AMPERIF ERROR RECOVERY

Document Type: 
Document Number (FOIA) /ESDN (CREST): 
CIA-RDP94T00858R000600880001-7
Release Decision: 
RIPPUB
Original Classification: 
K
Document Page Count: 
8
Document Creation Date: 
December 28, 2016
Document Release Date: 
February 12, 2008
Sequence Number: 
1
Case Number: 
Publication Date: 
April 13, 1983
Content Type: 
REPORT
File: 
AttachmentSize
PDF icon CIA-RDP94T00858R000600880001-7.pdf277.24 KB
Body: 
Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 NDS OPERATIONS PROCEDURE MANUAL SYSTEMS SW & HW NO. P-AO03 13 April 1983 ORIGINATOR: I STAT Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 DATA CENTER OPERATIONS BRANCH DCOB OPERATIONAL PROCEDURE MANUAL OPERATIONS - GENERAL No. 50-0009. 25 February 1983 AMPERIF CACHE DISC ERROR RECOVERY PURPOSE: 1. This-will establish a procedure to follow when an error occurs with the Amperif Cache Disc - Subsystems. REFERENCES: 2. The following references are incorporated within the following procedures AMPERIF CACHE Disc Operators Manual and the Univac 5046/8434 Operators Manual. PROCESSING: 3. See accompanying attached documents. Attachment: a/s STAT Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 CACHE ERROR RECOVERY: There are three major areas where an error can occur in a cache disk. sub- system. They are: 1. Control Units 2. Disc Drives 3. Cache Memory Before any activities are to be taken for repair and/or recovery it must first be determined where the error is occurring. The operator should have knowledge of what type errors can occur in a disc sub-system, for a understanding of these errors one should reference the Uni.vac Operators Manual on disc sub-systems. In the process of determining where an error has occurred some initial procedures should be followed: 1. Only under the direct supervision of a customer engineer and OCO should the cache memory be initialized. Once the memory has been initialized all data that was in cache will be lost and cannot be recovered. 2. Do not power off any cache modules or control units. This will destroy the data that is in cache and it cannot be recovered. 3. Before doing any re-setting one should check twice the toggle registers for what type of reset you are about to do. 4. A quick survey of the control unit(s) status and any outstanding error messages should be recorded. This will give the C.E.'s additional information when they arrive. 5. A call should be put into the C.E.'s when an error occurs so that they can give you additional information and/or help via telephone for recovery procedures. In determining where an error occurs one should first look at the disc drive in question (,the logical address of the disc drive, control unit and path will be contained in the error message). The disc drive is the most probable cause of error. Being mostly a mechanical device it has the highest chance of an error occurring on it. If an error does occur a few items should be looked at to determine if the disc drive is where the error is occurri,ng. 1. Are any fault lights l i.t on the, operators.indi.cator panel? 2. Is the drive powered on and in a ready condition? 3. Is the write - protect switch. on? 4. Are the fans blowing indicating power to the disc drive? Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 5. Is a pack in place and the lid closed (only for non-wi,nchester type disc,drives)_? If any of the above items are true for the situation then appro riate action should be taken to correct the condition occurring. (See Attachment) Once the action has been taken answer any outstanding console messages with an "A" and the system should return to normal. If the same type problem does occur on another drive but either thru the same path and/or control unit further actions must be taken. There is the high possibility that the problem is occurring in one of the other areas. If none of the previously mentioned situations have occurred the problem will probably be associated with a control unit, cache and/or path. The following suggestions may help in isolating the problem. In dealing with cache disk systems certain errors or conditions can occur where a loss of data will be incurred. This is an unfortunate fact, no matter how many safeguards are introduced to the system. There is always the possibility of losing data. Some instance where data loss is inevitable are: 1. Loss of a power supply to a cache module. 2. Uncorrectable errors in cache. 3. Loss of a memory board and/or memory control board. 4. Hard errors in the file register board (i.e., FRB parity errors, etc.). If one of the above conditions does occur systems (S.P.S.) and a customer engineer should be notified immediately. It should be noted that without the aid of some special tools and diagnosis the person observing or trying to locate the source of the problem will not know, with the exception of the hard FRB error, what condition is occurring. He will only know, thru his observations, from display readouts, of a potential problem that might exist where loss of data will be incurred. In trying to isolate control units (this also includes path problems) and cache memory problems, the operator should attempt to put all of the disc drives into a bypass mode. If this is done successfully and processing continues normally, it is advisable to leave all of the drives in a bypass mode and record any errors that have been logged. At this point the C.E. should be called and time scheduled to look at the problem. There will be the possibility of errors occurring wh.e.re the. data in cache cannot be written back to the disc drive. An attempt should he made to do the following - if disc drive has no errors. 1. A data copy should be attempted and write-protected placed on the data copy drive. Refer to Amperif Cache Disc Operators Manual for data copy procedures. Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 2. Disable control unit B and do a bypass all a. If it works then o.k. - leave control unit B disable and down via software (console). b. If the problem reoccurs, do the above procedure (a) for control unit A. 3. If the above procedure does not work notify systems personnel and C.E.'s. 4. At this point the data is stored in the data copy and is good. Cache can be re-initialized and data restored to cache from the data copy drive. Another attempt can be made to place all the drives into bypass after the data has been restored. 5. If the disc drive has an error, attempt to move the pack and I.D. plug to a spare disc drive and try again to put the drives into bypass. Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 Control Units In isolating control units as previously mentioned all drives should be placed in bypass if possible. Once this has been accomplished we can now begin to isolate which control unit and/or path is causing problems. The procedure to be followed for isolating control unit should be as follows: 1. Before attempting to determine which control unit may be bad, check the shut-down authorization light coming from the U.P.S. If this light is one place the U.P.S. into bypass mode, reset the control units but do not initialize cache. Answer all outstanding messages. The system should go back to normal. It should be noted that the U.P.S. is defective and should be looked at. It is not advisable to put disk drives in a caching mode, they should be placed either in write-thru or bypass modes. If a problem persists then continue with the following steps. 2. Down the associate control unit, via the console, with the outstanding error message. 3. Answer the message with the proper response. If the error does not return and normal processing continues the control unit and/or path is bad. Time should be scheduled for emergency maintenance. 4. If an error message does continue do the above procedure (#3) for the other control unit. a. If an error still continues it is possible that the disc drive or other problem exists. 5. If a control does prove to be bad, it would be useful information to the customer engineer whether or not the problem stays with one path or both. 6. In determining which path may be bad, down the associate path on the bad control unit with the outstanding error message. Up the control unit and see. if the problem goes to the other path. If so, do the same procedure for the other path. If the problem goes thru both paths of the control unit you can be certain that there is a control unit problem. Leave the control unit down and all the drives in bypass and schedule emergency maintenance. 7. In the last step if a problem stayed wi'th_a path one can isolate the problem further. By this I mean whether the problem stays with the path from the XFER switch to the control. units or a path from an IOU to the XFER switches. 8. When isolating a path problem one must insure that both IOU's in question, associate IOU's on the XFER switch., are in the system. If they are not, then notify the OCO's that a configuration change must be made. Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 9. To determine which or both paths are bad, down the control unit and both paths in question. Switch the XFER switch and up the associated path and control unit. If no problems occur then the possibilities are that the XFER switch or path from the IOU's to the XFER switch is bad. If the problem remains, it will be associated with the control unit or path from the XFER switch to control unit. Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7 DISK DRIVE ERRORS FAULT CONDITION ACTION TO RE TAKEN I. Fault light indicated on 1. Reset fault light and answer any operators panel. outstanding messages. 2. Recycle drive. a. If fault does not appear answer any outstanding messages. b. If fault remains remove pack and I.D. plug to known good spare disc drive and answer any outstand- ing messages.* **NO NEED TO CALL C.E. *NOTE: A C.E. should be called and notified of the problem. II. Is the disc drive powered on 1. Power on disc drive and press ready and ready. a. If drive comes ready answer message an "A" system should go back to normal.** b. If drive does not come ready or fault occurs remove pack and I.D. plug and place on a known good drive.* III. Is the write-protect switch on. I. Remove write-protect and answer messages.** IV. Are fans blowing (i.e., is there 1. If power is to the disc drive then power to the drive) try to recycle drive. a. If drive comes ready and you are able to answer message with an A proceed normal.** b. If drive does not come ready then remove pack and I.D. plug to a known good spare drive.* c. If no power is going to disk drive- the associated CRT speaker should be checked and put back on. If tripped - if continues to trip move etc.* V. No pack or LID not closed. 1. Put pack into drive, make sure drive is to be i;n system - start drive and answer with an A. Attachment "A" Approved For Release 2008/02/12 : CIA-RDP94T00858R000600880001-7