PHASE II FINAL REPORT VOLUME V SYSTEM ORGANIZATION FUNCTIONS, AND PROCEDURES

Document Type: 
Collection: 
Document Number (FOIA) /ESDN (CREST): 
CIA-RDP78-03952A000100050001-7
Release Decision: 
RIPPUB
Original Classification: 
S
Document Page Count: 
493
Document Creation Date: 
November 16, 2016
Document Release Date: 
February 3, 2000
Sequence Number: 
1
Case Number: 
Publication Date: 
March 1, 1965
Content Type: 
REPORT
File: 
AttachmentSize
PDF icon CIA-RDP78-03952A000100050001-7.pdf20.08 MB
Body: 
Approved ForRelease 2000/05130 : CIA-1DP78-03952A0q, 0 SYSTEM ORGANIZATION, FUNCTIONS, AND PROCEDURES DIRECTORATE OF SCIENCE AND TECHNOLOGY OFFICE OF COMPUTER SERVICES df 1 Approved For Releas6-20601051,30`-: U1A-RDID78-03952A000100050001-7 GROUP Excluded Fr.rn ClO? 4 2114*,1 Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 WARNING This material contains information affecting the National Defense of the United States within the meaning of the espionage laws, Title 18, USC, Secs. 793 and 794, the trans- mission or revelation of which in any manner to an unauthorized person is prohibited by law. Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 :go- 78-03952A000100050001-7 CONFIDENTIAL; 2/4.04,C- ././- Phase II Final Report Volume V SYSTEM oRWTgATTPL? FM4T115-0-nD PROCEDURES CHIVE/R-3-65 1 March 1965 DOE REV DATE al Mr Pi Y g 0R1G COMP TYPE OR1G MAAS PAGES REV CLASS - JUST NEXT REV eV/ Minh RR 104 CONFIDENTIAL Approved For Release 2000/05/30 : IA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Cl CONR 8-03952A0Z:010-0Q TABLE OF CONTENTS Page 5.1. Introduction 1 5.1.1. General 1 5.1.2. System Overview 2 5.2. System Organization 9 5.2.1. Background 9 5.2.2. Proposed Organizational Concept 10 5.2.3. Position Descriptions 21 5.3. Data Base 31 5.3.1. The Selection Problem 31 5.3.2. Basic Selection Criteria 31 5.3.3. Sources to be Exploited 33 5.3.4. Level of Coverage 34 5.4. CHIVE Indexing Technique 41 5.4.1. Introduction 41 5.4.2. Concepts 41 5.4.3. System Description 53 5.5. System Files 5.5.1. Introduction Approved For Release 2000/05/30 : C -03952A000100050001-? - 67 67 Approved For Release 2000/05/30 : C 8-03952A000100050001-7 CONFIDENT1 Page 5.5.2. Document Index Files 73 5.5.3. Document Image Files 78 5,5.4. Vocabulary Control Files 81 5.5.5. Unsynthesized Information Files (UIF) 106 5.5.6. Summary Information Files (SIF) 115 5.5.7. Special Projects Files 121 5.5.8. Referral Service Files 127 5.5.9. Management Data File 133 5.6. System Flows and Transactions 141 5.6.1. Document Input 141 5.6.2. Document Retrieval 151 5.6.3. Information File Building, Maintenance and Retrieval 159 5.6.4. Task Tables for System Transactions 168 5.7. File Conversion 189 5.7.1. Introduction 189 5.7.2. Document Index Files 190 5.7.3. Document Image Files 203 5.8. Computer Interface 211 5.8.1. General 211 Approved For Release 2000/05/3 52k8001 -00050001-7 A?roved N411E11E140 8-03952A000100050001-7 Page 5.8.2. Command Language 212 5.8.3. File Definitions and the EDP File Analyst 216 5.8.4. Summary 218 5.A The Organizational Problem 221 5.A.1. Organizational Objectives 221 5.A.2. Alternative First-Level Organizational Concepts 228 5.A.3. Organizational Alternatives Within A Geographic Division 258 5.B Preliminary Evaluation of the CHIVE Indexing Experiment 273 5.B.1. Summary Description of Experiment 273 5.3.2. Preliminary Findings 279 Feasible Alternatives in Index Design 287 CONTIDEIVT AL Approved For Release 2000/05/30 : -03952A000100050001-7 Approved For Release 2000/05/30 : Cl NFIDENT1AL 2A000100050001-7 5.C. CHIVE Indexing Guide Page 297 5.C.1. Introduction 297 5.C.2. Content Indexing System 298 5.C.3. Header Data Transcription Guide 324 Tab A Code Schedules 351 Tab B Project CHIVE Tags 365 Tab C CHIVE Index Terms 387 Tab D CHIVE Header Form 388 Tab E Authorized Abbreviations/CHIVE 389 5.D Inherited Files 393 5.D.1. Introduction 393 5.D.2. Index Files 397 5.D.3. Document Image Files 463 Approved For Release 2000/05/30 : CS40511 V)11bW4A . 60100050001-7 Approved For Release 2000/05/30 : ClitigF81039L2ff, ,A,PrirrinVA I FIGURES Page 5-1 CHIVE System Flow Chart 3 5-2 UIF File Building Alternatives 112 5-3 SIF File Building Alternatives 120 5-4 Document Input Processing 142 5-5 Document Retrieval Processing 152 5-6 Information File Maintenance 164 5.D-1 List of China-Related Inherited Files 494 5.D-2 Vocabulary Control, Summary and Unsynthesized China-Related Inherited Files 495 5.D-3 Format A - SR Subject/Commodity File Card 496 5.D-4 Format B - SR (China) Area Detail File Card 498 25X6 5.D-5 Format C - SR Organization 25X6 File Card; SR Personality File Card; SR Foreigner File Card 500 5.D-6 Format D - SR Soviet Organization File Card; SR Soviet Personality File Card; SR Soviet Foreigner File Card 502 5.D-7 Format E - All Other Organization File Card 504 5.D-8 Format F - All Other Personality File Card; All Other Foreigner File Card 506 Approved For Release 2000/05/30 : 1111-03952A000100050001-7 Approved aNase eFIDal 0/05/30 : CIA 103952A000100050001-7 5.D-9 Format G - PI Subject/Commodity File Page Card; PI Area File Card 508 5.D-10 Subject/Commodity and Area Files 510 5.D-11 Organization Files and Derivative Files 511 5.D-12 Job 3 File Statistics 512 5.D-13 Reports Title Index 513 5.D-14 Job 3 Card Format 514 5.D-15 Job 3 (KWIC) Elements of Information 515 5.D-16 FIB Town/City Information Card Format 517 5.D-17 FIB Installation Information Card Format 518 5.D-18 FIB Location Cross Reference Card Format 519 5.D-19 FIB ICF Coordinate Card Format 520 5.D-20 FIB ICF City Cross Reference Card Format 521 5.D-21 FIB ICF Name Card Format 522 5.D-22 FIB Model-Type Brochure Index Card Format 523 5.D-23 Punched Card Characteristics of the IRS DOcument Index File (New) 524 50D-24 Punched Card Characteristics of the IRS Document Index File (Old) 525 5.D-25 Punched Card Characteristics of the Film Index File 526 Approved For RelpaRVF05ITFAUggr-03952A000100050001-7 laU141-1"1 Approved For Release 2000/05/30 LIP6E-1110.7 Page 5-1 CHIVE inputs 38 25X1A 5-2 Index Report 171 5-3 Over-Counter Document Search 175 5-4 Generation and Input Processing of Formatted Information/Index Records Prepared Under Contract 179 5-5 Information Analyst Activity Relative to an All-Source, All-File Search for a Named Personality 183 Approved For Release 2000/05/30 : CIA- CON ODZI') IAL 03952A000100 500v - Approved For Release 2000/05/30 : Clh-WER:5118-03952A000100050001-7 Chapter 5.1. INTRODUCTION 5.1.1. GENERAL This volume of the report is primarly concerned with the non-EDP aspects of the CHIVE system, that is, the organization of personnel required to operate the system and types of personnel needed, the nature and extent of the data base to be exploited, the indexing philosophy and technique, the files which will be identified to the user, system flows and data handling procedures, and the man- machine interactions projected for the computer-centered system. Of course, not all design problems have been resolved. Moreover, even if they had, it would not be possible to describe within the confines of one volume all of the transactions which must be performed in a system as large and complex as this. However, illustra- tions of representative tasks are included and some concepts of system data flows are presented to dEmonstrate the impact of hardware and programs upon personnel actions. The recommendations interspersed in this volume result from Phase II of the design study, including a preliminary evaluation of the CHIVE Indexing Experiment INTRODUCTION General 5.1.1. - 1 - Approved For Release 2000/05/30: CIA-MBE143952A000100050001-7 Approved For Release 2000/05/MMECRDP78-03952A000100050001-7 conducted between November 1964 and January 1965, and are supported by material in some of the appendices to this volume as well as in earlier CHIVE documentation. The other appendices present further details on the indexing language and technique and the files to be inherited from the existing central reference reposi- tories. All are recommended reading for recipients of this report who desire more detail on specific aspects of the system, as well as further background on the alternative configurations considered and the steps taken to arrive at the recommended system. A supple- mentary appendix to this volume will be issued later describing the CHIVE Indexing Experiment in greater detail, and reporting the final conclusions derived therefrom. 5.1.2. SYSTEM OVERVIEW A simplified graphic view of the CHIVE system can be obtained by referring to Figure 5-1. In this diagram the flow paths within the system are separated for descriptive purposes into three major functional cate- gories--document input processing (flow path 2), document retrieval processing (flow path 1), and information file building and maintenance (flow path 3). The following INTRODUCTION System Overview 5.1.2. - 2 - Approved For Release 2000/0?e@gAIA-RDP78-03952A000100050001-7 25X1B Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA-VER'V03952A000100050001-7 paragraphs will summarize briefly the major elements of the system, leaving the more detailed explanations to subsequent chapters in the volume. In general, the philosophy of the CHIVE system is to combine the required intellectual talents of trained intelligence information analysts with the processing and storage capabilities of the computer. The source documents to be input to the system, the necessary human functions to be performed relative to these documents (i.e., reading, selecting, indexing, querying and reporting) and the outputs to be derived from the system are quite similar to those which characterize one or more elements of the existing central reference operation. Only if the proposed system is compared to an individual register s,lbsystem within the current OCR complex does the contr,st appear, and then only with respect to certain features of the existing subsystem. In terms of file organization, the system follows the approach used in SR/OCR and DD/OCR in maintaining a separation between an index and the document holdings to which it refers. This necessarily has implications in terms of input time which may compare unfavorably with some of the current systems which are oriented toward multiple-filed documents (e.g., BR/OCR), but it also INTRODUCTION System Overview 5.1.2. - 4 - Approved For Release 2000/05/30 : CIA-RgFrTgg3952A000100050001-7 Approved For Release 2000/05/30 : Cli&W8-03952A000100050001-7 offers certain advantages in such areas as procedural standardization, index integration, number and variety of access points to the files, space requirements, etc. The information is received primarily 'n the form of documents; however, index records to maps, photographs, and films will also bu included in the system, as will certain machine-language data prepared on contract (but under CHIVE control) by external organizations (e.g., the Library of Congress). Following preparation of the index record (a function normally performed by humans except where only a limited retrieval capability seems required), the index will be converted to machine storage with the aid of an optical character reader and placed in a random access device, ultimately the IBM/System 360 Data Cell Drive. The information storage capacity of one Data Cell Drive will allow us to accommodate the content of an estimated 600,000 index records (the actual storage capacity is 400 million characters of information), and there is no practical limit on the number of modules that could be provided. The same device would be used te hold what might be called the directory to the index records themselves, i.e., a list of the terms which appear in INTRODUCTION System Overview 5.1.2. - 5 - Approved For Release 2000/05/30 : CIA-FEEK*63952A000100050001-7 Approved For Release 2000/05WINA-RDP78-03952A000100050001-7 the index records and, for each term, the record and phrase number(s) containing said term. This would obviate the need to examine every index record in the file to see if it contains the term (or terms) sought. Index entries can be retrieved from the index store at the rate of about two per second depending on the number of terms involved in the search formula. CHIVE's recommendation is that most textual docu- ments should be converted to microfilm and stored either in the form of 35 mm. aperture cards (containing up to 8 image,_ per aperture) or packed microfiche (sheet microfilm records containing up to 60 letter-size pages on each microfiche). Documents in excess of a certain page limit and those of poor image quality should be kept in hard copy. Maps, films, and photos will continue to be stored in the conventional manner in the physical reposi- tories in which they are now located. Whether the 35 mm. aperture card or microfiche storage system is chosen, the document images should be filed in motorized card files, but should be retrieved and refiled manually. Assuming 10 million documents were to be stored on site, the estimated floor space required for a packed microfiche system would be an area approxima- tely 30' x 60'; for the 35 mm. system, 40' x 70'. Output INTRODUCTION System Overview 5.1.2. - 6 - Approved For Release 2000/05/SECREtr-RDP78-03952A000100050001-7 Approved For Release 2000/05/30: CaKBAT8-03952A000100050001-7 from either the hard copy document or microimage files would consist of paper copies. The integrity of the document collection will be maintained such that none of the master microimages, or original documents if filed only in hard copy, will leave the file except for photoduplication or hard copy printing. INTRODUCTION System Overview 5.1.2. - 7 - Approved For Release 2000/05/30 : CIA3iRU03952A000100050001-7 Approved For Release 2000/05/30: CIAKA305-03952A000100050001-7 Chapter 5.2. SYSTEM ORGANIZATION 5.2.1. BACKGROUND The organizational configuration recommended by CHIVE is the product of much thought and discussion Imo extending back into the Phase I study and reflects a variety of views expressed by persons both internal to um; the CHIVE design team as well as to OCR. One of the two most vexing and, at the same time, one of the most important of the CHIVE design problems, it is not anticipated that the organizational plan which has evolved will be attractive to all. Nevertheless, it Iwo appears to offer the best hope of achieving the isme desired system objectives, consistent with the human factor requirements imposed by the environment within Iwo which the system must operate. The search for a revisioli of the existing central mow reference organizational structure was largely influenced by the findings of the DD/I survey and the set of system 10.40 'vomit requirements derived therefrom. These findings and inferred goals have been described in the CHIVE Phase I Report, in CHIVE/R-1-63, and (in more abbreviated fashion) in Volume IV, Chapter 2, of this report. The organization study SYSTEM ORGANIZATION Background - 9 - 5.2.1. Approved For Release 2000/05/30 : CIAMIRIF03952A000100050001-7 Approved For Release 2000/05/36EUETRDP78-03952A000100050001-7 itself may be said to have consisted of three phases: a. An analysis of the personnel or management requirements imposed on the system by the overall system objectives. b. A study of various alternative organizational configurations which might be adopted, ranging from a completely decentralized activity to various kinds of centralized operations, including alternative configurations at different hierarchic levels. c. An evaluation of one organizational concept by the process of subjecting the concept to a practical experiment which simulated to some extent the problems to be encountered in a live environment. Phases a. and b. are described in some depth in Appendix 5.A. to this volume and are briefly reviewed below. Phase c., which resulted in some revision of the organizational concept, is discussed in Appendix 5.B. and only its conclusions are reflected here. In considering the managerial problem of how best to organize the input and retrieval functions to be performed, as well as the personnel to carry out these functions, a number of organizational requirements were set forth. SYSTEM ORGANIZATION Background 5.2.1. - 10 - Approved For Release 2000/05/gtcRt-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIAW8-03952A000100050001-7 These requirements, or objectives, may be summarized for the purposes of this review as follows: a. Specialization with minimum processing duplication b. Minimum customer contact points c. All-source service from any point d. Close communication between input and query handlers e. Close communication between system operators and users f. Document control as the first priority g. Operator job satisfaction h. Personnel flexibility The next step was to pass various organizational configurations against these objectives to determine which would appear to offer the best hope of accommodating the defined goals. Because of the size of the contemplated activity in terms of the number of personnel needed to operate the system, this required that alternative con- figurations be considered not only at the first organiza- tional level, but at least at one additional level below that. For the first cut the following four different organizational concepts were considered: a. Retention of the existing OCR configuration SYSTEM ORGANIZATION Background 5.2.1. - 11 - Approved For Release 2000/05/30 : CIAMtEX03952A000100050001-7 Approved For Release 2000/05/MCCRFORDP78-03952A000100050001-7 b. Development of a single, all-source document retrieval system, with a separate biographic information facility c. Dispersal of some or all of the information storage and retrieval activity among the research and production components d. Continuation of the central system, but on an all-source, geographically-organized basis Where the additional subdivision of personnel would be required because of the size of a particular component these additional means of grouping the analysts assigned thereto were studied: a. Organization by document source (Collateral, Comint, etc.) b. Organization by function (input, retrieval, information file maintenance, etc.) c. Organization by class of data to be stored and retrieved (biographic, installation, subject/commodity, etc.) d. Organization by topic (political, scientific, economic, military, etc.) The study very quickly made clear that none of the alternatives considered resolved all problems that could be anticipated. However, the combination of the geographic SYSTEM ORGANIZATION Background 5.2.1. - 12 - Approved For Release 2000/05a0GrOk-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClagF8-03952A0001000500017 approach at the first level, and topical specialization (where required) at the second, seemed to come closest to meeting the organizational objectives outlined above. It remained to be seen, however, whether an information analyst could perform all-topic indexing of all-source documents satisfactorily, and what effect it mi-it have on his morale and attitude if he had to operatL in this kind of environment. The CHIVE Indexing Experiment afforded the opportunity to test the configuration proposed and, as detailed in Appendix 5.B., identified a number of problem areas which suggested that some additional organizational and procedural alternatives might well be considered. Of principal interest from the organizational point of view was the recommendation that the geographic concept be retained but that the coding process per, se be separated from the function of selecting documents and identifying the subjects or objects to be indexed. Acceptance of this approach meant some compromise of the single-point indexing concept but offered the advantage of increased job satisfaction on the part of the more highly qualified analyst, helped reduce the selection problem, and suggested the possibility of acquiring more personnel for less money to perform the more routine input functions. Since it still permitted achieving all the other organizational SYSTEM. ORGANIZATION Background - 13 - 5 2 . 1. Approved For Release 2000/05/30 : CIA-Frgant3952A000100050001-7 Approved For Release 2000/05/3V:CdiATRDP78-03952A000100050001-7 objectives, it was selected as the alternative best satisfying the system requirements and is the approach recommended here. 5.2.2. PROPOSED ORGANIZATIONAL CONCEPT The responsibility for implementing a spec_fic organizational configuration must be left to those who will direct the operation since there are a variety of factors to be considered which are beyond the purview of the system designer. To assist those, however, who will be charged with this activity, it might be useful to summarize the principal CHIVE organizational recom- mendations in the context of the major functions to be performed within the system, and to give some feel for the interrelationships between these functions since these could have implications for management in terms of communication interface, assignment of physical space, and so forth. This first _,00k will be an abbleviated one since much of the same ground is covered (if from a slightly different point of view) in more detail in other sections of this volume. A set of position descriptions outlining the duties and responsi- bilities of the various types of personnel within the system concludes the chapter. SYSTEM ORGANIZATION Proposed Organizational Concept 5.2.2. - 14 - Approved For Release 2000/05ntR04-RDP78-03952A000100050001-7 Approved For Release 2000/05/30: gMtp1978-03952A000100050001-7 5.2.2.1. Input Control and Customer Service The CHIVE system would be built largely around Information Analysts organized (at the first level) into some four or five geographic components. It is our view that it is difficult to identify any better way of organizing the input and retrieval activity than by grouping the primary individuals involved by geographic area. As stated in earlier documentation, this approach loses the advantage of source specialization in processing and poses the problem of geographic overlap in document analysis and query coordination. At the same time, it contributes to standardization of vocabularies and procedures important in an all-source environment, anC is in focus with customer inquiries which normally relate to a particular geographic region of the world. Thus, on balance, while it does not overcome all operational problems that can be envisaged, of all the alternatives considered it seems to come nearest to meetin7 the system objectives. Without specific restrictive criteria (which, thus far, seem impossible to obtain) with respect to the content of the documents to he processed, the experienced Information Analyst, operating in close communication with his customers, appears to offer the best hone of resolving the data selection problem. The Information Analyst would, SYSTEM ORGANIZATION Proposed Organizational Concept - 15 - 5.2.2.1. Approved For Release 2000/05/30 : CIPM1M-03952A000100050001-7 Approved For Release 2000?IftitTCIA-RDP78-03952A0001 050001-7 therefore, be responsible for determining not only what documents entered the system files but what data within these documents was captured for retrieval purposes. The Information Analyst operating out of a geogra- phic component would also be solely responsible for the selection and processing of data input to information files required by customers, and would handle all queries levied on the system. By virtue of the fact that he was personally involved in the input process, he would not only be familiar with the current reporting but would know what material had been stored for retrospective searching and how to get at it. Whether the Information Analyst should also specialize by topic within area or by some class of intelligence data (e.g., biographic, installation, etc.) remains a moot point. CHIVE continues to favor the former in the belief that it would lessen the number of times a document would have to be handled, but additional testing of both concepts is desirable. 5.2.2.2. Index Preparation The function of physically preparing the index records to documents, including both the header (bibliographic) as well as the content data descriptions, would be assigned to special personneZ known as Header Indexers and Content -1 Approved For Release 200gge SYSTEM ORGANIZATION Proposed Organizational Concept 5 . 2 . 2 . 2 . : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30: GIWREVIP78-03952A000100050001-7 Indexers, operating in close communication with the analytical components. Content Indexers serving one geographic desk, e.g., the Far East, should probably be located together as a unit attached to said component. The Content Indexers, like the Information Analysts, would be subdivided by geographic area and each would normally process the output of his counterpart analyst or analysts. Content Indexers would each have a set of the dictio- naries and other vocabulary control tools pertinent to hi: area of responsibility. In addition, a master set of other area dictionaries would be located within each content indexing group for reference purposes. Content Indexers would translate the items of data tagged by Information Analysts into the codes and other descriptors dictated by the vocabulary of the system. To increase their sense of participation in the more intellectual aspects of the input process (and, thereby, reduce turnover), they might be given full responsibility for general subject indexing as distinct from named- object control. Header Indexers would perform a function similar to content indexing, but on the bibliographic elements of a document. One group of Header Indexers would operate in a SYSTEM ORGANIZATION Proposed Organizational Concept 5.2.2.2. - 17 - Approved For Release 2000/05/30 : CU:SWEV-03952A000100050001-7 Approved For Release 2000/?Et3REICIA-RDP78-03952A000100050001-7 centralized mode, serving all geographic components by header indexing, immediately upon receipt, those documents for which CHIVE has a repository responsibility. Other Header Indexers would be assigned to each geographic organization to capture the necessary bibliographic data pertaining to non-repository-type documents which had been reviewed by Information Analysts and selected for retention by the system. 5.2.2.3. Dissemination The dissemination function, apart from any necessary re-routing of documents within a CHIVE geographic component, is external to the system per se. However, it might be advantageous to co-locate dissemination personnel with the centralized header indexing group between document receipt and file repository-type documents. 5.2.2.4. This operation equipment Data Transcription function required is to be refers if, as to shorten the time availability for to the rather formalized typing planned, used to convert optical recognition index and other records into machine-recognizable form. Header Indexers can type their inputs in a form suitable for processing by a page reader. However, a central pool of typists will also be SYSTEM ORGANIZATION Proposed Organizational Concept 5.2.2.4. - 18 - Approved For Release 200W4,r. CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : qttlikle78-03952A000100050001-7 needed, operating at the system level, to convert the majority of transcript sheets received from Content Indexers as well as search rr,quests from Information Analysts into the graphic quality required. This central pool can be supplemented by typists assigned to the various area desks who, in addition to typing finished reports, memoranda, etc., would also transcribe many of the file maintenance and query transactions for input to the page reader. 5.2.2.5. Image Processing and Document File Maintenance Image processing is that activity conducted by the so-called "Document Delivery System," i.e., the micro- filming and associated operations required to convert incoming documents to microimage form, as well as the reproduction of items retrieved from the document store for delivery to customers. This is a relatively discrete function although, if an aperture card storage system is employed, it requires some support from the machine side of the house. Otherwise, its principal interface is with the document store itself to which materials are passed after microfilming and from which it receives, in turn, items to be reproduced. During the evolutionary development of the CHIVL system both the new and old system operators will require SYSTEM ORGANIZATION Proposed Organizational Concept 5.2.2.5. - 19 - Approved For Release 2000/05/30 : Cl4gait7r8-03952A000100050001-7 Approved For Release 200ogg1ft'ETCIA-RDP78-03952A000100050001-7 access to many of the same document collections. If the logistical problems are not too severe, it would seem advisable to co-locate all master document files in one general physical area to lessen the communication Problem as well as render file maintenance operations more efficient. This might increase the distance which now obtains between an existing central document collection and a set of users, but over time the majority of users would probably benefit from the establishment of one "Document Center." Similarly, because of the close relationship between the document files themselves and the image processing function, it is recommended that the latter be connected both physically and organizationally to the former.* 5.2.2.6. Machine Functions The principal machine-related activities and hardware include: a. EAM personnel and equipment needed to input data to files not yet absorbed into the new system and to retrieve data therefrom. Assuming no conver- sion to an EDP storage medium, the latter, in particular, will necessitate the retention of an EAM facility for as long as the inherited files have value. b. EDP hardware needed to operate the new system, including associated I/O devices (e.g., the page reader), and computer operator personnel. * Problems involved in co-locating files are discussed in Volume III. SYSTEM ORGANIZATION Proposed Organizational Concept - 20 -5.2 2,..6 Approved For Release 2001-: CIA-Ku1378-03952A000100050001-7 Approved For Release 2000/05/30 : Cl1aW3-03952A000100050001-7 c. System analysts/programmers (referred to in this report as EDP File Analysts) who will develop and refine the machine operations to be performed, define new files to the system, etc. Logically, all of these personnel and operations should be centralized in one organizational component whether located within the central reference complex or external to it. 5.2.3. POSITION DESCRIPTIONS The personnel involved in making up the CHIVE operator complex will include the following: Information Analyst, Content Indexer, Header Indexer, Dictionary Editor, Data Transcriber, Information Control Clerk, Document File Clerk, Reproduction Equipment Opeator, EAM Operator, Computer Operator, and EDP File Analyst. 5.2.3.1. Information Analyst The Information Analyst will be the principal inter- mediary between the customer and the system. He will be responsible for selecting what goes into the files and will screen all output before it is delivered to a requester. Senior Information Analysts will serve in various supervisory capacities frm the sub-Section to the Branch or Division level, directing, coordinating, and reviewing the work performed by their subordinates. SYSTEM ORGANIZATION Position Descriptions 5.2.3.1. - 21 - Approved For Release 2000/05/30 : CIAMMT-03952A000100050001-7 Approved For Release 2000/05/?ifcgURDP78-03952A000100050001-7 All Information Analysts will hold professional positions, and will specialize in a particular geographic area and (where required by-reason of work volume) by topic within area. Every Information Analyst will be trained in applying the indexing vocabulary to documents by actual involve- ment in the coding process. He will also be thoroughly familiar with all the CHIVE-built files available within the system as well as the query language used to interro- gate or modify said files. In addition, he will know what inherited files were acquired from the existing system and their general content, although not necessarily the vocabulary used in these files. The duties of an Information Analyst will include: a. Receiving and reviewing the content of documents, cables, graphics and other incoming data for information worthy of retention by the central reference system. b. Selectively marking the elements of information to be extracted from the documents for represen- tation in the system's index files and distribu- ting the marked documents to Content and/or Header Indexers. c. Exploiting the content of document index records for the purpose of building formatted information files pertaining to a specific subject or class of subjects. d. Preparing file maintenance transcript sheets as the means of adding da,-a to, or changing data within, said information files. SYSTEM ORGANIZATION Position Descriptions 5.2.3.1. - 22 - Approved For Release 2000/05g@c041-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Clktidifir3-03952A000100050001-7 e. Receiving requests from customers and preparing the necessary search prescription after consulting the relevant vocabulary control files and other Information Analysts most familiar with the vocabularies of certain inherited files. f. Requesting copies of documents as well as dossiers and other master records from the central docu- ment repository. g. Reviewing, analyzing, and synthesizing data recovered as a result of the search process and preparing responses in raw or finished form for delivery to the customer. h. Advising customers about files or persons external to CHIVE that might be worthwhile consulting, and personally contacting same if required. 1. Recording necessary management data relative to requests received, responses furnished, and other system processes. 5.2.3.2. Content Indexer The Content Indexer will he a semi-professional possessing at least a high school education. His duties will include: a. Extracting the elements of information in a docu- ment identified for him by the Information Analyst. b. Consulting the relevant dictionaries and other vocabulary control files for the purpose of seJecting the appropriate controlled terms to express these items of data. c. Arranging the data into a form for machine entry using pro-forma content data transcript sheets. d. Consulting with the appropriate Information Analysts and Dictionary Editors with regard to the applica- tion of the index language and possible revisions to the system vocabularies. SYSTEM ORGANIZATION Position Descriptions 5.2.3.2. - 23 - Approved For Release 2000/05/30 : CIA-RgEn?013952A000100050001-7 Approved For Release 2000/05/AE.CMIRDP78-03952A000100050001-7 e. Initiation of additions or changes to the vocabulary control files through preparation of file maintenance transcript sheets. f. Reviewing printout of changes and additions to the files including incorrect entries. 5.2.3.3. Header Indexer The Header Indexer will occupy a clerical position and must be a qualified typist. The duties of the Header Indexer will include: a. Extracting the standard header (bibliographic) data appropriate to the category of document involved, and expressing this data (where re- quired) in the codes used by the system. b. Typing the data in the prescribed manner for machine entry using the correct header data transcript sheet. c. Consulting with the Dict 'nary Editor for Header Data with regardto':he use of the header data codes and format, al. recommending changes when required. 5.2.3.4. Dictionary Editor The Dictionary Editor will be an Information Analyst with primary responsibility for control of one of the system's vocabulary files. Some Dictionary Editors will have system-wide control over the application of terms in their respective subject areas. Others (e.g., an Organiza- tion Dictionary Editor) may govern the use of terms only within a given country or other geographic area. Tho SYSTEM ORGANIZATION Position Descriptions 5.2.3.4. - 24 - Approved For Release 2000/05SEERelk-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CggBg8-03952A000100050001-7 duties of a Dictionary Editor will include: a. General review of the content and format of sample transcript sheets emanating from Indexers assigned to the area unit of which he is a part. b. Providing advice and counsel to Indexers on the use of the specific dictionary for which he is responsible. c. Reviewing all new entries to the dictionary for the purpose of determining whether each was a legitimate entry and whether format and content met established procedures. d. Personally initiating changes to a dictionary where required. e. Reviewing printouts of changes and additio,ls and insuring that all revisions to the dictionary are published and disseminated. f. Consulting with other Information Analysts and custr?rs regarding current requirements and possible improvements to the system's vocabulary control files. g. Advising Information Analysts preparing request statements on the terms to be used in the query prescription. 5.2.3.5. Data Transcriber The Data Transcriber includes any person exclusively assigned to operate a key-driven device from copy provided via another system operation. The duties of a Data Transcriber will be as follows: a. Receive format instructions from Information Analyst, Content Indexer, or other individual for typing, tape perforation, or card punching. SYSTEM ORGANIZATION Position Descriptions 5.2.3.5. - 25 - Approved For Release 2000/05/30: CIA-BDPRE103952A000100050001-7 Approved For Release 2000/05/36ECKTRDP78-03952A000100050001-7 b. Prepare typed copy, punched paper tape, or cards for optical character recognition or other form of computer entry. c. Check transcribed cop l for accuracy and correct if necessary. d. Operate typewriter, Flexowriter-like device, 026 Key Punch, and 056 Verifier. 5.2.3.6. Information Control Clerk Information Control Clerks will be assigned to most operational components of the system. Their general duties will include: a. Receiving material such as hard copy documents, machine listings, document request forms, paper and magnetic tapes, card decKs, etc. b. Accounting for material received and maintaining necessary special-purpose logs of requests and other actions. c. Intra-office routing and delivery of materials to staff personnel and mailing of system products to customers. d. Assisting Information Analysts in the routine maintenance of manual files including the insertion of handwritten entries to machine listings and other hard copy records. 5.2.3.7. Document File Clerk The duties of the Document File Clerk will include: a. Filing newly-processed documents or refiling old materials in+-..o one or more of the following types of document files: personality or installation dossiers, card files, open-shelf document files, and 16 mm. or 35 mm. aperture card collections. SYSTEM ORGANIZATION Position Descriptions 5.2.3.7. - 26 - Approved For Release 2000/05EXICKMk-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClOCEIM-03952A000100050001-7 b. Receiving requests for documents or other records to be retrieved and recovering same from the appropriate files on either a routine or priority basis. c. Maintaining dossier and other special-purpose logs pertaining to transactions affecting the document files. d. Recording action taken on document request forms, forwarding requests for unrecovered documents to other file repositories for searching, and transmittal of master records to image processing for photographic reproduction. 5.2.3.8. Reproduction Equipment Operator The duties of the Reproduction Equipment Operator will include: a. Receiving inroming documents and determining which are photographable and which must be stored in hard copy. b. Operating the appropriate microfilming equipment required to reduce the documents to a micro- storage medium and reviewing the quality of the photographic record. c. Receiving documents retrieved from the master files and reproducing same on a variety of image- processing equipment. d. Servicing and supplying reproducing equipment. e. Supplying copies of documents to Information Control Clerks for delivery to internal or external requesters. 5.2.3.9. EAM Operator EAM Operators will be required to process certain card files inherited from the existing system as well as - 27 - Approved For Release 2000/05/30 : CIA-SIE1RT03952A000100050001-7 SYSTEM ORGANIZATION Position Descriptions 5.2.3.9. Approved For Release 2000/05/1C8aRDP78-03952A000100050001-7 select new files. In general, the duties of an EAM Operator will include: a. Operating electrical accounting machines including interpreter, reproducer, tabulator, sorter, and printer units. b. Performing routine machine operations in accordance with conditions outlined by EDP File Analysts, Information Analysts, and Indexers. c. Wiring panels in accordance with directions. 5.2.3.10. Computer Operator In general, the duties of the Computer Operator will include: a. Maintaining a schedule and operating log of the components of the computer complex. - b. Loading and unloading Tape Units. c. Loading and operating stored programs. d. Tracing and correcting program errors. e. Correcting failures in card, paper tape, or optical character reading equipment. f. Wiring and/or selecting control panels for use in card reading machines. 5.2.3.11. EDP File Analyst The duties of the EDP File Analyst will include: a. Determining from Information Analysts requirements for new system files and devel(N-ing the record structures, file formats, and output products needed to establish and maintain such files. SYSTEM ORGANIZATION Position Descriptions 5.2.3.11. - 28 - Approved For Release 2000/05MOCRHA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30: CgkBP8-03952A000100050001-7 b. Preparing general and special-purpose programs either for the purpose of converting extant machine-language files or to provide new data processing capabilities. c. Testing newly designed programs utilizing the computer and necessary input/output units. d. Conducting studies of system data flow for the development and refinement of programs. e. Determining utilization requirements for input/ output devices including displays, and designing programs to permit exploitation of input/output capabilities. f. Designing quantitive techniques and statistical devices for special program applications. g. Preparing procedures descriptions including coding formats and flow charts for operator task guidance. SYSTEM ORGANIZATION Position Descriptions 5.2.3.11. - 29 - Approved For Release 2000/05/30 : CIA-&PaCI:RiFt3103952A000100050001-7 Approved For Release 2000/05/30 : CIRW8-03952A000100050001-7 Chapter 5.3. DATA BASE 5.3.1, THE SELECTION PROBLEM The selection problem has been with OCR since its inception. No coordinated study of selection as an entire OCR problem has ever been made. Individual registers have established selection criteria, some more formalized than others. An attempt to summarize these criteria for compatibility, or to establish common criteria to be used by all registers was not deemed necessary heretofore. Since each register has been more or less independent, ipso facto, its criteria have for the most part been unrelated to those of any other register. This condition has led to non-uniform levels of coverage and, in some cases, duplicative processing of the same subject matter. Regardless of CHIVE, if OCR adopts a geographical organization posture, uniform criteria for document series and depth of subject indexing become mandatory within geographic component. 5.3.2. BASIC SELECTION CRITERIA Selection criteria will depend on several factors: DATA BASE The Selection Problem 5.3.2. - 31 - Approved For Release 2000/05/30 : CIASRORES-03952A000100050001-7 Approved For Release 2000/05/3tHWTRDP78-03952A000100050001-7 (a) the documents used and information needed by the analytic offices; (b) the all-source concept and organizational configuration thereof. These two factors have to be balanced against the manpower and resultant capability available for the operation. There seems to be a consensus of opinion that several levels of indexing should be applied to the various categories of documents: - Entire series to be indexed in depth. - Entire series to be rejected for depth indexing, but to receive header or bibliographic control. - Entire series to be rejected completely. - Specific documents within a series to be indexed in depth. Selection of an indexing level for a particular document category is contingent upon customer reaction and acceptance, which determination requires discussion of interest in series not covered now and re-examination of series presently covered. Customer participation in determining selection criteria can mean the success or failure of the system in terms of usage. Once the level of indexing is agreed upon, document priorities will need to be established for implementing the CHIVE system since all categories cannot be implemented within the initial System simultaneously. DATA BASE Basic Selection Criteria 5.3.2. - 32 - Approved For Release 2000/05/3SteTRDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIAWf?T-03952A000100050001-7 5.3.3. SOURCES TO BE EXPLOITED The following major document series are planned for CHIVE control: Raw Intelligence Reports (Collateral) State Airgrams Military Attache Reports CIA Reports--00,CS, etc. Military Command Reports Selected Other Governmental--AID, USIA, etc. 25X6 International Organizations--NATO, etc. 25X6 25X1A Cables (Collateral) CIA-TDCS Non-CIA Finished Intelligence U.S. Open Publications and Translations FDD JPRS DATA BASE Sources to be Exploited 5.3.3. - 33 - Approved For Release 2000/05/30 : CIAMU1T-03952A000100050001-7 25X1A Approved For Release 2000/05/35Ctra-RDP78-03952A000100050001-7 COMINT Messages Reports Photo Interpretation Reports (T/KH) Maps, Films, and Ground Photos Miscellaneous Select Contractual-IIIIII etc. State Biographic Cards Unclassified Selected Periodicals, e.g., for China: Peking Review, Survey of China Mainland Press, etc. Criteria for the depth of coverage will be developed by the CHIVE information analyst working in concert with the research offices. He will direct the indexer as to coverage and depth, i.e., which personalities, which organizations, and/or which subjects should be indexed. The CHIVE Indexing Experiment has shown the need for title coverage of most documents regardless of the level of indexing unless the document or series is completely rejected. This includes title preparation for those types to be selectively indexed which have no titles, e.g., non-CIA cables. 5.3.4. LEVEL OF COVERAGE 5.3.4.1. Raw Intelligence Reports Since the information content of IR's supports a DATA BASE Level of Coverage - 34 - 5. 3. 4 . 1 . Approved For Release 2000/05/gtckht-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CAR8-03952A000100050001-7 variety of intelligence interests, all IR's will be considered for some level of indexing. Duplicative information which frequently occurs between sources will be eliminated wherever possible, based on the information analyst's recall capability supplemented by data contained in dictionaries and identifier lists. 5.3.4.2. Cables CIA cables (TDCS's) have always been handled as Information Reports and should be continued as such. As for non-CIA cables, the very fragmentary and highly perishable nature of these cables and the frequent duplication by follow-up reporting would indicate that only a small percentage of these cables are worthy of storage for retrospective search purposes. Only those cables containing positive foreign intelligence infor- mation will be indexed for header control as well as content. All others will be rejected completely--the Cable Secretariat continuing to retain repository responsibility for same. 5.3.4.3. Finished Intelligence The Intelligence Publications Index (IPI) and Special Register's Job 3 are published by OCR to provide current awareness, and, to a lesser extent, retrospective subject - 35 - DATA BASE Level of Coverage 5.3.4.3. Approved For Release 2000/05/30 : CIA4KIKET-03952A000100050001-7 Approved For Release 2000/05/aCM-RDP78-03952A000100050001-7 and area searching for finished intelligence. In addition, some finished intelligence is incorporated into the files of BR and FIB. Since the Agency has a repository responsibility for finished intelligence, bibliographic control over such material will be established in the CHIVE system for document retrieval purposes. Furthermore, some named-object indexing of finished intelligence documents will be performed similar to the control currently maintained by BR and FIB. During the evolution of the CHIVE system, the biblio- graphic and named-object control achieved by the 25X1A 25X1A and subsequent branches will in part duplicate the contents of the IPI and Job 3. This dupli- cation seems unavoidable since the issuance of these publications should continue in order to serve the current awareness needs of analysts, and it does not seem feasible during the implementation period to split the preparation of the publications between CHIVE and the existing activities. When implementation has been completed, however, it will be desirable to investigate the feasibility of producing a permuted title index to finished intelligence from the machine-stored data base as a replacement for both the IPI and Job 3, DATA BASE Level of Coverage 5.3.4.3. - 36 - Approved For Release 2000/05/WW-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CgRE*1-6-03952A000100050001-7 STATSPEC 5.3.4.5. Eapila Translations FDD and JPRS translations can be considered as one type. Heretofore in CIA no subject indexing scheme has incorporated both of these open literature sources. The manpower needed to cope with this large volume (see Table 5-1) is of significant concern. However, broad customer interest dictates in-depth subject and named-object control. DATA BASE Level of Coverage - 37 - 5.3.4.5. Approved For Release 2000/05/30 : CIASHEMT03952A000100050001-7 25X1A I I I pprov d For Release 2000/05/30: CIA-RDP78-03952A000100050001-7 Table 5-1 CHIVE INPUTS Series Approximate Annual Volume Repository Responsibility Bibliographic Content (C) miity B) and/or Control Remainder ,moiliel?IN????????? Ram Intelligence (including Tres's) Cables Finished Intelligence Translations FDD JPRS COMUNT Photo Interpretation Reports Maps Films and Ground Photos 25X6 253,500 192,000 7,803 78,000 items 44,300 items 109,050 7,900 6,000 87,000 903,035 7,200 36 625 items 1,560 items 3,400 items 104 items 12,925 X X IND ?00. ??? 9111 B and. C Not processed Approved For Release 2000/05/30 : CIA-RDP28-0$3952A000100050001-7 B only B and C B and C B and. C B and C B and C B and C Not processed B and C B and C B and C B and C B and C B B and C B and C B cnly B B only B only B only B only B and C Not processed Not processed Not processed Not processed Not processed Approved For Release 2000/05/30: ClAfiefE18-03952A000100050001-7 5.3.4.6, COMINT All hard-copy SI material with the possible exception of military order-of-battle data will be considered for indexing in depth. Teletypes are excluded in their entirety pending the design of an automatic processing capability which will take advantage of the fact that the data is available in machine-language. An information analyst knowledgeable in both collateral and SI may be able to spot duplicative information if such exists. One large series of SI material which, in the present OCR/SR system, is given cursory control, will be studied to determine whether it should receive any title or subject control whatsoever. A few items in this series were processed during the experiment, but the titles were so general as to be practically worthless for retrieval. 5.3.4.7. Photo Interpretation Reports The unquestioned value of this category requires that all published reports receive in-depth content and header indexing. 5.3.4.8. Maps, Films, and Photos These categories of receipts will be excluded from CHIVE processing control because of the specialized knowledge needed for their analysis and input, the DATA BASE Level of Coverage 5.3.4.8. - 39 Approved For Release 2000/05/30 : CIA-EMET03952A000100050001-7 25X1A Approved For Release 2000/05/35ECREIRDP78-03952A000100050001-7 difficulty of separating the indexing function from the acquisition activity, etc. It has been agreed, however, to have these materials indexed by GR and the Map Library according to the CHIVE indexing scheme and the index records will be incorporated into the CHIVE data base. 5.3.4.9. Other A number of miscellaneous classes of documents will also be processed by CHIVE. Most (e.g., press reviews and surveys) will receive named-object indexing primarily. The large volume of State biographic cards needs rigid selection and weeding not only to determine names of interest but also to eliminate repetitive information. Like the there is little high- grade ore contained therein in relation to volume. DATA BASE Level of Coverage 5.3.4.9. - 40 - Approved For Release 2000/05@ket-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClaSRI-03952A000100050001-7 Chapter 5.4. CHIVE INDEXING TECHNIQUE 5.4.1. INTRODUCTION The most critical design element of the proposed system is the indexing system to be applied to input documents; the performance of the system is no better than the data which it is supplied. The transformation of textual material to the system language is an expen- sive process - one which has been given more attention than any other in the Phase II effort. 5.4.2. CONCEPTS 5.4.2.1. Document/Information Retrieval The system will provide combined information retrieval and document retrieval capability. Documents themselves will be at the heart of the system, with their index records providing access to them through content control. The index records will also be the base from which information files will be built. That is, in the process of indexing documents, facts about CHIVE INDEXING TECHNIQUE Concepts 5.4.2.1. - 41 - Approved For Release 2000/05/30 : CIASIECRU-03952A000100050001-7 Approved For Release 2000/05/56Ctra-RDP78-03952A000100050001-7 named things of intelligence interest will be extracted and stored. The approach will be to extract information about specific named objects, keep this information in the context of the document for document retrieval, and manipulate this information out of context for informa- tion retrieval. It is not proposed to create non- redundant summary records from index records at input time either through human or machine collation. Summary records will be formed and maintained on select high- interest personalities, installations, and other finite subjects, but the creation of these records will be an analytic activity requiring the synthesis of index records and documentary information. In addition to the index records, the indexer working aids will themselves be a source of answers to questions. For example, the Organization Identifier List will contain names of organizations, their locations, type of activity, etc. 5.4.2.2. Manual Indexing An investigation of the state-of-the-art of automatic indexing reveals that it is still largely experimental and CHIVE INDEXING TECHNIQUE Concepts 5.4.2.2. - 42 - Approved For Release 2000/0512CW-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClkaaV8-03952A000100050001-7 is not sufficiently precise to meet most of the Agency's retrieval requirements. Automatic indexing techniques usually involve word frequency counts, assigning weights to high-frequency words, and storing these words as index terms. Other techniques include syntactic analysis, sometimes in conjunction with the above statistical process. It is obvious that these techniques could not be applied to an intelligence storage and retrieval system requiring a high relevance/recall rate, since much intelligence information is inferential and inter- pretive and requires analysis for high-quality indexing. Human indexing, therefore, with its recognized faults is still superior to automatic techniques and is the only feasible system for CHIVE. However, some documents will require only title indexing and in these cases automatic title-indexing techniques can be applied. The most notable title-indexing system is the Key-Word-In-Context (KWIC) method. In this system, the key words in titles are permuted so that each word appears in its alphabetic file position along with the other significant surrounding words from the title. The permuted titles can be machine stored for searching on demand, or printed listings can be generated for manual perusal. CHIVE INDEXING TECHNIQUE Concepts 5.4.2.2. - 43 - Approved For Release 2000/05/30 : CIA3KFRU03952A000100050001-7 Approved For Release 2000/05/39EaRAIRDP78-03952A000100050001-7 5.4.2.3. Depth--Subjects vs. Named-Oblects It need hardly be argued that intelligence interests are catholic in nature, and that if an information storage and retrieval system arbitrarily decides to limit its coverage to personalities, installations, or conceptual- type subjects, it automatically limits its ability to satisfy its total customer population. Intelligence analysts have found that "named-objects"-- e.g. installations, personalities, organizations--most often provide the clues to resolving research problems. OCR request experience is an accurate reflection of this interest. We recommend, therefore, that these subjects receive the greatest emphasis; and, in view of OCR experience relating to the kinds of things users are interested in concerning named-objects, we recommend that an increased number of attributes of named-objects be brought under control. The latter are the elements of information which identify a named object, e.g., a person's address, organizational affiliation, etc. In-depth indexing of named-object attributes does not necessarily have to mean an equivalent increase in the volume of data indexed or in CHIVE INDEXING TECHNIQUE Concepts 5.4.2.3. - 44 - Approved For Release 2000/05gfefe-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIARSI3g-03952A000100050001-7 indexing time since common attributes, such as addresses, types of organizations, and products of an installation, will be stored in indexer identifier lists (see Section 5.4.2.5. below), and it will not be necessary to re-index this data when it is reported repetitively in documents. We recommend that subject indexing, that is, the kind of indexing performed by the Intellofax system and the Subject/Commodity Section of the Special Register be continued at least to the present level, but on a broader data base to include important document series (e.g., foreign translations) which are excepted today. 5.4.2.4. Index Language; Linkage The CHIVE indexing language consists of controlled entries taken from identifier lists and code schedules, as well as words and phrases extracted directly from documents. 5.4.2.4.1. Identifier Lists and Code Schedules In the case of certain kinds of named-objects, identi- fier lists are required to ensure that the same organization, place, etc., is always entered in the same manner so that information is not missed during retrieval because of CHIVE INDEXING TECHNIQUE Concepts 5.4.2.4.1. - 45 - Approved For Release 2000/05/30 : CIPSBORET-03952A000100050001-7 Approved For Release 2000/05SKIMA-RDP78-03952A000100050001-7 incorrect or synonomous entries. In the subject indexing area, a subject authority list or code scheme is required to control the depth of indexing, synonyms, and homographs. In some cases, the authorized entry form will be identical to the way the entry will frequently appear in documents. In other instances, the entry will be converted to a code to either express the hierarchic structure built into the identifier list--e.g., the hierarchic arrangement of organizations in a Communist country--or to compress a long entry into more abbreviated form to conserve storage space. 5.4.2.4.2. Extracted Words and Phrases Words or phrases extracted from documents are used (a) to index certain kinds of named-objects which will not receive identifier list control, (b) to give greater specificity to subject indexing, and (c) to provide information retrieval via the index record. In the first instance, it is felt that identifier list control of all named-objects is impractical and impossible. Where the volume of reporting is reasonably restricted, or where one can predict fairly well which CHIVE INDEXING TECHNIQUE Concepts 5.4.2.4.2. - 46 - Approved For Release 2000/05/itaArRDP78-03952A000100050001-7 Approved For Release 2000/05/30: CgOPT8-03952A000100050001-7 named-objects will be the subject of customer queries, it makes sense to control input through identifier lists. Such is the case, for example, for place names and priority organizations and installations. Personalities, however, are neither few in number nor can one readily anticipate which names will be requested. Similarly, in the case of lower-level installations, it would not pay to exercise a high degree of input control when it is probable that the referenced information will be retrieved infrequently, if at all. For both these categories, therefore, we recommend that the burden of overcoming the synonym problem be transferred to the output end of the system. Key words taken from documents are added to subject index categories to provide greater retrieval specificity without complicating the subject schedule. The subject indexing vocabulary provides a medium-depth, generic searching capability. Key words added to the subject schedule provide a specific search capability, e.g., equipment nomenclatures, types of research, new concepts, etc. The third application for entering key words from documents is to provide a level of information retrieval. CHIVE INDEXING TECHNIQUE Concepts 5.4.2.4.2. - 47 - Approved For Release 2000/05/30 : CIAMMU03952A000100050001-7 Approved For Release 2000/05/3UMTRDP78-03952A000100050001-7 In this case, the entry is uncontrolled, but the class of entry is searchable. For example, one of the personality attributes in the CHIVE system is "Reason for Travel." information would be provided by the index record which would aid in selecting documents or in some cases obviate the need to refer to documents. 5.4.2.4.3. Linkage Index entries which are related (e.g., an organization and its address) will be linked together in the index record so that the relationship can be interrogated at the index record level, thus negating the need to refer to documents to determine ties among elements of information. This is necessary because intelligence documents typically include many people, organizations, areas, subjects, and their interrelationships. If there were no way to deter- mine the contextual relationship between these subjects, the system would be overburdened with false retrieval matches (false drops) requiring reference to many irrelevant documents. CHIVE INDEXING TECHNIQUE Concepts 5.4.2.4.3. - 48 - Approved For Release 2000/05/?ElaW-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CKWT78-03952A000100050001-7 Linkage can be accomplished through the use of formatted input, as is typical in punch card systems (i.e., all entries in one defined record are by definition linked), or by appending a linkage symbol to each index entry, as is typical in systems utilizing un- formatted input. Formatted input records are not practical for CHIVE because of the long record lengths and large number of variable elements of information included. Experimentation with appending the linkage symbol to each entry has worked very successfully and will be adopted. 5.4.2.5. Requirements for Identifier Lists and Thesauri The use of identifier lists is recommended for the following reasons: (a) There is little consistency in the way named- objects are reported, e.g., the Institute of Physics of Moscow University may be referred to as the Moscow Institute of Physics, or the Moscow Physics Institute, or the Physics Institute of Moscow University, or the Nuclear Physics Institute, etc. Even place names are translated and transliterated in a variety of ways. Therefore, if named-objects were entered as reported, it would be a very difficult retrieval problem to determine the right synonyms to use in order to find the variant entries. An identifier list includes variants but allows only one correct entry format. CHIVE INDEXING TECHNIQUE Concepts 5.4.2.5. - 49 - Approved For Release 2000/05/30 : CIASMftT03952A000100050001-7 Approved For Release 2000/05/3CUCIRMDP78-03952A000100050001-7 (b) An identifier list (e.g., for organizations) contains not only the name of the organiza- tion, but also a number of identifying attri- butes of the organization, including address, commodities produced, etc. This capsule summary aids the indexer in identifying and discriminating among organizations and improves the quality of the indexing. (c) As was pointed out earlier, an identifier list helps decrease redundant indexing because the common attributes of a named-object do not have to be repetitively indexed when they are listed in the identifier list. (d) Identifier lists are of value for answering queries of a non-complex nature such as the correct spelling of an organization or place, the precise location of a facility, etc. Identifier lists will be required for installations and organizations, place names, significant national and international meetings and conferences, and personalities on whom physical or logical dossiers are maintained. The initial identifier lists will be constructed from the machine language data which exists in OCR, and will be issued to indexers in machine-listing form organized geographically in the various sort orders as required. Key words will be appended to hierarchic classification terms to reflect the terminology of documents and to provide greater search specificity. The initial concept is that these words will be entered as written in documents and will not be subject to thesaurus control. The key words CHIVE INDEXING TECHNIQUE Concepts 5.4.2.5. - 50 - Approved For Release 2000/05/311CM-RDP78-03952A000100050001-7 Approved For Release 2000/05/30: ClAfiefq8-03952A000100050001-7 may be printed out, however, in answers to queries on the hierarchic subject codes to which they are appended, and should aid in determining which documents are relevant. For example, if a requester is searching for a particular aluminum alloy and three of the index records retrieved refer to alloys in which he is not interested, the requester can screen out these references from further consideration. If in the future it is determined that dictionary control over key word entries will raise the quality of the indexing and retrieval, key word thesauri can be created by obtaining listouts of the key words which have been applied to the individual hierarchic codes. These key word lists would be turned over to dictionary editors who would resolve synonym and homograph problems and weed out undesirable terms. It is felt that this method of building a thesaurus, i.e., building it from the actual terminology used in documents, is both superior to and cheaper than trying to adapt an established dictionary to the Agency's indexing problem. In addition, one can take advantage of the uncontrolled key word indexing prior to the building of the thesauri. CHIVE INDEXING TECHNIQUE Concepts 5.4.2.5. - 51 - Approved For Release 2000/05/30 : CIA-SIMET03952A000100050001-7 Approved For Release 2000/05/3g6KRIDP78-03952A000100050001-7 5.4.2.6. Header Data Indexing The foregoing discussion dealt with CHIVE concepts related to indexing the subject content of documents. Another important aspect of document indexing relates to the so-called header (or bibliographic) elements of the document such as title, author, control number, etc. Header data indexing is required for the following reasons: (a) To obtain bibliographic control of documents over which the Agency has a repository responsibility. (b) As searching parameters in conjunction with subject or named-object searches. (c) To provide minimum index control over docu- ments which are not indexed in depth. In the first instance above, header data control would perform a service comparable to that performed by the source card file maintained by the CIA Library. The machine-stored header data record will be used to verify the receipt of documents in the Agency and to re- cover specific documents whose control numbers are unknown. In the second instance, header data control will be used most often to limit searches (e.g., searches can be restric- ted to certain document series or dates), or a subject request can specify that information is required only when authored by a particular scientist. In the third CHIVE INDEXING TECHNIQUES Concepts 5.4.2.6. - 52 - Approved For Release 2000/05SeCREA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIWPT8-03952A000100050001-7 instance, header data will provide minimum, but important, search keys at very little input cost. Permuted title indexes can be published for certain series (e.g., finished intelligence) in lieu of in-depth indexing. Similarly, searches can be made for all reports issued by a particular post during a specific time period when an important event occurred. In this latter case, all documents can be retrieved whether they were subject indexed or not. Whereas the selection of documents for content indexing will be subject to well-defined criteria and therefore limited, it is anticipated that most documents can be brought under header data control. This possibility is rendered more likely by the fact that header data indexing (with the exception of title expansion) can be performed by clerical personnel, as borne out by the recent CHIVE Indexing Experiment. 5.4.3. SYSTEM DESCRIPTION What follows is a summary description of the indexing technique. A detailed description is given in Appendix 5.C. CHIVE INDEXING TECHNIQUES System Description 5.4.3. - 53 - Approved For Release 2000/05/30 : CIA-SEIMF03952A000100050001-7 25X1A Approved For Release 2000/05/3gMTRDP78-03952A000100050001-7 5.4.3.1. Elements of Information and Indexing Tools As stated above, the CHIVE indexing concept includes "named-objects" and "subjects." Named-objects refer to people, places, organizations/facilities, and conferences/ meetings. Subjects include commodities, concepts, research activities, military activities, and all other topics and events which do not fall under the above-defined named- objects. 5.4.3.1.1. Personalities Personality names will be entered more or less as they appear on documents. Only those misspellings will be corrected which it is possible to recognize without reference to identifier lists or other support files. The use of name search tools such as the Name Tables and printouts of unique personal name/surname combinations entered into the system will be investigated as substitutes for controlling names during input pro- cessing. When a specific name is searched, and all of the records relating to that personality have been identified, this identification will be retained so that subsequent searches for the same personality will have to address only those records which have been entered since the previous search. CHIVE INDEXING TECHNIQUES System Description 5.4.3.1.1. - 54 - Approved For Release 2000/051KROA-RDP78-03952A000100050001-7 25X60 Approved For Release 2000/05/30: CI5kEkBg8-03952A000100050001-7 A detailed list of the attributes of personalities which will be indexed is included in Appendix 5.C. Most of these attributes will be entered in a prescribed manner and thus will be available for direct searching in term files. For example, all locations will be entered from approved gazetteers, dates will be formatted, organi- zation affiliations will be entered from organization identifier lists, etc. This will provide the capability to make information retrieval type queries from the index 5.4.3.1.2. Organizations/Installations Two levels of control will be applied to organizations and facilities. Priority organizations will be included in identifier lists. These lists will also include significant attributes of the organization, e.g., addresses, synonymous names, function code, products, etc. The lists will be built from the machine language data which exists in SR, BR, and FIB. The organization identifier lists will be issued on a country basis in several arrangements, i.e., by name of organization, by function, and by place name location. CHIVE INDEXING TECHNIQUES System Description 5.4.3.1.2. - 55 - Approved For Release 2000/05/30 : ClAglECIRE8103952A000100050001-7 Approved For Release 2000/05/36Eetk1RDP78-03952A000100050001-7 For organizations on the list, the indexer will enter an identifying number in lieu of the organization's name, thus ensuring that all indexed information relating to a specific organization can be retrieved exclusive of other organizations with the same or similar names. Attributes of organizations included in the identifier lists will not be re-indexed when the same information is repetitively reported. Low-level installations and organizations will not be identifier list controlled. They will not be indexed by name but rather by location and a function code. It may be desirable later to produce listings of these facilities for mapping and aerial photographic customers. Once these listings are established, it is unlikely that any further indexing of these facilities would be required unless the status of the facility changed. 5.4.3.1.3. Area/Locations For indexing large geographic areas, e.g., blocs, countries, and provinces within countries, the ISC area code has proven a satisfactory tool and it is recommended that it, or a similar country code, be adopted. For place CHIVE INDEXING TECHNIQUES System Description 5.4.3.1.3. - 56 - Approved For Release 2000/05/?tcak-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClAFFW15-03952A000100050001-7 names within countries, there are a number of gazetteers available, e.g., OCR generated gazetteers, the NIS gazetteer, etc. The NIS gazetteer has recognized faults, but it is generally conceded to be the most authoritative tool available and it is recommended that it be used as the authority for entering place names. The basic gazetteer will be updated with new place name entries encountered in documents and will be issued on a country-by-country basis. Place names will be entered in clear text as they are spelled in the gazetteer, appended to the appropriate country code. Geographic coordinates will be entered in index records only when they are not associated with a place name. Coordinate searches will be accomplished by a machine search of the gazetteer to locate the appropriate place names having the desired coordinates, followed by a search of the place name term file plus a search for those coordinates that were disassociated with a place name. 5.4.3.1.4. Meetings/Conferences Significant national and international meetings and conferences will be controlled in identifier lists. CHIVE INDEXING TECHNIQUES System Description 5.4.3.1.4. - 57 - Approved For Release 2000/05/30 : CIAMHOF03952A000100050001-7 Approved For Release 2000/05/3MCERATRDP78-03952A000100050001-7 Earlier comments on the use of identifier lists for organization control apply to this category also. Less significant conferences will not be indexed by name, but will be subject indexed with appropriate ISC subject codes. 5.4.3.1.5. Subjects The Intelligence Subject Code has been used throughout the Intelligence Community for a number of years for subject indexing, and it is generally recognized as the best general indexing tool for intelligence documents. For these reasons, CHIVE has recommended that it be used as the basic subject indexing tool in a revised OCR system. However, during the CHIVE Indexing Experiment, several weaknesses were noted which should be corrected prior to its adoption in a going system. 5.4.3.1.5.1. ISC Structure The 1960 revision of the ISC did much to simplify its structure. However, experience in using this edition points to several areas where further simplification is desirable. CHIVE INDEXING TECHNIQUES System Description 5.4.3.1.5.1. - 58 - Approved For Release 2000/05/?kett-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Ca-gg8-03952A0001000500017 (a) Expanded Use of Modifiers: The ISC subject modifiers are a faceting device which can be combined with certain subjects to specify actions or states which affect those subjects. For example, the modifier "049 Production" can be combined with any commodity to indicate production of the commodity. The 1960 revision greatly expanded the use of these modifiers over previous editions, but further expansion is desirable in two ways: (1) The 1960 revision limited the use of the modifiers, i.e., each modifier could only be used with specific chapters or sections of the ISC. As a result, in sections where a modifier cannot be applied, it has been necessary to set up a subject code in lieu of the modifier. For example, modifier "069 Government Policies, Laws, Legislation, etc." can only be applied to the commodity chapter of the ISC. As a result, a subject code for government policy has had to be set up in various non-commodity sections of the ISC. If the modifiers were freed and the redundant subject codes deleted, it would increase the efficient application of the ISC. During the CHIVE Indexing Experiment, the modifiers were freely applied, and no particular difficulties ensued. (2) In a subject classification system, the same subject is often repeated in several different sections because each section gives a different meaning or emphasis to the subject. For example, in most classifi- cation systems, guided missile subjects would be found under engineering, production activities, and military activities. This repetition is logical, but it complicates the structure of the system and makes it hard to apply. A generalist indexer often CHIVE INDEXING TECHNIQUES System Description 5.4.3.1.5.1. - 59 - Approved For Release 2000/05/30 : CIAWIRAT03952A000100050001-7 Approved For Release 2000/05/AEUEIRDP78-03952A000100050001-7 finds it difficult to determine whether the information he is reading is oriented toward engineering or production. If he mistakenly puts engineering information under production, it may be lost in a later retrieval run. With the addition of some new subject modifiers, much of this repetition could be eliminated, i.e., the various subject facets could be shown through the use of modifiers to distinguish production from military activities, etc. This would also considerably reduce the size of the ISC. (b) Expanded Use of Clear Text: Some of the detailed subject breakdowns inthe ISC could be eliminated with a more liberal use of clear text. During the experiment's indexing consistency test, it was found that there was a low-level of consistency in applying the ISC. This can be attributed to the depth of subject detail in the ISC, i.e., one indexer will use "621.349 Uranium" and another indexer will index the same subject matter using "621.351 Natural Uranium." If some of this subject detail were further reduced so that there was only one subject code for uranium, the consistent application of the ISC would rise measurably. Moreover, indexing specificity (e.g., natural vs. enriched uranium) could still be achieved by using controlled clear text as an extension of the subject code. The advantage of this approach is that with more consistent application of the ISC there is less likelihood of losing information. This may often put a burden on the searcher in that with fewer subject categories, more material will initially be retrieved, but this is preferable to losing information and the free use of clear text can help alleviate the problem. Thus, if the clear text is uncontrolled, it can be used as a screening device to get rid of unwanted references, or if it is controlled, it can be used as a searching device to restrict the volume retrieved. CHIVE INDEXING TECHNIQUES System Description 5.4.3.1.5.1. - 60 - Approved For Release 2000/05attalf-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CI1AP-8-03952A000100050001-7 5.4.3.1.5.2. Subject Schedules for Occupations and Installations A subject schedule or code is required for occupations 25X60 and installation types in order to respond to queries on such subjects as all 25X60 During the recent Indexing Experiment, specified subject codes in the ISC were designated for this purpose. Since the ISC was not constructed with this aim in mind, the designated codes proved quite inadequate. Problems were caused by the previously alluded to duplication of subjects (e.g., an atomic installation could be indexed in several different places), and by the multiplicity of subjects in the ISC (i.e., a rather simple code schedule was required, and the ISC was too detailed for the required need). In addition, the ISC did not have appropriate subjects for some occupation and installation categories. *This need for a generalized subject schedule for occupa- tions and installation types is to be distinguished from the requirement to retrieve by specific activity. The latter capability will be provided either through the ISC code itself or, where necessary, through ISC plus key word. CHIVE INDEXING TECHNIQUES System Description 5.4.3.1.5.2. - 61 - Approved For Release 2000/05/30 : CIA-SE1V03952A000100050001-7 Approved For Release 2000/05/3gCERRIRDP78-03952A000100050001-7 In view of the above problems, it is recommended that the ISC not be modified to perform this function, but that separate subject schedules be developed based on the experience available in the Foreign Installations Branch and Biographic Register. 5.4.3.1.5.3. Area Rules The present ISC area rules proved inadequate on several counts during the recent experiment. (a) The terminology is sometimes confusing--e.g., some rules read "nationality is primary area." Since the CHIVE indexing procedures provide for area tags for nationality and primary country, the terminology is subject to ambiguous interpretation. (b) Some of the rules are illogical--e.g., there are two subject codes in Chapter VII which can be used for foreign military training and the area rule for one of the codes is the opposite of the other. (c) The CHIVE indexing technique allows more flexibility in area relationships than is allowed in the ISC as used in the Intellofax system. Consequently, there are many subjects which do not have area rules which should have them appended for CHIVE purposes. All these rules should be re-examined and modified before the CHIVE system goes operational. CHIVE INDEXING TECHNIQUES System Description 5.4.3.1.5.3. - 62 - Approved For Release 2000/05MCW1-RDP78-03952A000100050001-7 SECRET Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 5.4.3.1.5.4. Subject Gaps Prior to the recent experiment, it was felt that a number of subjects occurred in Codeword materials which did not appear in Collateral documents and that, therefore, the ISC would not be adequate for indexing these materials. For this reason, sections of the Special Register code manual were utilized as a supplement to the ISC during the experiment. As it turned out, the SR supplement was not used a great deal because the ISC had subject cate- gories which were almost comparable. However, there are a limited number of special-purpose subjects which should be added to the ISC to make it fully suitable for all- source indexing. 5.4.3.2. Tags Each entry in the CHIVE indexing system is preceded by a tag. A tag is a three-digit mnemonic symbol which identifies the entry which follows. Tags are used to: (a) Distinguish between homographs, e.g., Washington a person vs. Washington a city or street. (b) Organize machine files, i.e., separate people's names from organizations and subjects and thereby facilitate searching. CHIVE INDEXING TECHNIQUES System Description 5.4.3.2. - 63 - Approved For Release 2000/05/30 : ClAk5liFt3103952A000100050001-7 Approved For Release 2000/05/AE:CaRTRDP78-03952A000100050001-7 The CHIVE tags were made mnemonic as a memory aid. The first character of a tag represents a major subject category, e.g., "P" = Personality, "0" = Organization, etc. The second and third characters further specify the element of information being indexed, e.g., "PVN" = Personality Name Variant, "POH" = Personality Organization Head, etc. (See Appendix 5.C. for a detailed list of the CHIVE elements of information and their associated tags.) 5.4.3.3. Phrasing The requirements for linkage were discussed earlier. In the CHIVE system, linkage is accomplished through a system defined as phrasing. A phrase is simply a group of tags and terms which the indexer relates together with a unique number which is assigned to each tag and value in the group. On retrieval, queries can specify that the input linkage must be present for the query to 25X6 be satisfied--i.e., a query may specify all information on a person 25X6 Without phrasing (linkage), all documents which contained CHIVE INDEXING TECHNIQUES System Description 5.4.3.3. - 64 - Approved For Release 2000/05/MM-RDP78-03952A000100050001-7 25X60 Approved For Release 2000/05/30: CIWPf8-03952A000100050001-7 these two terms would be retrieved and in some cases the relationship would be accidental. On retrieval, the phrase linkage can be reconstituted by testing for those terms which have the same document accession number and phrase number. The rule for phrasing is very simple. All terms which are logically related can be combined in a phrase. Thus, if a person is affiliated with an organization in these three elements of information can be combined together in a phrase. However, if additional information were given that this individual also traveled 25X6 to an additional phrase would have to be constructed otherwise it might be interpreted that the organization, if it appeared in the same phrase, was located in both 25X6 Phrases can be very simple or complex. The simplest phrase contains only a place name or an area and one subject. A complex phrase may contain a number of index terms which constitute a rather detailed biographic sketch of an individual. Further details on phrasing with examples are contained in Appendix 5.C. CHIVE INDEXER TECHNIQUES System Description 5.4.3.3. - 65 - Approved For Release 2000/05/30 : CIASIONT03952A000100050001-7 Approved For Release 2000/05/3WAETRDP78-03952A000100050001-7 5.4.3.4. Header Data Indexing Header data indexing will be performed by clerical personnel who will type the information on formatted transcript sheets. During the recent experiment, as illustrated in the header section of Appendix 5.C., a single transcript sheet was used for all documents. This does not appear to be as efficient as developing unique transcript sheets for different series. The elements of information comprising header data will be taken from the document or the information from the document will be translated into code to achieve conciseness and uniformity of entry. A formatted transcript sheet can be used since the header data elements are fixed in number for each document series, and the length of entries is either fixed or a maximum field length can be determined. The use of a formatted sheet obviates the need for tags and the only linkage required is to the document control number. The latter will be appended automatically to each header data term. A detailed list of the elements of information comprising header data is included in Appendix S.C. CHIVE INDEXER TECHNIQUES System Description 5.4.3.4. - 66 - Approved For Release 2000/05aGCRGIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIV-gg8-03952A000100050001-7 Chapter 5.5. SYSTEM FILES 5.5.1. INTRODUCTION This chapter classifies and describes the logical files and sub-files which will be available in the CHIVE system. These are the files which are identified to the user--i.e., the CHIVE information analyst and, perhaps ultimately, the research analyst. They are the files he must be familiar with, if he is to take full advantage of the resources of the system and exploit it intelligently. The total number of individual system files, including old as well as new, might easily exceed a hundred. However, it is possible to classify all the various files into no more than nine types, each with very distinctive functions and properties. categories are as follows: Document Index Files: Files These nine containing all the raw document index records in the system, including not only the complete index records themselves but the access mechanism to these records. The documents referenced by these records may include any form of information carrier --e.g., maps, photos, films, or other, and need not neces- sarily be readily accessible to the system. - 67 - Approved For Release 2000/05/30: CIA?kW-03952A000100050001-7 SYSTEM FILES Introduction 5.5.1. Approved For Release 2000/05/AKWRDP78-03952A000100050001-7 Vocabulary Control Files: Files required to insure consistent entry of index terms (tag and value) into the Document Index Files and other system files. The principal function of these files is to reduce the synonym problem at search time. They include "identifier files" for named objects (which, like scope notes in a code schedule, help to distinguish one specific subject from another), code books, dictionaries, thesauri, and other authority lists. Unsynthesized Information Files: Files consisting of select phrases or terms extracted from documem_ index records or directly from the raw documents themselves. Such files would be built to facilitate retrieval where a substantial number of requests for the pertinent data can be anticipated on a continuing basis. Unlike Summary Information Files (see below), records in these files would often contain duplicative and/or contradictory information. Periodically, however, inforation in such files might be reviewed and added to the appropriate Summary Information Files. Summary Information Files: Files built either from records (or portions of records) in the Document Index Files, from records in Unsynthesized Information Files, or from the raw documents themselves during or after input processing. The distinguishing feature of these files is the fact that they will ordinarily contain evaluated, non- redundant data about named objects or events associated SYSTEM FILES oduction Approved For Release 2000/0WricRe114-RDP78-03952A00010000.#1t. Approved For Release 2000/05/30 : ClaSP1-03952A000100050001-7 with named objects. Named-object identifier files could be placed in this file category, the only apparent difference being the limited amount of historical data ordinarily found in such files. Special Project Files: The unique features of these files are as fellows: (a) the inputs to the files originate outside CHIVE; (b) CHIVE actually acquires the files and not simply "profiles" thereof; (c) additions or modifica- tions to the files can be anticipated; (d) the files do not use the elements of information and/or vocabulary controlled in CHIVE. Special Project Files may otherwise have the properties of any of the file classes named above. These files will be processed by CHIVE but maintained by CIA or other agency analysts. The degree of CHIVE involvement in such files remains to be determined since the responsibility for such files is currently assigned to the Applications Division of OCS. Referral Service Files: These files differ from Special Project Files in that they are not substantive data files but rather descriptions or profiles of files located outside the CHIVE system. Referral Service Files will consist both of profiles of analysts' special fields of competence as well as files maintained by analysts and/ or information repositories external to CHIVE. CHIVE will not maintain, or retrieve data from, the substantive files themselves. It will simply inform customers of those files potentially relevant to a given query. SYSTEM FILES Introduction - 69 - 5.5.1. Approved For Release 2000/05/30 : CIA1M03952A000100050001-7 Approved For Release 2000/05/%69K-RDP78-03952A000100050001-7 Document Image Files: Files of documents stored by the CHIVE system. From a functional point-of-view they include "aspect" systems (where the index is stored separately from the documents) as well as self-indexed document files. Both existing OCR document collections as well as CHIVE-originated document repositories are encompassed by this category. The storage media for such files will include hard copy, various types of microimages, and even digital storage in some instances. Similarly, the categories of documents involved will differ widely in size, shape, classification, and point of origin. Management Data Files: Files of data collected on the activity of the CHIVE system to (a) enable operational management to evaluate the cost/performance ratio of the system and (b) to guide system designers in improving hardware and software support. From the point-of-view of what data is collected, most of the Management Data Files will have to do with either system processing times or processing volumes. System Processing Files: Files used to support the system in processing data. Most such files will be organized in table form enabling values to be obtained from arguments. Examples would include a file of legal tags and other error correction files, decode dictionaries which would convert codes into clear text for display to a reader, SYSTEM FILES - 70 - Introduction 5 . 5 .1. Approved For Release 2000/05$11CROA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIAW8-03952A000100050001-7 intermediate files which exist only temporarily during the processing, of a transaction, working storage files, etc. Since these files are largely internal to the CHIVE EDP System and the information analyst need not interact with them in any direct way--only know what functions the system is capable of performing--they will not be covered further in this volume but rather in Volume VII of the report. For each of the file categories listed above a second-level categorization may be required, i.e., one which classifies CHIVE files fron the point-of-view of the origin of the files. These classes are three in number: Chive-Built Files: Files built by and for the CHIVE system either from new inputs or through the conversion of existing OCR files to the format and vocabulary of CHIVE. These files will be continually updated as part of the regular processing cycle. Inherited Files: Files originally established by the various OCR systems which it was not found possible to integrate with new CHIVE files. Such files will include records in hard copy as well as machine language. In some instances these files may be transferred to another storage medium (e.g., magnetic tape) if querying and output can thereby be improved. Similarly, some existing machine- readable files may be restructured and interrogated in SYSTEM. FILES - 71 - Introduction Approved For Release 2000/05/30 : CIASMEBT03952A000100050001-7 5.5.1. Approved For Release 2000/05/31EaRITIRDP78-03952A000100050001-7 the vocabulary of a single CHIVE language. Neither of these changes, however, implies true conversion to the CHIVE system. Another significant difference between these files and Chive-Built Files is that while both will be used by the CHIVE information analyst, no additions will be made to the Inherited Files once the CHIVE system is fully operational. Supplemental Files: Files not built or maintained by CHIVE, nor inherited from OCR, but which contain data functionally useful to CHIVE as a secondary source of information. All Special Project Files (see above) fit this category, as do reference aids of various kinds (e.g., Who's Who compilations, gazetteers, commercially published indexes, etc.) obtained from external sources and left essentially in the form in which they were received. In the broadest sense the CHIVE system must necessarily include not only the new files it creates but the files it inherits from the existing system. The discussion of these separate but related subjects, however, has been divided in the pages to follow to lessen the possibility of losing the reader in the file forest. In the main body of this chapter, we will focus primarily on the CHIVE-Built Files, providing a summary description of their functions, data content, maintenance criteria, and SYSTEM FILES Introduction 5.5.1. - 72 - Approved For Release 2000/05g1E0W-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIASIkgg-03952A000100050001-7 other characteristics. Appendix 5.D to this volume describes the principal Inherited Files which must be accommodated by the system, with primary attention given to those which fall in the categories of Document Index and Document Image files, as defined above. It should be emphasized that the basic objective in this chapter is to communicate a more or less static image of the files in order to simplify understanding of the structural framework (or file philosophy) of the system. In Chapter 5.6. we will examine the more dynamic aspects of file activity within the system, i.e., the transactions which will affect the files, interactions which might take place between files, etc. 5.5.2. DOCUMENT INDEX FILES 5.5.2.1. Master Index File (MIF) The Master Index File of the CHIVE system will contain the index entries for all the documents available in the CHIVE system as well as certain classes of documents located in repositories not under CHIVE management. Examples of the latter include maps, the storage responsibility for which will be retained by the Map Library Division of ORR, and select open-source books and periodicals which may be accessible only at the Library of Congress or at some other holding agency. Conceivably, certain documents indexed by SYSTEM FILES Domment Index - 73 - Approved For Release 2000/05/30 : CIAIRInt-03952A000100050001-7 Approved For Release 2000/05/kTRDP78-03952A000100050001-7 CHIVE may not even be available at all--for example, select Soviet periodicals never received in this country, but which were described in secondary sources that were accessioned. In all cases, however, whether the original source document is readily available or not, the preparation of index records for the CHIVE Master Index File will be under CHIVE format and vocabulary control no matter where the records are physically prepared. All index records will be stored in such a manner that a search, based on certain criteria, will produce all the records in the system or, at the customer's option, phrases and/or terms within records which may apply to the search criteria. The index records will contain sufficient information to enable the requester to determine if the document referred to in the index entry should be requested for detailed study. In the case of named- object associated informPLion, the entries will have sufficient information-bearing content to permit summary data files to be built and responses given to certain queries directly from the index records themselves without referral to the source documents. Records entering the Master Index File will originate from the following sources: - CHIVE information analysts processing incoming documents in the CHIVE geographic divisions. - Graphic analysts indexing photos and films in the Graphics Register (GR). SYSTEM FILES - 74 - Document Index Files Approved For Release 2000/05/acWr-RDP78-03952ACIC5i00050001-7 uproif mod Tommie 4111W Approved For Release 2000/05/30 : eragD78-03952A000100050001-7 - Map catalogers processing maps in the Map Library Division (ML), ORR. - Miscellaneous additional organizations (either under Contract to CHIVE or agreeing to follow CHIVE input procedures) exploiting primarily foreign language documents. Examples of such organizations might be the Library of Congress, FDD, etc. - Documents received by CHIVE in machine language (e.g., Comint teletype) on which a limited form of automatic indexing is to be performed. - Machine-converted document index files from existing central repositories. With regard to input selection criteria, assuming continuation of present practices, CHIVE will have the responsibility to serve as the Agency's repository for community-published positive intelligence materials (with the exception of cables and maps), and to provide reference service on "active" documents. In addition, it will pre- sumably assume OCR's obligation to serve as the office of record for archival storage of certain CIA document series In order to fulfill these responsibilities, CHIVE will be obliged to index at least the header (or biblio- graphic) data for every "intelligence" document received. By "intelligence" documents we mean all categories of textual materials generally considered to be in the mainstream of intelligence reporting. These include Comint (messages, reports, and teletype), T/KH reports, USIB-produced IR's USIB-produced finished intelligence, the FBIS, photo enclosures to IRs, and USIB-produced trans- SYSTEM FILES - 75 - Document Index Files Approved For Release 2000/05/30 : CISMISO78-03952A0001600%080117 Approved For Release 2000/05/3WaIRDP78-03952A000100050001-7 lations of foreign documents. By agreement with the Library it will also store map index records generated by ML. The preparation of index records on other categories of materials e.g., cables, non-USIB-produced reports studies,ni films, and original open-source literature, depend on the substantive content therein. The content of document index records can include any term type permitted by the vocabulary of the CHIVE ndexing system. (For a list of all permissible term see Appendix 5.C.) No single record will, of course, oortain all poss bte term types sin-e some terms will be uue to certain kinds of documents. Outputs from the Master Index File will consist of scheduled and ad hoc products. The principal items be provided within each category are briefly described Scheduled Products KWIC listin of titles or expanded titles of all documents which ha- not been content indexed, as well as the PHIS laily Reports. The permuted port!on of the list4nrY will be the title and exnahded title, wh.le the reference portion will clude basic header data for the document including document control number. Separate, as well as combined, listings will probably be run for the different categories of documents involved, e.g., SI Teletype, FBIS, Finished Intelligence, and Raw Intelligence Reports, - 76 - Approved For Release 2000/055513ERGR-RDP78-03952A000100050001-7 SYSTEM FILES Docnr?ni- Index Files 5., 2.1. Approved For Release 2000/05/30 : CIAFFW8-03952A000100050001-7 - Map catalog cards in 3" x 5" form containing in clear text on each card the entire index record for a map. This record would include accession number, area code, subject, scale, classifica- tion, map title, date of publication and name of publisher. The cards would be outputted in the sequence of the Map Library Card Catalog to facilitate interfiling at the Map Library. - Output similar to the map cards, but reflecting index records on films stored in the Master Index. In this instance, the records will probably be of tab card size to conform with the size of the existing file. The sequence will also conform with the existing Intellofax reference card file on the film collection. - Accessions lists comprised of clear-text index records on maps, ground photos, and perhaps select additional document receipts (e.g., tables of contents of foreign scientific periodicals) processed by CHIVE. Ad Hoc (Query) Products - Listings in natural language of document index records or subsets thereof (i.e., phrases within records) containing the search terms specified in the query. Subject or concept-oriented queries will normally require output of complete document index records including header as well as content data. Named-object-oriented queries will ordinarily result in the output of select phrases only which match the search criteria, together with a limited amount of header data (e.g., document classification, document type, and appropriate document control numbers). - Listings of control numbers only for documents whose index records match the search parameters. SYSTEM FILES Document Index Files 5.5.2.1. - 77 - Approved For Release 2000/05/30 : CIA-Millif03952A000100050001-7 Approved For Release 2000/05/3gC:RAIRDP78-03952A000100050001-7 - Listings containing simply a computed figure of the number of index records matching the search parameters. This kind of intermediate output will enable the customer to broaden or narrow the search prescription depending on the volume of the anticipated output. - Listings of index records containing terms which match standing customer queries or analyst interest profiles. Whenever hits occur because new information is received on a subject or person of interest to a particular research analyst, the customer would be notified through the transmittal of the listing containing the pertinent record. 5.5.3. DOCUMENT IMAGE FILES The Document Image File (DIF) is the central repos- itory for active textual intelligence documents. Maps and graphics are to be retained within the respective organizations currently responsible for their retention, although all of these will be used in conjunction with the computer-based Master Index File described in the previous section. The Document Image File shall consist of those textual intelligence documents for which the Agency has repository responsibility as well as other documents which are judged to contain information of potential value within the intelligence community. As a central repository it is to be all-source, containing SYSTEM FILES Document Image Files 5.5.3. - 78 - Approved For Release 2000/05/?tcRt-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIAW8-03952A000100050001-7 USIB finished and unfinished intelligence reports, FBIS, Open-Source literature (including JPRS and FDD translations), COMINT (messages, reports, and teletypes) and selected cables. The system will be inclusive of inherited document image files (see Appendix 5.D.) as well as newly-accessioned, CHIVE-processed documents. To effect this, files currently maintained in various locations (SR, LY/Circ, etc.) would be moved to a single physical area within the headquarters building along with the CHIVE document system, thus offering the user a single point of entry for his reference needs. The discussion of a proposed approach to implementing this central repository is contained in section 5.7.3. The primary purpose of the Document Image File is to serve as a central reference point from which identified documents may be retrieved and copied for distribution. The identification of the documents (by unique identification number) may be accomplished via a computer search of the Master Index File, or it may be known by some other means by the requester. The file must be responsive to either type of demand. Documents SYSTEM FILES Document Image Files 5.5.3. - 79 - Approved For Release 2000/05/30 : CIA-911E1113T03952A000100050001-7 Approved For Release 2000/05/30SFE#4DP78-03952A000100050001-7 are not to be circulated outside of the file area; and requests are to be serviced by producing a durable, hard-copy replica of the document master for distribu- tion to the requesting user. The design goals for the volume and turn-around times in responding to these file demands are outlined in sections 6.2.1. and 6.5.6. Aside from its primary purpose of providing a repository for retrospective reference, a number of secondary purposes must be served by the system. First, provision must be made for a backup file capa- bility. This duplicate file must be produced as a by-product of the input procedure, and must be suitable both as an alternate reference point in the event of loss or destruction of items in the main file, as well as a means of reconstructing the main file in the event of catastrophic destruction. Provision for selective protection of vital records is also within the scope of the document image subsystem although no special design consideration has been devoted to this requirement in this study. An additional implicit requirement of the document system is the need to provide archival quality records for those items requiring prolonged retention. SYSTEM FILES Document Image Files 5.5.3. - 80 - Approved For Release 2000/05.1513:CREIA-RDP78-03952A000100050001-7 MOI pip 25X1B Approved For Release 2000/05/30: Ca00}8-03952A000100050001-7 5.5.4, VOCABULARY CONTROL FILES 5.5.4.1. Personality Identifier Files 5.5.4.1.1. Master Dossier File (MDF) The functions of the Master Dossier File are: - To identify hard copy folder files maintained by CHIVE on select personalities.* - To reduce search time on requests for select personalities by virtue of the fact that the information analyst determined in advance of the request which incoming name references pertained to these personalities. An analogy could be drawn here to the difference between searching a tightly controlled classification *The reason for maintaining hard copy folder files, in addition to storing documents in the Master Image File in microimage form, would be the anticipated high request activity on these select documents which would increase reproduction costs significantly. An alternative to maintaining hard-copy personality files would be to maintain lists of documents referring to select individuals, and, when one of these individuals' file is requested, to reproduce all the documents referred to in the list. This approach of building a dossier on a "demand" basis would make sense if experience proves that redundancy in name searches is minimal. Pending further study, however, of the redundancy factor during Phase III, we have assumed a requirement for some hard-copy personality files and made provision for same in the system design described here. SYSTEM FILES Vocabulary Control Files 5.5.4.1.1. - 81 - Approved For Release 2000/05/30 : CIA-Mege103952A000100050001-7 Approved For Release 2000/05/305.EgkElbP78-03952A000100050001-7 system and an uncontrolled keyword index. A listing of the Dossier File, wherein is contained a unique file number and set of attributes for each personality cited, is directly analogous to a listing of a classified schedule which contains both a unique code and often scope notes defining each term in the listing. - To provide by means of a printout of the identifying information on each dossier personality a summary-type information record which can be used to answer requests, serve as a reference aid for research analysts, facilitate screening of dossier files without the necessity for examining the files themselves, etc. The initial CHIVE Master Dossier File will be derived from the BR dossier system with possibly some deletions to the latter file. The subsequent creation of new dossier records will occur largely as a result of name searches, taking advantage of the fact that document index records and their related documents have been analyzed in the course of answering the customer's inquiry. Dossier identifier records may refer either to physical or logical dossiers. In those cases where it seems desirable to establish a hard-copy file on a personality, a digital record containing select elements of identifying information (see below) will be prepared and added to the Master Dossier File. In addition, all SYSTEM FILES Vocabulary Control Files 5.5.4.1.1. - 82 - Approved For Release 2000/05a0CgMk-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : 4-g5'8-03952A000100050001-7 documents containing information on the individual will be reproduced and stored in a folder on the personality involved. Logical dossiers will also be represented by identifier records in the digitalized MDF, but the docu- ments relevant thereto will be accessible only in the Master Image File. Maintenance of both the hard copy dossiers as well as the digitalized Master Dossier File--although left to the discretion of the analyst--would also ordinarily be performed as a corollary function to name searches and not at the time incoming documents are indexed. This means that a given hard-copy folder file will not necessarily contain all the available documents on an individual which may be held in the Master Image File except immediately subsequent to a request having been answered on said personality. Similarly, the digital record on a dossier personality will only be current as of the time of the last request. The contents of a digital record in the Master Dossier File will be: SYSTEM FILES Vocabulary Control Files 5.5.4.1.1. 83 - Approved For Release 2000/05/30: CIANSHEMX03952A000100050001-7 Approved For Release 2000/0513g6KRDP78-03952A000100050001-7 - Personality Name - Variant Name - Telegraphic Code - Dossier Number - Birth Date - Citizenship - Date of Death - General Occupation - Organization Affiliation - Position Title - Organization Affiliation Date (year only) - Date Record was Last Updated - Document Reference Numbers Whenever new dossier records are added to this file or changes made to existing records as a result of name searches, the following actions will take place: (a) Documents not previously filed in physical dossiers will be reproduced for same. (b) Dossier identifier records will be created or updated. (c) The list of document control numbers attached to each identifier record will be compiled or updated. The effect of (c) will be to establish the identity of the individual mentioned, thus capturing the results of SYSTEM FILES Vocabulary Control Files 5.5.4.1.1. - 84 - Approved For Release 2000/05at4tft-RDP78-03952A000100050001-7 AIM Approved For Release 2000/05/30 : CIA#EW8-03952A000100050001-7 the analysis. Persons searching the same name at a later date will have to employ standard search strategy techniques only to recover those records from the Master Index File which might have entered the system subsequent to the previous search. Earlier references will be available either via the hard copy dossier itself or, in the case of logical dossiers, through an "absolute" search on the document numbers known to be relevant to the individual concerned. The latter can be a semi-automatic process in which the information analyst need only specify the dossier number involved-- i.e., the computer will find the document numbers perti- nent to the dossier, and either print them out or use these numbers to locate and output the corresponding index records. In the initial CHIVE system, scheduled outputs from Master Dossier File will consist of: - Master listings in natural language of the personality identifier records arranged by name within citizenship. - Cumulative supplemental listings in the same arrangement as the master listings. SYSTEM FILES Vocabulary Control Files 5.5.4.1.1. - 85 - Approved For Release 2000/05/30 : CIA-MERIEV3952A000100050001-7 25X1A Approved For Release 2000/05/3%E:CdihIRDP78-03952A000100050001-7 Demand (ad hoc) products of the file will include natural language printouts of the records on a variety of media (e.g., cards, listings, etc.) in any sort order desired by customers. 5.5.4.1.2. Name Group Tables The function of name group tables is essentially that of any dictionary of synonym and "see also" references. Such tables properly belong in a list of "vocabulary control files" since, like any term dictionary, they serve to relate the several ways in which a term (in this case personality name) can be spelled to a standard code. In the CHIVE system, it is proposed to experiment with the two kinds of name group tables developed by and for the Surname Table and Given Name Table. Each of these tables contains a list of all the surnames or given names, as the case may be, which have occurred within the system. Listed with each name is a reference to the name group to which it has been assigned. The functions of these name tables are: (a) to determine if a specifically spelled surname or given SYSTEM FILES Vocabulary Control Files 5.5.4.1.2. - 86 - Approved For Release 2000/051tirElfk-RDP78-03952A000100050001-7 25X1A Approved For Release 2000/05/30 : CgtMN;i8-03952A000100050001-7 name is contained within the system, and (b) to associate a group number to the name. A new name entry in a document index record, before it can be filed, must match a name in the name table. If it finds no match, the machine will print out a notice to the information analyst to this effect. The latter will then consult his tables or the...II/CHIVE expert concerned (pro- cedure to be determined), assign a name, and re-enter the Ideally, the name record into group table group number to the the machine file. concept reduces the intellectual problem for the name searcher by providing for a guided search of potentially relevant, alternative name spellings. This capability will not, however, preclude the searcher from bypassing the name grouping feature if he wishes the machine to yield only those records which exactly match the spelling(s) in his request. Scheduled products of the Name Group Tables will include listings of both the surname and given name tables arranged in both name and group number order. Query products may include the variant names searched within a given name group as well as "see also" references to names in related groups. SYSTEM FILES Vocabulary Control Files 5.5.4.1.2. - 87 - Approved For Release 2000/05/30 : CIA-Ma103952A000100050001-7 Approved For Release 2000/05/31 6KRDP78-03952A000100050001-7 5.5.4.2. Organization/Facility Identifier Files The Master Organization/Facility Identifier File (MOFIF), like other vocabulary control files, is required to insure consistent indexing of items of information derived from documents--in this case organi- zations or facilities (installations). In the sense used here, organizations and facilities are defined in the broadest possible terms. They include political, economic, military, cultural and scientific bodies, as well as physical installations which are relatively fixed in terms of geographic location (e.g., a weather station). Like the Master Dossier File, the functions of the MOFIF will be: - To identify hard-copy folder files maintained by the system on select organizations where high request activity is anticipated. - To provide, via printouts from the file, identifying information about organizations which an information analyst can browse in order to determine (a) whether he has previously assigned a code or unique identifying number to an organi- zation and/or (b) whether there is a hard-copy dossier available on an organization. - To reduce search time on requests for "controlled" organizations. SYSTEM FILES Vocabulary Control Files 5.5.4.2. - 88 - Approved For Release 2000/0WROA-R0P78-03952A000100050001-7 25X6 Approved For Release 2000/05/30: Clglqfq8-03952A000100050001-7 - To provide a summary-type information record which can be used to answer requests, serve as a published reference aid, etc. Not all organization and facility names will be placed under MOFIF control. Furthermore, not all organi- zation references encountered in documents will be indexed by the specific name and/or identifying number of the organization mentioned. Some organizations will be indexed only by type using the OTF tag. Still others (e.g., a laboratory or committee) will be indexed by their parent organization's code, but will not be assigned a unique identifying number of their own, con- sequently they too will not appear in the MOFIF. The initial CHIVE Master Organization/Facility Identifier File, which will be composed will be built during Phase III of the CHIVE project from existing organizational dictionaries developed by FIB, SR, and BR. Each organization record resulting from this process of analysis and synthesis will include, in addition to the CHIVE-assigned identifying number and name, the code number or numbers (if any) by which the organization was previously 25X6 SYSTEM FILES Vocabulary Control Files 5.5.4.2. - 89 - Approved For Release 2000/05/30 : CIA-BEEIZET03952A000100050001-7 Approved For Release 2000/05/36ECKIRDP78-03952A000100050001-7 identified in the OCR register(s) from which it was derived. These "cross reference" numbers will help indicate to the searcher whether there is information stored on an organization in one or inherited from OCR (e.g., SR Detail and plant folder files, BR dossier, more of the files Index, FIB card etc.). The absence of such cross references in a record would mean either that there was no inherited information available on the organization, or that the organization was so loosely controlled in the earlier system that a subject search or some other method for accessing the files would be required to uncover the pertinent data. As in the case of personalities, it is planned that certain organizations would have hard-copy dossier files where a high request rate is anticipated (but see footnote in section 5.5.4.1.1. regarding request redundancy study). If this plan is implemented, incoming documents containing information on dossier-controlled organizations would probably be added to the dossiers as a part of the initial processing activity rather than at the conclusion of search operations on said organizations (as was proposed in the case of personality dossier maintenance). The SYSTEM FILES Vocabulary Control Files 5.5.4.2. - 90 - Approved For Release 2000/05/HdeVRDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Cl?tEk8P8-03952A000100050001-7 reason for this is that the MOFIF would ordinarily have to be consulted when indexing an organization in order to obtain the correct CHIVE identification number for the organization. Such being the case, little additional effort would be required to determine from the MOFIF listing whether a dossier was being maintained on the organization and, if so, to direct that a copy of the document be deposited in the dossier concerned. The contents of a digital record in the Master Organization/Facility Identifier File will be as follows: - Translated Name and/or Number - Functional (Assigned Name) - Foreign Language or Transliterated Name - Variant Name(s) - Previous Name(s) - Name Abbreviation - Telegraphic Code - CHIVE-Assigned 0/F Number - Dossier Indicator - Cross-Reference Numbers ? FIB ? SR SYSTEM FILES Vocabulary Control Files 5.5.4.2. - 91 - Approved For Release 2000/05/30 : ClAfalia8f03952A000100050001-7 Approved For Release 2000/05/30SF6WDP78-03952A000100050001-7 ? COMOR ? BR ? BE ? NPIC ? TDI - Address ? Country ? Political Subdivision ? Place Name ? Coordinates ? Street Address ? Cable Address ? Post Box Number - Parent Organization - Type 0/F - Source Citations - Remarks Not all the elements of information listed above will appear in every organization/facility record in the MOFIF. Not only will the type of organization have an effect on the elements of information that will customarily SYSTEM FILES Vocabulary Control Files 5.5.4.2. - 92 - Approved For Release 2000/05/Alcaitic-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClAkiAlg-03952A000100050001-7 appear in its identifier record (e.g., a political body will not have a COMOR or BE number, nor perhaps a specific address), specific elements of information will be unavailable on many organizations and facilities. Source citations, where desirable, for items of data carried in the MOFIF can be included in the identi- fier records by referencing the control number of the document which provided the information. One source reference for each element of information in an MOFIF record would probably be sufficient. If additional supporting evidence for a given fact was required, the index records in the Master Index File could be searched. The "Remarks" field of an MOFIF record is intended for use in recording historical facts about changes in organizational nomenclature, hierarchic relationship to other organizations, etc. This information will not be directly accessible, but may be displayed on printout to enable searchers to determine how to formulate a request which will insure recovery of all pertinent data about an organization despite organizational changes which might have taken place over the years. SYSTEM FILES Vocabulary Control Files 5.5.4.2. - 93 - Approved For Release 2000/05/30 : ClAgRIRM-03952A000100050001-7 Approved For Release 2000/05/SECtiA-RDP78-03952A000100050001-7 In the initial operational system, current thinking is that human interface with this file, as well as all other vocabulary control files, will be through the medium of the printed listing. It is recognized that this is not a wholly satisfactory solution (although a familiar one), particularly in view of the probable increase in the number and size of vocabulary control files which the CHIVE indexer must routinely consult. For this reason, an in-depth study of the matter is planned during Phase III of the CHIVE project. Scheduled outputs of the Master Organization/ Facility Identifier File will consist of: (a) Master listings in natural language of the organization identifier records pertaining to a single country. The types of master listings required are: (1) A permuted title listing of all organization names (including official, assigned, variants, etc.). The reference portion of the listing will be ordered by 0/F number and contain the complete identifier record for each organization referenced. (2) A listing of MOFIF records, without organization name permutation, ordered on place name. (3) A listing identical to (2) but ordered on type of organization/facility. SYSTEM FILES Vocabulary Control Files 5.5.4.2. - 94 - Approved For Release 2000/05/glitai-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIAS-E14g03952A000100050001-7 (4) A listing identical to (2) but ordered on geographic coordinates. (b) Cumulative supplements to the master listings issued in the same arrangements as the master listings but bound together. Alternatively, supplementary listings may be issued in the form of pages to be inserted in master listings. Demand (ad hoc) products of the file will include natural language printouts of the records on a variety of media (e.g., cards, listings, etc.) in any sort order desired by customers. 5.5.4.3. Meeting/Conference Identifier Files In the planned central retrieval system a requirement exists for retrospective searching of documents dealing with certain meetings and conferences. The conditions which dictate whether a given conference should be indexed by name or identifying number cannot be stated with complete precision at this time. Nevertheless, the fact that some conferences (possibly international scientific meetings attended by USSR nationals) must be controlled dictates that the capability be provided in the CHIVE system to retrieve the pertinent data, whatever the input criteria might ultimately turn out to be. SYSTEM FILES Vocabulary Control Files 5.5.4.3. - 95 - Approved For Release 2000/05/30 : CIA*51RfET-03952A000100050001-7 Approved For Release 2000/05/58QU-RDP78-03952A000100050001-7 The function, therefore, of the Master Conference Identifier File (MCIF) is to relate the several ways in which the name of a conference or meeting may be spelled to a standard code and, in addition, to supply other identifying information which would facilitate distinguishing meetings having similar names. The initial data base for the MCIF may be derived 25X6 from the BR's International Conference and Travel File. In this instance, a requirement does not exist to merge separate OCR system vocabularies since BR is the only organization which maintains a conference authority file. Informal consultation between the CHIVE conference dictionary editor and BR's authority in this area during the evolution of the CHIVE system should enable standardization to be achieved between the two systems on the identification of international meetings which both systems index. The contents of the MCIF will be as follows: - Name of Meeting/Conference - Assigned Code Number - Location ? Country ? City SYSTEM FILES Vocabulary Control Files 5.5.4.3. - 96 - Approved For Release 2000/05/gpca*-RDP78-03952A000100050001-7 Approved For Release 2000/05/30: Clh-1EPA-03952A000100050001-7 - Date of Conference - Type of Meeting (Subject) - Sponsor Organization Scheduled outputs from the file in the initial CHIVE system will consist of the usual master listings arranged in this instance by name of conference, location, and sponsoring organization. The name listing will be a permuted title arrangement with the reference portion of the listing ordered by code number and containing the complete identifier record for each conference referenced. Supplementary listings will also be provided either in cumulative form or (as indicated earlier) as pages to be inserted into master listings. Demand products will, of course, be issued in any sort order desired. 5.5.4.4. Location Identifier Files The function served by the Master Location Dictionary (MLD) is to specify the approved entry form for certain classes of locational-type information listed below. (Detailed address information, e.g., cable address or street name, will not be under vocabulary control and, consequently, will not appear in this file.) Additional SYSTEM FILES Vocabulary Control Files 5.5.4.4. - 97 - Approved For Release 2000/05/30 : CIASINFREV03952A000100050001-7 Approved For Release 2000/05/aCOKRDP78-03952A000100050001-7 uses of the file will be to confirm file coverage by location, to show hierarchical and synonomous relation- ships between place names and political/administrative regions of the world, and to support requests by location defined in terms of country, political subdivision, or place name. Location identifier (authority) files will ultimately be maintained on all countries, but in the initial system primary concentration will be placed or the element of the Master Location Dictionary. Specific senior content indexers will serve as dictionary editors for certain geographic portions of the file, rejecting or approving all new entries generated as a byproduct of the document input process. The initial Master Location Dictionary will be constructed on the base of the NIS Gazetteer. The ISC 4-digit classification system will be used to identify country and political subdivision. Place name entries will be carried in full text. Map catalogers at the Map Library, according to the terms of a tentative agreement arranged between CHIVE and SYSTEM FILES Vocabulary Control Files 5.5.4.4. - 98 - Approved For Release 2000/05akW-RDP78-03952A000100050001-7 25X6 Approved For Release 2000/05/30 : dxf-WaT78-03952A000100050001-7 the Map Library, will employ a modification of the ISC area code which appends the ML provincial codes to the ISC area code. This expanded code will permit ML to continue to index provinces and other political sub- divisions where this degree of index specificity may not be required for document retrieval. During the preparation of the map index transcript sheets at the Map Library, the map cataloger will either (a) enter the modified ISC area code in addition to the ML area code on the transcript form, or (b) enter the ML area code only. If the former procedure is followed, both codes will be converted to machine language but only the modified ISC area code will be stored in the CHIVE Master Index File. The map catalog cards returned to ML, however, will carry the ML area code since the ML card catalog employs this system and the use of another area code would upset the existing file arrangement. Alter- natively, a conversion table may be built which would permit the computer to convert the ML area code appearing in the index records (option [b] above) to the modified ISC area code, thus obviating the need for both codes to be entered on the transcript forms by the cataloger. SYSTEM FILES Vocabulary Control Files 5.5.4.4. - 99 - Approved For Release 2000/05/30: CIA-BEIERS03952A000100050001-7 Approved For Release 2000/05/3gCNTRIDP78-03952A000100050001-7 The Master Location Dictionary records will contain the following elements of information: - ISC 4-Digit Numeric Notation - Major Area, Subordinate Geographic Region, Country, or other Political Subdivision (including cross references) - Remarks (scope notes and comments on historical changes) - Place Name (including cross references) - Remarks (place name scope notes and comments on historical changes) - Geographic Coordinates Scheduled master and supplemental listings of the Master Location Dictionary will include: (a) A listing arranged hierarchically by ISC area code and containing the major area and political subdivision names together with any "Remarks" pertaining thereto. (b) A listing identical to (a) but ordered alpha- betically on area and political subdivision name. (c) A listing arranged hierarchically by ISC area code with the minor sort alphabetical by place name. This listing will also include the "Remarks" field pertaining to place names, as well as geographic coordinates. (d) A listing identical to (c) but ordered on geographic coordinates. Ad hoc (demand) products of the file will include a geo-coordinate computation capability. This program, SYSTEM FILES Vocabulary Control Files 5.5.4.4. - 100 - Approved For Release 2000/0VQRRA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClgkR0T8-03952A000100050001-7 which uses a mathematical technique based on the overlap of two convex polygons, will allow an information analyst to retrieve all references to place names falling within any regular (or irregular) shaped area whose vertices are known. 5.5.4.5. Subject/Commodity Authority Files 5.5.4.5.1. Intelligence Subject Code (ISC) A modified form of the ISC classified schedule will be used in CHIVE as the dictionary authority for entry of all terms of a descriptive, semi-abstract nature whether they modify named objects or stand alone. The file is designed to perform three main functions: (a) to display relationships among these descriptive terms, (b) to define these terms when required, and (c) to serve as a code book for input to the computer. The relation- ships displayed include synonyms and alternate spellings as well as class inclusion and class membership. The file also serves an important mechanical role by requiring that every ISC code in a new index or query be present in the file before the transaction of file maintenance or searching is processed, hence controlling input errors. The file will be maintained manually, that is, all SYSTEM FILES Vocabulary Control Files 5.5.4.5.1. - 101 - Approved For Release 2000/05/30 : CIA-WFM03952A000100050001-7 Approved For Release 2000/05/313EaAIRDP78-03952A000100050001-7 relationships and original entries will be externally controlled with changes made only by a change sheet following approval by the ISC dictionary editor. To increase the specificity of the ISC, it will be augmented by the addition of key words in clear text. Initially, the information analyst will be permitted to append these key words to any ISC code without reference to a controlled list. After a suitable length of time, however, a key word dictionary separate from the ISC classified schedule may be developed to provide guidance for consistent entry. The ISC is generally recognized as a satisfactory mechanism for indexing intelligence documents to a medium level of specificity. It is detailed enough to organize a document collection into manageable categories, but not so detailed that it is difficult to learn or apply with reasonable uniformity. The ISC is particularly strong for indexing political and socio-economic concepts. Key word indexing will be used to supplement the ISC in those areas where it is weakest, and to obtain more specificity in commodity indexing. Heavy emphasis will SYSTEM FILES Vocabulary Control Files 5.5.4.5.1. - 102 - Approved For Release 2000/05athEA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA613g-03952A000100050001-7 be placed on key word indexing of equipment nomenclatures and model types. Key words will also be used to index scientific processes and concepts, as well as military strategy and tactics. In other fields, e.g., politics, there will be less need for key word enhancement of ISC codes. Specific revisions required of the ISC to accommodate it to an all-source document base include the following: - Reduction of the depth of subject coverage in selected areas to simplify its application. - Development of separate schedules for the classifi- cation of such subjects as organization types and personality occupational categories. (This would require the deletion of organization types which are currently scattered throughout the ISC.) - Expansion of the list of coded modifiers. - Expansion of the ISC to provide for special subject requirements unique to certain sources, e.g., photos and SI documents. The contents of a digital record in the ISC file will consist of: - ISC 6-Digit Numeric Notation - Clear-Text Term Definition - Scope Notes SYSTEM FILES Vocabulary Control Files 5.5.4.5.1. - 103 - Approved For Release 2000/05/30 : CIA4UNN03952A000100050001-7 Approved For Release 2000/05/gcni-RDP78-03952A000100050001-7 Outputs from the file will include: - Master listings of the complete ISC dictionary arranged hierarchically by ISC code with appro- priate indentations for each lower-level category. In addition to codes and term definitions, the listing will contain pertinent scope notes. - Master listings of the subject index to the ISC arranged alphabetically by index term, including "see references." Both of these types of listings must be classified as demand products since the frequency and number of changes to the ISC vocabulary will dictate the periodicity of master re-runs. Indeed, it is likely that this vocabulary control file more than any other will be updated by page inserts rather than by re-issuance of the complete master schedule. 5.5.4.5.2. Header Data Dictionary This file encompasses a variety of separate and distinct system dictionaries which control the entry of data pertaining primarily to the header portion of document index records. It includes such specialized system tables or dictionaries as the following: - Document Category File - Report Producing Component File SYSTEM FILES Vocabulary Control Files 5.5.4.5.2. - 104 - Approved For Release 2000/05gfcaf-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClgiglE-78-03952A000100050001-7 - Series/Periodical Name File - Classification File - Codeword Control Stamps File - Dissemination Controls File - Photo Type File None of these files are of such magnitude that their size alone or access requirements would justify their storage in a digital medium. Nevertheless, an important goal of the system is to present information to the external customer in a language with which he is familiar. This must be accomplished even though the information is carried within the system in a different form. For this reason, wherever a convention of codes has been established for a certain type of information, and this information must be displayed to a user on output, the file must be available in digital storage and a conversion routine provided to substitute clear text for codes on output. The data content of records in all the separate authority files making up the Header Data Dictionary is identical, i.e., code (whether numeric, alphameric, or alpha) and applicable term. No machine-generated products SYSTEM FILES Vocabulary Control Files 5.5.4.5.2. - 105 - Approved For Release 2000/05/30 : ClAiiWWW03952A000100050001-7 Approved For Release 2000/05/3gWRIDP78-03952A000100050001-7 either on a scheduled or demand basis, are presently envisaged from the file as such, although, as indicated above, the file will be machine searched to serve other system functions. 5.5.5. UNSYNTHESIZED INFORMATION FILES (UIF) No attempt will be made in this section to specify the particular Unsynthesized Information Files which will be built by information analysts in the CHIVE system. It is assumed there will be a continuing requirement for certain of the analogous information files currently being maintained (e.g., BR's International Conference and Travel File), but this is a decision best left to the information analysts within the system, working in concert with their external customers. This section merely out- lines the characteristics of Unsynthesized Information Files (UIF), explains the rationale underlying their establishment, and describes some of the methods by which such files will be constructed and maintained. As indicated in the introduction to this chapter, Unsynthesized Information Files consist of select elements of information about a given subject whether it be a SYSTEM FILES Unsynthesized Information Files 5.5.5. - 106 - Approved For Release 2000/05SECREA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA510g-03952A000100050001-7 personality, an installation, or some class of activity or event. They are to be distinguished from Special Project Files (discussed in section 5.5.7.), some of which may be information rather than document reference type files, in that they reflect only the elements of information contained in CHIVE document index records. Similarly, they are distinguishable from Summary Informa- tion Files (see section 5.5.6.) which are evaluated, concise statements of fact about similar topics. Most Unsynthesized Information Files being maintained in OCR today are the products of specialized input activity which is separate and distinct from other input processing. The principal reason for this situation is that the regular processing system (or systems) cannot readily be modified to accommodate these specialized indexing requirements, primarily because of the limitations of the supporting EAM equipment. A good example is the BR travel index which, since its inception, has been functionally and physically separate from the dossier processing system. In the CHIVE concept of document processing, however, wherein all the data of significance in the document is captured in one pass, so to speak, by the person originally SYSTEM FILES Unsynthesized Information Files 5.5.5. - 107 - Approved For Release 2000/05/30 : CIABINRE/-03952A000100050001-7 Approved For Release 2000/0566CMT-RDP78-03952A000100050001-7 assigned to index the document, the resultant product will feed both the document reference system (i.e., the Master Index File) as well as such Unsynthesized Informa- tion Files as the information analyst has decided to build. This means, obviously, that the UIF contain data no different from that stored in the Master Index, only select subsets of the same records or phrases. The reader might well ask at this point why have Unsynthesized Information Files at all if their content is identical with elements of information stored in the Master Index File? Why not simply query the Master Index when a specific set of data is desired? This brings us to the criteria for establishment of a UIF: - The customer's information requirements must be capable of definition in terms of logical data units which have specified characteristics--i.e., that there is a logical separation of data elements into related files so that any one file contains data relative to a given subject or function. - A sufficient number of requests can be anticipated on a continuing basis for the particular set of data elements contained in an information file to justify establishment of the file. SYSTEM FILES Unsynthesized Information Files 5.5.5. - 108 - Approved For Release 2000/05/ilider-RDP78-03952A000100050001-7 uni Approved For Release 2000/05/30 : Cbagsg-03952A000100050001-7 Where neither of the above conditions obtain, the data would remain in the Master Index File, and requests for the retrieval of specific elements of information would be handled like any other ad hoc queries levied on the system. On the other hand, if these requirements are met, it is generally agreed that it is useful to group the data elements involved into files, organized on a functional basis, since they can then be handled as logical elements in the system for maintenance, retrieval, and system output. The basis of organization is affected not only by the type of information to be processed, but also by the relative activity of data within a file and the user's control of the information stored in the file. Thus, in establishing an information file system, the user will probably want to functionally group his data (e.g., personality travel, leader appearances, missile site order-of-battle, directories of government officials, etc.). For particular applications the user may desire to have his stored information combined on a different basis. =Mt SYSTEM FILES Unsynthesized Information Files 5.5.5. - 109 - Approved For Release 2000/05/30 : CINEME9-03952A000100050001-7 Approved For Release 2000/05WINI-RDP78-03952A000100050001-7 Ordinarily, he would exercise this option only on the Summary Information Files discussed in the next section, since the resultant output would be an evaluated, higher-quality product. However, the system proposed will provide for a multi-file output capability from any file stored within the system. This capability will be achieved by allowing the user to query several files and selectively assemble on an output work tape the resultant information. The data will then be presented to the user in the format he specifies. For example, if the user has three files containing 25X1 B With regard to the means by which inputs to the UIF are to be obtained, it is evident that a separate indexing and transcription process is not required if the CHIVE one- time indexing concept is implemented. In other words, the plan is to exploit the document retrieval system in order to build information files. On the other hand, once the SYSTEM FILES Unsynthesized Information Files 5.5.5. - 110 - Approved For Release 2000/05ALtRtiA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CI1-W1-03952A000100050001-7 data has been put into machine readable form, the informa- tion file inputs might be derived either by (a) automatic duplication of portions of the index records during their input processing into the Master Index, or (b) by periodically querying the Master Index. Figure 5-2 illustrates these alternatives graphically. CHIVE proposes to follow the latter path for the following reasons: - It will create less of a burden on the machine processor which would otherwise have to examine every incoming record to determine if it contained data relevant to a particular information file. - There is no real requirement to update the informa- tion files at the instant that the data is entered into the machine. - By requiring some external action to be taken before data is transferred to an information file, management control is enhanced. To facilitate file building, standing queries will be written, punched, and entered into the system. Thus, when the information analyst wishes to add new data to an information file, he can merely call for the pertinent query by name and the computer will make the necessary search and load the data into the relevant file. The content of records in a UIF may consist only of a specified set of fixed elements of information which SYSTEM FILES Unsynthesized Information Files 5.5.5. Approved For Release 2000/05/30 : CIASIffeRM03952A000100050001-7 Approved For Release 2000/05/30: CIA-RDP78-03952A000100050001-7 Index Record Figure 5-2 UIF FUN BUILDING _ALTERNATIVES Index Record Processor DP Processor Master Index File Unsyn. Info. Files Master Index Human-triggered Info. File Building - 112 - Approved For Release 2000/05/30: CIA-RDP78-03952A000100050001-7 25X1B 25X1B 25X1B Approved For Release 2000/05/30 : CgkBPT8-03952A000100050001-7 appear once in each data record or a combination of fixed and repetitive elements of information including some fields of variable length. For example, in a file elements in a record contained in this file might have a fixed number of characters or, alternatively, some may be fixed while others (e.g., "function attended") may be fields of variable length. Similarly, in the case of a travel file Document references, while not vital to information files which ordinarily do not require consultation of the documents from which the data was originally extracted, can nevertheless be included in UIF records where required by citing the pertinent document control number. Similarly, the security classification of a record in such SYSTEM FILES Unsynthesized Information Files 5.5.5. - 113 - Approved For Release 2000/05/30 : CIAMS76103952A000100050001-7 Approved For Release 2000/05/3g?KTRDP78-03952A000100050001-7 files is easily denoted since the record should typically carry the same classification as the parent record in the Master Index. In both cases, the transferral of the document control number and security classification from the header portion of a Master Index Record to a record in a UIF can be accomplished automatically at the same time that the content data is duplicated for storage in a UIF record. Provision will be made to automatically identify those records in the Master Document Index whose content has been extracted for a given information file so that the same records need not be searched later. The system will also allow the analyst to specify both an "active" and a "history" file of information for any one functional area if it seems desirable to save the digital records representing a file for some indefinite period of time. Outputs from the UIF will largely consist of periodic listings of the complete contents of a UIF arranged in various sequences. Such listings will be used to service customers as well as information analysts within CHIVE who will analyze and modify the listed records which will then be converted back to machine language and input to SYSTEM FILES Unsynthesized Information Files 5.5.5. - 114 - Approved For Release 2000/0MAZREIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : &&9?:F78-03952A000100050001-7 Summary Information Files. Some of these listings will no doubt be required on a regularly scheduled basis, e.g., a leader appearance listing published weekly. Others will be demand products issued irregularly as the need arises. In addition to the provision of hard-copy machine listings of digital UIF meeting the the records an entire file for browsing purposes, a file itself can be queried for records request specifications, in which case only satisfying the request will be output. This mode of man/file interface will probably be used less than hard-copy browsing; but, unlike the CHIVE Vocabulary Control Files discussed in section 5.5.4., the means by which the UIF records will be made available to the human in the system will not be exclusively through a hard-copy representation of the file contents. 5.5.6. SUMMARY INFORMATION FILES (SIF) Summary Information Files (SIF), like the Unsynthesized Information Files, can be classed as formatted in nature since their specifications can be pre-defined and the data elements making up their content can be handled as logical SYSTEM FILES Summary Information Files 5.5.6. - 115 - Approved For Release 2000/05/30 : CIA-BERRIE-D3952A000100050001-7 Approved For Release 2000/05/4d&RDP78-03952A000100050001-7 entities in the system for purposes of input, query, and output processing. Pertinent input data is organized by major subject and formatted for ready retrieval and tabulation by content. Summary Information Files consist of semi-evaluated data relative to specific classes of events or named objects. In format and content they are indistinguishable from UIF files, differing only in the fact that redundant, and, usually, contradictory information has been removed from the SIF files through a process of human analysis and synthesis of the raw data originally received. While they cannot be accurately described as containing only "finished intelligence" (if, by definition, this term is meant to apply only to the refined outputs of an intelli- gence research facility), neither is their content truly "raw" and, for this reason, the expression "semi-evaluated" has been used advisedly. Certain of the Vocabulary Control Files which must exceed the boundaries of a typical dictionary in order to adequately "identify" a controlled term can also be properly classified (as pointed out in the introduction to this chapter) as Summary Information Files. These SYSTEM FILES Summary Information Files 5.5.6. - llt Approved For Release 2000/05/315E.WRDP78-03952A000100050001-7 ONO mmw 25X1B 25X1B sviro Approved For Release 2000/05/30 : Clagg-03952A000100050001-7 include the personality and organization/facility identifier files discussed in sections 5.5.4.1. and 5.5.4.2., respec- tively. Not only are these formatted files whose content is fixed and specific, they contain a variety of summarized, semi-evaluated facts about named objects which, when displayed, can serve a variety of information needs other than that of supporting the document indexer. An example of an SIP would be a Officials File. In reality, a type of organization summary file, since the stored data would consist of a set of facts about significant public and private organs of society within the country concerned and not summary data about personalities as such, the file entries might contain in tabular form information on the name of an organization, its subordination, the names of its officers and perhaps ordinary members of the organization, their individual position titles, and the dates of appointment and/or earliest and latest dates of identification for each. The data would be retrievable on the basis of any of the categories which make up the file so that answers to such questions as: SYSTEM FILES Summary Information Files 5.5.6. - 117 - Approved For Release 2000/05/30 : CIASEERW03952A000100050001-7 25X6 Approved For Release 2000/05/gqff-RDP78-03952A000100050001-7 25X1B 25X1B can be readily obtained. Summary Information Files can, of course, include fully automatic, semi-automatic, and manual data files ranging from the highly structured to the unformatted, narrative type. FIB's Installation Summaries File and the various types of biographic reports on file in BR are examples of manual summary information files which are only partially formatted if at all. BR's Who's Who Card File is a formatted, semi-automated summary file on personalities. The CHIVE system will also produce and maintain manual files of summary information, but the focus of discussion in this section is on the digital, and not hard-copy, summary files planned for the system. The criteria dictating the establishment and maintenance of an SIF file are identical to those for a UIF file, i.e., the set of data elements making up the file must be definable and a "substantial" number of requests for such data must be anticipated. The means by which input and maintenance transactions will be obtained, however, will differ from that of the UIF files. SYSTEM FILES Summary Information Files 5.5.6. - 118 - Approved For Release 2000/05At9kpek-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIAKLW8-03952A000100050001-7 Unlike UIF processing, additions to and modifications of SIF files cannot be automatically generated since human judgment is required, if not to identify potentially relevant inputs, to make the conclusive determination that a record should be added to a file or a change made to an existing record. The process by which these functions will be performed will vary depending on the nature of a particular SIF file. Figure 5-3 depicts the various approaches which can be used. As indicated in the figure, the SIF file builder has essentially three options open to him as the means of inputting data to a Summary Information File: (a) he may query the basic index records to documents for data pertinent to an established SIF file (Option 1), (b) he may obtain a printout of a UIF file which serves as the raw data base for an associated SIF file (Option 2), or he may arrange with other information analysts to have all documents containing information pertinent to his needs routed to him (Option 3).* *A fourth option, which has not been suggested because it would lead to completely duplicative document handling, would require the SIF specialist to examine all incoming documents for their possible relevance to a summary data file. Such an approach would never be required as long as the initial document indexing was sufficiently specific to enable the information file specialist to recover the pertinent index records and/or documents by a search of the Master Index. SYSTEM FILES - 119 - Summary Information :files Approved For Release 2000/05/30 : CIA-BWM339%2A060100050001-7 Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Figure 5-3 SIF FILE BUILDING ALTERNATIVES 2ption 1 Search Request from SIF Specialist Index Record Listing Select Re- cords or hrases) Doc 't Image File Option 2 Search Request from SIF Specialist V Listing of UIF Records 14-- Requests for Select Doctts SIF Specialist Review Additions or Changes to SIF SIF File Option 3 Original Docits Screened b nfo Analys Select Doctts SIF Specialist Review - 120 - Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Additional Routine Input Processing 4 _ J Approved For Release 2000/05/30 : ClAg611V-03952A000100050001-7 Options 1 and 2 both provide for the possibility that the SIF specialist may wish to examine certain documents for himself to clarify some fact reported in the records listed for him as a result of his inquiry. It is anticipated, however, that in most instances the listed records will speak for themselves and no reference to documents will be required in the SIF input process. SIF files will be available in both digital and hard-copy (listing) form, and, like the UIF, can include active as well as history files. Source citations in the form of document control numbers may be listed at the end of each summary information record or referenced to each term (element of information) in the record. 5.5.7. SPECIAL PROJECT FILES In the remarks made earlier in this chapter with regard to Special Project Files, it was noted that the limits of CHIVE responsibility for special file building, maintenance, and output processing have not as yet been satisfactorily determined. Furthermore, in a survey of existing OCR files which were either exclusively or in 25X6 part (see Appendix 5.D.), no files were identified which in the CHIVE context would be considered SYSTEM FILES Special Project Files 5.5.7. - 121 - Approved For Release 2000/05/30 : CIAMBEa03952A000100050001-7 Approved For Release 2000/0545CM-RDP78-03952A000100050001-7 as "special project" in nature. For both of these reasons, therefore, it is difficult to specify the characteristics of individual Special Project Files which might be included either in the initial or final CHIVE system. One must anticipate, however, that the need for such special files will be expressed by customers from time to time, and some remarks may be in order as to the features which distinguish these files from other system files and how requirements for such files might be accommodated. Processing demands of a one-time nature which necessitate the input, manipulation, and retrieval of a peculiar set of data will not fall within the Special Project File category since these will be handled like any other request. However, if the system were asked to continue the activity on an indefinite basis, this method of handling would no longer suffice and a special project need would have been established. Special projects will include any files obtained from organizations external to CHIVE which cannot be fully integrated with equivalent CHIVE files and which require machine handling. In this sense the term "special projects" could apply to certain EAM files inherited from OCR, as well as to files acquired from other agencies. SYSTEM FILES Special Project Files 5.5.7. - 122 - Approved For Release 2000/05RTIVRDP78-03952A000100050001-7 mow 25X1 B ow. tonal NNW Approved For Release 2000/05/30 : MRIg78-03952A000100050001-7 Special Project Files will also describe any customer input requirements which cannot be satisfactorily handled by the established system for representing the informa- tion content of documents. The data involved might be largely numeric in character or, if non-numeric, would require the extraction of items of information not planned for inclusion in the system and which, if accepted, would significantly add to total processing time. It is well known that individual members and groups within the CIA customer population have a number of relatively unique and distinct information handling problems which cannot be met by generalized information system attempting to serve only the common interests of the many. Examples of such requirements were uncovered during the earlier fact-finding survey of the DD/I. The following is but a partial list of some of the needs expressed: SYSTEM FILES Special Project Files 5.5.7. - 123 - Approved For Release 2000/05/30 : CSE-eaPT8-03952A000100050001-7 25X1B Approved For Release 2000/05/455cM-RDP78-03952A000100050001-7 In the past when a research analyst, or group of analysts, developed an information control problem which was not being satisfactorily handled either in the appropriate production office or by the central reference system, one of the following courses of action was generally adopted: - The problem was contracted out to some external organization which obtained the necessary source documents and did the input processing and retrieval 25X1A required to respond to the identified need. Examples of this approach include OSI's Project 25X1A and LSD/SI's - A special group was set up within one of the research offices to perform either or both the input and data manipulation functions depending on what was required. The of ORR is a good illustration of this type of problem solution. 25X1A - OCR was asked to expand its operations either by increasing the depth of its indexing or by broadening its document coverage, or both. OCR projects generated by a demand for increasing indexing depth beyond that normally provided by the basic indexing SYSTEM FILES Special Project Files 5.5.7. 124 Approved For Release 2000/00KRETA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIAGRg-03952A000100050001-7 systems employed include the SR Test Range Activity File and the BR Travel File. Illustra- tions of projects which required expansion of the OCR document base can be found in the the Library's Science Information Service (SIS), 25X1A the SR PI Reports File, the etc. 25X1B - Mechanical, if not data extraction, assistance was requested either of OCR's Machine Division or of some external machine facility (governmental or private) where manual techniques for data manipula- tion and display were unsatisfactory. Examples File for OBI. Today, the resources available to an analyst faced with an unresolved information processing requirement are much the same--but with one significant difference. If he is willing to prepare the input data of interest to him to the point at least where it can be transcribed into machine-recognizable form, he now has a powerful machine capability available to him to perform a host of operations on the data. The CHIVE EDP System can, of course, increase this capability. The question, however, which remains is whether CHIVE should get involved in "special-project" applications at all, where only a limited set of customer interests are served, and, if so, whether its involvement should be restricted to the provision of EDP support or whether it should also assist in input preparation. - 125 - SYSTEM FILES Special Project Files 5.5.7. Approved For Release 2000/05/30 : CIA-Wlz7L8r03952A000100050001-7 25X1A Approved For Release 2000/05/WW-RDP78-03952A000100050001-7 Currently, the OCS/Applications Division is per- forming a major role in supporting special project requirements of CIA research analysts. In all such projects, however, the data extraction responsibility has been assumed by the customer concerned. It might be argued that to avoid confusion of responsibility, the role CHIVE should play in this area (if any) should be restricted to the assumption of the responsibility for those special projects where, for one reason or another, it seems most efficient to have the central reference organization, rather than research analysts, prepare the input data. This, however, heightens the risk of gradually proliferating the informa- tion processing responsibilities of the CHIVE system to the point where it might become simply a collection of special projects. In the design concept presented here, the responsi- bility for CHIVE's undertaking certain special projects has been accepted, but the duty of preparing the input to Special Project Files is assumed to be the customer's and the report reflects this philosophy. In the final analysis, however, the matter can only be resolved by SYSTEM FILES Special Project Files 5.5.7. - 126 - Approved For Release 2000/05giltalp-RDP78-03952A000100050001-7 owl god Approved For Release 2000/05/30 : Clk-W8-03952A000100050001-7 management decision. If the choice is to include special projects within CHIVE, including the function of data preparation, the approach suggested above may provide a modus vivendi for relating the respective roles to be played by CHIVE and OCS/Applications in the handling of these projects. 5.5.8. REFERRAL SERVICE FILES Current manpower ceilings seriously limit the cover- age of present central system operations. Even if available manpower can somehow be more effectively utilized, the volume of material of potential value is so great that complete coverage would still not be possible. Thus, the only alternative appears to be to develop support from other systems, including centralized as well as personalized (analyst-driven) file activities. To do this, CHIVE must identify information resources available in such systems and determine how best to tap these resources for the Agency consumer. In earlier CHIVE documentation, reference was made to a "support mode" which envisaged not only the referral of customers to persons or files of possible interest SYSTEM FILES Referral Service Files 5.5.8. - 127 - Approved For Release 2000/05/30 : CIA-SIDERET03952A000100050001-7 Approved For Release 2000/05/3g.QUIRDP78-03952A000100050001-7 external to CHIVE, but the actual acquisition of certain machine-language files and supporting documentation which would be searched in-house in behalf of CHIVE customers. The "support mode" concept remains valid in present CHIVE thinking, but in the further refinement of the design a distinction has been made between: (a) files outside of CHIVE's control which are actually available within the system in either manual or mechanized form, and (b) information resources not directly accessible to CHIVE to which customers may be referred. The former have been classified in this report as "Supplemental Files" (see section 5.5.1.), i.e., files neither built by CHIVE nor inherited from OCR, of which Special Project Files may be one class. The latter are now termed "Referral Service Files" and are the subject of this section. The community's information resources are so vast and scattered that even the simple identification of all potential sources constitutes a major problem. For this reason, it is planned to concentrate initially on the identification of the many different human file resources scattered amongst the various service components and production shops within this Agency. Only after this SYSTEM FILES Referral Service Files 5.5.8. 128 Approved For Release 2000/05altRVt\-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIRW8-03952A0001000500017 is done will any attempt be made to obtain descriptions of information resources and repositories in other USIB components. Perhaps the simplest means of beginning to build a referral service capability will be to derive a set of analyst profiles from requests levied against the central system. Using this technique, search terms chosen for query purposes will gradually form the set of subject identifiers descriptive of each analyst's interests. This approach will be supplemented by the circulation of questionnaires to analysts throughout the research (and select service) components of the Agency, which would solicit narrative statements of their areas of substantive knowledgeability, including their personal files or files maintained by their respective sections, branches, or other organizational component. Not all analysts, of course, can be expected to respond. However, experience with similar surveys in other organizations suggests that a response figure of about 80% is not beyond reason. The returned questionnaires will be indexed in the vocabulary of the CHIVE system, including both the ISC classified schedule as well as key words. These descriptors SYSTEM FILES Referral Service Files 5.5.8. - 129 - Approved For Release 2000/05/30 : CIA-Fe5M63952A000100050001-7 Approved For Release 2000/05/36ECRERDP78-03952A000100050001-7 will not necessarily be limited to conceptual-type subjects (although the emphasis will probably be on these kinds of topics) but will, in all probability, also contain on occasion named-object identifiers such as the names of persons, organizations, military installations, etc. It is not anticipated that the responses will include every specific subject heading in an analyst file. However, it is hoped that the principal categories of information contained in such files will be described and this alone would greatly assist analysts in seeking to exploit the Agency's human and documentary resources. In addition to storing information descriptive of the subject matter in which a person or file specializes the data which will be contained in these referral service records will include: - Name of Individual - Organization Identification (component to which analyst or file is attached) - Address of Individual or File (room and phone number) - Descriptive Title of File - Overall Security Classification of File SYSTEM FILES Referral Service Files 5.5.8. - 130 - Approved For Release 2000/0%.3&FRik-RDP78-03952A000100050001-7 Approved For Release 2000/05/30: CIMIrt8P8-03952A000100050001-7 - Releasability - Countries or Geographical Person or File Areas Covered by the - Primary Intelligence Activity Supported by the Person or File (e.g., Missile Photography, Intelligence, etc. OB, Ground - File Storage Medium (documents, 5" x 8" cards, EAM cards, magnetic tape, etc.) Assuming the cooperation of a reasonable number of 25X1B analysts, it is likely that the collected records will be sufficiently voluminous and file order requirements so varied that a machine data base will be needed. It is not contemplated, however, that the Referral Service mr Files will be automatically searched at the time queries are levied against the substantive data files of the CHIVE system. Rather the content of such files will be made moo available in the form of a published Directory of Informa- tion Resources. This Directory would be issued to CHIVE system operators and perhaps, in a variety of classifica- tions, selectively disseminated to Agency consumers. When specifically requested to do so by a customer, CHIVE personnel, in addition to searching the basic files of the CHIVE system, will consult the Directory for the demi SYSTEM FILES Referral Service Files 5.5.8. - 131 - Approved For Release 2000/05/30 : CIA-FallER03952A000100050001-7 Approved For Release 2000/05/36EM1RDP78-03952A000100050001-7 purpose of determining what other files or intelligence analysts might possess information pertinent to a given query. The type of service that will be provided, if and when a potentially relevant resource is uncovered, will vary depending on such factors as the location of the file or the urgency of the request. In some instances, CHIVE information analysts will act as the intermediary between the customer and the other informa- tion resource. In other cases, they will simply refer him to the appropriate system. With regard to the provision of a referral service capability for files outside of CIA, advantage might be taken of efforts currently being sponsored both by DIA and by CODIB to collect descriptions of intelligence data files maintained in an automated form by Department of Defense elements and USIB member agencies, respectively. It is planned that a catalog of such files will be published periodically and may be interrogated on an ad hoc basis. If these external collection programs prove successful, the data resulting therefrom might be merged with the product of the internal file survey to form a relatively comprehensive record of information resources throughout the Community. SYSTEM FILES Referral Service Files 5.5.8. - 132 - Approved For Release 2000/058aRtIt4-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIWP8-03952A000100050001-7 5.5.9. MANAGEMENT DATA FILE The CHIVE Management Data File will contain two types of data: - Data, obtained by computer methods, about the processes performed by the EDP portion of the CHIVE system. - Data, obtained by manual methods, about the non-computer processes of the CHIVE system. The following paragraphs discuss the sources and collection methodologies for these two types of data, Imo the reasons for the dichotomy, and the use of this data to operational management. and mast 5.5.9.1. Collection Techniques As indicated above, the method employed in collecting the data (EDP data or manual data) determines the origin of the data and to a large extent the use of the data by CHIVE managers. EDP data collection refers to an activity within the computer itself. The monitor program system (with its attendant bookkeeping functions) will supervise all computer operations. This is implied under the philosophy of a multiprogramming system. This method of operating provides a natural means of recording: SYSTEM FILES Management Data File 5.5.9.1. 133 Approved For Release 2000/05/30: CIA-KIBUREg3952A000100050001-7 Approved For Release 2000/05/36EMERDP78-03952A000100050001-7 (a) Process times. The computer has timing mechanisms which the monitor can use to record computation, input, and output times as indi- vidual entities, as well as the total time the computer uses to process a transaction. (b) Error rates and types. A variety of errors and malfunctions may abort an operation or degrade the output. Certain of these, e.g., misuse of the language, transcription errors, equipment disorders, and illegal file manipula- tions, may be more readily detected and recorded by the computer programs than by manual means. (c) File activity. It is of significant importance to determine which files or parts of files experience a high rate of use. File system design, program system design, and language structure are just a few of the areas which affect the use of the files and are, in turn, influenced by usage statistics. This dynamic data may be supplemented by such relatively static data as: - Equipment availability - Day, month, and year - Priority of the transaction As this data is recorded (either by the bookkeeping routines within the monitor or by specially produced CHIVE programs as adjuncts to the monitor) it should be entered into a file. This file is essentially a log of CHIVE EDP transactions and their associated management data. The normal method of labelling entries in such a SYSTEM FILES Management Data File 5.5.9.1. - 134 - Approved For Release 2000/05ftibeft-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Clk-W8-03952A000100050001-7 file is by "job" or transaction number. Under the multiprogramming mode of computer operations, it is mandatory that each "job" which enters the computer be uniquely identified. This "job" or transaction number provides a natural storage and retrieval device. Manual data collection refers to that activity outside the domain of the computer which collects management data about the processing of transactions. As presently envisioned, the process will begin when a transaction is initiated and will end when the transaction is completed. For example, a query against a file is a transaction which begins with the request and ends when the requester obtains the data and materials which satisfy his request. Between these two events many functions are performed in many organizational elements. The majority of the time-consuming and error-prone functions are performed by people. Data regarding these functions may be conveniently collected by manual techniques. It is suggested that data regarding each transaction accompany the transaction during the entire process. If feasible, a standard form should be used. Examples of manual data are as follows: SYSTEM FILES Management Data File 5.5.9.1. - 135 - Approved For Release 2000/05/30 : CIA-RDEvai0B952A000100050001-7 Approved For Release 2000/05/3gWIRDP78-03952A000100050001-7 - Name of Requester - Requester's Organization - Name of Analyst - Analyst's Organization - Type of Transaction - Transaction Number - Dissemination Code - Time Received and Time Released (by each organiza- tional unit which handles or is responsible for the transaction) - Organizational Identifier (for each component which handles or is responsible for the transaction) Not all of these are applicable to each transaction. However, the last two items--times and organizations-- must be supplied for each component and each transaction for two reasons: (a) To account for each transaction and its location in the system. (b) To provide a complete file of data for process evaluation. 5.5.9.2. Storage, Retrieval, and Processing The EDP data, due to the collection method, is naturally stored as a file of data within the CHIVE EDP system. As such, it can be processed and retrieved through SYSTEM FILES Management Data File 5.5.9.2. - 136 - Approved For Release 2000/05AtiMA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA?IW-03952A000100050001-7 the use of the CHIVE query language. It is suggested that initially no language capabilities be added for this specific purpose since the number and nature of reports on machine processes which CHIVE management will require is not completely predictable. The manual data collected in the system should initially be stored, retrieved, and processed by manual methods. As the system is used and evaluated, the file of manual data and the number of management reports will increase. At some point, this data must be processed by the EDP part of the CHIVE system. For this reason, it is important to design the manual data forms so that, as volume increases and operational procedures become firm, the data may readily be input to the computer and integrated into the EDP management data file. When this point in system evolution is reached, all manually collected data regarding CHIVE operations will be retrieved and stored by initiating a transaction. Thus, data about the processing of the Management Data File is recorded in the Management Data File and constitutes a resource which management may use to study its own evaluative and analytic activities. SYSTEM FILES Management Data File 5.5.9.2. - 137 - Approved For Release 2000/05/30 : CIA-MIM63952A000100050001-7 Approved For Release 2000/05/3gMhDP78-03952A000100050001-7 5.5.9.3. Reports and Their Use In discussing reports and their content, a distinction should be made as to when, during the evolution of the system, the reports are needed. This is particularly true in the case of those reports drawn from the EDP management data file prior to the incorporation of the manual data. The purpose of reports based on data collected by EDP methods is to assist the CHIVE analysts and designers in improving, correcting, and modifying the EDP portion of the system. During the initial stages of operational testing t will be necessary to examine EDP operations carefully in order to eliminate bottlenecks and optimize equipment usage. Certain reports will be highly specialized, e.g., an analysis of disk storage use over some period of time, and will not be necessary as a regular product. Of continuing interest will be reports which provide management with an insight into the amount of time used on the computer and its various components. This has long-range implications regarding computer hardware acquisition. Reports derived from the manually collected data will vary in frequency and detail as the system gains operational SYSTEM FILES Management Data File 5.5.9.3. - 138 - Approved For Release 2000/056tkW-RDP78-03952A000100050001-7 SECRET Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 acceptance. In any new system, there will be imbalances which must be adjusted if the best results are to be obtained from the available personnel and equipment. The parameters which can be measured within the system are primarily concerned with rates and volumes. It is suggested that forms be designed and procedures instituted which will provide managers with raw data on how long a transaction stays in each component. This is the first step toward the elimination of delay points in the system. Shifting of manpower and new procedures will undoubtedly be necessary. This in turn will prompt another round of reports and analysis. And so on. Of interest to manage- ment in terms of long-range changes to the system will be reports on sources and types of transactions. Such reports are generated by the present system and will be produced by CHIVE. Data on the number of cards produced, number of file accessions, number of references generated, and number of pages delivered will also provide managers with the necessary background for making adjustments in the processing of transactions. After all management data has been combined in the file maintained by the EDP system, reports can be SYSTEM FILES Management Data File 5.5.9.3. - 139 - Approved For Release 2000/05/30: CIA-14Mq3952A000100050001-7 Approved For Release 2000/05/3C6ECRERDP78-03952A000100050001-7 generated on a regular or demand basis with much less expenditure of manpower. The nature of the reports will probably vary little after the shakedown period is completed. However, the volume of data which must be manipulated dictates an EDP mode of report generation. SYSTEM FILES Management Data File 5.5.9.3. - 140 - Approved For Release 2000/056Lt.a4!1-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA98-03952A000100050001-7 Chapter 5.6. SYSTEM FLOWS AND TRANSACTIONS This chapter provides a more detailed view of system flows and transactions, i.e., the more dynamic aspects of the data processing activity, including some descriptions of illustrative tasks. The document image storage and delivery portion of the system is covered in outline only, leaving the more definitive treatment of this subject to Volume VI. Similarly, only passing mention is made of the EDP design since it is fully discussed in Volume VII. 5.6.1. DOCUMENT INPUT Referring to Figure 5-4, the input to the system will be described. The principal categories of incoming documents will consist of (a) textual-type documents received in all source classifica- tions ranging from Unclassified to T/KH, (b) select documents (principally SI Teletype) SYSTEM FLOWS Document Input 5.6.1. - 141 - Approved For Release 2000/05/30: CIA-MMQ3952A000100050001-7 25X1B Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Cli&-Rig-03952A000100050001-7 received in machine language (as well as hard copy), (c) graphic images in the form of ground photography and films, (d) maps, and (e) machine language (ML) index records prepared by external organizations according to CHIVE rules and formats. Graphics and maps will continue to flow to GR and the Map Library Division (ML) through their existing acquisition channels. The only significant change in their oper- ations will be that they will employ the CHIVE vocabu- lary in their indexing or cataloguing operations, and will transmit a copy of their index transcript sheets to CHIVE for conversion into machine readable form and entry into the Master Index File. CHIVE in turn will return to them a printed version of their index records for entry into their manual files where this seems desirable. Documents selected by the information analyst which are available in machine language and have a formatted header and title (e.g., SI Teletype) will bypass indexing and transcription steps and go, in their machine language versions, directly to the EDP SYSTEM FLOWS Document Input 5.6.1, - 143 - Approved For Release 2000/05/30 : CliSELSBM-03952A000100050001-7 Approved For Release 2000/05/SMWRDP78-03952A000100050001-7 System where the necessary conversion to CHIVE format will be performed. The hard copy versions of the documents will be sent simultaneously to microfilming for processinj into the microimage store (Master ImaIe File) Other machine language receipts, consisting of abstracts of foreign scientific and technical literature, bibliographic records, and formatted in- formation extracts pertaining to named-object data appearing in open sources, may likewise be input directly to the EDP System. Printed versions of these receipts, however, may be passed to information analysts within the system who will thereby be afford- ed the opportunity to review their content, and, if desired, delete the corresponding machine record from the EDP file. Since the source documents will not accompany these ML inputs, no photoprocessing will be required. The remainder of this section will deal with the principal input flow process depicted in Figure 5-4, i.e., that relating to all-source textual documents. SYSTEM FLOWS Document Input 5.6.1. - 144 - Approved For Release 2000/05/3?EatiRDP78-03952A000100050001-7 Approved For Release 2000/05/30 : 1Ig78-03952A000100050001-7 Upon their receipt in the mail room, these documents will be counted, batched by type, and assigned document control numbers where required. The batches will then be forwarded to a dessemination unit where the documents will be disseminated to other offices as well as to CHIVE. Documents to be dis- tributed to CHIVE will be divided into two categories; (a) reports for which CHIVE has a repository responsi- bility, and, therefore, must be kept regardless of substantive content (hereafter referred to as "R" documents); and (b) non-repository ("NR") documents whose retention value can only he determined after examination by an experienced intelligence information analyst. "R" documents (constituting the vast majority of incoming receipts) will be addressed to the appro- priate CHIVE subcomponent, but will flow initially to a centralized Header Indexing Group which will index the bibliographic data on the documents. ,Once this operation is completed, the documents will be trans- SYSTEM FLOWS Document Input 5.6.1. - 145 - Approved For Release 2000/05/30 : CSUM1778-03952A000100050001-7 Approved For Release 2000/05/SECM-RDP78-03952A000100050001-7 paitee(1 directly to the Document Delivery System for image processing into the Document Image File, while the header index would be sent to the EDP System for conversion to machine language. The "R" documents whLob, in all probability, would be the ones most Then re uested v Agency- customers in the period aebiately following their receipt, will (by this process) find their way quickly into the document store where they will be available for retrieval. Following image processing, they will be forwarded to the CHIVE analytical desks marked on the documents Lor content review and indexing where warranted. NR" documents will bypass the centralized Header indexing Group, being forwarded by the Dissemination Unit directly to the analytical components within: the CHIVE geographic divisions. Hore a further redistribution of some of the "R" is well as ?NR" documents might take placeif the iLLial dissemination was not sufficiently precise. Lo any event, the:: ultimate recipient if both types SYSTEM FLOWS Document Input - 146 - Approved For Release 2000/05/geater-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClWEREIT8-03952A000100050001-7 of documents will be an information analyst specializing in an area or topic within area. His responsibility, relative to the "R" documents, will be to determine whether content indexing is warranted in addition to the header indexing already performed. If not, he will destroy the documents and send a notice to the EDP System that no content index will be forthcoming. If the documents, however, do warrant content indexing, he will mark the parts of the documents which he wants reflected in the index, and will pass the marked docu- ments to a Content Indexing Group serving his Division. (The activity of these individuals is described below.) "NR" documents will likewise be examined by infor- mation analysts and will either be destroyed or marked for some form of indexing. If indexing is required, the documents will be sent first to header indexing clerks functioningat the division or desk level. They will prepare header transcript sheets, like their counterparts in the centralized Header Indexing Group. Where content indexing is not required but storage is SYSTEM FLOWS Document Input 5.6.1. - 147 - Approved For Release 2000/05/30 : CIASECRET-03952A000100050001-7 Approved For Release 2000/05/36EMTRDP78-03952A000100050001-7 desired, the "NR" documents will be sent to the Document Delivery System for microfilming, while their corresponding header transcript sheets will be passed to the EDP System. The remaining "NR" docu- ments (and transcript sheets) which were to be content indexed will be forwarded to the Division's Content Indexing Group where they will rejoin the select "R" documents discussed above. In the Content Indexing Group, semi-professionals known as content indexers will prepare content data transcript sheets by extracting and formatting the data identified for them by the information analysts. A selected portion of this work will be inspected and revised if necessary. Corrections and changes will be written on the data sheets. Once the content data transcript sheets have been prepared, the marked-up copies of the "R" documents can be destroyed since an image of these will already be available in the Document Delivery System. The indexed "NR" documents, however, will be forwarded SYSTEM FLOWS Document Input 5.6.1. - 148 - Approved For Release 2000/05/39EatfRDP78-03952A000100050001-7 IMO awl omit IMMO Approved For Release 2000/05/30 : Cl00171-03952A000100050001-7 to the Document Delivery System for processing into the Master Image File. Content data transcript sheets for both "R" and "NRudocuments will be sent to a Data Transcription Group where they will be copied by typists.* The typed index entries, after sight verification, will then be fed to the EDP System for machine processing. Within the EDP Subsystem, a Page Reader will convert the clear-text header and content indexes into machine language. Following this operation, punched Work Cards will be generated by the computer from a portion of the header data record which will be used in the Document Delivery System (see below) in the preparation of the microimage store. The complete digitalized records of the header and content indexes will be processed by computer programs which will check the records for format and certain types of content errors and add them to the pertinent system files. *Header data sheets can presumably be typed by the header indexers who prepared them. SYSTEM FLOWS . Document Input - 14 9 - Approved For Release 2000/05/30 : CIASIRCRE13-03952A000100050001-7 Approved For Release 2000/05/fMURDP78-03952A000100050001-7 In the Document Delivery System, documents to be kept in hard copy for reasons of length, image quality, or other will be shelf-filed in an area contiguous to the microimage file according to their meaningful document control numbers. The remaining documents will be routed to a microfilm section. There they will be photographed, and, assuming the storage medium selected is the 35 mm. aperture card; the resultant product will be an aperture card with the document batch and serial numbers eye-visible in the aperture. After these numbers are punched into the aperture cards, the aperture cards will be mechani- cally collated on these numbers with the deck of Work Cards prepared by the computer from the header data records to the same documents. Following collation, other data punched in the Work Cards will be reproduced and interpreted for the Vital Materials Repository (VMR) and NSA as appropriate. Lastly, a master set of the cards will be filed in document control number sequence in the Master Image File. SYSTEM FLOWS Document Input 5.6.1. - 150 - Approved For Release 2000/05/4ECIMRDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CligreF8-03952A000100050001-7 5.6.2. DOCUMENT RETRIEVAL Referring now to Figure 5-5, the recovery of information from the files will be discussed. The retrieval process will ordinarily begin with a customer external to CHIVE originating a request for data ether on a form designed for this purpose, by lepnone contact, or by personal visit to the system. He will be put in touch with an information analyst working on the geographic/topical area of concern. The information analyst will be familiar with the current reporting, having screened incoming documents to determine what should be indexed, and will also have had extensive training in the indexing vocabulary, the logical files available within the system, and the query language required to conduct the computer search. After ascertaining the clearance level of the customer, the degree of sensitivity desired in the search, and the heterogeneity of the document base to be explored (e.g., "search document and photo SYSTEM FLOWS Document Retrieval 5.6.2. - 1 1 - Approved For Release 2000/05/30 : CIPMREZ-03952A000100050001-7 25X1B Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Cl/frarr78-03952A000100050001-7 indexes, but not maps or films"), the information analyst (assuming a machine search is required) will translate the request into a set of commands using the formal language developed by CHIVE (see section 7.A,). To prepare the necessary search criteria he will consult the various Vocabulary Control Files-- e.g., MOFIF, ISC, etc.--in order to derive the proper terms on which the search should be conducted. This research might also reveal whether certain inherited files would be worth interrogating (see section 5.5.4.2.1.). Having determined what descriptors to employ in the search, he will obtain a request number from a central control point and proceed to fill out an inter-leaved set of request forms on which he will identify himself (as well as his customer) by name and address, cite the file (s) to be interrogated, detail the logic and priority of the search, and define the output format required. One copy of his request statement will then be sent to thereuest control point to be added to the file of open requests. Assuming, however, that some inherited files must also be searched since the SYSTEM FLOWS Document Retrieval 5.6.2. - 153 - Approved For Release 2000/05/30 : CIASBOREI-03952A000100050001-7 Approved For Release 2000/05/AE.CERRTRDP78-03952A000100050001-7 date span of the -request encompassed the ?period prier to the initiation of the CHIVE system, the information analyst may be required to take one or more of the following additional steps: a. Eeeemine hard copy files of cards or docu- ments co-located with his organization component. Reeeest the retrieval of hard copy records (c.e., AIRA, one-name cards, etc.) from the system's centrally-located, master document cellection. c. Consult uith other information analysts familiar with the contents, vocabularies, and record formats of machine files in- heritedeby CHIVE and obtain their assistance (here rcuired) in preparing the special request forms to interrogate said files. The formulated machine requests will be typed en,-, sight verified, and than transmitted to the Page .e.ealee via the pneumatic tube system. For those requests to be passed against the EDP files, the eomputer will check for such things as the complete- ness of the recruest statement and validation of the terms composing the query. All requests will then Le queued for processing against the pertinent SYSTEM FLOWS Document Retrieval 3.5.2. - 154 - Approved For Release 2000/05/4tderRDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClAS-ER-03952A000100050001-7 inherited and. CHIVE-built files.* Searches of unconverted EM files will be con- ducted as at present, with the output taking the form of existing machine listings which cite documents, personality dossiers, installation num- bers, or photo accession numbers relevant to the request. For files converted to EDP and the CHIVE- built Master Index File, the product of the search will also be a listing, albeit in a different form. On the first page(s) of the listing will appear the identity of the information analyst levying the request, the request itself, and the list of docu- ment control nuMbers which satisfied the search criteria. On succeeding pages will appear, depend- ing on the output format requested, either the complete "hit" index records or select elements thereof. (Output of a statistical count of the number of documents which matched the search *The periodicity of searches may differ between these files, i.e., inherited files may customarily be searched only once a day while the CHIVE-built files will be searched on a demand basis. SYSTEM FLOWS Document Retrieval 5.6.2. - 155 - Approved For Release 2000/05/30 : CIAM1RE8103952A000100050001-7 Approved For Release 2000/05/W8gRDP78-03952A000100050001-7 prescription, without the records themselves, is also possible if the information analyst so desires.). Codes appearing in the records would be translated into clear text for ease of understanding by the information analyst and customer (if the latter also reviews the listing directly). The information analyst will study the various machine listings received to determine the relevance of the retrieved records to the search prescription, and, particularly in the case of inherited file out- puts, will consult with other information analysts familiar with the contents and vocabularies of such files as required. In a certain percentage of cases the output records may, themselves, answer the request. If so, the retrieval activity will end with the ,information analyst transmitting the desired information by mail or phone to the customer. On the other hand, the response might have been such that he will wish to re-enter the request with .mproved criteria. SYSTEM FLOWS Document Retrieval 5.6_2. - 156 - Approved For Release 2000/05/Rckft-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Cl1-rtR-03952A000100050001-7 When the index record output is satisfactory but, in itself, does not supply the answer sought, the information analyst may order the pertinent documents from the Document Delivery System before transmitting .he results of the search to his custom- er for review. If so, he will encircle the appro- priate document numbers appearing on the first page of his listing and send this page to the Document Delivery System. Where inherited files, however, are involved he may be ordering personality or instal- lation dossiers, as well as documents, and will, therefore, follow a. slightly different procedure. Graphics and map index records uncovered during the _initial search will be transmitted to the customer who will order these items for himself. Dossiers, following their retrieval from the file, will be forwarded directly to the information analyst erequesting same. A replica, rather than the file copy of all other documents, however, including those recovered from the existing Intellofax and SR SYSTEM FLOWS Document Retrieval 5.6.2. - 157 - Approved For Release 2000/05/30 : CIAMMI03952A000100050001-7 Approved For Release 2000/05/36gMTRDP78-03952A000100050001-7 collections as well as from the microimage and hard- copy files of CHIVE, will be prepared before being transmitted to the analyst. The information analyst will review the output from the various document files, and, after removing those documents which do not appear to be pertinent, will transmit the response to the customer. Alterna- tivtely, the information analyst may be asked to respond to the inquiry by phone, memorandum, completion of a customer's response form, or by the preparation of a narrative report (e.g., a biographic summary). In the latter case, he would obviously have to supply information rather than documents, which might ne- cessitate a more sophisticated analysis and synthesis of the materials at hand. Lastly, the information analyst may update certain of his identifier records, as well as dossier files, to reflect the results of his analysis (see section 5.5.4.1.1.), or send a marked copy of his report (if it deserves retention) back through the input process for indexing and storage in the Master Image File. He will also return any master cards or SYSTEM FLOWS - 158 - Approved For Release 2000/05/39E;atTRDP78-0)@/13alMOSY7a1 Approved For Release 2000/05/30 : Cl1-ggi-03952A000100050001-7 dossiers to their appropriate files, and report the closing out of the request by completing his copy of the request form. The latter will be sent for processing into the Management Data Files. 5.6.3. INFORMATION FILE BUIMING, MAINTENANCE, AND RETRIEVAL As has been pointed out, the CHIVE system, like the existing central reference operation, will require a variety of dictionaries and other support tools (given the general title of Vocabulary Control Files in this report). In addition, it will maintain sub- stantive files of information either in unsynthesized or summary form. Since the procedures for building such files as well as retrieving data therefrom will differ substantially from the document indexing and recovery process, they are reviewed here separately. Moreover, these files, unlike the Master Index records, will require continual maintenance, i.e., the deletion of obsolete or useless data as well as the correction of or addition of information to, existing records in SYSTEM FLOWS File Building 5.6.3. - 159 - Approved For Release 2000/05/30 : CIAMM-03952A000100050001-7 Approved For Release 2000/05/3tRUCTRDP78-03952A000100050001-7 the file. The Master Index File, on the other hand, will require little maintenance at the sub-record level as such--only the addition of new records to the file and the periodic retirement of segments of the file to a less accessible storage medium. 5.6.3.1. Vocabulary Control File Maintenance Vocabulary Control Files (e.g., MOFIF, MLD, etc.) will be consulted by content indexers as well as header data indexers in order to select the ap- proved term or code for representing a subject or named-object mentioned in a document.* These files, initially, will be represented in listing form although some alternative reference medium will be intestigated. If the indexer finds no suitable entry for the topic mentioned in the document, or if the entry is erroneous or incomplete, he will prepare a File Maintenance Transcript Sheet on which he will specify the changes to be made to the file in question, *The maintenance of the personality identifier file (Master Dossier Index) is excepted from this discus- sion since, as the reader will recall from section 5.5.4.1.1., names will not be "identified" during the input process. SYSTEM FLOWS File Building - 160 - 5,6 3.1 Approved For Release 2000/05/3SELVEIRDP78-03952A000i00050001-7 Approved For Release 2000/05/30 : ClA5k6Pg-03952A000100050001-7 using a portion of the same command language employed in the retrieval of records from the Master Index File. The File Maintenance Transcript Sheet will be Passed to a dictionary editor who will be responsible for reviewing all changes made to this specific vocabu- lary control file. He will insure that the proposed transaction is legitimate and proper, and, after enter- ing the proposed changes by hand in his master listing, will forward the transcript sheet to the Data Tran- scription Group for typing. After the transcript sheet has been copied and any necessary corrections made, it will be processed in essentially the same manner as the Document index Transcript Sheets, that is, the forms will be convert- ed to machine language by the Page Reader and the resultant output fed to the EDP System for updating the pertinent machine files. A record of the changes made will then be printed out in the various arrange- ments required, and returned to the dictionary editor as well as all indexers using the particular vocabu- SYSTEM FLOWS File Building 5.6.3.1. - 161 - Approved For Release 2000/05/30 : CIA-ERWF03952A000100050001-7 Approved For Release 2000/05/311E:CdiETRDP78-03952A000100050001-7 lary control file affected. The frequency of preparation of these printed supplements to master listings, as well as the frequency with which the master listings themselves will be rerun, will vary depending on the number of changes occurring over a given period of time. The initial period of CHIVE operation will permit time for some experimentation to arrive at the most satisfactory procedure. 5.6.3.2. UIF and SIF Processing As indicated previcusy, formatted information files consisting of logical data units either in unsynthesized or summary form may be initiated either by: (a) analysts external to the CHIVE system having a pressing and continuing need for the retrieval of select facts (as distinct from documents) pertaining to a given subject or function; or (b) by CHIVE information analysts reacting to the accumu- lative effect of specific request patterns. Require- ments of this nature, since they will increase both the human and machine burden, will be reviewed by managers at the branch or higher level to determine, SYSTEM FLOWS File Building 5.6.3.2. - 162 - Approved For Release 2000/05/39EatTRDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClOOFF8-03952A000100050001-7 the anticipated load on the system and its capacity to respond to same. Accepted requests for the establishment of UIF or SIF files will be assigned to one or more infor- mation analysts conversant in the subject matter in- volved, for initiation of the input as well as main- tenance and retrieval processing. Assuming the data is to be stored in digital files, the information analyst responsible for the file will consult first with a specialist assigned to the EDP System known as an EDP File Analyst. The latter will be throughly familiar with the internal operations of the EDP System and, in particular, the method used to estab- lish new digital files. His duties would be analagous to those of an individual in the Planning Staff of the Machine Division/OCR, i.e., he will design the format and record. structure of the machine file re- quired by the information analyst and see to it that the file is actually established. In general, the approach of the area information analyst will be to use the document retrieval system - 163 - SYSTEM FLOWS File Building 5.6.3.2. Approved For Release 2000/05/30 : ciASEGREIT-03952mooloomoo1-7 25X1B Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Clgag-03952A000100050001-7 to help build the required information files. If the file, however, is to have the characteristics of an Unsynthesized Information File (see section 5.5.5, above), the actual involvement of the infor- matior analist in the input process may not be great since, presumably, the data requested is already re- flected in the content of document index records (i.e., the UIF would be built directly from re- arranged elements of index records).* Where this is indeed the case, the information analyst will periodically direct the computer to take such action by calling for the appropriate standing query and. record generation job to be run. SIF files, on the other hand, will require more activity on the part of the information analyst since they will consist of evaluated, summary records about named-objects or events. These can only be *If the data is not already being captured, then the request must be classified as a"special project" which would require a procedure all its own. SYSTEM FLOWS File Building 5.6.3.2. - 165 - Approved For Release 2000/05/30 : CIAMMIN03952A000100050001-7 Approved For Release 2000/05/3gaKRDP78-03952A000100050001-7 generated (as suggested in section 5.5.6.) by the analysis of the output from a UIF- file, from the Master Index File, or by the processing of the in- coming documents themselves. Assuming the SIF is to be built from data in a UIF, the information analyst will, review the listed product from a UIF, comparing it with a listing of any records already stored in the SIF. If he decides to make a change to the SIF either by adding new data, deleting what was there, or by replacing old information with new, he will prepare a File Maintenance Transcript Sheet (similar, if not identical, to that used to update vocabulary control files) on which he will describe the transactions to be performed. This form will follow the usual path to typing, thence to the Page Reader, and finally to the EDP System for computer processing. The retrieval of data from either the SIF or UIF files might be initiated for a variety of reasons, the principal ones being as follows: a. To provide a listing of changes to the master file in order to update the infor- SYSTEM FLOWS File Building - 166 - 5.6.3.2. Approved For Release 2000/05/acMRDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIAW/M03952A000100050001-7 mation analyst's printed version of the file. b. To provide a listing of the complete master file either for reference use by the infor- mation analyst* or for periodic publication and distribution to interested customers. c. To search, in response to a customer's request, for a specific fact or correlation of facts which could not be readily derived by human browsing of the printed records. Whatever the reason for initiating a retrieval transaction the process will be virtually the same as that followed in the retrieval of document index records (using the same retrieval language), with the exception that no inherited files should be involved in the search and no documents will ordi- narily need to be retrieved from the document image store. Schedules can, of course, be set up for the levying of standing queries which would cause the listing of all or a portion of a file on a periodic basis without any action being required on the part of the responsible information analyst. *The listing will be the primary mechanism for analyst-SIF communication. SYSTEM FLOWS File Building 5.6.3.2. - 167 - Approved For Release 2000/05/30 : CIA9KEIREEI-03952A000100050001-7 Approved For Release 2000/05/SECRETRDP78-03952A000100050001-7 5.6.4. TASK TABLES FOR SYSTEM TRANSACTIONS Examples of the step-by-step procedure by which some of the system transactions outlined above might be carried out using the equipment, file organization, program organization, and operator procedures described elsewhere in this report are provided below. Obvi- ously, there are a variety of procedures that might be used to perform any of these tasks. 4hat is sug- gested here must, therefore, be regarded as tentative and subject to modification as procedures are worked out in detail during Phase III. With regard to the method of presentation, it should be pointed out that written descriptions of even the most routine human activities make difficult reading at best. Anf.this is no less true of a data processing operation, especially when couched in the language of the systems analyst. Secondly, it is a fact that if flow charts were prepared of many current central-reference operations, the resultant products would also appear relatively complex. Yet, somehow, humans manage to carry out the operations involved. SYSTEM FLOWS Task Tables 5.6.4. - 168 - Approved For Release 2000/05/1%616RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClA44-03952A000100050001-7 Lastly, it should be recognized that some atypical problems .are covered in the task tables which would not ordinarily be encountered in the average trans- action. These, necessarily, further complicate the narrative discussion. The tables which follow have four columns. The first column (STEP) contains the number of the oper- ation. The number is used in the body of the table to reference deviations from the normal sequence of operations. The phrase, "go to step 10," will tell the reader that the next operation in the sequence is step 10. The second column (AGENT) identifies the person or equipment which is chiefly responsible for carrying out the operation. The third column (LOCATION) shows where most of the operation is carried out. The fourth column (OPERATION) has one or more sentences for each operation which describes what takes place in the operation. These are either processing operations, in which some action is taken on the data covered by the task table, or they are decision operations in which a question is asked and SYSTEM FLOWS Task Tables 5.6.4. - 169 - Approved For Release 2000/05/30 : CIA-EPTRET03952A000100050001-7 Approved For Release 2000/05/36MKTRDP78-03952A000100050001-7 the consequences are given for the two or more possible answers'. These consequences are usually in the form of "go to statements. The statement, "STOP," is the last statement in the OPERATION column for a particular task and indicates that the task is completed. SYSTEM FLOWS Task Tables 5.6.4. - 170 - Approved For Release 2000/05/3@gatiRDP78-03952A000100050001-7 STATSPEC Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Next 3 Page(s) In Document Exempt Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : aC078-03952A000100050001-7 Table 5-3 OVER-COUNTER DOCUMENT SEARCH Step Agent Location Operation 1. Requester Will vary Communicate available biblio- graphic identifying data on document (s) wanted by phone, mail, or in person to Docu- ment Delivery System, and , indicate response priority. 2. Informa- Document Prepare request form if not tion Delivery already made out. Control System Clerk 3. Informa- Document If control number is available tion Control Clerk Delivery System for the requested document, send one copy of the request form to the search unit respon- sible for the particular col- lection or sub-file in which the document would be stored; if the control number is not available, go to step 10. 4. Document Document If the document would ordinarily File Delivery be in the Microimage File, search Clerk System the motorized card file for the document control number cited and proceed to step 5; if the document would ordinarily be in the Hard Copy File, go to step 18, - 175 - Approved For Release 2000/05/30 : Clfirffe1VB-03952A000100050001-7 Approved For Release 2000/05/38EMIRDP78-03952A000100050001-7 Step Agent Location Operation 5. Document Document If the document is found, remove File Delivery document, replacing it. with an Clerk System "out" card, and send document with request form attached to reproduction; if document is not found, and it is in a Category for which the system has a re- pository responsibility, forward request to Hard Copy File searchers and qo to step 18. 6. Reproduc- Document Prepare paper copy of document tion Delivery on appropriate image-processing Equipment System equipment. Operator 7. Reproduc- Docament Transmit paper reproduction of tion Delivery document plus request form to Euuipment Operator System request receipt point, and re- turn master image to appro- priate files section for refil- ing. 8. Informa- Document Deliver copy of document (if tion Control Clerk Delivery, Jyste,ALL found) to requester. Otherwise notify requester that document is either still in transit or not available in CHIVE .(and why). If requester wishes, hold the request for a second search after a suitable time interval. - 176 - Approved For Release 2000/055birElp-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : eRg78-03952A000100050001-7 Ste- Agent Location Operation 9 Informa- Document Record temporary or final corn- , tion Control Delivery J.y; pletion of action on request form and transmit form to Data , , Clerk Transcription Group for typing - and subsequent insertion (via , , the Page Reader) into the ' , Management Data Files. End of , Over-Counter Document Search. . STOP. , 10. Informa- Document Telephone, or send copy of tion Delivery request form to, EDP System. Control System Clerk ii. Informa- Computer If a priority request, deliver tion Control Center to console operator; if not priority, send to key punching , Clerk and go to step 16. , 12. 'Computer Computer Key the request into the corn- Operator Center puter using the inquiry console.* 13, Computer Operator Computer Center Using the document identifying handles provided by the re- quester (e.g., post, airgram number, jPRS number, date, or other), search the header data portion of the Master Document Index File and print out the corresponding document control numbers. *Cross reference listings, arranged in various sequences, will also be available for consultation and may be used in preference to machine queries to recover document control numbers where this approach would be equally effective. - 117 - Approved For Release 2000/05/30 : CIAME78-03952A000100050001-7 Approved For Release 2000/05/36EMTRDP78-03952A000100050001-7 Step Agent Location Operation 14. Computer Computer Transmit results of printout Operator Center to Information Control Clerk. 15. Informa- Computer Telephone or transmit request tion Control , Clerk Center form with list of document con- trol numbers to Document De- . livery System. Go to step 4. 16. Key Computer Key punch search specifications 17. Punch Operator Computer Center Computer and transmit cards to operations section to await batch proces- sing, Insert the request into the Operator Center 1 computer and go to step 13. 18, Document File Document Delivery Search the appropriate. segment of the -Tard Copy File. If docu- Clerk System ment is found, remove document, , replacJ.ng with an "out" card, send document with request form attached to reproduction, and go to step 6. If document is not found, so indicate on re- quest form, send request form back to receipt point, and go to step 8. , 178 Approved For Release 2000/059Lecet-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Cg-RBg8-03952A000100050001-7 Table 5-4 GENERATION AND INPUT PROCESSING OF FORMATTED INFORMATION/INDEX RECORDS PREPARED UNDER CONTRACT* Step Agent Location Operation :. i Informa- tion Contractor Receive and log in the periodical, monograph, or other publication , Control to be exploited. Clerk 2. Informa- Contractor Obtain code designation (if a tion serial) from an official list Control and enter same on a routing .: Clerk sheet clipped to the publica- , 3. Informa- Contractor tion. Sort and distribute publications tion to appropriate translators de- Control pending upon language or content . Clerk 4. Trans- Contractor of publication, Scan content of publication for labor data of interest to CHIVE and determine elements of informa- tion to be extracted. *This table illustrates the procedure which might be followed where the following conditions prevail: (a) CHIVE can influence the automation of data at the source- (b) the elements of information to be extracted lend themselves to a highly formatted record structure. Information of this type which enters the central reference system now, but only in hard copy, includes the Political and Scientific Biographic Cards from JPRS, Bibliographic Cards from the MIRA contract at the Library of Congress, abstracts of scientific articles from FDD, etc, - 179 - Approved For Release 2000/05/30 : CISEEIRE78-03952A000100050001-7 Approved For Release 2000/05/?kaETRDP78-03952A000100050001-7 Step Agent Location Operation 5. Trans- lator Contractor Type formatted transcript sheet for Ach article, monograph, or other, containing the pertinent information required. 'Enter . . data in English in the appro- priate columns or spaces pro- vided, and in the coding con- vention required where this does not require dictionary consul- tation. For the latter (e.g., organization names), enter descriptor in clear text. Type "remarks" - type information, the abstract body (if a scien- tific article), and similar un- formatted text at the end of the index record. 6. Trans- lator Contractor Clip transcri9t Sheet to publi- cation and transmit both to coding group co-located with the Contractor or internal to CHIVE'. 7. Content Contractor Add codes, where required, on to Indexer or CHIVE transcript Sheet in addition to clear text after consulting per- tinent CHIVE dictionaries. 8. Content Contractor Return publications to file and Indexer or CHIVE send transcript Sheets to typists. 9. Typist Contractor or CHIVE If typed product is to .be read by CHIVE's Page Reader, type entries in form of hard copy; otherwise, generate paper tape as well as hard copy on Flexowriter-like device and go to step 11. - 180 - Approved For Release 2000/05/4tcllerRDP78-03952A000100050001-7 IP Approved For Release 2000/05/30 : AgaT78-03952A000100050001-7 Step Agent Location Operation , 10. Page Reader CHIVE Read typed copy and feed machine- language product to computer. 11. Computer CHIVE Process records into Master Docu- ment Index File. , . , 12. Computer CHIVE If CHIVE area desk most concerned ' with input records generated by ' . contractor does not desire to ' review additions made to the files, , ' . input process is completed. End of Input of Formatted Index Records Prepared under Contract. STOP. If opposite is true, print out (on a periodic basis) a hard . , . copy listing of new records enter- ing system, transmit listing to . appropriate CHIVE area desk, and go to step 13. 13. ' informa- Lion CHIVE Scan output listing for unwanted items. Analyst , 14. Informa- tion Analyst CHIVE Prepare a File Maintenance Tran- script Sheet containing the usual job specifications (e.g., trans- action originator, classification, file to be addressed, date, etc.), the numbers 3f the unique records to be add:ressed, and the operation (presumably a "delete") to be performed. - 181 - Approved For Release 2000/05/30 : CISMET8-03952A000100050001-7 Approved For Release 2000/05/WW-RDP78-03952A000100050001-7 Step Agent Location Operation 15. Informa- tion Analyst CHIVE Send transcript sheet via typing and Page Reader to EDP System for processing. 16. Computer CHIVE Delete unwanted recorda from the pertinent file.* *An alternative approach to that taken in steps 13-16 would have the information analyst responsible for the file make use of a remote display device to screen additions to the file and make deletions thereto. Indeed, such a device could be introduced much earlier in the input cycle as the means by which codes would be added to the records and any unwanted entries deleted before file updating is actually undertaken by the computer. - 182 - Approved For Release 2000/05gedat-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : 6FNCAg78-03952A000100050001-7 ten 1. Table 5-5 INFORMATION ANALYST ACTIVITY RELATIVE TO A.N .A.LL-SOURCE, ALL -FILE SEARCH FOR A NAMED PKSONALITY Agent q!ation Anaayst ';aalyst Infor- itation Analyst Infor- mation Analyst Location Operation -All vary - 183 - Obtain available identifying data (e.q., name, citizenship, occupa- tion, affiliation) on personality wanted by phone, mall or in per- son from requester. If request has been levied on right party, accept same if request has been levied on riqht area desk but wrong Information Anal7st (-because, on this desk, 1-..h re is more than one analyst and each specializes in a differ- ent topic), transfer request to correct individual. Obtain request number from cen- tral control point an::.1 enter in First section of interleaved. re- cfuest form no elementary data needed for logqinrj purposes, I .e., name of rerTuester, date, name of analyst handling request, etc. Send one copy of request form to control point for filing with. other ''opon" requests. Approved For Release 2000/05/30 : CISEalbe78-03952A000100050001-7 Approved For Release 2000/05/AEWRDP78-03952A000100050001-7 Step Agent Location Operation S. Informa- tion Analyst C.G.D. Search Master Dossier Index list- ing for references to inherited as well as CHIVE-built dossiers. If an entry for the personality is found, extract dossier number and date dossier identifier record was last updated. . Informa- tion Analyst C.G.D. Enter in the query statement section of one copy of the inter- leaved request form the specific search parameters to be used in querying the CHIVE-built Master Index File. For example, if the Name Group Table is to be used, enter single spellings of both surname and personal names; if the name group feature is to be bypassed, enter the specific variant spellings to be included in the search; if FNU's are not wan4-.er1, so specify; if a dossier is avaiMble on the personality, exclude unwanted references already on file in the dossier by specifying that the date of preparation of any document index record containing the desired name should not be of a lesser value than the date the dossier identifier record was last up- dated. Also list any other factors 'which will serve to limit the scope of the search e.g., citizenship, general or specific occupational category, date of birth range, etc. - 184 - Approved For Release 2000/05/MM-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : MCRN78-03952A000100050001-7 [Step Agent Location Operation . ' , ' . . . . , 7. 8. 9. Informa- tion Analyst , , ' , ? . . ? ' , Informae ? tion Analyst Informs.- tion ? Analyst C.G.D. . C.G.D. C.G.D. ? Assuming the Special Register (SR) name index portion of the Detail File to Comint Reports has not been integrated with the CHIVE Master Document Index, complete the copy of the inter- leaved request form used for searches of the SR Detail File consulting (as necessary) with an Information Analyst familiar with the vocabulary and file structure of this inherited file system. Refer to the printed version of the Name sGroup Table to help select the variant name spellings to be searched in this file, and also include any variant spellings required if the transliteration system employed in this file is unique. If a dossier was discovered on the personality in step 5, enter its number on the dossier re- trieval copy of the request form. Forward the completed request form resulting from step 6 Ithrough typing and. Page Reader to the Computer Center for re- trieval of the pertinent index ,records from the CHIVE-built Master Index File; forward the request form resulting from step 7 directly to the Computer Center for manual retrieval and subse- quent listing of the relevant name records from the punch card - 185 - Approved For Release 2000/05/30 : CReRE178-03952A000100050001-7 Approved For Release 2000/05853CRETX-RDP78-03952A000100050001-7 Step Agent Location Operation file, inherited from SR; forward the dossier request form to the hard copy section of the Docu- ment Delivery System for re- covery of the dossier desired. 10. Informa- tion Analyst C.G.D. Telephone or communicate in some other fashion the details of the request to the Graphics Register (GR) for manual retrieval of photographs on the individual wanted from the inherited GR Per- sonality Photo File. (Photos on the person processed subsequent to the initiation of the CHIVE system will be uncovered, in- itially in the form of index records, in the computer search of the Master Index File refer- red to above.) 11. Informa- tion C.G.D. While awaiting receipt of the listed index records from the Analyst Master Index File and SR Name File, as well as the arrival of the hard copy dossier and photos, investigate any self-indexed card or document files on per- sonalities inherited from BR which may be located either with the area desk or in the central hard copy files of the Document Delivery System. Also examine any Supplementary Files (e.g., Who's Who publications, commercial indexes, etc.) avail- able at the area desk. - 186 - Approved For Release 2000/05atR@Ifk-RDP78-03952A000100050001-7 4110 Approved For Release 2000/05/30 : CgRg8-03952A000100050001-7 Step, Agent , Location Operation . , . ? , , , " . , . . ? 13. , . . . . - . 14. Informa- tion Analyst Informa- tion Analyst Informa:- tion Analyst C.G.D. . e.G.D. ? ? . , . , C.G.D. Upon delivery of the index list- ings frnm the Master Document Index and SR lame File searches, iLeLne eie references printed out to determine whether they indeed refer to the person sought. Consult again, if necessary, with an Information Analyst familiar with, the SR system to I interpret the output from the SR file. Assuming the request will not be rerun with improved criteria, identify the documents desired by encircling the appropriate document numbers appearing on the first pages of the listings. (Alternatively, the listing may be on a two-part form which will allow the Information Analyst to keep a carbon copy of the index record listing after using the original as an order for docu- ments.) Transmit the document orders to the Doeument Delivery System, and any photo control num-Hers to GR, for retrieval and reproluction of the items desired. *It is assumed, for the purposes of this table, that all material available on the personality being searched must be examined before a response can be made to the requester. For this reason, the search cannot end with the retrieval of an index record or card from a manual file, - 187 - Approved For Release 2000/05/30 : ClikEME78-03952A000100050001-7 Approved For Release 2000/05/AMKRDP78-03952A000100050001-7 ' Step Agent Location Operation 15. , Informa- tion Analyst C.G.D. ' Assemble all material collected from the various document re- positories (i.e., hard copy dossier . . _ eproductions of documents from the CHIVE.Mastef Image File, inherited Comint Document File, and GR Person- ality Photo File : . . original items pulled from self-indexed card or document files . . . and reference works from the Supplementary Files). :Remove those items which, after analy- sis of the documents themselves, prove to be unrelated to the person in question, and prepare the response in the manner re- quested by the customer. End of All-Source Search for a Named Personality. STOP. - 188 - Approved For Release 2000/05fttailk-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClOOF7T8-03952A000100050001-7 Chapter 5.7. FILE CONVERSION S.7.1. INTRODUCTION Of the many types of extant central reference files which might be candidates for full or partial conversion to the CHIVE system, two are of primary concern. These are the document index and document image type files. In the former category are such files as the following: - SR Detail Index File (Comint) - SR Detail Index File (PI) FIB Active InstallationIndex File BR Dossier Index File IRS Document Index File - GR Ground Photo Index Inherited document image files include: - IRS Document File (includes aperture cards and hard copy) SR Comint Document File - BR One-Name File - FIB .Active Installation File (includes cards and folders) - BR Dossier Folder File There are, of course, many other types of central reference files in addition to those listed above, including some already in machine language. FILE CONVERSION Introduction - 189 - 5.7.1. Approved For Release 2000/05/30 : CIA-gpen-I33952A000100050001-7 Approved For Release 2000/05/3WARTRDP78-03952A000100050001-7 Most of these, however, are either information files of such short-term interest that there would be little reason for converting the existing records, or are vocabulary control type files which, while they might be used to build analogous CHIVE indexing and retrieval tools, would not be converted per se. The discussion in this section, therefore, will cover only index and image files, in that order. 5.7.2. DOCUMENT INDEX FILES 5.7.2.1. Reasons for Conversion One of the most important reasons for converting tha inherited files to the CHIVE system would be to , create a truly centralized source of reference data and information for the Agency. Conversion of the existing document index files to magnetic tape under the CHIVE system would provide a means of establishing effective data systems management. The conversion of the inherited files would result in a reduction in the total number of document index files that would have to be maintained. In addition, conversion of these files would tend to FILE CONVERSION Index Files 5.7.2.1. - 190 - Approved For Release 2000/05Abalf-RDP78-03952A000100050001-7 mei 011111 mstsi mommi mot mirso Approved For Release 2000/05/30 : ClkW8-03952A0001000500017 simplify the operating procedures of the document indexing and retrieval system. By converting, only one set of procedures would be needed as opposed to a set of procedures for the inherited files and a different set of procedures for the CHIVE-built files if conversion were not undertaken. Further- more, a reduction in the total number of personnel in the document indexing and retrieval system and a reduction in space should be obtained by converting the inherited files. 5.7.2.2. Degrees of Conversion There are at least three different degrees or types of conversion that are possible. The first is a direct conversion and is probably the simplest and least expensive. Direct conversion means simply that the card image would be converted directly to tape. This type of conversion would not reduce any of the duplicative information existing in the card files. Moreover, it is the least desirable because it would provide the least amount of flexibility. FILE CONVERSION Index Files 5.7.2.2. - 191 - Approved For Release 2000/05/30 : CIA-FOISe4tET3952A000100050001-7 Approved For Release 2000/05/36MEIRDP78-03952A000100050001-7 The second type of conversion is to convert the card files to the CHIVE format. This would eliminate any redundancy existing in the card files by pulling all data that was indexed on any particular document into one logical CHIVE record. This type of conver- sion is more desirable since it would provide good flexibility and would eliminate the built-in redun- dancy of the existing card files. The third type of conversion would be a complete conversion, both syntactic and semantic. The syntactic as-Deets of the change would be similar to that de- scribed in the preceding paragraph. The semantic or vocabulary conversion, however, would re7luire a con- siderable amount of intellectual participation by analysts from the respectiveareas where the inherit- ed files originate. This type of conversion would be the most desirable and most flexible, but it would also be the -mot complex and difficult to accomplish. FILE CONVERSION Index Files 5.7.2.2. - 192 - Approved For Release 2000/0WthaA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIWER*8-03952A000100050001-7 SR Detail Index File Study Of the various document index files described in Appendix 5.D., only one has been looked at in any detail to ascertain the conversion possibilities. That file was the SR Detail Index File--the study being performed to determine the advisability and. Air feasibility of converting the file from cards to magnetic tape or to a direct access device. Some of 04.04 the findings of this preliminary study are presented mw below, Whether these are representative of similar conclusions that might be reached vis-a-vis other inherited document index files after investigation f their individual conversion potential, one cannot say. Further study of the entire problem will be required during Phase III before any final recom- NNW arid mendations can be made. The following are the data that were collected from the SR study. The number of cards, in millions, that would have to be read to convert all of the Detail FILE CONVERSION Index Files 5.7.2.3. - 193 - Approved For Release 2000/05/30 : CIA-SEMISI03952A000100050001-7 Approved For Release 2000/05/306WRIDP78-03952A000100050001-7 Index File is as follows: No. 1 File-Subject/Commodity No. 4 File-Area 7.2 4.0 No.'s 2,3,6,7,8,9 Files-Organi- zation and Personality 4.1 15.6 This means that 15.6 million cards would have to be read to acquire all of the data in the current Detail Index File. This data applies to conversion to tape or conversion to a direct access device. Both approaches are discussed in the following sections. 5.7.2.3.1. Conversion to a Magnetic Tape File For the first part of this study, it was assumed that the Detail Index File would be converted to one long file ordered on series-document number, with all data pertaining to any one document constituting a logical record. The file size converted to tape would be approximately 930 million characters. This would result in approximately 40 tapes for the master file, with that many as first backup also. This indicates that at least 80 tapes would be required at any one FILE CONVERSION Index Files 5.7.2.3.1. - 194 - Approved For Release 2000/05atatt-RDP78-03952A000100050001-7 !al VIP Approved For Release 2000/05/30 : Cl/W6P8-03952A000100050001-7 time to represent the file on tape. Assuming a thousand cards per minute input rate with 20% allowed for manual handling, this results in 307 hours of 360/Mod 30 machine time to read the file in, This is n-Juivalent to approximately 1.3 months of Mod 30 time (eight hours per day), Assuming the read-in is performed on extra shift, the minimum cost would be $3,000. In addition 30 to 35 hours of 7090 or 360/Mod 60 time would be needed for sorting., merg- ing, and file building. This cost would amount to approximately $14,000 . Programming and analysts costs are estimated at $10,000. Therefore, an initial cost or conversion would, at a minimum, cost about $27,000. It would. take a minimum of three hours to read a tape file of this size. An additional half-hour per day would be required for input request processi.ng, sorting of input and output, output processing, output and maintenance. It was assumed that the Mod. 60 would. be used to do the search processing. This would amount to approximately $200 per hour. Assuming a once-a-day search, '1:..he approximate monthly machine rental to per- form the maintenance and retrieval of the SR Detail FILE CONVERSION - . 195 - Index Files ?Approved For Release 2000/05/30 : CIA-FEBERE13952A0B100050001-7 Approved For Release 2000/05/MMTRDP78-03952A000100050001-7 Index File would be $15,400. This is approximately. two-and-a-half times the present EAM rental of SR's: Machine Branch. Turn-around time on requests would suffer by converting a large file of this nature to tape. The SR personnel contacted indicated that a 24-hour turn- around on all requests would be unacceptable. They further indicated- that approximately 20% of the re- quests handled by SR require a two-hour-or-less re- sponse time. These priority requests are spread throughout the file, not just in a selected portion of the file. The amount of space presently occupied by the SR Machine Branch (card files and EAM gear) is ap- proximately 4,300-square feet. A reasonable value to place on this would be about $4 per square foot, per year. Assuming that 3,000 square feet of this area could be saved by -conversion, this would result in an effective savings of $12,000 per year. FILE CONVERSION Index Files 5.7.2.3.1. - 196 - Approved For Release 2000/0NeRetA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClA041-03952A000100050001-7 5.7.2.3.2. Conversion to a Direct Access File Slightly different ground rules were chosen for this technique than were used on the "long tape file." Instead of trying to form one logical record from all the cards existing in the Detail File which originated from any, one document, the existing file structure was assumed to be transferred to the Data Cell. Also, it was assumed that a directory or access file of a very simple nature would be maintained to enhance re- trieval on this file. It was further assumed that the IBM 360/Mod 60 would be used to build the file and perform the operational activities required of the file. The file would reside on a 2321 Data Cell which has a capacity of 400 million characters on line stor- age. However, the cells on a Data Cell Drive may be changed much in the same manner that tapes are changed on tape drives or disk packs on disk drives. Only one Data Cell Drive, which can have a maximum of ton cells on line, is required. The converted Detail Index File would occupy approximately thirty cells assuming about 75%, packing. FILE CONVERSION - 197 - Index Files Approved For Release 2000/05/30 : CIA-BreR6103952A001MQVA01 Approved For Release 2000/05/35EMTRDP78-03952A000100050001-7 The read-in of the file, assuming a thousand cards per minute reading rate and 20% handling, would take 307 hours on the 1402 attached to the Mod 60. The data could be read on to tape as an interim measure to save some rental on the Data Cell. How- ever, some of these savings may be absorbed by addi- tional programming costs. Assuming the Mod 60 would be operating in a multi-programmed mode, the cost )f initial conversion would be as follows: Reader (1402) $ 1,600 Channels 200 Tapes 3,000 Data Cell 200 CPU 100 Analysis and Program- 10,000 ming $15,100 As was mentioned earlier, the structure of the file would be the same as exists presently in cards Therefore, no sorting for the input conversion is needed. Retrieval on the file would take advantage of the directory to reduce the number of records that must be read to satisfy a request. A rough estimate of the average number of cards accessed from the exist- FILE CONVERSION Index Files - 198 - Approved For Release 2000/05/?kle-RDP78:8362iDdIAY0050001-7 Approved For Release 2000/05/30 : Clagg-03952A000100050001-7 ing file is in the range of 60 to 70 thousand per request. Therefore, 100 thousand cards per request was assumed as a very safe estimate for the direct access file. Assuming 10 requests per day (based on current usage), this results in approximately one million cards being processed per day. The average time of 137 microseconds per card was estimated for card processing. This results in approximately 0.83 hours per month CPU time. CPU time for re- trieval and maintenance is 1.83 hours plus about 10% for handling which equals approximately two hours per month. This results in approximately $300-400 per month rental for the Mod 60 (for everything except the Data Cell). A range of costs are provided instead of more stable figures because of the dif- ficulty in estimating for a multi-programming environment. Estimated use of the Data Cell is approximately 20 hours per month for retrieval and 54 hours per month for maintenance if the entire file is passed each maintenance run. These two functions result in FILE CONVERSION Index Files 5.7.2.3.2. - 199 - Approved For Release 2000/05/30 : CIA-NeREP3952A000100050001-7 Approved For Release 2000/05/3WRIRDP78-03952A000100050001-7 approximately $1200 a month rental. 5.7.2.3.3. Summary The comments in this summary generally apply to both parts of the study except where specifically stated otherwise. The following table of data was provided, with some modifications by SR personnel, from the Report: 25X1A File Request Rates Searches/Mo. Searches/Day Requests/Mo. Requests/Day No. 1 32 1.5 167 7.6 No. 4 62 3.0 443 20.0 No. 8 24 1.+ 41 2.0 No. 7 14 0.6 88 4.0 No. 6 11 0.5 33 1.5 No.'s 2,3,9 73 3.3 1076 50.0 215 10.0 1848 85.1 The table shows, as the headings indicate, the average requests per month and day. It should be noted that 90% of the requests against the No. 4 (Area) and No.'s 2,3,9 (Personality) files are selected by manu- ally browsing the files. This means that 57% of the SR requests are handled manually. Further, from these facts, it is seen that the conversion to tape or direct FILE CONVERSION Index Files - 200 - a. 3 Approved For Release 2000/05artalt-RDP78-03AzAuuth00050001-7 Approved For Release 2000/05/30 : Clfr'W8-03952A000100050001-7 access file would effectively replace an EAM system that is handling an average of only 93 requests per month. This usage rate is very low. Even if the total request rate were used, it would still be a low usage rate for a computer driven file. The last statement is made for two reasons. First, if the actual number of requests (from a computer file point-of-view) were 215 per month, it would be highly questionable whether this would be large enough to warrant conversion. Second, the original 215 "requests" do not actually represent that many requests from a tape or direct access file standpoint. To explain--a sheet of paper entering the SR machine area containing instructions for searching a. file may ask for references relating to pipes, paper, and cars. These parts are treated as three requests, not one, even though they all would go against the same file. However, this would repre- sent only one request against the file from a tape or direct access point-of-view. Therefore, the total request rate of 215 per month would have to be divided FILE CONVERSION Index Files 5.7.2.3.3. - 201 - Approved For Release 2000/05/30 : CIA-SE1RET03952A000100050001-7 Approved For Release 2000/05/36EUETRDP78-03952A000100050001-7 by some factor to reflect how many requests this would represent in a tape or direct access system. Data on what this factor should be is not available. at this time. On the basis Of these findings it is recommended that the total Detail File not be converted to magnetic tape. On the other hand, conversion of the Detail File to Data Cell storage appears to be economically feasible. The costs of performing the conversion and doing the required retrieval on a Data Cell attached to an IBM 360/Mod 60 are reasonable. Also, the turn- around time on a request is satisfactory since it would only take a little over five minutes to read and process the required 100,000 records to answer a request. This should leave adequate time for coding, and outputting the request. The decision to convert this file, however, cannot be based on these technical considerations alone. The usage rate must also be carefully ap- praised. Finally, it is important to remember that this conversion problem is but one of many CHIVE FILE CONVERSION Index Files 5.7.2.3.3. - 202 - Approved For Release 2000/05abiqyk-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Clagg-03952A000100050001-7 implementation tasks which must be addressed during the next 18 to 24 month period. 5.73. DOCUMENT IMAGE FILES A comprehensive list of existing document image files is contained in Appendix 5.D. along with a capsule description of the function and activity characteristics of each. Also included for most files is an appraisal of each file's susceptibility to being segmented according to geographical area as a means of transition to the creation of an all-source document file. This section will discuss the conversion alter- natives and recommend a posture for concurrent oper- ation of inherited and CHIVE-built document image files. It is felt that the approach presented will constitute a basis for orderly implementation of a new central document reference facility. It is appropriate, first, to look at the reasons why conversion to a single document system should be considered. The overriding argument for such a step is to eliminate the multiple reference points that an FILE CONVERSION Image Files 5.7.3. - 203 - Approved For Release 2000/05/30 : CIA- 3952A000100050001-7 Approved For Release 2000/05/3ggaDP78-03952A000100050001-7 analyst must currently consult and present to him a central reference point where a comprehensive response to his request can be provided. A further incentive for conversion to a central document system would be intra-Agency standardization of: - File media and techniques - Microfilm processing and reproduction equipment - Hard copy quality and format Conversion to a central repository and reproduction facility also presents a potential for reducing oper- ating costs by combining similar clerical efforts, and by facilitating the use of more advanced proces- sing devices. Assuming then that there are advantages to be derived from converting to a centralized document reference facility, let us consider to what degree this could reasonably be accomplished. Of about 25 document image files which are candi- dates for conversion (files enumerated in Appendix 5.D.), many can be excluded from consideration as candidates for conversion. A policy decision has been made to, FILE CONVERSION Image Files 5.7.3. - 204 - Approved For Release 2000/05/SKIZELT-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIMW8-03952A000100050001-7 the effect that only textual documents are to fall within CHIVE repository responsibility. This im- mediately excludes all graphic files (i.e., photo, film, slide, and map files) which are to remain the respective responsibilities of GR and the Map Library. Another major group of files are the dossiers Which are subject-oriented folders relating to per- sonalities, organizations, and installations. These files are maintained and referenced by information specialists who generally act as intermediaries between the consumer and the files. It has not been demon- strated that this type of information reference service can be improved by conversion of the existing files to another storage medium. Consequently, for the present it will be assumed that these files, which are primarily under the cognizance of BR and FIB, will be retained in their present form. The foregoing exclusions restrict the discussion, then, to document image files currently maintained by the Library (Intellofax) and SR. These files are FILE CONVERSION Image Files 5.7.3. - 205 - Approved For Release 2000/05/30 : CIA-IRBER103952A000100050001-7 Approved For Release 2000/05/306KROP78-03952A000100050001-7 characterized by direct reference activity by the consumer, and, in most cases, respond by furnishing the consumer with a document. Primarily, they fulfill a document retrieval function rather than an infor- mation retrieval function, and, as such, are prime candidates for initial implementation as part of a centralized document reference service. Other files may prove suitable for incorporation into such a facility, but they should be evaluated on an ad hoc basis after a nucleus system has been established. Our recommendatiOn, therefore, is that an all-source document reference facility consisting of document image files within Intellofax and SR be a design goal for the initial system. It should be pointed out that the document system i5 largely independent of the CHIVE computer/indexing effort and consequently could be implemented prior to placing the EDP system on an operational basis. The centralization goal could be attained either in one step or on a modular basis. Either all incoming documents from the two systems could be incorporated, up FILE CONVERSION Image Files 5.7.3. - 206 - Approved For Release 2000/05/nagy-RDP78-03952A000100050001-7 111.111. RIP Approved For Release 2000/05/30 : ClOEFF8-03952A000100050001-7 into the new system, or some portion of each (such as Chicom materials) could be assimilated into the CHIVE-built system. The latter approach offers the advantage of limiting the volume during an initial shakedown phase. The question remains as to how such an all- source document reference capability could be instituted. Essentially, it involves the problem of somehow combining two diverse inherited systems and integrating these with a third, new CHIVE-built system. As a fundamental tenet, total conversion of the existing document image files to the newly adopted file medium is not warranted or practical. The in- herited files are very large in volume, having been accumulated over a number of years. Conversion to virtually any new system would require a copy of the document to be completely re-photographed and re- processed into the new file medium. Some partial conversion to the new system might prove advisable for any segment of the file where high reference activity, over a long term, can be anticipated. FILE CONVERSION Image Files 5.7.3. - 207 - Approved For Release 2000/05/30 : CIA-BeeRE-1)3952A000100050001-7 Approved For Release 2000/05/AECAURDP78-03952A000100050001-7 However, because of the low activity rate of the total file, the cost of converting records which will never be active should be avoided. The recommended posture, therefore, is that inherited files will not be con- verted from their current form but will merely be co-- located within a-single area along with the CHIVE-built files. The appropriate processing equipment will be installed within this same area and a single reference point will be presented to the consumer. Requests will be serviced through the appropriate systems, and responses furnished through a single distribution point ,.ffnere the proper enforcement of security restraints will be administered. The inherited files will be retained for reference purposes only and will not be augmented. All new items introduced into the file will be assimilated into the CHIVE-built system. It is recognized that the recommended approach perpetuates existing files and techniques while intro- ducing one additional document system to operate con- currently. Nonetheless, this approach seems to be FILE CONVERSION Image Files 5.7.3. - 208 - Approved For Release 2000/05Aftralf-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Clk-RFA-03952A000100050001-7 the only feasible way to cut over to a single, standardized document system and also eliminate the extreme cost and effort associated with a large- scale retrospective conversion. Experience has shown that there is a bias of reference activity toward more recent materials which would effect a gradual phasing out of the inherited systems with the growth of the CHIVE-built document file. FILE CONVERSION Image Files 5.7.3. - 209 - Approved For Release 2000/05/30 : CIA-IneRn3952A000100050001-7 Approved For Release 2000/05/30 : CIAWIK03952A000100050001-7 Chapter 5.8. COMPUTER INTERFACE 5.8.1. GENERAL The EDP portion of CHIVE will perform the follow- ing functions: - Build and maintain files - Create sub-files from existing files - Search files and retrieve data from them - Display data The techniques chosen to implement these functions provide a built-in flexibility that will also allow revisions in the definition of the content and struc- ture of CHIVE-built files. In a computer based system, special effort must be oe'oted to inputting data, searching for it, re- organizing it, and subsequently displaying it. An integral -.part of the EDP system is a command language that allows these types of manipulation. It is recog- nized that "unlimited" flexibility is allowed if the user can be persuaded to use machine language. More practically, a set of commands is provided that COMPUTER INTERF7V?E - 211 - General Approved For Release 2000/05/30 : CIAME15-03952MWM050001-7 Approved For Release 2000/05/1ECOKRDP78-03952A000100050001-7 permits personnel other than programmers to use the EDP system. The CHIVE command language is fully described in Appendix 7.A. The language allows the user to direct the performance of the four functions mentioned above. Full use of the commands requires good knowledge of the indexing procedures, logic, and the content and, structure of the records and files to be manipulated. It is planned that only information analysts, diction- ary editors, and, to some extent, content indexers, will be trained to use the language. The responsibilities concerned with defining new files and modifying existing file definitions will be assigned to the EDP file analyst. (See section 5.2.3. Lor further description.) The EDP file analyst must be trained to a level similar to that of a programmer, since he must be able to specify files to the system, initiate jobs for the machine operations personnel and participate in subsequent check-out. 5.8.2. COMMAND LANGUAGE The command language permits the information COMPUTER INTERFACE Command Language 5.8.2. - 212 - Approved For Release 2000/0513%EateDP78-03952A000100050001-7 WA Wok p. , Approved For Release 2000/05/30 : CIAW8-03952A000100050001-7 ana ysts to direct the EDP system to provide desired :esults and.. products. The first consideration of the user 13 to build and maintain files. The usual file maintenance o-oerations are provided. They are: - Adding new data to a file - Changing existing data - Deleting existing data The user can control the file maintenance operations in either of two ways The first way is the usual one of specifying a unique record identification and then having the desired maintenance perforMed on that record. The second ,4ay is to specify logical condi- tions t'hat coulJ TIllalify a sinrIle record or many records ithin a file for the specified maintenance operation. For example, it may be desired to change the names of all factories named. the Stalin Works to In such a ease it is only :accessary to sst up the test condition with a replace command. The desired changes are made without requir- ing the user to hnow in advance the unique identifi- cations of all of the records involved in the trans- COMPUTER INTERFACE Command Language 5.3.2. - 213 - Approved For Release 2000/05/30 : CIA-IneREP3952A000100050001-7 Approved For Release 2000/05/3g?U-TRIDP78-03952A000100050001-7 action. The second concern of the user is to search the files. The CHIVE command language provides basic search operators -and logical linkage. The available operators are: and, or, not, greater than, less than, and equal. In addition, a "scan" operator allows searches for a contiguous string of characters in a value field. Notation is provided for specifying that the character string can be in any position within the value field and in some relative position. For example, it may be desired: to find all occur: ences of the character string ACZN22 no matter liere it occurs in the value field or only when it is the first six characters of a value. Another capability provided by the command lan- guage is to allow indirect searches. Here we mean that the user can specify the results of one search to be used as arguments in a subsequent search. An example would he: "What universities or colleges were attended by engineers working at radar plants in Country A?" Atfirst search is necessary to deter- COMPUTER INTERFACE Command Language 5.3.2. - 214 - Approved For Release 2000/05tRizty?-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA- VON" -03952A000100050001-7 mine the names of engineers associated with radar plants in country A. A second search can then be made to associate these engineers with schools. The command language allows the researcher to specify that the names of the engineers be automatically used as input arguments to the second search. Thus the problem involved with routing an intermediate machine output to an information analyst, setting up a second search, and then submitting it to the system are elimi- nated. New files can be created by preserving the results of extensive searches of large document files. In addition, the capability of restructuring records is provided by the HIT processing commands of the CHIVE language. These commands allow a user to manipulate records after they have been found to satisfy search criteria and before they are transmitted to an out- put file. The control available permits saving for output all or specified portions of the original .records. In addition, computations can be specified and the resulting values can be appended to the new COMPUTER INTERFACE Command Language 5.3.2. - 215 - Approved For Release 2000/05/30 : CIA-FSEEREV3952A000100050001-7 Approved For Release 2000/05/WWRDP78-03952A000100050001-7 output records. The resulting files can in turn be searched and updated in the same manner as any other system data file. The command language also governs printing and displaying data. Section 7.11. describes output proces- sing in detail and Appendix 7.C. shows samples of the types of reports provided by the EDP System. To specify a report it is only necessary to use the print command and then to state the name of the file, the sort sequence, and the output format desired. The format type includes such parameters as number of lines. per page, width of printed portion of page, top and bottom literals, pagination, etc. The current report capability is felt to be adequate at this stage of the CHIVE development. Additional features will be pro- vided only after actu-1 need is established in an operational environment. r 0 0 FILE DEFINITIONS AND THE EDP FILE ANALYST The CHIVE command language allows manipulation of data in existing files and also permits a way of creating sub-files which can in turn be processed by the EDP system. These features directly concern COMPUTER INTERFACE - 216 - File Definitions Approved For Release 2000/05/?kkii-RDP78-93%52.A000100050001-7 Approved For Release 2000/05/30 : Clagq-03952A000100050001-7 the information analyst. The tasks and procedures associated with changing file definitions and adding new files to the system are the responsibility of the EDP file analyst. The CHIVE EDP programs are controlled by external descrip- tions of the data files to be processed. The data descriptions taken collectively are called File Format Tables. Each table describes a file and its consti- tuent elements. If it is desired to process files other than those currently defined it is necessary to add new table descriptions to those already in existence. The File Format Tables contain all the informa- tion about an item that is required to process it. Included are the terms allowed in a record, term groupings, which terms are used as identifiers, addres- sing parameters, occurrence data, bow stored, and con- tent legality parameters. Extensive revisions can be made to the tables. In addition to adding new files, terms can be added to or deleted from an existing file. Legalities can also be changed. It is important to note that revisions of this type do not require any COMPUTER INTERFACE - 217 - File Definition Approved For Release 2000/05/30 : CIA-8eeRaiD3952A00A19035.0001-7 Approved For Release 2000/05/3(REBODP78-03952A000100050001-7 maintenance to the EDP programs. The external file definition concept requires a special maintenance system. There are two main functions involved: the first concerns generating file format tables, and the second involves restruc- turing existing file data records. File format tables are generated from descriptions supplied by file analysts. Some types of table revision will result in producing a table that is inconsistent with the existing file. In this case, the existing file is processed so that its item structure reflects the new table revisions. After this Step it is possible dm for the EDP system to operate correctly on the revised file with the new file format table. 5.3.4. SUMMARY The CHIVE EDP System can be viewed by the informa- tion analyst as a tool for manipulating data. In order o get at this information, he must learn the rules and Procedures attendant with the CHIVE command Ian- juage. Forms will be designed to aid and guide in transcribing the commands. The EDP system is designed COMPUTER INTERFACE Lillitaw. 5 . 4. - 218 - Approved For Release 2000/05/Ackty-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : Clk8g8-03952A000100050001-7 to allow random transactions which will obviate to some tent the scheduling of input to the machine. Output will be 7;ufficiently identified so it can be routed -loacI7. to the information analyst. It is recognized that the interaction of the man and machine is never smooth. For this reason two remote consoles will be included in the initial system. These consoles will permit experimenting, in an operational environment, with the problems of direct communication between the information analvst and the EDP System_ They should be 'helpful in expedi- ting icarch processing, reducing pa;-)r outpat volumes and in simplifyin the problem of routing request,73 to and from. the computer. COMPUTER INTERFACE Summary 5.3.4. - 219 - Approved For Release 2000/05/30 : CIA-SIBERBT03952A000100050001-7 Approved For Release 2000/05/30 : CIRW8-03952A000100050001-7 Aopendix 5.A. THE ORGANIZATIONAL PROBLEM This appendix describes the reasoning Ifihich led CHIVE to recommend the geographic organization of input and retrie -al personnel with additional topical specialization or certain priority countries. In it, various alternative organizational configurations are described and their advantages and disadvantages dis- cussed.. A formal report on the CHIVE Indexing Experi- ment which led to some revision of the organizational concept recommended here--namely, the removal of the coding responsibility as such from the information analyst's area of concern--will be published in the near future as an additional appendix to this Phase II moo Report. 5.A.1. ORGANIZATIONAL OBJECTIVES mart In considering the overall problem of how best to organize the functions to be performed and personnel to carry out these functions in a future storage and Jaw retrieval system, it appears logical to address oneself first to the primary objectives of the contemplated ORGANIZATIONAL PROBLEM - 221 - Objectives Approved For Release 2000/05/30 : CIA-SEriteT03952A606100050001-7 Approved For Release 2000/05/3g?KTRIDP78-03952A000100050001-7 system and to derive from these a subset of organi- zational-or management requirements which, if met, could assist in the attainment of the ultimate system goals. A particular organizational and management framevork, of course, cannot by itself insure the achievement of a system superior to that now in existence. On the other hand, it is equally clear that despite all the advantages of EDP hardware (including stored program logic, speeds, etc.) and new developments in the information retrieval state- of-the-art, these tools alone are as yet insuffi- cient to provide any major breakthroughs, and indeed have inherent disadvantages as well as advantages which, in the final analysis, must be taken into account. For this reason the efficient organization and employment of personnel takes on added significance. In fact, it may well determine whether a major step forward is possible. The principal CHIVE system design objectives which have been discussed in some detail in earlier ORGANIZATIONAL PROBLEM Objectives 5.A.1. - 222 - Approved For Release 2000/05Atek-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIWI?F/8-03952A000100050001-7 documentation may be summarized for the purposes of this discussion as follows: Objectives derived from user needs 1. Broder document coverage 2. Increased indexing specificity . More exhaustive indexing 4. Capability to answer more complex questions S. Reduction of retrieval time 5. Single-service point Common system vocabularies 3. All-source output capability ObAectives d.erivoi needs Micro-storage medium 10. Increased transcription speeds 11. Increased file utilization 12. More efficient use of available manpower w/o unacceptble degradation of system performance 13. Reduction of index and support file query time 14. Reduction of manual labor involved in preparing 'system outputs (research aids, acquisition lists, etc.) ORGANIZATION-\L PROBLEM Objectives - 223 - Approved For Release 2000/05/30 : CIA-Faha1t3952A000100050001-7 Approved For Release 2000/05/30S:Egla1/4411DP78-03952A000100050001-7 15. Improved communication with customer 16. Increased index record lengths so as to reduce file proliferation 17. Improved evaluative tools for management Some of the above are themselves organizational objectives for CHIVE, e.g., items 6, 8, and 15. Other listed objectives, if they are to be achieved, have implications at least for the organizational side of the total system design effort as well as for other design tasks. Combining the former with some deductive reasoning about the latter which is oriented towards the personnel and. management impli- cations thereof, it is possible to form a list of what might be called CHIVE organizational require- ments. This list follows, and it is important to this discussion since it sets the goals in terms of which various alternative organizational configur- ations are compared. Oblectives Influencing CHIVE Organizational Structure I. Specialization with minimum processing ORGANIZATIONAL PROBLEM Objectives 5.A.1. - 224 - Approved For Release 2000/052tAtft-RbP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIRW8-03952A000100050001-7 duplication Encourage specialization on the part of information analysts to the extent possible so as to improve the quality of inputs and relevance of outputs to customer needs. At the same time minimize duplicative processing activities--i.e., multiple readings of the same documents, expen- diture of intellectual time in term selection, transcription, etc. 2. Minimum customer contact points Facilitate direct interface between the user seeking information and the information analyst -aost knowledgeable on the problem. Provide a coordination capability where required, but organize analysts so as to reduce need for same. 3. All-source service from any point Organize system so that requester, if he so desires, can receive all pertinent information from whatever source that bears on his search problem. 4. Close comTrcunication between input and query handlers ORGANIZATIONAL PROBLEM - 225 - Objectives Approved For Release 2000/05/30 : CIA-SEITRET03952ACf00406050001-7 Approved For Release 2000/05/M:CaRTRDP78-03952A000100050001-7 Enable person querying system store to be thoroughly acquainted with processed inputs. Similarly, keep indexers informed of requests being handled by the system. Ideally, input and query processors should be one and the same. 5. Close communication between system operators and users Operators should be fully cognizant of intel- ligence needs and priorities of research analysts. This is especially important in theCIA appli- cation where the breadth of customer subject interests and responsibilities and the volume of the data base are so large as to prevent equal attention being given to all subjects or source's. 6. Document control--first priority The primary responsibility of the central reference system, i.e., to establish a basic retrospective search capability for all positive intelligence documents of immediate or potential interest to the Agency, must not be diluted by the additior :rf special tasks which, if permitted ORGANIZATIONAL PROBLEM Objectives 5.A.1. - 226 - Approved For Release 2000/05gtAlfk-RDP78-03952A000100050001-7 VMS Approved For Release 2000/05/30 : Clardg-03952A000100050001-7 to grow unrestrained, would prevent the achieve- ment of fundamental goals. Elemental priorities must be established and adhered to, and personnel organized in a fashion to bar the drift toward serving- specialized user interests. 7. Job satisfaction Morale of the central reference personnel must be maintained to reduce turnover and attract high-quality persons to the staff. Information analysts positions should afford opportunities for career growth and offer sufficient intelleC- tual challenge to interest professional employees. 8. Flexibility in personnel allocations New processing requirements and shifts in intelligence interests and priorities should not unduly upset the central reference operations and organizational structure. Requirements for retraining should be minimal if standard vocabu- laries, input, and retrieval systems prevail throughout CHIVE. Ideally the shift of one or more persons to more pressing tasks would not completely destroy an existing activity assuming ORGANIZATIONAL PPOBLEM - 227 - Objectives Approved For Release 2000/05/30 : CIA6113M103952A60.0100050001-7 25X1B Approved For Release 2000/05/3V:WRDP78-03952A000100050001-7 the assignment of more than one person to a given subject or geographic area to begin with 5.A.2. ALTERNATIVE FIRST-LEVEL ORGANIZATIONAL CONCEPTS Keeping in mind the above-listed objectives for organizing the central reference personnel and acti- vities, what kind of organizational configuration would appear to offer the best hope of meeting most if not all of these aims? In this section we will review some of the possible alternatives without necessarily considering all variant approaches which might theoretically be envisaged. The focus here will be on the initial; or first-level, organizational breakdown. In a subsequent section we will address the problem of how to manage activities within the rough organizational framework selected. 5.A.2.1. Alternative A - Retention of Present Configuration Under this concept the existing structure of OCR would be accepted as is. Input and querying would be organized by subject (Biographic Register, and Intellofax), by ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.1. 228 Approved For Release 2000/0SPERM-RDP78-03952A000100050001-7 -4 4 apt Approved For Release 2000/05/30 : CIRWW8-03952A000100050001-7 subject within source (Special Register), and by information carrier (Graphics Register and Map Library), Specialized. EDP systems could be developed .which would be tailored to the needs and desires of each Register or Division which might well employ different vocabularies, input and output processes, document storage media, etc. Alternatively, all systems might be required to adopt common file formats, dictionaries, pro- grams, document storage and delivery systems, and. so forth in order to simplify management understanding and control of processing activities and reduce design costs. The principal advantages of this approach are operator and management familiarity with ad- ministering such a system, the availability of trained personnel and established operational procedures, the avoidance of any drastic reshuf- fling of personnel and slots with all the atten- dant problems associated therewith, and the assurance of continuing a level of system per- ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.1. 229 Approved For Release 2000/05/30: CIA-16150M3952A000100050001-7 Approved For Release 2000/05/3?WRDP78-03952A000100050001-7 formance at least as high as that which it now. obtains. In summary, the retention of the exist- ing configuration is attractive because it would be the easiest to implement, and because we know it works even if the efficiency and quality of its performance is perhaps less than might be desired. The major reason for not following this route is that, while the risks are less, the system will always be constrained by the organizational struc- ture within which it must operate. Thus the potential for real improvement will be limited. Specifically, it would be impossible to make any real progress toward achieving objectives 1-3 above and limits severely what can be accomplished on objective 8. Redundant reading and analysis of collateral documents could scarcely be avoided and the trend toward all-source information files might foster duplicative processing (already initiated by FIB's exploitation of Comint materials) in the SI area as well. Semi-duplicative document repositories, such as now exist in FIB, BR, the ORGANIZATIONAL PROBLEM First-Level ConceptS 5.A.2.1. - 230 - Approved For Release 2000/MtRON-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : ClaW3-03952A0001000500017 Intellofax System, and to a minor extent GR, would probaly persist because of the difficulty of identifying in advance which repository will choose to keep a given document. Customers seeking to exploit all the subsystems would still be faced with the necessity of interrogating each system separately unless an inter-system reference group were provided or the system contacted assumed the responsibility of querying all others. Either of the latter potential solutions, however, would interpose request "interpreters" between the customer and the ultimate respondent with consequent ill effects to the communication process. In brief, while Alternative ?i is appealing because of its familiarity, its inherent disad- vantages are sufficient in number to influence a search for something better if such can be found. 5.A.2.2. Alternative B - Single, All-Source Document Re -ieval System: Separate Biographic Information Facility Bet.een the extremes of a completely central- ized, all-source, all-topic storage and retrieval ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.2. - 231 - Approved For Release 2000/05/30 : CIA-F8IBEREI3952A000100050001-7 Approved For Release 2000/05/3gcelkIRDP78-03952A000100050001-7 system and the existing decentralized configur- ation of OCR many variations and alternative combinations can be conceived. That which has attracted the most attention perhaps is the concept of merging Intellofax, the Special Regis- ter, and the Foreign Installations Branch but leaving the Biographic Register as a separate activity. Proponents of this approach (some of whom would also except FIB from the merger) gen- erally point to the "unique character" of the BR operation, its "analytical" responsibilities, its production of finished intelligence, the fact that it is not a document retrieval system at all but rather an inforTe,ation file, and so forth. Most of those favoring this compromise ap- proach are somewhat vague on the organizational details. Some, apparently, would establish an all-source BR, removing the responsibility for personality control of Comint materials from the conjoined Intellofax-Special Register operation, ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.2. - 232 - Approved For Release 2000/OgNRRA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : C1W-W8-03952A000100050001-7 Others would not oake this transfer of responsi- bility arguing, inter alia, that most BR customers are not cleared for Comint anyway. Some would retain the all-source FIB system as a separate file as weIi, presumably with installation index- ing remaining a part of the Intellofax-SR document input activity. The redundant analysis of docu- ments common to each of these systems has either not been considered by those who have recommended this approach or has been aecepted as a necessary evil. Of those favoring Alternative B or some vari- ation thereof, most do so in the belief that there are indeed advantages to be gained from the all- source, approach, integrated indexing, system standardization across OCR, common vocabularies and other reference tools, and other CHIVE goals. Most would, therefore, adopt CHIVE 's system recom- mendations if biographic data handling at least were excluded. 7:71:1at appears, however, to disturb people the ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.2. -233 - Approved For Release 2000/05/30: CIAMLF1715103952A000100050001-7 Approved For Release 2000/05/30SMOMP78-03952A000100050001-7 most about the prospect of including biographic intelligence in a centralized system is the index transcription problem. It is pointed out first of all that, while the necessity for filling out transcript sheets has long been accepted by Intello- fax and SR analysts, it would not be readily ac- cepted by BR personnel who, in recent years, have employed a file system (sometimes referred to as a "Collectanea" by Jocumentalists)* which requires no transcription at all. Second, there is the fact that any transcription requirement, no matter how limited, would diminish the number of person- ality references which could be processed by BR since it would necessarily add to processing time. Third, there is the argument, freluently expressed, that BR's need for multiple access points to per- sonality data has fallen off steadily over the *This term refers to any file system that used the general approach of lifting sections from a single source document, reproducing these excerpts, and physically filing them under each of the categories or key words of interest. ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.2. - 234 - Approved For Release 2000/058RRtit4-RDP78-039521000100050001-7 Approved For Release 2000/05/30: Cli1W78-03952A0001000500017 past several years following the assumption of eaAmunity responsibility for political person- alities. 7:rAy have more than name control over files, the reasoning goes, if the majority of rc!quests are for specific named individuals? The transcription argument might, indeed, just.fy leaving BR outside the central system concept were it not for the fact that following such a course helps none at all to resolve BR's storage and retrieval problems. Examined real- istically, it appears clear that there are only two fundamental ways of processing biographic or any other kind of information: (a) by creat- ing an index to documents containing the pertinent information (which index is then screened prior to the recovery of the documents themselves) or (b) by filing (and, if necessary, reproducing) the documents under the terms which constitute the desired search parameters (i.e., by estab- lishing a "self-indexed document collection). If the choice is totake the index path then ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.2. - 235 - Approved For Release 2000/05/30 : CIA-FeleRF03952A000100050001-7 Approved For Release 2000/05/3SECRUkDP78-03952A000100050001-7 certain elementary requirements must be met if retrieval from the system is to be successful. In the case of large personality record col- lections it means the index must carry sufficient identifying information about the personality to enable the searcher to distinguish between personalities bearing similar names. The more identifying information extracted from the docu- ment the better, but at the price of increased: transcription time. Alternatively, the more ab- breviated the index the less the transcription burden, but at the cost of more irrelevant docu- ments retrieved. The "collectanea" (or self-indexed document file) approach offers the user a reverse set of advantages and disadvantages. On the one hand, it virtually eliminates the function of having to transcribe words from documents. On the other hand, it vastly increas(s he physical storage require- ments of the syst A by virtue of the fact that each document must be multiplied by as many file ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.2. - 236 - Approved For Release 2000/05gt:RrEyt-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIASM-03952A000100050001-7 25X1A 25X1A headings as one chooses to store the document Hander. Since no system has unlimited space, this usually means that the means of access to the document collection are severely limited in comparison with document index systems In addi- tion, the filing problem is exaggerated by the xplorion of LThe original document population (witness I.Ez'H3 assignment ofil file ciers fulltlme to Its; central biographic card file andE cLerIc U dossier system)- The point of this brief detour into the of -.,),ansanality data handling is to make clear that nothing is really gained by leaving OR oatsi6e the central system framework unless it has airaftl been concluded that biographic data -will not bo controlled by an index per se. r!',ven this '...,;oulj not necessarily dictate the ei:clusion ryf us , 1.ra,inie process in, 31.11A-,::IL would be .oerfcIcy possible for the input analyst, after indexing the remainder of the document's content, Is hae the document or selected pages ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.2. - 237 - Approved For Release 2000/05/30 : CIA-MR6103952A000100050001-7 Approved For Release 2000/05/11EMETRDP78-03952A000100050001-7 therefrom reproduced and filed (,,n hard copy or microimage form) under the personality names of interest. If, on the other hand, the decision is to index biographic information then there are certain very real benefits in integrating this index activity with the representation of other subjects discussed in documents. for the remaining arguments deployed in the cause of keeping BR outside the integrated processing activity, they have little bearing On the manner in which biographic data should be ored and retrieved. Rather, they relate to le a_alytical functions to be performed, i.e., interpretation, correlation, synthesis, etc., after the raw material has been recovered from the files. Admittedly this intellectual process could be carried out by a separate group altogether, as indeed often occurs when a customer (e.g., a scientific intelligence analyst) chooses to review and interpret the basic documentation himself. But it can also be performed, perhaps equally well, ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.2. - 238 - Approved For Release 2000/05/?keik-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIRW8-03952A000100050001-7 Lrr persons who also index and retrieve biographic information. 'Thichever path is chosen it need not affect where and how documents are processed. 5.A.2.3. Alternative C - Co-located Organizational 'oafiguation A radically different organization concept from those discussed thus far, one which deserves at least brief consideration, is the notion of decentralizing document processing in the Agency by di persin(j the activity amongst the research and production components. Among the arguments for upgrading the so-called "analyst files" versus attempting to improve the central reference system are the folloing: - Analyst files will continue to be main- tained whatever is done centrally. Since they are a major information retrieval resource vihy not make them even more effective and efficient? - Providing analysts with manpower support in the form of information assistants hysically co-located with research per- sonnel in the production offices would relieve the analyst of most of his file maintenance problems and enable him to devote more time to research. ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.3 - 239 - Approved For Release 2000/05/30 : CIA-FEtant3952A000100050001-7 Approved For Release 2000/05/3gUaDP78-03952A000100050001-7 - Analysts could more readily control what goes Into the files thus reducing input chaff and providing semi-evaluated re- trieval. - Full-time information specialists could' index more material than analysts can process into their files today thus im- proving the breadth and depth of coverage. In the decentralized as in the centralized system approach, it is possible to think of many wayT, in which the processing activity might be organized. The following, however, are perhaps the most logical alternatives: a. Decentralized input and files/central directory of files Under this approach OCR would virtually disappear with the exception of the Library, FDD, and possible the Graphics Register. Analysts would continue to process materials into their own files but might be provided some machine assistance in the areas of file manipulation, storage, and reproduction. In addition, a master profile or directory of analyst files would be created and maintained - 240 - Approved For Release 2000/0*EZRW-RDP78-03952A000100050001-7 ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.3. Approved For Release 2000/05/30 :gictIA3P78-03952A000100050001-7 at some central location. Analysts with a search problem would consult the directory, determine which file(s) to peruse, and then either exploit the file directly or work through the analyst who maintains the file. Personnel formerly attached to OCR could either be assigned to the research analysts as information assistants where they would perform the bulk of the input and retrieval activity, or the research analyst population might be increased by converting the slots to intelligence production positions. ago b. Decentralized Input/Centralized Files tool ftig maintaining decentralized analyst files, mow research analysts and/or their information egg assistants would be required to transcribe their indexing in such a fashion that a This scheme would be much the same as the above in that input processing would still be performed on a decentralized basis. The difference would be that, in addition to warmi0 ftgir onwii ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.3. - 241 - Approved For Release 2000/05/30 : CISEEIRET5-03952A000100050001-7 Approved For Release 2000/05/30EKTIDP78-03952A000100050001-7 record thereof could be passed to a central storage and retrieval facility. Similarly, reproductions of the documents they wished to store or the pertinent cita- tions thereto would be sent to central storage. Adoption of this approach would' greatly increase search specificity over the directory technique and greatly simplify the problem of gaining access to the data files themselves. c. Decentralized input and files for select subjects/centralized input and files where interests overlap This system is perhaps best represented in the real world by NSA where files of restricted interest are co-located with the most appropriate customer offices, while files of interest to many are maintained centrally. d. Centralized input and files/information specialists co-located with research components This system would continue the central refer- ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.3. - 242 - Approved For Release 2000/04WERGIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : AC:078-03952A000100050001-7 ence activity without prejudice to decen- tralized analyst files, but representatives of the central system would serve on permanent or rotational assignments in the customer offices. Their function would not be to index material for analysts, nor to actually search and retrieve material from the central system, but to improve communications between the analysts and the central storage and retrieval operation. They would provide advice to analysts on the reference services available to them, transmit their queries to the proper components, identify unnecessary and/or duplicative data files, inform the central service of current intelligence priorities and anticipated retrieval needs, and in general insure that both sides of the house achieved a full understanding of each other's problems, capa- bilities, and requirements. There is much that is attractive about all ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.3. - 243 - Approved For Release 2000/05/30 : afr-CRW78-03952A000100050001-7 25X1A Approved For Release 2000/05/46GMRDP78-03952A000100050001-7 the above alternatives primarily because all Provide better user definition and control of what the Agency should be retaining in its record collections, and because all provide a means for the analyst to exploit potentially useful files maintained by others. With the exception of stem 3ad., however, which appears to offer some significant advantages which might well be tested on a limite( basis, all suffer from one or more of the following disadvantages which - sufficiently serious to recommend the rejection of the decentralized organizational concept as a practical solution: re Elimination of part or all of the existing central processing activitivies would inevi- tably give rise to increased record keeping by Agency analysts. Indexing by these analysts would be highly duplicative and inefficient because of overlapping interests amongst Agency components. Even today the duplication of analyst file activity is. sufficiently widespread to cause some ? to seek ways in which the situation might be ameliorated. In a recent study* one re- search analyst reported that "the files of The Analyst's Inbox in the DWI Area: Help or Hindrance?, 30 June 1964, OTR/IPC, Confidential. - 244 ORGANIZATIONAL PROBLEM - Approved For Release 2000/05/igdre-RDP78-q3 A. 2 . 3 . V6ib *166514/2P:='" Approved For Release 2000/05/30 : gk-kirTID78-03952A000100050001-7 several offices within OCI and ORR practi- cally mirror each other, if not in totality, then at least in certain subjects." Among the reasons for this situation, the same analyst observed, is the failure of manage- ment to properly define the exact responsi- bility of the analyst beyond his geographic area, the necessity for the analyst to be aware of the "big picture," fear of requests from Agency officialdom whether they fall within the analyst's assigned mission or nor, physical distance from other potentially useful files, etc. Whatever the truth of these remarks (and all were noted during the mow CHIVE Fact-Finding Survey of the DD/I), any enlargement of the analyst's filing responsi- bilities would result in a corresponding increase in duplicate files. mr - It would be virtually impossible to establish and maintain inter-analyst consistency in indexing, and to enforce adherence to standard rules and practives. The many components in- volved, each responsible to a different line of command, would make coordination and management most difficult. - Analysts regard file maintenance as a necessary evil. Any suggestion that they expand their input activities, especially if it requires them to prepare index records in a fashion mai which can be "captured" for storage at a central location, would meet with great resistance. - Analysts select only a small percentage of ono incoming documents for filing. This fraction of collected intelligence infor- mation ordinarily reflects a current ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.3. - 245 - Approved For Release 2000/05/30 : C1EAM78-03952A000100050001-7 Approved For Release 2000/05/36ECRE-RDP78-03952A000100050001-7 problem bias or that material pertinent to an analyst's production assignments for the coming year. Moreover, some information which would be filed by an analyst with less experience on the job would be ignored by the more senior type who has already stored such information in his head. Unfortunately, the analyst's cranium, although a well-recognized part of the Agency's institutional memory, is not easily accessed by information seekers and is lost when the analyst leaves the Agency. - Analysts almost universally state that they I:'ant and need a central system for retrospective search and file back-up. They do not feel that their own files, nor even the sum of all files of all research components even if they could be made readily available to them, would fully satisfy their requirements. - The possibility of co-locating select central reference files with the primary users, as suggested in 3.c. above, is practical only for intelligence organi- zations having clear demarcations of sub- ject and area responsibility. Regrettably, no such pattern prevails in this Agency, as pointed out in the study referred to above. - Agency reference responsibilities to other USIB components, whether imposed by DCL) directive (e.g., biographic) or the result . of tradition and historical precedent, could-he met only with great difficulty if the centralized file concept were abandoned. Interface problems of inde- ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.3. - 246 - Approved For Release 2000/05W0W-RDP78-03952A000100050001-7 imp Approved For Release 2000/05/30 :WligP78-03952A000100050001-7 scribable complexity would inevitably arise. In summary, there appears to be no accept- able alternative to a central reference system for a consumer population as large and complex as that represented by the DD/I and other CIA and non-CIA components. 5.A.2.4. Alternative D - Centralized, Geographically Organized Configuration Assuming the organizational objectives listed on pages 224-227 are indeed the controlling parameters in selecting a management framework for a future information storage and retrieval system for the Agency, it is difficult to con- ceive of any better way of organizing the person- nel involved than by grouping them initially by geographic area. While this would not overcome all operational problems that can be envisaged, of all the systems considered it comes nearest to meeting the requirements outlined above. In a geographic organizational arrangement there would be, perhaps, five major geographic ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.4. - 247 - Approved For Release 2000/05/30 : Gi&CF&K78-03952A000100050001-7 Approved For Release 2000/05/3EalliTRDP78-03952A000100050001-7 divisions reporting directly to a single manager, presumably at the Assistant Director level. Most of the existing central reference repositories (i.e., BR, FIB, SR, and DD) would be abolished and their personnel transferred to the new geo- graphic components. Previous area assignments would be taken into account in relocating per- sonnel. Documents would be disseminated to the geo- graphic divisions by an external dissemination group which would also handle, dissemination to the research offices. These documents would in- clude all materials of whatever classification, format, or mode of presentation. International, documents (those dealing with subjects or events occurring in more than one country) would be routed to each of the geographic desks concerned when the application of area expertese in the indexing process seemed justified by the char- acter of the subject matter dealt with in the ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.4. - 248 - Approved For Release 2000/05aft*frRDP78-03952A000100050001-7 Approved For Release 2000/05/30 :WP78-03952A000100050001-7 document. The majority of documents, however, would be processed by one desk only. A single master file would be maintained of all documents indexed by the central reference system. Most requests would be levied directly on the geographic unit having responsibility for the area of concern. Occasional requests would have to be coordinated between the divisions when more than one country was involved, but this would be the exception rather than the rule. The respondent, under the new configuration, would be familiar with reporting from all sources on the matter of interest to the customer, and could thus insure that the data retrieved reflected the full response potential of the system. The proposed configuration would lose the advantage of source specialization in processing and would pose occasional problems of geographic overlap in document indexing and query coordination. However, these disadvantages are not felt to be ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.4. - 249 - Approved For Release 2000/05/30 : csEeM78 -0 3952A0 0 01 00050001 -7 Approved For Release 2000/05/3EMTRDP78-03952A000100050001-7 serious. The system would come very close to achieving all of the organizational goals set forth earlier as the following review of said objectives demonstrates: a. Processing duplication There would be a minimum amount of re,4. dundant reading and expenditure of intel- lectual effort in input processing since the majority of documents would be com- pletely processed by the nerson to whom they were sent. While the international document problem will arise, there are fewer international documents than there are documents dealing with multiple sub- 'pm TIF jects (i.e., persons, organizations/instal- OP lations, commodities, etc.). Nor must Infor- mation Analyst specialization necessarily be surrendered. Instead of concentrating on biographic, installation, or other data; they could specialize in certain topic areas of interest to intelligence--e.g., military, ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.4. - 250 - Approved For Release 2000/05/30.? C1A-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 :%4P3P78-03952A000100050001-7 economic, political affairs, etc.--within the country to which they are assigned. In addition, the extant duplication of document files would be eliminated with concomitant benefits in terms of storage space, reproduction loads, and filing require- ments. B. Customer Contact Points Analyst inquiries normally relate to a particular geographic area of the world, although the information sought is frequently diverse in character and not restricted to any particular collection resource. Under the configuration proposed, there would ordi- narily be no need for the requester to inter- rogate more than one component of the system since the organization of service personnel would mirror the manner in which user organi- zations are themselves organized, i.e., by topic within country. ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.4. - 251 - Approved For Release 2000/05/30 : CRelZPV8-03952A000100050001-7 Approved For Release 2000/05/39EaRURDP78-03952A000100050001-7 C. All-source service One of the principal advantages of geo- graphic organization is that, in addition to the establishment of all-source files, , there is an extra benefit to be derived from the bringing together of information analysts who have specialized source hackground. This pooling of knowledge will make for more in- formed reference personnel and will help remove gaps and ambiguities in the data files and authority lists developed in separate source environments. d. input-output communication The geographic organization of central reference personnel does not, in itself, assure or encutbr communication between input and. query handlers. Ra-,:her, this is affected by the communication processes built into the system, and by the extent to which personnel specialize in the various functional areas of innut and output processing. These ORGANIATIONAL PROBLEM First-Level Concepts 5.A.2.4. - 252 - Approved For Release 2000/0ki5130-? CLA-RDP78-03952A000100050001-7 =k Approved For Release 2000/05/30 : gcREF'78-03952A000100050001-7 matters will be discussed in the next section. e. Operator-user communication 7fhile it may seem that geographic organi- zation oer se offers no inherent benefits over the present central reference configur- ation in terms of insuring better communi- cation between information and research analysts, in fact the information analyst in the proposed system, by virtue of the fact that he has access to a wider variety of sources and shares a subject/area assignment similar to that of his research counterpart, should be more cognizant of the later's resources an6 problems and, therefore, be able to offer him better service This, to be sure, is not enough, given the separate physical and operational environments in which each operates, and for this reason experiments such as locating certain Infor- mation Aai.ytr3. in the research components ORGANIZATIONAL PROBLEM ,First-Level Concepts 5.A.2.z. - 253 - Approved For Release 2000/05/30 : CIARE1.pIVI3-03952A000100050001-7 Approved For Release 2000/05/3.FNRDP78-03952A000100050001-7 should be tried as well. Processing priorities Geographic organization at the upper management levels cannot prevent information persolinel from .being assigned to respond to na.crow inteiests. vithin the geographic f3iy.isons an tiranizationai structure reflect- ing p:;Jocesing concerns (eq., document con- L.:col vs. ocial file projects) might help doing what, but since person- always n T,hifted around it is manage- . 1: eontrol which, in the final analysis, dtermine the direction and continuity c);f: jbL satisfacLion it would appear that the system ptoposed a richer and more meaningful environ- tont or Lhe information specialist than now available to him in the majority cvf: 6C1. registers. He would not be assigned ono :ianction only as, for example, the input ORGANIZATIONAL PROBLEM First-Level Concepts - 254 - Approved For Release 2000/05/3SECCREPDP78-03952A000100050001-7 Approved For Release 2000/05/30: gagF78-03952A000100050001-7 analyst in the Intellofax System 7 he would have access to a greater variety of docu- mentary materials; he would be able to specialize in a substantive area of intelli- gence concern; he would have contact with users of the information store and thus gain some appreciation of the problemto which his effort was addressed; and, not least important from the Agency's point of view, he would be better able to assume a research position if the opportunity arises for him to make such a move--as it often does. h. Flexibility Common system standards and procedures across the geographic division, as well as the increased. availability Personnel on any geographic area should lessen the problems entailed in re-allocating personnel to accom- modate changes in user needs. In a sense, the bringing together of, all persons working ORGANIZATIONAL PROBLEM First-Level Concepts 5.A.2.4. - 255 - Approved For Release 2000/05/30 : CliglalE78-03952A000100050001-7 Approved For Release 2000/05/36EMTRDP78-03952A000100050001-7 on the same country--persons now scattered amongst the various OCR registers--is ana- lagous to the establishment of a medical clinic composed.of specialists in various suject areas versus the continuation of individual medical practice. The assemblage of these various skills increases overall flexibility and assures the highest quality services Before concluding this section of the discus- sion, some additional facts may be worth noting. In a report to the Critical Collection Problems Committee of USIB, the DireCtor/SCIPS observed that "information processing activities, as contrasted with collection or research, generally are not oriented to area or country organization." How- ever, he went on to point out, "most of 'Os '4CY information handling activities surveyea [b SCIS] are concerned with peripheral descripti data rather than the substantive content of the informa- tion items and are, therefore, organized on a ORGANIZATIONAL PROBLEM First-Level Concepts 5. A 2 . 4 . - 256 - Approved For Release 2000/05SSCF4B1i'-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 :511f13P78-03952A0001000500017 functional basis rather than a geographic coverage basis." A different situation exists, he said, "where the process is dependent upon substantive content such as . . . deep indexing." In the latter case "then the lowest organization level is more apt to be structured on a geo- graphic area basis, like collection and research activities are prone to be."* In fact there can be little doubt that the processing of multi-source documents by geo- graphically-organized personnel will work. Within our own agency we have the Analysis Branch of the Document Division organized on this basis to process inputs into the Intellofax System. As is well known, the system deals with a wide _variety of intelligence report series and other documentary media. BR and FIB are similarly ar- ranged and, though they confine themselves to restricted subject areas, are faced with an even 25X1A ORGANIZATIONAL PROBLEM Filst-Level Concepts 5.A.2.4. - 257 - Approved For Release 2000/05/30 : CEIMBP8-03952A000100050001-7 Approved For Release 2000/05/3UMTRDP78-03952A000100050001-7 greater diversification of documentary inputs, including books, periodicals, newspapers and even photos. Outside CIA there is the DIA Jocument storage and retrieval system which reeieves inputs from all USIE agencies and inclexes peraons, organizations, locations, as well as other subj cts. Thus, the issue is not ,:ilhether input processing organized on geographic lines will work, or whether a multiplicity of doctiment types can be handled by a single organization-, but what the tradeoffs are versus some other approach to the problem. 5.A.3. ORGANIZATIONAL ALTERNATIVES WITHIN A GEO- GRAPHIC DIVISION The preceding section was addressed to the issue of the first-level organization of the central proces- sing activity. The problem, however, does not end Iere since, even if the geographic division concept is excepted, each-geographic division would be so large that some division of personnel into more manageable Atinistrative units would be required. ORGANIZATIONAL PROBLEM Geographic Division. 5.A.3. 258 Approved For Release 2000/05/Seatii-RDP78-03952A000100050001-7 Approved For Release 2000/05/30: Agg78-03952A000100050001-7 Referring back to the organizational o7pjectives listed earlier it appears that if th (Jec7fadic arrangement makes sense as the first cat, it would likewise be the preferred approach at every succeeding management level within the organization until the country level itself is reached. For example, if it did not seem desirable to group persons by document source or by the subject matter in documents they were assigned to store and retrieve because of the effects this would have on processing overlap, interface with the customer, capability for providing all-source service, and so forth, then it would make equally little sense to permit them to creep back into the system, although at a lower level, if the effect on the system's performance was still the same. The geographic concept begins to break down, however, when the volume of activity (input as well as requests) on a single country is characteristically so great that a relatively large number of information analysts must be assigned to the same country. It would be possible, of course, to have both the docu- - 259 - ORGANIZi\TIONAL PROBLEM Geographic Division 5.A.3. Approved For Release 2000/05/30 : ClArFaE/T3-03952A000100050001-7 Approved For Release 2000/05/MCSKRDP78-03952A000100050001-7 ments as well as the requests distributed indiscrimi- nately amongst these analysts, but specialization is always advantageous if it can be achieved at minimum or no cost to other system goals. 3ince not enough is known at this point about the input/output traffic that can be expected on every country in the world, nor 74hat the manpower require- ments and constraints will be on the CHIVE system, it is impossible to state with any degree of certainty where a division of personnel within a given geo- graphic area will be required. For some areas it seems logical to predict that an analyst will have complete responsibility for a country, e.g., one of ' emerging states in Africa which is of little conseauence in international affairs and, therefore, engenders little in the way of intelligence reporting or analyst interest. On the other hand, many informa- tion analysts will be required for the larger countries such as the USSR and China and thus the organization! of these analysts !becomes a matter of serious concern. The most reasonable alternative ways of grouping ORGANIZATION\L PROBLEM Geograohic Division - 260 - Approved For Release 2000/055EICIEWRDP78-03952A000100050001-7 mot Approved For Release 2000/05/30 :4?9413P78-03952A000100050001-7 personnel assigned to one country would seem to be the following: .A-3.1. Organization by Document Source Adoption of this approach would mean that separate groups of analysts would be established for each major document category. These cate- gories might be the open literature, collateral intelligence reports, Comint and T/KH, etc_ The principal advantage to be gained from this method of organization would be the availability of personnel trained on a document source basis, It would have stronger selling power if the indexing systems used were to differ by source. noway the latter will not be the case. Its disadvan- tages are that almost every request would have to be coordinated among the different source-oriented units since customers ? would customarily want more than one source searched; Information Thalysts would operate in different worlds and none would ; have a complete picture of reporting in his :parti- cular area of concern; the tendency would be to ORGANIZATIONAL PROBLEM Geographic Division 5,A.3.3. - 261 - Approved For Release 2000/05/30 : Cl/SMEN8-03952A000100050001-7 Approved For Release 2000/05/30613aDP78-03952A000100050001-7 :qaintain separate rather than integrated all- nourec ;Alas: and the multiple service-point ?roblem would remain. On balance, it does not seem to be a desirable approach. 5.A.3.2. Orr-anization bv Function This syntem would allocate to certain infor- mation analysts assigned to a country the respon- sibility for indexing all documents received on their area, to others the responsibility for answering all requests on said country, and pos- saav to a third group the task of maintaining' "special project files and establishing and periodically updating information files consist- ing of summarized data about a particular persOn or group of pnrsons, installation, or activity.' The notion of distinguishing input from retrieval personnel is not a new one. Libraries have traditiOnally followed this approach in separating tile cataloguing from the reference librarian function. Many EDP-supported informa- tion retrieval systems have also chosen this ORGANIZATIONAL PROBLEM Geographic Division', 5.A.3.2. - 262 - Approved For Release 2000/05SECRE1'A-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 :kE1931gP78-03952A000100050001-7 route, the original 433-L system of SAC being a prime exanpie in which so-called "query special- ists" (as distinguished from "coding specialists" and "file modification specialists") were to - handle all searches directed against the system. The advantages of separating personnel by the functions named are the following: - It heightens the job satisfaction of those assigned to the output end of the activity, thus reducing personnel turnover and en- abling the system to recruit and retain higher-quality personnel. - Persons unqualified to deal effectively with requesters can be separated there- from with less embarrassment to management. Similarly, persons who have neither the ee interest, background, nor temperament to become effective indexers can be given ? assignments more in keeping with their qualifications. - New personnel can be trained more quickly if the job responsibilities are more narrowly defined. This will reduce the total amount of unproductive time expended by the system, a matter of no small signi- ficance if the turnover rate is reasonably high. - By encouraging specialization the quality of the system's performance is enhanced. It permits processing to go on undistObed 07GANIZATION2\L PROBLEM Geographic Division 5.A.3.2. - 263 - Approved For Release 2000/05/30 : CIAKEW8-03952A000100050001-7 Approved For Release 2000/05/305:FeaDP78-03952A000100050001-7 by request interruptions with some con- sequent increase in operational efficiency. By formally separating the document storage and retrieval responsibility from special and general-purpose information file main- tenance, system functions would be better defined and management would have a clearer picture of their investment in either area. This Nsould bar the often unnoticed drift of centralized retrieial systems toward. increased special f. se-building activities to the detriment of establishing a basic retrieval capability over the documents entering the system. The principal disadvantages of functional separation are: - Query specialists would be unfamiliar with the inputs to the system except those they retrieved as the result of searches levied against the files. As a result they would tend to lose touch with current intelli- gence reporting unless some mechanism was provided for them to read select incoming documents, review the product of the in dexer activity, or other. In addition, all persons who index documents as well. as answer requests retain a great deal of information in their heads which is never reflected in the index representation of documents. Subtle though this advantage' may be, it makes for more effective service to customers in ways too numerous to mention. And it is most difficult to acquire this knowledge through any other mechanism than participating in the input process itself. - Input specialists would have little ap- preciation of customer needs. Being barred - 264 - ORGANIZATIONAL PROBLEM Geographic Division. 5.2.3.2. Approved For Release 2000/05WRga-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : thAWFT78-03952A000100050001-7 from dealing with requesters, they would not know that subjects to stress in their input processing, nor how to distinguish the significant from the insignificant. - The inevitable tendency would be to con- sider the query specialist a cut above the indexer to the detriment of the input person's morale. As experience has shown in OCR, the request handler would be re- garded as having the more interesting job primarily because, having contact with users, he could understand better .that contribution the entire activity was making to the intelligence mission. Those indexers who were unable to make the change from input to request handling ? ? because no vacancies developed would ultimately take positions elsewhere. Those who remained would tend to repre- sent the less capable and imaginative until, ultimately, the entire input staff would take on these characteristics. This approach would conflict with the mode of operation in most OCR components. With the exception of the Intellofax 3ystem, most CYC.11. systems have chosen to have the same individuals handle queries who handle input to the files. Both the Special Register as well as sections of the Bio- graphic Register have actually operated for varying periods of time on a functional basis but reverted back to the integrated configuration. Certainly, the majority of experienced OCR staff members would prefer to have information analysts operate in both modes and would resist the other approach. - Peak request or input loads would require the temporary assignment of personnel to - 265 - ORGANI1ATIONAL PROBLEM Geographic Division 5.A.3.2. Approved For Release 2000/05/30 : ClaPflitE48-03952A000100050001-7 Approved For Release 2000/05/?gcgaRDP78-03952A000100050001-7 the duty which was not their prime responsibility. Indexers who performed the retrieval function would thereafter be able to claim, and rightly so, that they were able to do the job otherwise they would not have been called on in the first instance. This would tend to weaken management's argument for con- tinuinj the distinction. As can be seen, while a good case can be made for either configuration, we tend to favor not making a formal division of central reference- personnel along functional lines. 7ihile there : will, J..nevitaly, be some persons in the system, whose functions will be more or less unique, and others who because of personality or other limi- tations willbe confined to a restricted area of operations, these will be the exceptions rather, than the rule and, in the latter case at least, would not be reflected in the formal organi- zational structure. .A.3.3.? Organization Named Named Object This configuration would organize the infor- mation analysts by the major classes of data stored ORGANIZATIONAL PROBLEM Geographic Division - 266 - Approved For Release 2000/05SWIM-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 :5VETP78-03952A000100050001-7 and retrieved by the system. For example, within the USSR Division there might be a Personalities Branch, an Organization/Installation Branch, and a Subject/ommodity Branch. The Special Register is divided on this basis today and, in a sense, the collateral repositories of OCR, i.e., BR, FIB, and the Intellofax System, are reflections of the same concept except on a larger scale. We know that this approach will work since has been proven over many years of operating experience. Furthermore, by introducing this kind of division at a much lower operational level (namely the country desk) than is the case today, many of the ills of the existing system such as conflicting vocabularies, overlapping document files, diverse input/output procedures, and so on might well be eliminated. It also offers the advantage of immediately identifiable manpower trained in these particular areas and, in addition, permits a high degree of analyst specialization. 71hat makes this solution unattractive? The ORGANIZATIONAL PROBLEM Geographic Division 5.A.3.3. - 267 - Approved For Release 2000/05/30: CIAW&RfET-03952A000100050001-7 Approved For Release 2000/05/30 ? QA-RDP78-03952A000100050001-7 SECRET principal objection is, of course, the fact that it would be a rare document that would not have to be read aad indexed by all three groups. While Comint materials would be less troublesome in this regard, collateral documents and open literature are not typically oriented to any single type of named object. Attempts to coordinate the input effort so as to reduce duplication would be extremely difficult to implement, and document dissemination would in all likelihood take the form of dissemination of the same documents to all three points. Finally, there would remain the problem of coordinating the response to queries. A significant proportion of the requests would relate to all three subject areas and require a coordinated response. In summary, while this configuration is pre- ferable in many ways to the existing central refer- ence organization, it would be less efficient and ?economical than what might be desired. That there may be a better alternative was suggested earlier, - 268 - ORGANr4ATIONL PROBLEM Geographic Division 5.A.3.3. SECRET Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 ? 411111 IMP , Approved For Release 2000/05/30 : Wifri'78-03952A000100050001-7 anflit will be the subject of the next section. T32\.3.4. Organisation2i3 Topic The major selling points for a topic approach to the organization of the central reference acti- vity beneath the country level are (a) that it corresponds more closely than any other configur- ation to the kinds of requests we can anticipate will be levied on the system; and (p) that, while it does not eliminate entirely the problem of the multi-subject document, it would seem to confine the problem to reasonable bounds. If the former statement is accepted, then organization along topic lines would lessen the need to coordinate the search activity in order to provide the customer with a complete response to his query. Similarly, if documents tend to relate to a single, though broad, subject area of intelligence concerr or example political affairs, scientific and technical intelligence, military activities, or economic matters), then the need for multiple - routing of documents should be diminished and - 269 - Approved For Release 2000/05/30 : CIPM041-03952A000100050001-7 ORGANIZATIONAL PROBLEM Geographic Division 5.A.3.4. Approved For Release 2000/05/305.E&READP78-03952A000100050001-7 processing duplication minimized. A preliminary examination of documents enter- ing the current system, as well as a review- of aueries levied on the system by analysts, indi- cates that both do tend to concentrate on one or another of these basic subject areas. This is not too surprising since these are the classic divisions of strategic intelligence, and collection as well as production organizations within the intelligence community reflect this fact. It also appears that there is a reasonable balance of documents as well as queries in each of these topic areas such that there would not be a pre- ponderance of personnel assigned to any one field. As to whether it might be desirable to fu7ther refine the topical breakdown within poli- tical affairs, economics, etc., this would depend on the number of information analysts assigned to any one topic. Additional subdivisions are clearly possible and could be advantageous in that they would permit increased analyst specialization and - 270 - Approved For Release 2000/05116CREA-RDP78-03952A000100050001-7 ORGANIZATIONAL PRO3LEM Geographic Division 5.A.3.4. Approved For Release 2000/05/30 : gc-VETP78-03952A0001000500017 lessen the span of control problem for super- visors. On the other hand, these benefits might ultimately be offset by the inability of the system to separate documents cleanly on the basis of these increasingly narrow subject categories. Documents dealing with two or more major topics would, of course, be received by the system. However, this need not cause any undue concern. Multi-processing of a single multi-subject docu- ment by different topical specialists is less important than multi-processing of an international document by geographic specialists. Such documents would be directed to the unit which seemed princi- pally concerned for complete indexing even when the choice seemed rather arbitrary. If it appeared that the information reported seemed of more than average significance, this would not preclude an information copy of the same document being routed to another unit. Research analysts should prefer to deal with topic-oriented information specialists since they ORGANIATIONTiL PROBLEM Geographic Division 5.A.3.4. - 271 - Approved For Release 2000/05/30 : CI1SME7T3-03952A000100050001-7 Approved For Release 2000/05/3?geBaDP78-03952A000100050001-7 would find them better able to understand their search problems. Indeed, such information specialists might in time become more factually 3mowledgcab1e than their customers since they would have fewer extraneous responsibilities and could concentrate their exclusive attention on the subject at hand. IMO ORGANI-ATIONAL PROBLEM Geographic Division - 272 - Approved For Release 2000/09:61KROVRDP78-03952A000100050001-7 - OM involved included 16 indexers, 4 senior indexers, 3 Approved For Release 2000/05/30: ?g78-03952A000100050001-7 Appendix 5.B. PIREE,IMININRY EVALUATION OF THE CHIVE INDEXING EXPERIMENT 5.3.1. SUMMARY DESCRIPTION OF EXPERIMENT A joint OCR/CHIVE indexing experiment was con- ducted, from about 15 November 1954 to 15 January 1'2:35. Approximately two months training preceded the indexin-4 phase of the'experiment, while the query and evaluation phase is expected to extend through my. The personnel ,Jleric. typists, and 3 project monitors. The d.ata 25X6 consisted, of some 5,000 all-source documents on 25X6 col-k during the period. 1 July - 7. 1 30 September 1,7.64. , The experiment washeld to test certain or-Tlani- 6,; zational and indexing techniques -proposed by CHIVE. 7 .1 Specifically, it -oas desired to test the following .major concepts: ? 7 7 That with aderJuate supporting tools, a person can satisfactorily index all of the, information contained in documents, i. n people, -organi- zations/instaliations, areas, subjects, etc. EVLa:',TION OF EXPORIMENT SupilLary 5.E.]. - 273 - Approved For Release 2000/05/30 : Cl) '-03952A000100050001-7 Approved For Release 2000/05/3WIETRDP78-03952A000100050001-7 - That all-source materials (including Col- lateral, SI, and T/KH) not only can be proces- sed and retrieved in one integrated system but that certain advantages will accrue there- from. - That personnel organization by geographic area and, if necessary, by topic is feasible and desirable. That the CHIVE indexing approach will provide at least as many entry points to documents as that now obtainable from the sum of the indivi- dual indexes and other controls established in the various registers of OCR. - That header data (bibliographic) indexing can be performed by clerical personnel with a minimum of guidance. To.test these concepts, an experimental Branch was established. The Branch was organized into four topical sections: Political, Economic, Military, and Scientific and Technical. Each section was headed. by a senior indexer. More than half of the OCR person- nel assigned to the project had some previous indexing experience, but less than half were currently full- time indexers. Each section was allotted personnel who had experience in working with SI materials, 25X6 background, or familiarity with the Intelligence Subject Cpcle. Some of the individuals had more than 274 25X1A EVALUATION OF EXPERIMENT Summary Approved For Release 2000/066SMA-RDP78-03952A000100050001-7 25X6 25X6 25X6 Approved For Release 2000/05/30 :gakTP78-03952A000100050001-7 one of these attributes. Unfortunately, few of the indexers had previous topical specialization similar to that employed in the experiment. The indexing tools used during the experiment included: - The Intelligence Subject Code - A listing of on whom the Biographic Register maintains dossiers - The Special Register Manual - The NIS Gazetteer - The Special Register Code Book Supplement to the ISC - The CHIVE Indexing Manual - Miscellaneous dictionaries and other reference works. The Intelligence Subject Code and the SR Code Book Supplement were used to index subjects and commodities. The BR dossier list, SR Organization Manual, and the NIS Gazetteer were used as authorities for entering people, organizations, and place names--that is, EVALUATION OF EXPERIMENT Summary 5.B.1. - 275 - Approved For Release 2000/05/30 : CIWERV8-03952A000100050001-7 25X1A Approved For Release 2000/05/3RWRIDP78-03952A000100050001-7 whenever a significant person or organization was encountered, the indexer had to refer to the dossier list or organization manual, find the correct entry, and enter the code assigned by BF or R. All place names were checked in the NIS Gazetteer for the correct entry form. The CHIVE Indexing Manual con- tained the explanation of the indexing techniques, the method of transcription, and some preliminary indexing rules and procedures. The data base was All-source and consisted of , Collateral intelligence reports, translations, the FBIS, newspaper articles, Comint, T/KH materials, and miscellaneous other series. Each document category was represented in proportion to the total documents currently re- ceived in that category during a year. All of the documents concerned tions with other countries. For documents which contained multi-country/ subject content, the rule was established to index that material which would normally be processed by an operational Consistency was dif- 25X6 25X6 25X6 EVALUATION OF EXPERIMENT - 276 - Surninary Approved For Release 2000/055KRER-RDP785:13:952A000100050001-7 Approved For Release 2000/05/30: 65cligT178-03952A000100050001-7 ficult.to obtain here because the indexers had slightly different interpretations as to what a 25X1A would process. It was decided not to apply any selection criteria, but to index all of the information concerning As a result, many low- level personalities and installations, as well as fragmentary subject matter, were indexed which would not be captured in an operational system. No selection criteria were applied because it was felt that realistic criteria could not be established prior to the experiment and that artificial criteria' would affect the experimental results. It was further felt that great indexing depth would aid in estab- lishing future criteria--that is, that the experi- ment would. show that redundant indexing of many sub- jects is unrealistic. However, despite the lack of criteria, an indexing consistency test following the experiment showed that each indexer tended to apply his own criteria based on his views of what was important. 25X6 EVALUATION OF EXPERIMENT SUM:Clary 5.B.1. - 277 - Approved For Release 2000/05/30 : CI4KIXE1T8-03952A000100050001-7 Approved For Release 2000/05/39BapROP78-03952A000100050001-7 The documents were broken out into the four topical categories mentioned above. Each senior controlled the flow of material to his indexers thus assuring that each processed a variety of sources. Upon completion of the indexing, the seniors re- viewed the transcript sheets for accuracy and logic. However, many errors were not caught because neither the indexers nor the seniors were as well versed in the system as would be desirable in an operational system. In fact, it would be fair to say that it was not until the end of the experiment that the indexers and seniors were beginning to gain confi- dence in what they were doing. In addition, severaL of the indexers were not suited to the task and would have to be given other assignments in an operational' system. Following review by the seniors, the documents and transcript sheets were transmitted to the three typists for header data transcription. One of these clericals acted as a senior for resolving problems. In addition, an OCS system analyst who had planned - 278 - Approved For Release 2000/0MRRA-RDP78-03952A000100050001-7 EVALUATION OF EXPERIMENT Summary 5.B.1. Approved For Release 2000/05/30: CniS8-03952A000100050001-7 the header data transcription task oversaw this phase of the operation. The documents were then filed by a CHIVE accession number, and the tran- script sheets were transmitted to key punching. Computer processing resulted in a print-but of index records which contained errors. These listings were reviewed by one of the project monitors and fina corrections were made. - PRELIMINARY FINDINGS The final results of the experiment await the conclusion of the query phase. However, prelimi- nary findings relating to indexer reactions, se- lection -problems, indexing times, etc., can be described, and these are perhaps the critical factors affecting the organization of the proposed CHIVE system, 5.B.2.1. Personnel Considerations The personnel involved in the experiment were college-graduate professionals and less than half had worked in jobs that involved full-time indexing. Even those with an OCR indexing background, had - 279 - EVALUATION OF EXPERIMENT Preliminary Findings 5.B.2.1. Approved For Release 2000/05/30 : CIAWfil-03952A000100050001-7 Approved For Release 2000/05/3EGRINRDP78-03952A000100050001-7 worked or allied. taskssuch as querying or diction- ary building, or had served as experts on some aspect of indexing. In this experiment they did nothing but Index and found the tools and rules with which they eori+, I 0 0 0 0 0 0 0 0 0 0 0 0 0 114 00000000000000000000000000000000000000000000000000000D000000001 , 73 4%411 4101112131515q(617141 avnz 2525262/2020353i32133435361383350515753555555571459505125.155554647585960610/6365655567586970/1/2/3751575/115/ 1111111111111111111111111111/1 111111111111111111111111111111111111111111111111 1 ' 21122222227222222222222222222222222222222222222 2 22222222222222122222 2 22222 2 2 : 1 333 3 33 3 3 3 33 3 33 1 33 3 3 3 3 3 Vfrsin intdEprineA wn r27P.T., 334 31 )3333 3 3 3 3 33333 3333 3 3 3 : 3 ; 14 4 4 4 4 4 4 4 4 4 4 4 4:4 4 4 4 4 4444444444444444444444444444444444444444444444444444444444d t 1 55555555555555555555555555555555555555555555555555555555555555555555555555551 , 666666666666666663666666G666666666666666666668666666G666666666 1 1 1 7 7 1 1 1 7 ) 1 1 117 1 1 1 7 1 7 7 7 1 7 1 7 7 7 7 7 1 7 7 1 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 1 7 7 7 7 7 7 7 7 7 7 7 7 6 3666666666666 E 1 7 7 7 7 7 1 1 1 1 1 11/1 8888888888888:88888888888888888888888888888888888 8 8 8 8 8 8 8 88 a s 88888888888888888i 999999999999 9:9 999999999999999999999099999999999999999999999999 5 9 999999999999F. 11 4 5 , 4 5 W1112.114.50 '41.0252 72 225'24252221183031473311363631839545152535555565754505051:.25155555451u536441626.165 4 ;4.54.6474/. r,.1.,. .5N,1.1, - 514 - Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 4 a 1 2 3 4 5 6 7 9 CONFIDENTIAL Sag Approved For Release 2000/05/30 : CIA-l -03952A000100050001-7 Figure 5.D-15 JOB 3 (KWIC) ELEMENTS OF INFORMATION Col. Field Name 1 File 2-15 Document Identity, Series & No. 16 Year Date of Information** Content Description The numbers (1-4) which identify the exploded punch card records. Numbers are suppressed in printout. Clear-text transcrip- tion of the document series and the docu- ment number. Series and number are sepa- rated by a blank space. Year of publication of the report taken from the documentation. The numbers 0 to 9 are com- bined with over punches to develop 2-digit year on printout. X overpunch for 4 No overpunch for 5 n overpunch for 6 X overpunch for 7* 17-18 From Month 0-12 19 Year as in 16 above 20-21 22 * * To Month 0-12 Year as in 16 above Corresponding CHIVE Field None Card 03 - Doc/ Series identifi- cation no. includes series, no. & year. The digraph sub- ject portion of the series is also carried in field 02-02. 01-07 first subfield 01-07 second subfield All year fields for Job 3 are one column and a 2-digit year is developed in the same manner for all. No information date on PI publication date appearing on the date line is entered in cols. 20-22. Approved For Release 2000/05/30 : Cl - 515 - 03952A000100050001-7 NciDENTA Approved For Release 2000/05/ Figure 5.D-15 (Cont'd.) Col. Field Name 23,37, 51,65 24-36 38-50 52-64 66-78 79 80 Subject seg- ment Keyword Code Clear aext aiig150001-7 Content Description 3 or 0 are used to show whether or not the word following is a keyword and, therefore, to be printed in the alpha list of keywords which comprise the index. 3=Keyword. 0=Non-Keyword* The clear-text words taken from the docu- ment. Some but not all are dictionary controlled. a. SI Reports. The number 2 identifies Cuban reports. The number 3 identifies UAR reports. b. PI reports. One of 6 codes C,K,S,T, Z,N used to indicate the security channels in which the document is being handled. For multiple channels, highest indicator is used. Distribution control symbols 1-7. Corresponding CHIVE Field 01-14 01-02 and 01-03 No one-to-one relation with CHIVE code. 01-04 No onP to-one equivalency. CHIVE code could be easily expanded to accommo- date these entries. * All Keywords are dictionary contr-lled. See sections 5 and 6 for sample pages from China area book and Job 3 dictionary. - 516 - Approved For Release 2000/0 tk-RDP78- Approved For Release 2000/05/30 :CI 6 952A000100050001-7 ZW014 CONFIDENTIAL Figure 5.D-16 FIB TOWN/CITY INFORMATION CARD FORMAT Col. 1-3 1. FIB Country Code 4-6 2. FIB Political Subdivision Code 7-30 3. Location Name 31-32 4. 200 Chart Series 33-35 5. B. E. - WAC Number 36-40 6. B. E. - Town Number 41-42 7. Degrees (N/S) 43-44 8. Minutes (N/S) 45-46 9. Seconds (N/S) (South "X") 47-49 10. Degrees (E/W) 50-51 11. Minutes (E/W) 52-53 12. Seconds (E/W) West "X") 54-55 13. Date of Latest Information (Yr) 56 14. Source Code 57-64 15. AMS Chart Number 65-69 16. Location Identification Code 70 17. Town Card Indicator ("X") 71 18. Town Information Indicators 72-74 19. Cat. Design. Code 75-80 20. Town "C" Code - 517 - Approved For Release 2000/05/30 : CIAArati103952A000100050001-7 CONFIDENTIAL Approved For Release 2000/0-5 .114. a r 6.91. -RDP78-03 Figure 5.D-17 FIB INSTALLATION INFORMATION CARD FORMAT Col. 1-3 1. FIB Country Code 4-6 2. FIB Political Subdivision Code 7-11 3. Location Identification Code 12-35 4. Installation Name 36-40 5. B. E. Installation Number 41-42 6. Degrees (N/S) 43-44 7. Minutes (N/S) 45-46 8. Seconds (N/S) (South "X") 47-49 9. Degrees (E/W) 50-51 10. Minutes (E/W) 52-53 11. Seconds (E/W) (West "X") 54-55 12. Date of Latest Information (Yr) 56 13. Source Code 57-64 14. FIB Identification Number (Firm #) 65-70 15. Installation Identification Code (ICC) 71 16. Installation Use/Assoc. Indicators 72-74 17. Cat. Design. Code 75-80 18. Installation "C" Code Approved For Release 2000/05 ri\i\lor? -RDP78-03952A000100050001-7 cc) Approved For Release 2000/05/30 : CIA?8-03952A060100050001-7 Figure 5.D-18 FIB LOCATION CROSS REFERENCE CARD FORMAT Col. 1-3 1. FIB Country Code 4-6 2. FIB Political Subdivision Code 7-11 3. Location Identification Code 12-14 4. "See" 15 5. (Blank) 16-35 6. Location Cross Reference Name 36-40 7. (Blank) 41-42 8. Degrees (N/S) 43-44 9. Minutes (N/S) 45-46 10. Seconds (N/S) (South "X") 47-49 11. Degrees (E/W) 50-51 12. Minutes (E/W) 52-53 13. Seconds (E/W) (West "X") 54-69 14. (Blank) 70 15. Cross Reference Card Indicator("12") 71-80 16. (Blank) - 519 - Approved For Release 2000/05/30: CIA-Rtionair3952A000100050001-7 Approved For Release 2000/05/30 ? CIA-RDP7 SECRET.- 00100050001-7 s Figure 5.D-19 FIB ICF COORDINATE CARD FORMAT Col. 1-7 8-28 1. 2. Sequence Number Location 29 3. Country Code (Target "X") 30-31 4. Country Code 32-35 5. Political Subdivision Code 36-40 6. (Blank) 41-42 7. Degrees (N/S) 43 8. (Blank) 4A-45 9. Minutes (N/S) 46 10. "N" or "S" 47 11. (Blank) 48-50 12. nglgrees (E/W) 51 13. (Blank) 52-53 14. Minutes (E/W) 54 15. "E" or "W" 55-58 16. "APPR" (If Approximation) 59 17. Irclank) 60-63 18. WAC Number 64-69 19. (Blank) 70 20. Control "X" 71-72 21. (Blank) 73 22. Control "X" 74-75 23. (Blank) 76 24. Town Folder Indicator 77-79 25. (Blank) 80 26. Card Type "1" ,CONFIDENTIAL ..$,EqET Approved For Release 2000/05/30 :-CIA-RDP78-03952A000100050001-7 sEcieT CONFIDENTIAL Approved For Release 2000/05/30: CIA-RDP78-03952A000100050001-7 Figure 5.D-20 FIB ICF CITY CROSS REFERENCE CARD FORMAT Col. 1-7 8-28 29 30-31 32-35 36-38 39 40-63 64-79 80 1. Sequence Number 2. Location 3. Country Code (Target "X") 4. Country Code 5. Political Subdivision Code 6. "See" 7. (Blank) 8. Location 9. (Blank) 10. Card Type "2" - 521 - Approved For Release 2000/05/30: CIA-RDP78:b3952A664F44*NTIAL Approved For Release 2000/05/305,60"kbP78-0MAIDIKIN?O?QQ01-7 rinutiV I IAL Figure 5.D-21 FIB ICF NAME CARD FORMAT Col. 1-7 8-28 1. 2. Sequence Number Location 29 3. Country Code (Target "X") 30-31 4. Country Code 32-35 5. Political Subdivision Code 36-63 6. Firm Name 64-67 7. Plant Number 68 8. Status "X" 69-75 9. Firm Number 76 10. Plant Folder Indicator 77-79 11. Industrial Category Code 80 12. Alpha - 522 - Approved For Release 2000/0 keTA-RDP78-03 EoRAIAL Approved For Release 2000/05/30: i 8-03952/QV Figure 5.D-22 FIB MODEL-TYPE/BROCHURE INDEX CARD FORMAT Col. 1-20 21-59 1. 2. Model Type/Series Descriptive Name 60 3. ?Tech. Material Indicator 61 4. Tech. Material Language* 62-63 5. Industry 64 6. Category Code iICC 65-71 7. Dossier Number (Firm #) 72-73 8. Date (Month) 74-75 9. Date (Year) 76-78 10. Country Code 79-80 11. USSR Area Code *Admissable Entries are: (1) English (2) Native Language (3) Other Approved For Release 2000/05/30 : CIA- )0___NTIDENTIAL i3952A000100050001-7 Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Figure 5.D-23 PUNCH CARD CHARACTERISTICS of the IRS Document Index File (New) E-f 4.1 ? Subject Code st Subject Modifier E-4 Code 4 a a Clear Text E-4 4 o Subject 0 4 Organization E-4 ? Abbr. Place Name 0 ul Area, Code Source Code 4 4 A Document No. A 4 H Pub. Date 8 4 Classification Code * NOTE: Numbers indicate action codes. These are literal entries. - 524 - Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Figure 5.D-24 PUNCHED CARD CHARACTERISTICS OF THE IRS DOCUMENT INDEX FILE (OLD) Fields 1 1 - 6 2 7 -1112 3 4 13- 14 5 15- 18 6 19-21-23- 20 7 27 8 2C 9 26-32 Punch N pos. ,Data Subject Code Subject Modi- fier (Action Code) Area Code Classification Code Source Code Locator No. Related Area Code Related Area Code Pub. Date Control No. 031 , . - 525 - Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Figure 5.D-25 PUNCHED CARD CHARACTERISTICS OF THE FILM INDEX FILE 1 Fields 1 1 - 6 2 7-1012 3 11- 4 1314-1E20 5 6 19- 7 2122-2E27 8 9 Punch Data pos_ Subject Code ONMOMM Area Code Text (Language) AMO Code Type Code Mi Holding Agency Code Pub. Date (Yr) MOW Classification MI Code Title No. alliMil (Control No.) Availability WM Code - 526 - Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7 Approved For Release 2000/05/30 : CIA-RDp78-03952A000100050001-7 seat(' CONFIDENTIAL CON IDEF, CREt Approved For Release 2000/05/30 : CIA-RDP78-03952A000100050001-7