FINAL REPORT: TASK TEAM V (BIOGRAPHICS)

Document Type:

CREST

Collection:

General CIA Records

Document Number (FOIA) /ESDN (CREST):

CIA-RDP80B01139A000300040006-1

Release Decision:

RIPPUB

Original Classification:

Document Page Count:

Document Creation Date:

December 15, 2016

Document Release Date:

December 19, 2003

Sequence Number:

Case Number:

Publication Date:

February 11, 1966

Content Type:

REPORT

File:

Attachment	Size
CIA-RDP80B01139A000300040006-1.pdf	2.18 MB

Body:

25X1 Approveird F r Release 9004101115 - - 000300 0 06-1 5ECI~E` ~ CODIB--D-911111 a 514 11 February 1966 UNITED STATES INTELLIGENCE BOARD COMMITTEE ON DOCUMENTATION Final Report: Task Team V dBio ra icsL Attached for coordination within member agencies and discussion at a subsequent meeting is the Task Team V report. Secretary Attachment ARMY, DIA, DOS, FBI, ONI, USAF reviews completed. On file GSA & OMB release instructions apply. SECRE GROUP I Excluded from automatic downgrading and declassification 25X1A 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved r Release 2004/01/15: CIA-RDP80BOW 39A000300040006-1 U N I TED S TAT E S I N T E L L I G E N C E B 0 A R D COMMITTEE ON DOCUMENTATION TASK TEAM V -- BIOGRAPHICS FINAL REPORT T/V/R-1 1 February 1966 25X1 Group 1 Excluded from automatic downgrading and SECRET I declassification. 25X1 Approved For Release - A000300040006-1 Approved For Release 004/01/15 : CIA-RDP80BO1139A00030Q040006-1 SECRET 25X1 T/V/R-1 1''February 1966 U N I T E D STATE S I N T E L L I G E N C E B 0 A R D COMMITTEE ON DOCUMENTATION TASK TEAM V - BIOGRAPHICS MEMORANDUM FOR: Chairman, Committee on Documentation SUBJECT: Report of'Task Team V 1. Attached. is the report of Task Team V for your consideration. 2. The Team has attempted, in an evolving interpretation of its Terms of Reference, to present realistic recommendations while developing in some depth a substantive description of the problems for the use of interested agencies. While the overall report is classified SECRET Annex 2 has been given a lower classi i.ca ion o permit wider distribution to U. S. Government officials. 3. A large file of information, monographs on various aspects of the problem (National Agency Check System, search strategies, data con- version techniques and experienced costs, SCIPS studies of PI files," etc.) is available in or through the CODIB Support Staff. 4. It is recommended that the Task Team be discharged on CODIB acceptance of this report. A formal mechanism for continued exchange on biographic problems and techniques is, however, contained in the RECOMMENDATIONS. 5. My thanks to support. for his extensive and imaginative airman, Task Team V Attachment: Task Team V Report Approved For Release 2004/01 006-1 25X1A Group I Excluded from automatic downgrading and SECRET lassification. 25X1 Approved For Rele se U N I T E D S T A T E S I N T E L L I G E N C E B 0 A R D COMMITTEE ON DOCUMENTATION TASK TEAM V -- BIOGRAPHICS Table of Contents Purpose Summary of Findings Recommendations The Nature of the Problem Counterintelligence and Security Positive Intelligence Annexes: 1. Glossary 2. Proposed Approach to the Aachine Recording of Personal Names Attachment 1: Machine Recording Techniques for Personal Names 3. Biographic Index, Facts Summary 4. Data elements, in Team Member Agency Records 5. Examples of Name Variants 6. Terms of Reference 7. List of Task Team Members 25X1 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Rase 2004/01/15: CIA-ROP80B01139AP0300040006-1 T/V/R-1 1. February 1966 U N I T E D S T A T E S I N T E L L I G E N C E B 0 A R D COMMITTEE ON DOCUMENTATION TASK TEAM V - BIOGRAPHICS FINAL REPORT PURPOSE The objective of this Team was to "identify means for improving the storage, retrieval and exchange of information from the major name files and related data files in the Intelligence Community." SUMMARY OF FINDINGS 1. Improvements in the speed and quality of biographic information processing involving interagency exchange on U. S. citizens and foreign nationals are necessary to further improve security, and to afford policy makers and analysts better response from biogra- phic intelligence files on foreign nationals of interest from a variety of angles--military, subversive, political and scientific. The Team finds that use of computer techniques and inter-agency telecommunica- tions links may provide significant improvements. .2. There are, however, profound, complex problems and significant costs in making major changes in the large biographic holdings of community concern, particularly if the changes involve conversion to computer systems. 3. There are three basically separate, but somewhat over- lapping biographic areas: Counterintelligence* (CI), Positive Intelligence* (PI), and Security*. Name finding* and name searching* take place in all three. (See Annex 1, Glossary,for definition of these and subsequent asterisked terms). 4. The major indexes* considered by the Team ranged from 300,000 unit records (Secret Service) to 50,000,000 (FBI). These now total about 170,000,000 unit records of interagency concern, and are growing at the rate of over eleven million yearly. (See Annex 3). 5. An average of 30,000 requests concerning individuals are made against these indexes daily. Of the 30,000 requests, about 25X1 Group I SECRET xcluded from automatic 25X1 Approved ForReleal qAA-FA"F 15 . - A00Q$Q,QAA9tV&f1k and declassification. Approved F release 2004/01/15 : CIA-RDP80B0114000300040006-1 25X1 one-half are made between agencies (see footnote) and the other half are processed within the agencies where the requests originate. The 30,000 requests, plus file maintenance procedures, generate 155,000 name searches each day. About one-half of the 15,000 requests made daily between agencies result in a no-record* response. 6. There are several thousand people involved in biographic activity in the Intelligence Community. Approximately 1000 of these, at an annual salary-only cost of $5,000,000 are directly involved, at the index level, in the preparation, maintenance, and searching of the major biographic indexes. These indexes occupy about 100,000 square feet and about $500,000 a year is being spent on supplies and equipment for their support. 7. Agencies in the Washington area are answering security name check requests from each other within two to eighteen days, portal-to-portal, with an overall average response time of nine calendar days. Considerable additional time and cost is involved in delivering the results to the: original requester within the requesting agency. The timeliness of response is believed to vary widely owing to volume, personnel costs, and a combination of many other factors unique to each agency. It is difficult to measure the actual loss to the government in terms of personnel not taken on board, personnel taken on board waiting for appropriate clearances, personnel not utilized in a contact or contractual sense because of the slowness of the system. These are intangibles that only the various elements of the respective agencies can weigh within the purview of their own responsibilities and requirements. 8. In the area of name searching, significant quality and time improvements may be obtained through automation and use of tele- communication links. No major name index in the intelligence community has yet been fully automated. Therefore, proof of success has not been conclusively demonstrated. Several agencies are at various stages in developing systems with practical appli- cations anticipated in the near future. 9. The critical problem in any large name index used for name searching is the way in which personal names are recorded, filed, and searched. Any planning for index mechanization must emphasize this aspect. The success of an improved interagency name Note: Since these statistics were gathered, the number of inter- agency name requests submitted by several agencies has increased on the order of 50% during the last several months mainly as a result of several new programs. SECRET 25X1 Approved For Release - 0300040006-1 ,Approved For aR lase 2004/01/15: CIA-RDP80B01139WO 0300040006-1 check exchange system based on telecommunications coupled with computer search requires a common approach to recording personal names and certain additional basic identifying data. 10. Name Finding activities could be improved through increased understanding resulting from the exchange between agencies (at both the user and system planning levels) of information about the nature and purpose of each other's specialized files as well as the exchange of data files in certain cases and interchange of information on manual and.ADP techniques for improving speed and flexibility of response. 11. The team agreed that the professional interchange derived from the Task Team effort was highly valuable to each member in providing new insights in manual and machine techniques, inter- agency channels, sources of information, and policies of other agencies. 25X1 SECRE 25X1 Approved For Release 2 0410-11-15 : - 040006-1 Approved FQelease 2004/01/15 : CIA-RDP80B011W000300040006-1 RECOMMENDATIONS IT IS RECOMMENDED THAT: 1. USIB urge those agencies with large name indexes used for name searching in the National Agency Check system and in Positive Intelligence applications of Community interest to continue to strive within their organizations for index mechanization wherever it is found to be feasible and practical (recognizing that several agencies are already in various steps of development in this area). The findings and report of this Task Team should be used as a point of departure. 2. In conjunction with Recommendation 1, USIB request each agency to study the feasibility of establishing telecommunications links within the National Agency Check complex to facilitate the exchange of requests and replies. 3. USIB request those agencies engaged principally in Positive Intelligence activities to study the feasibility of tying into the Washington area LDX system for the exchange of Positive biographic intelligence. 4. Those agencies which plan to convert large manual biographic indexes to computer-based name searching systems consider. the approach to the machine recording of personal names outlined in Annex 2. 5. The CODIB Support Staff be directed to prepare and maintain current publications to inform users of biographic information in the community of the characteristics of each major collection, and the procedures and channels for getting service from each, within the limits of security classification and need-to-know prescribed by each agency. 6. The CODIB Support Staff also serve as the vehicle for informing those agencies developing new computer data files, par- ticularly in the PI biographic area, of the format and coverage requirements of others in the community to reduce unnecessary dupli- cation and coverage gaps. 7. DIA expand its program for the processing of military personality information to meet the needs of the PI community. This should include the processing of open source material and should provide for an EDP file of personality information as well as hard copy backup for such a file. This can be coordinated by DIA with a 25X1 SECRET I 25X1 Approved For Release 21304/0i/15' - A00d300040006-1 Approved For. e - group composed of representatives of NSA, CIA, State and cognizant service branches. 8. The Task Team III (or its, successor) be tasked to study those various programs exploiting open source scientific and technical information, which generate personality information of positive intelligence value as a by-product. In conjunction therewith, a coordinated program should be developed using EDP methods to provide machine indexes of the bibliographic data processed by any organiza- tion in this field, so that the personality information is accessible to a recipient in machine form, with quick follow-up to the translated source. 9. Two or three day seminars be held semi-annually (with chairmen rotating from the respective agencies) on the progress of the various agencies in the biographic field, with working sessions for groups with specific problems (such as CI, Security, PI, Communications, the state of relevant technology, software, control techniques, and other functional or technical aspects). 25X1 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved Fo lugg 04/01/15: CIA-RDP80B01139A00030004 P006-1 ' THE NATURE OF THE PROBLEM 1. The Intelligence Community has for many years collected an ever-increasing amount of information about individuals from a great diversity of sources through a large number of channels, and has stored':this data in a variety of retrieval systems in diverse formats. These have traditionally taken the form of index references, either self-conta'ined or leading to dossier files or individual documents. The Team decided, as a point of departure, that the relative pay-off in system improvement would be higher in respect to the larger biographic files in which there is a high degree of activity and . interagency communication. Thus, many of the smaller files studied by SCIPS (the Staff for the Community Information Processing Study) were not included. 2. There are three types of major biographic indexes and files now in operation. They are the Positive Intelligence, Counterintelli- , in the CI/Security area and about 80% in the-PI area). and in many cases actually part of, larger intelligence collection and storage systems which are mission, subject or area oriented. In contrast, the CI/Security systems are clearly oriented to the heavy use of. name searching among alphabetically ordered biographic indexes which, in most cases, lead to dossier files. The Team determined that there. is name searching and name finding going on in both the Positive as well as the CI/Security activity. However, the bulk of the requests in both areas involve name searching (above 95% some similarities in, the basic operating procedures and kinds of searches that are made in the PI systems versus the CI/Security systems. The PI biographic systems are deeply intertwined with, 3. There are important and fundamental differences between,and are contained in the files of the CIA Biographic and Special Registers, DIA, NSA/Office of Central Reference, Department of State and Air Force/Foreign Technology Division (FTD). gence and Security holdings. There i6-relatively little exchange of requests between the PI biographic files and the Security files, moderate exchange between the CI and PI communities and frequent exchange between Security and CI. The Counterintelligence (CI) biographic system centers around the foreign counterintelligence repository of CIA and the domestic counterintelligence holdings of the FBI. The security and PI holdings of the agencies referred to in this report also lead to CI data in some degree. The interagency exchange of Security data centers around the name search type operations performed by CIA, State, Army, Navy, Air Force, FBI, Secret Service, Immigration and Naturalization Service (INS), and Civil Service Commission (CSC). The major PI biographic records SECREJ Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 25X1 Approved For Rele se 2004/01/15 : CIA-RDP80B01139 Op300040006-1 SECRE 25X1 4. The critical problem in name searching large manual or machine indexes involves the ways in which personal names are reported and stored for retrieval. This is a spelling phenomenon, particularly in PI and CI indexes,which may be classified in two parts: a. Name Variants: Different spellings of the phonetically same surname in the original language (SCHUKOW5 CHOUKOV, DIUKOV, DZHUGOV, JOUKOFF, YOUKOV, ZHJUKOV, ZHUKOV, etc.). Given name equivalents, diminutives or abbreviations are also considered part of the name variant problem (WILLIAM, WILHELM, WILL,BILL, WM.) b. Name Variations: Different conventions in recording and using parts of names (name elements), for example: Fidel CASTRO; CASTRO, Fidel A.; CASTRO y RUZ, Fidel Alejandro; John Taylor BROWN; BROWN, J. Taylor; BROWN, John T. 5. The difficulties in handling the name variant/variation combinations are particularly crucial in those systems in which the preponderance of names are on foreign nationals, or U. S. citizens where control of the source reporting (e.g., employee applications, identification of individual by social security or other number, etc.), is not available. The reasons for the corruption of name spellings received by the majority of agencies considered in this report reflect the real world of intelligence biographies - foreign and domestic. The causes include different transliteration systems between countries (and even within a given country),usage and custom, mistranscription in rewriting names, typographical error, telegraphic garble, and phonetic renditions of names overheard. Examples of these problems are given in Annex S. 6. Given this situation, the possible combinations and permutations of name variants/variations are unlimited and, more to the point, unpredictable. Thus no formal linguistically based system for reducing name variants to a common denominator has been found wholly adequate for reliable storage and search by those agencies dealing primarily with uncontrolled sources. A pragmatic approach to this problem - called name grouping - is being developed. See Annex S. 7. The problem is minimized for those agencies which have numerical identifiers (such as social security number or date of birth) in the large majority of their index records. The name variant problem cannot be escaped even so, since these agencies are recipients of name search requests on foreign nationals or U. S. citizens on whom the requesting agency has no control number, and quite possibly a different spelling of the name. SECRET 25X1 Approved For Release 20 Approved FRelease 2004/01/15 : CIA-RDP80B019A000300040006-1 SECRE1 8. The high proportion of common names adds to the difficulties in large indexes, foreign and domestic. For example, in one multi- million card file on Soviets containing over 300,000 different surname spellings, some 1,500 common surnames account for over 50% of the file. In the case of Vietnam, 540 of the people in the Red River Delta area have the surname NGUYEN; 85% of the Vietnamese population is represented by twelve surnames, with the balance less than 300 clan names. 9. The lack of identifying data on named persons is intimately related to the name variant and common name problems for those agencies without source control. While Annex 4 shows the categories of iden- tifying data recorded if available in the reporting, most foreign and domestic reporting deals with vaguely identified personalities. It is therefore impossible to develop rigid rules on what constitutes the minimum identifying data required. Each agency, in recognizing these problems and the nature of its own index, forms its own rules regarding minimum identifying data for recording, and the depth of search according to the nature of the request. 10. The above indicates what is involved in the quality of name searching. In the past, many agencies have reduced their capability for quality search in manual or machine systems.. (e.g., by restricting the amount of data recorded). All involved in this Task Team recognize the need to observe the following principles: a. Preserve complete name spellings, and record name element components in a consistent format for either manual or potentially mechanized indexes. If an agency is planning the latter, the methodology for the formatting of individual name elements as explained in Annex 2 should be considered. b. Retain in the index record all identifying data which assists in distinguishing persons of the same or similar name from one another. Such data elements as sex, date of birth, place of birth, citizenship/nationality, occupation/profession, location, social security number are generally agreed to be desirable, if available, though additional amplifying data further distinguishing the individual should be recorded - regardless of the feasibility of machine search - for human analysis. c. Follow the progress of the "name grouping" approach to the name variant problem and, should it prove operationally successful, take advantage of already developed computer techniques to capitalize on the linguistic effort expended by the Government and private agencies for this purpose. SECRET 25X1 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For R -Ilse 2004/01/15: CIA-RDP80B01139 0 300040006-1 SECPXTF7 I 11. It was also found that name finding requires substantially more time and effort per search. This is true because a name finding request generally must be structured in a more complex fashion and requires a more involved search procedure. 12. The Team decided to consider the CI and Security systems as one area and the PI biographic systems as a.separate area for the purposes of developing the facts,- defining the problems, and making recommendations in this report. 25X1 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved e A000300040006-1 COUNTERINTELLIGENCE AND SECURITY 1. The Security activity clearly stands out as a network of ten large indexes which are heavily used. Name searches are conducted mainly for granting security clearances for a variety of reasons such as employ ment, contact, association, contract, etc., and at a variety of security levels. An agency's requirement to grant such a clearance results in the selective checking by that agency of an average of seven other agencies. The major agencies involved in this program include CIA, State, Army, Navy, Air Force, NSA, FBI, Immigration and Naturalization Service, Secret Service, and the Civil Service Commission. The latter three listed are not part of the USIB Community but, in formulating the Team, it was recognized that these agencies are an integral and significant part of the National Agency Check (NAC) Program. Of the approximately 114 million unit records in the Security holdings, these three agencies hold approximately 50 million (I&NS, 37 million; CSC, 12 million; Secret Service, .3 million). Of the 28,000 requests generated daily in the CI/Security System, approximately,8,000 are generated by these three agencies. 2. Intertwined with the Security request activity are the foreign and domestic Counterintelligence activities centered respectively in CIA and FBI. There are, however, some CI functions in most of the other agencies represented. The normal purpose of the Counterintelli- gence biographic name check activity, as it takes place between the agencies, is to determine the presence of information about: an individual of interest to the requesting agency for some counterintelligence reason (e.g., relating to hostile activities of foreign intelligence services and the Communist Party). The CIA maintains a se arate and significantly large foreign counterintelligence 25X1 B index in light of its foreign counterintelligence responsibilities under NSCID 5/3. Security indexes lead primarily to investigative cases and criminal records, predominantly on U. S. citizens. In spite of the fact that requests are made of the CI/Security holdings for different reasons, the nature of the requests and the structure of the data bases involved are substantially the same. 3. The various contributing agencies are listed inr'Annex 3 along with a set of facts about the respective size, type, growth, activity,, etc.,of their CI/Security files. It can readily be seen that the size of the various indexes ranges from 300,000 in the case of the Secret Service to over 50 million in the case of the FBI. Most of the unit records are still on 3 x 5 cards. Some of the individual agencies are in the process of converting their indexes to machine language at the present time., This is true of the Office of Security and the Clandestine SECRETI Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 25X1 25X1 Approved For Rol ase 2004/01/15 : CIA-RDP80B011394 0300040006-1 Services of CIA. The Army and Navy indexes are already on IBM cards, and the NSA Security Records are on magnetic tape. As a result of recent DoD action, the Army, Navy, and Air Force are completing plans to merge their three index holdings on punched cards by mid-1966. Consequently, the Air Force will shortly convert its 3 x 5 index cards to IBM cards for insertion into the common DoD index. This DoD index, although to be in machine language (IBM cards), will, in its initial phase of development, be searched manually. The Immigra- tion and Naturalization Service is presently studying a program to convert its index to machine language and prepare for a machine-based system. This is likewise true of the Secret Service, FBI, and the Civil Service Commission. 25X1 4. The CI/Security indexes are growing at approximately 7% per year. This means that they will double in size within ten years at the present rate of growth. Of particular significance is the fact that the 28,000 requests made per day in these indexes (along with the daily maintenance) results in over 120,000 actual name searches being made, mostly manually, in these indexes each day. Of these 28,000 requests, approximately half are made between agencies. From these.14,000 name checks flowing between the agencies, more than half result in a no-record response by the responding agency. 5. The elements of the Cl/Security search process considered by the Team include the size and the activity between the agencies, the accuracy and form of the requests and responses, as well as the time that it takes the agencies to respond to each others' requests. The Team noted the fact that there are literally dozens of name check request forms now being utilized by the various agencies. In observing some of these typical and most widely used forms, the Team found that certain basic data such as name, place and date of birth, service serial number, social security number, sex, etc. were included on each form. The Team considered a study of the need for a single name check form to be used by the various agencies. It was considered more important, however, to examine the data elements used and what rules should be applied to their control. These considerations become increasingly critical as the agencies move toward greater use of machine language. 6. To obtain a reasonably dependable determination of the kind of response time in which the various agencies were providing informa- tion to each other, a sample survey was made of 3,000 individual typical routine requests. Emergency and priority requests are handled by every agency in a matter of minutes or hours depending upon the results of search. The FIB, I&NS, CSC, CIA, and Army participated in this test. 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved Felease 2004/01/15: CIA-RDP80B01A000300040006-1 These agencies tabulated the response times of requests from each' other as well as from the Navy and the Air Force. The interagency response time varies from two to eighteen days with the average of all the agencies being nine calendar days. There were factors which the Team recognized as causing possible aberration in these figures: hand carrying of the requests by liaison personnel, the variations in the depth of searches, (i.e., on the head* or checking different possible spellings of the same name) and the researching of the files by requesting agency personnel on the premises of the answering agency. In spite of these, the Team feels that the nine-day figure is a reasonably accurate estimate of the average time (within a day or two) required for processing of the great bulk of the name checks being made in this system. 7. It should be noted that the response time referred to above does not include any internal processing time, in or out, by the various requesting agencies. The time was measured in all cases from the day the request left the requesting agency to the day that it returned to the requesting agency. This time included the mail time plus that required to make the index search by the responding agency and the analysis of files in the case of possible identification. Based on informal observations of the various Team members it appears that, in the great majority of these cases, there is far more time spent processing these requests within the requesting agencies (i.e., from the time the original requester - e.g., analyst, investi- gator, Ambassador, etc. - sends out his query to the point where it re-enters the agency and is provided to the ultimate user) than the nine-day figure of external processing time explained above. To determine the extent of the internal processing lags and the reasons therefor was a task far beyond the capability of the Team. 8. Many CI requests are answered from materials that are not processed into the files, such as directories, working aids, etc., or from material too current to be in the file, such as today's newspaper. Some files are restricted by security classification as to what can be processed. Research in such a limited source file often gives incomplete or out-dated information. It is doubtful that any single file, whether it be computerized or manual, can ever be considered a complete or sole source for biographic information. 9. It was not possible for the Team to consider specifically the relative merits of: (a) the improvement of the manual systems within each agency, (b) the potentials in automation of the index systems within each agency, and (c) the system efficiency that might be realized by the institution of a machine language communication system between the various agencies. These are tasks requiring management supported feasibility studies, dominated by the professionals within each agency, in terms of the unique history and problems of each. Approved For 4ase_2004/01/15: CIA-RDP80B0113 A000300040006-1 25X1 25X1 Approved For Rase 2004/01/15: CIA-RDP80B01139300040006-1 POSITIVE INTELLIGENCE 25X1 1. The positive intelligence (PI) biographic files can be defined as those files in the intelligence community that have been developed to support the evaluation and production of foreign intelligence. The files are used primarily by government reports officers, researchers and policy makers in establishing or determining facts and reaching decisions in the fields of foreign affairs and defense. The personalities contained in the community's PI files are predominantly foreign nationals. The team concentrated its review upon the major files of the PI community (see Annex 3) on the assumption that the problems involved in the areas of storage, retrieval and exchange would also exist in other PI files and because a large number of the smaller subject-oriented PI files contain the same source material. Development of these smaller files may often be the result of the problems of size, immobility and acces- sibility that have developed over the years in the large PI files. 2. The management of a PI file can be broken down into four functional areas: collection of source material; selection of informa- tion for the files from the source material collected, processing of information into the files, and dissemination of information from the files. The task team concentrated mainly on the area of dissemination and procedures for searching information requests. Since the other three areas have a definite effect on dissemination they were reviewed. a. Collection - Literally hundreds of thousands of source documents are received by a PI file system each year. They will be in English or in a foreign language and each must be read and evaluated. These sources will include the following: newspapers, press services, foreign journals, books, government publications, radio broadcast information and the entire intelligence output of the US intelligence community. A portion of this material will be of a very current nature, having been produced the same day or the previous day. b. Selection - The basic criterion of any agency for selecting an item for a PI file is whether or not the item supports the foreign intelligence effort on a particular country or area. Every organization has its own standards for selection based on the mission it is supporting and budgetary limitations. The same source document is frequently processed by different PI organizations. The amount of information that is already available in authoritative sources such as military registers, directories, etc.,will often determine what will be 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved F selected for the files. On areas such as the USSR, China, etc., a great deal of open source and classified intelligence will be processed because reliable directory type information is not obtainable. There is an overlap of information in PI files because the different file systems support the same requirements, or because the personality mentioned in the source report meets the selection criteria for two different requirements: e.g., CIA and State have an interest in military personalities who are prominent in other fields such as politics, science, space, etc., whereas DIA and NSA are interested in the same person be- cause he is in the military field. There is no assurance, however, that a personality mentioned in a source document will necessarily be processed into a PI file. c. Processing - Most PI organizations process an abstract, page or the entire document into its file. The main file may be in the form of a dossier or a structured alphabetical file which can be approached directly or through a card or machine' index. The file items may be photocopy, microfilm, multilith, typed abstract, or the original document. Because of the timeliness of some information (the same day or previous day) and the current nature of some requests, it is necessary either to process this information on?a priority basis and get it into the file quickly or to arrange support files that will give a researcher quick access to this information. The file item may he indexed for a particular computer file at the same time it is processed into a manual PI file system. The personality name as it appears in a source document is often either incomplete or misspelled and the name is researched and corrected wherever possible. Routine processing time from selection of an item to filing the item will range from an average of seven to twenty days. d. Dissemination - The dissemination of information from a PI file will be usually one of two types: the ad hoc research of a specific request for information on personalities or the production of biographic intelligence by the PI element itself. Examples of the latter are the biographic handbooks produced by CIA and DIA on high level personalities, Soviet Men of Science, Biographic Briefs, and the Directory of Soviets. 3. In order to analyze the biographic request activity, the team members from DIA, CIA, NSA, and State each exchanged a group of typical research requests. These requests could be grouped into the following categories: diplomatic and government; military; scientific and technical; subversive; foreign trade; business and international SECRET 25X1 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Rase 2004/01/15: CIA-RDP80B01139(0 0300040006-1 25X1 organizations. The requests involved either name searching, where the identification or complete information on a named individual is requested, or name finding, where the name of the person (s) is either missing or so badly misspelled that research on the. other data elements. available, such as his position, location, organization or persons associated with him, is required. 4. The group arrived at the following conclusions as a result of its analysis of the requests and its discussion and review of the file systems. a. PI requests are'basically 20% name finding and 80% name searching. It takes more time to research a name finding request, particularly if identifying data in the request is incomplete. A name finding request may generate a list of hundreds of personalities of possible relevance. Many name searching requests require the analyst to use various name finding approaches. If the requester wants a complete identification or biographic sketch on a person holding a government position or an organizational position, e. commander of the Moscow PVO district, General lit is necessary to check the records by organiization. is will insure that any documents reflecting his change in the organization by position but not name might provide the desired information. b. A computer system that is developed to process PI information should provide the researcher with both name-searching and name-finding approaches. In a manual system this is usually accomplished by two file systems: a name file in which the personality is searched by his name, and by files that are set up by the other data elements such as organization, location, occupation, etc. In a computer file of limited size, e.g., one or two magnetic tapes, where the maximum search time is fixed, a single file containing the name and all pertinent data elements may be adequate. This will not be true of a file system containing millions of personality records growing at the rate of a million records per year. If name finding approaches are not provided in a large system, the result may well be the development of a new group of subject-oriented files, either manual or computerized, similar to those that presently exist, to meet the needs of specific components of an organization. c. Many PI requests are answered from materials that are not processed into the files, such as directories, working aids, etc., or from material too current to be in the file, such as today's newspaper. Some files are restricted by security classifi- 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved P'Release 2004/01/15: CIA-RDP80B01A000300040006-1 cation as to what can be processed. Research in such a limited source file often gives incomplete or out-dated information. It is doubtful that any single file, whether it be computerized or manual, can ever be considered a complete or sole source for biographic information. d. "On the head" name search (i.e., researching the name only as it is spelled in the request) cannot always be considered adequate in the PI areas. This implies that information will be found under the name spelling in the request; and since PI name spellings do not usually come from official sources, they are more likely to be incorrect than names found in those indexes where source data is controlled. As mentioned previously, an effort is usually made to correct the spelling before an item is filed, and the same effort is and must be made when performing research. e. The PI request is often of a current and timely nature, requiring an answer within an hour; or even minutes if it is to be useful to the requester. Routine requests are normally answered within a day. Some extensive research projects may involve thousands of names and require weeks or months to complete. The need for rapid response is one of the reasons a PI element often cannot rely on another agency to answer its requests. This is one of the reasons for the overlap found in the various PI files. The present communications between agencies is not adequate for quick exchange of classified information. f. There is an extensive but insufficiently coordinated effort in the intelligence community to produce or bring under control scientific information from open sources on the Soviet Union and Eastern European Communist countries. This activity results in the creation of a great deal of personality informa- tion on scientists at all levels of significance. g. The community could benefit from.a coordinated effort in the production of military biographic information from open sources. S_ It was not possible for the Team to consider specifically the relative mertis of: (a) the improvement of the manual systems within each agency, (b) the potentials in automation of the index systems within each agency, and (c) the system efficiency that might be realized by the institution of a machine language communi- cation system between the various agencies. These are tasks requiring management supported feasibility studies, dominated by the professionals within each agency, in terms of the unique history and problems of each. -114 14 - F Approved Fo - 00300040006-1 =Mb"_ ILVV1411 15 m 019 25X1 25X1 Approved FonWease 2 GLOSSARY COUNTERINTELLIGENCE BIOGRAPHIC AREA: That activity which deals with information on personalities who constitute a known or possible threat to national security. These normally include members acid agents of foreign intelligence services, Communist Party officials, and others engaged in organized subversive activities. POSITIVE INTELLIGENCE BIOGRAPHIC AREA: That activity which deals with information on personalities, usually foreign, who are of general interest to the intelligence community. These include leaders in the scientific, political, governmental, economic, military, and other professional/governmental fields. SECURITY BIOGRAPHIC AREA: That activity which deals with information held by those organizations which have the normal function of investigating and granting clearances on individuals or organizations. This activity includes information of counterintelligence interest in respect to the internal operations of the holding organization. NAME FINDING: Searching to identify individuals from data elements other than the name, such as age, position, location, organizational affiliation, occupation, military rank, nationality, including a combination of such factors. NAME SEARCHING: Search of indexes or files organized by the names of persons to determine if information exists on the individual, or to validate basic information. MAJOR NAME INDEX: Those personality indexes, in or associated with the intelligence community, which are large in size (several hundred thousand or more unit records) and which are regularly consulted on a routine basis by at least several of the intelligence community member agencies. ON THE HEAD SEARCH: This consists of a name search on the exact spelling given. For example, a request on BURKE, Robert M. results only in a search in the index against the name BURKE, Robert M, and not any variation of the name. This is the strict interpretation, but some groups which operate biographic holdings in the intelligence community indicate that this definition might include, from the example above, such variations as Robert no middle initial; BURKE, Robert Meredith; BURKE, and BURKE, R. M. All are fairly well agreed that it would not include variants of the spelling of BURKE. NO: RECORD RESPONSE: This refers almost exclusively to name searching. This involves the situation where the cheek being made results in no SECREII 25X1 25X1 Approved For Release 2004/01/15 : CIA-RDP80B0l139A000300040006-1 Aft, Approved For Releoge 006-1 25X1 ANNEX 1 information about the individual at the index level. This is the basis of the statistics as reflected in column 15 of Annex 3. This does not reflect the situation where several possible identifications are made at the index level which, when later analyzed from file information, are determined to be different individuals, in which case a no record response still is returned to the requesting agency, nor the numerous cases in which one or more similarly named persons may possibly be identical with the subject of the request. 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Rgase 2 0 0 /1 C 0 139AQ00300040006-1 ANNEX 2 PROPOSED APPROACH TO THE MACHINE RECORDING OF PERSONAL NAMES INTRODUCTION 1. A USIB endorsed approach to the machine recording of personal names is proposed, subject to qualifications outlined below. The pur- pose in proposing the adoption of this approach is to insure that those agencies automating their indexes for name searching purposes, where continuing inter-agency exchange is involved, recognize the problems of identifying the elements of personal names in machine recording, and adopt similar, if not identical, logic in storing, maintaining and searching these name elements. This is necessary if the agencies concerned are to exchange, eventually, formatted queries via tele- communications facilities, for input to automated biographic indexes with little or no programmed format conversion and manual reprocessing. 2. In suggesting this approach, it is recognized that significant problems could confront those now using or developing manual or EAM indexes. It also is not intended to preclude the immediate adoption of electrical communications between agencies for speedier search request response. 3. The proposed approach is subject to the following qualifica- tions'and assumptions: a. It is intended to apply only to those major PI, CI, and Security indexes consulted regularly on an inter-agency basis (e.g., Major NAC indexes, Biographic Register, NSA/CREF), though the approach to personal name recording should be of value as well to those developing internally-used index systems. b. The approach assumes computer data recording and manipulation, as opposed to punched card systems (the rules can only apply to variable length records and computer program- ming techniques to manipulate data elements internally). c. The proposal assumes that the rules would be applied only at that point when an agency begins machine language preparation of new input for eventual computer operation, and is not intended to apply to existing punched card records which, however imperfect, may be the only means for converting an existing file to a computer data base. C-O-N-F-I-D-E-N-T-I-A-L Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Release 2004/01/15 : CIA-RDP80B01 9A000300040006-1 C-O-N-F-I-D-E-N-T-I-A-L 4. It is felt that those agencies contemplating eventual conversion to computer search systems should evaluate the desirability of recording personal name and related identifying data in variable length input format for computer processing. This will accomplish the beginnings of a data base which will not require later keypunch conversion, provide means for manipulating and editing index informa- tion not possible in EAM or manual systems, and will provide also the capability to print or punch index records as a byproduct to keep up manual and EAM systems during the interim stages. 5. Attached hereto is a description of machine recording techniques classified FOR OFFICIAL USE ONLY. C-O-N-F-I-D-E-N-T-I-A-L Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Re /01/15: CIA-RDP80B01139A00030004 006-1 ANNEX 3 1. The index.size refers to the actual number of index records (3 x S cards, IBM cards, logical records on magnetic tape, etc.). 2. The type of index record would include whether it is a 3 x 5 card, 5 x 8 card, IBM card, on magnetic tape (MT) in document form, etc. 3. The increase per year is the best possible estimate of the yearly change in the number of the index records during the next three years. 4. A multiple reference card is one which leads to more than one dossier, document, etc., by some reference mechanism such as a number. S. The emphasis in this definition is on the word "predominately" with the understanding that probably all indexes being considered are mixed to some degree. The purpose of this item is to indicate in general terms whether an index mainly concerns U. S. citizens or foreign nationals. 7. A "request" means a requirement levied on the index, either by the organization internally or by another organization, for the checking of a name of a person. If the request is in the form of a list, for example, names of ten different individuals are considered ten requests. 8. The average number of searches per request indicates how many different ways on the average a request is searched. The searcher may look for a variation in the name, for example, E. J. Jones, Ed Jones, etc., or for the name variant in either the surname or other name elements (for example Nicholas, Nichols, Nickols, Nickles, etc). Some organizations may make one or both types of multiple searches on a certain type or percentage of requests. 9. This is the product of column 7 times column 8. 10. Maintenance searches include such activities as prechecks SECRET 25X1 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Rele%a 7&'T/TI5 : CIA-RDP80B01139A000300040006-1 BIOGRAPHIC INDEX FACTS SUMMARY agency 4 Navy-ONI 4.5 5 AR-OSI Dar 2.5 TOTALS 'ry Cr O o Q o o fi G -15 66 IZV~(~2 ~); -, 9f , &Q / ~c /,,i, -4"~ 'ry '0 q S,CI 1 1500 1 1.5 S,CI ( 1200 1 _ 1.5 1 1800 SIC I 3400 1300 1.5 4. 5100 5200 600 100 5700 5300 70 80 all all 80 90 13 S , C r-3-L5 5x/ Doc MT 11.28 33462 84755 51100 8.62 31250 79395 37500 4.26 2587 6010 17350 TA=Not ascertainable. Lines 1-13=CI/Security Systems 1-16 171.7 1-13 ,137.2 12-16 41.5 ti ~G/J o O 155227 14924 436 118707 13764 399 40822 E 12631 117 21--i Lines Systems ,-4 15 4~.-l 25X1 B 25X1 25X1 B 1B z 25X1 z M Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Release nnA/nl/l-s;-rlA-PnPRnRnll.'IAAnn S CRE ANNEX 3 for any reason, the filing of new cards, the refiling of cards for any reason, activity involved in correction of cards, cards being placed or removed for the purposes of opening new cases, purging operations and any other index search or look-up which is not made directly as a result of a normal request as defined under item 7. 11. This is the summation of items 9 and 10. This item reflects the actual total number of searches performed by the reporting organization per day. 12. This is the percentage of the requests (item 7) on which no record or no identifiable information is obtained from a check of the index. It was recognized by the Team that many possible identifications made at the index level later result, after final analysis, in a no record or a no identifiable information; but it was agreed by the Team that since this figure was not readily available, the best criterion for the purposes of this report would be the no record at the index level. 13. This percentage figure represents that proportion of total requests (item 7) which come from other agencies. 14. This represents the number of requests from other agencies as calculated from the percentage figure in column 13 times the request figure in column 7. 15. This percentage figure indicates the portion of requests from other agencies for which no record is found at the index level. The same criterion was used as for item 12. 16. This represents the number of external requests on which no record is found at the index level. It was conputed from the percentage figure in column 15 times the number of requests in column 14. 25X1 25X1 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Release 2004/01/15 : CIA-RDP80B01139AO 0300040006-1 MACHINE RECORDING TECHNIQUES FOR PERSONAL NAMES Annex 2 Attachment 1 1. Described below are some of the problems involved in the recording, filing, and searching of personal names and suggested solutions. The problems in the handling of personal names by electronic data processing are dealt with specifically and considera- tion is limited to large personal name indexes where (1) point of retrieval is on name spelling, (2) the quality of name recording, i. e., spelling and/or completeness of name, cannot adequately be controlled, e.g., names recorded in newspaper articles, heard on radio broadcasts, copied from documents, or obtained from second or third hand sources whose knowledge of the name spelling and/or completeness may not be reliable, and (3) where additional identifying information such as date and place of birth, occupation, etc., may not be consistently reported, and such specific numeric controls as social security number, military service number, drivers registra- tion number, etc., do not apply. These conditions are found not only in the names recorded in an index, but also in the names received as requests for information. 2. The first problem in recording personal names is to define the basic order in which the name parts will be recorded. That is, shall the name be recorded in the English signature style (given names followed by family name) or in telephone book style (family name followed by given names)? If the index in question stores names of all nationalities (very'few do not), either style of recording will require some rearrangement of name parts at the time of recording. For example, Hungarian and Chinese name signatures are quite different from the English signature style. That is, the Hungarian or Chinese name is usually written with the family name first, followed by the given names. 3. Regardless of the recording style selected, it is important to define various elements within a name and to identify them in some manner when they are recorded. The definition 'and identification of various name elements is necessary to (1) adequately describe . recording rules to reporters and recorders as they apply to names of various nationalities, (2) facilitate accurate filing of the name records in the index, (3) permit accurate machine processing (sorting) for alphabetic listings, etc., (4) and to facilitate storage and retrieval (search) of name records by computer. Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved PM-Release 2004/01/15: CIA-RDP80B019A000300040006-1 FOR OFFICIAL USE ONLY 2 - Annex 2 Attachment 1 4. Many different codes, symbols, characters, or fielding techniques may be used to identify various name elements. However, if a printed version of the name is to be read by persons not normally associated with the EDP environment, it is preferable to use common punctuation which can easily be interpreted by the customer, i.e., use a period after a single alphabetic character to identify an initial as opposed to a single character name or particle. 5. Definitions of various name elements wh_ch should be identified when recording the name follow: a. NAME: That word or combination of words used to identify a person. (1) The minimum field length for recording the name should be forty characters. Although many names can be recorded in less than 40 characters, the truncation imposed upon lengthy names by, say, a 20 character limit, often eliminates the very elements which provide discreteness. Such system-imposed restraint increases the number of name records which will be retrieved in a search. Additionally, it often imposes pre-input editing to be sure that critical elements of the name can be recorded in the field size allotted. For example, the name Evangelica Concepcion Rodriquez y Gonzalez contains 42 characters including spaces and without any special characters to identify various name elements. The usual pre-input edit of this name would probably reduge it to RODRIQUEZ, EVANGELIC, thus making it impossible to distinguish this Evangelica Rodriquez from any other Evangelica Rodriquez. If the name were not pre-input edited, but merely truncated by the irput typist or arbitrarily by the machine, the entry RODRIQUEZ Y GONZALEZ, EVANGELICR CONCEPCION would be truncated to RODRIQUEZ Y GONZALEZ which is even less discrete. Forty characters permits recording of the family name and most of her given names, i.e., RODRIQUEZ Y GONZALEZ, EVANGELICA CONCEPC. FOR OFFICIAL USE ONLY Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Release 2 0 C: 0 39A000300040006-1 .3 wr - 3 - Annex 2 Attachment 1 b. SURNAME: The word or words which comprise the element of a name commonly referred to as the "last name" or "family name," including initials, abbreviations, and particles (defined below) if reported as part of the surname. The surname is that element of the name which governs the primary position of a name in an'.alphabetic file. Surnames containing more than one word are referred to as "compound" or "Multi-Word" surnames. (1) Because surnames often contain more than one word, and in view of its basic importance to the filing and subsequent finding of the name record, it is necessary to identify which part of the complete name is the surname. In the examples which follow, surname is printed first followed by a comma to show the end of the surname. If some such method of surname identification is not used, surnames which contain more than one word cannot be distinguished from those with only one word followed by first name. Examples: BROWNE, T. R. CESPEDA Y LOPEZ, JUAN KAMAL AL DIN, MOHAMED c. GIVEN NAME: The word or words in a name commonly referred to as the "first," "baptismal," "Christian," "middle," or "patronymic," etc. Initials and abbreviations are included. Given Names dictate the alphabetic position of a name record within like surnames. Therefore, particles, titles, and telecodes (defined below) are not included in the definition of "Given Name." (1) Whether the name parts being recorded are called "Surname and Given Name" or "Clan Names" or whatever, is irrelevant. It is important, however, to identify which word or words in a name are to be used as the primary storage or search element (Surname) and which are to be used secondarily, (Given Name). (2) Note, in the following list of names recorded without commas, that "compound" surnames cannot be distinguished by a computer from non-compound surnames and, therefore, the second word of the compound surname is likely to be used as a given name. FOR, OFF.IOIA.L;? USE ONLY Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved Fd2elease 2004/01/15 : CIA-RDP80B011A000300040006-1 FOR,OFFICIAL -USE ONLY Annex 2 - 4 - Attachment 1 GARCIA LOPEZ JOSE should be GARCIA LOPEZ, JOSE MAC DONALD HENRY it IT MAC DONALD, HENRY RODRIGUEZ L. JUAN RODRIGUEZ L.I. JUAN ST. CLAIR ROMAN LUIS ST. CLAIR ROMAN, LUIS STA. ANA RAUL " " STA. ANA, RAUL d. PARTICLES: Particles include the articles (la, der, etc.) prepositions (de, von, etc.) and conjunctions (und, etc.), foreign equivalents of the English the, of, and, etc., which have.not become an integrated part of the name. (1) Particles are usually ignored in the filing of names because they may be different each time a name is reported and recorded or may at times be completely absent. Therefore, if the particles were used in determining the alphabetic file position of the name, the same name would be filed in different places. Examples: GARCIA LOPEZ, JUAN GARC IA (Y) LOPEZ, JUAN GARCIA (E) LOPEZ, JUAN (DE) GENNARO, GUISEPPE (DI) GENNARO,, GUISEPPE GENNARO, GUISEPPE KAMAL (AL) DIN, MOHD KAMAL (UD) DIN, MOHD KAMAL (EL) DIN, MOHD'' KAMAL (ED) DIN, MOHD (2) For the above reasons, it''is important to identify those words in a name which are particles. When they have been properly identified, the computer Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Release 20QVft1/j{F F 11A . Q[LI OPML~9A0=00300040006=1 Annex 2 5 - Attachment 1 processing of these names will be able to facilitate appropriate alphabetic sequence. '(3) For name searching purposes, it is particularly important that particles appearing in the given name field be identified (for example, by enclosing in parenthesis) so that they are not confused with given names. Examples: NASSIR, GAMAL ABD (AL) SHARIF, ABD (AL) MOHD e. TITLES: A descriptive name or appellation which denotes rank, office, privilege, or is used as a mark of respect. The terms Jr., III., 2nd, Mrs., Miss, Colonel, Prince, etc., are included as titles. Example: BROWN, JOHN /JR/ (1) In most files dealing with military personalities, rank is normally fielded separately. If titles are included in the name field, it is important that they be identified as such, so that they do not become confused with given names. Example: SCHEINHEIMER, BARON should be SCHEINHEIMER, /BARON/ f. TELECODE: Numeric equivalent of ideographs used in Chinese, Korean, and Japanese writings. Some Japanese ideographs which have no numeric equivalent are represented phonetically, i.e., "KATAKANA." When the ideograph is illegible and/or the numeric equivalent is not known, the term, NTA (No Telecode Available) is often used. Examples: TOJIMA, FUSANOSUKE /2073/*02701/2075/0037/6534/ LEE, WON-LOU /NTA/0029/0283/ CIIAN, LI-SHU /7115/0173/0209/ (1) Each numeric or alphabetic set in the telecode should be separated from the other by some special character. If the telecode is recorded in the name field, special characters should be used to identify'it for potential special processing by the computer. FOR; OFF.I;C.IAL..USE ONLY Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For elease 2004/01/15 : CIA-RDP80B0113AO00300040006-1 Annex 2 6 - Attachment 1 g. PREPARATION OF THE NAME FOR SORTING AND STORAGE: (1) If characters other than alphabetic are used in the name, certain special characters should be removed for sorting purposes, creating a so called "Pure Name" for sorting purposes. The internal creation of a sort name is necessary to assure accurate sequencing of names for alphabetic printing or storage. When the name is printed, the original input `lame field is used. (2) If characters such as hyphen or an apostrophe were allowed to remain in the name during a sort, the name HERNANDEZ-PELAGIO would be listed after the name HERNANDEZ ZERTUCHE. A search for O'BRIEN would find it listed before names beginning with OA and not in the OB part of the list as would be expected. (3) Characters and special elements to be removed for sort purposes are: (a) Particles - remove and left justify the remainder of the name. (b) (c) Hyphen - remove and insert space. Period remove and left justify the remainder of the name. (d),Comma - remove and insert an extra space code. (4) Titles and telecodes included in the name field are sorted to numeric and/or alpha order. The virgules or other special characters enclosing these characters are also used in sorting and will provide the uniqueness required to place names embodying titles or telecodes after like names in the file, without a title or telecode. (S) Upon the removal and substitution of the foregoing, the name may be sorted accurately to alphabetic order. Note, in the following examples, the effect of the foregoing rules, especially with respect to compound names. NAME AS PRINTED NAME FOR SORTING 'AZIM, MOHAMED (AL) AZIM MOHAMED Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Release 2004/01/15: CIA-RDP80B01139A000300040006-1 FOR OFFICIAL USE ONLY NAME AS PRINTED AZIM,? MOHAMED AL GARCIA, MARIA GARCIA-LOPEZ, MARIA GARCIA (Y) LOPEZ, MARIA O'BRIEN, JOHN O'BRIEN, JOHN /DR./ (DE) SANTOS, JOSE SMITH, J. X. SMITH, J. XAVIER SMITH, ZELAYA ,SMITH-CORONA, JAMES STE. ANTON, GREGOR STE-ANTON, GREGOR NAME FOR SORTING AZIM MOHAMED AL GARCIA MARIA GARCIA LOPEZ MARIA GARCIA LOPEZ MARIA OBRIEN JOHN OBRIEN JOHN /DR./ SANTOS JOSE ,SMITH J X SMITH J XAVIER SMITH ZELAYA SMITH CORONA JAMES STE ANTON GREGOR STE ANTON GREGOR Annex 2 Attachment 1 10. The following, in summary, is the approach the Team recommends in the identification of name elements, with examples of the types of punctuation controls which may be used: a. Record complete name elements in a consistent order, i.e., surname followed by given names then by telecodes and/or titles. Example: CHIANG, KAI-CHEK /1203/0009/7156/ b. Identify surname elements as opposed to given name elements, i.e., by placing a comma between the two elements. Example: DOE, JOHN c. Identify particles, i.e., by placing parenthesis around them. Example: GARCIA (y) LOPEZ, JOSE FOR OPFICIAL-USE ONLY Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved Fd2elease 2004/01/15 : CIA-RDP80B011A000300040006-1 FOR OFFICIAL USE ONLY Annex 2 d. Identify titles and/or telecodes, i.e., by placing virgules around them. Examples: CHAN, WON LI /0148/0029/0173/ ROBBINS, CHARLES A. /JR./ e. Identify initials from one character names, i.e., by terminating them with a period. Examples: SMITH, J. L. ARMAND Y, LI CHU (one character surname) SANCHEZ R., JUAN f. Allow sufficient space for recording the entire name. Forty (40) positions minimum are recommended. FOR OFFIC TAL -USE' ONLY Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Rel%1 ANNEX 4 DATA ELEMENTS USED IN BIOGRAPHIC SYSTEMS C ti 0 1 NAME x 4- L , X 1 i x " , J. a Title x x x x . - x x x x x x x b Rank or Grade X X X X x x x x x x IN c Alias x x x x x x x x x x x x x d Also known as x x x x x x x x x x x x x e Maiden x x x x x x x x x f Pseudon m x x x x x x x x x g Ming # telecode x x x x x x x x 2 Date of Birth X X X X x x x x x x x x 3 Place of Birth x x x x x x X X X X X x x 4 Nationality x x x x x x 5 Citizenship x X X X X X x x x x 6 Sex x x x x x x x x x x X X 7 Race X x x x 8 File number x x x x x x x x X X X x x 9 Social Sec. No. x x X x x x x 10 Service No. x x x x x x 11 Residence x x x x x x x 12 Employment x x x x x x x x x 13 Occupation x x x x X X x x x 15 Personal Description X x x 16 Spouse Data X X x 17 Militar Service x x ~ x x 18 Language x x F F F 19 Reference x x x 20 Date of Card x x x 21 Document Reference x 22 Document Date X X 23 Record Number x 24 Location of File 25..Year Record Created x x x x 26 Added Ad min Elements . x 27 Localities x x x 28 Phone Caller 29 Letter Writer 30 Added Info X X X 31 T e of Case X 32 Addresses X - X 33 Disposition x x 34 Remarks X X he x (t se elements are entered Approved For Release 20 by each agency as avails e or required) 25X1 25X1 Approved For Rase 2004/01/15: CIA-RDP80B01139A 00300040006-1 . FOR OFFICIAL USE ONLY ANNEX 5 THE NAME GROUPING APPROACH l.' The Name Grouping approach is designed to insure that a search of a name brings together all references to an %ndividual although his name may have been recorded in various spellings and transliterations.. This is accomplished by having linguists (native speakers) examine the name spellings recorded in a ,particular index in order to put names which belong tpgethcr phonetically in a 'group which is then identified by a:'number. jThus. when the index is searched, references recorded ,on any variant of a surname or given name are brought together through the pre-analysis and grouping by the language expert,' 2. The purpose of the technique is to build into a given index system a one-time, professional linguistic analysis of each unique name spelling related to other phonetically identical name spellings on a purely pragmatic basis. That is, name grouping is concerned with the name spellings actually received by an organization, not by rules or theories on how names might have been, or ought to be, spelled. The primary advantage is to avoid a variety of search criteria by various index clerks. 3. Inherent in this technique is the logic for random access storage of biographic records in a computer system. The surnames and given names are used as computer dictionaries (tables) leading to all group index records on a given name variant in one storage area of a random access file. FOR OFFICIAL USE ONLY Approved For Release 2004/01/15: CIA-RDP80BOl139A000300040006-1 Approved oreleani0af01161RLC(01A000300040006-1 - 2 - EXAMPLES OF NAME VARIANTS VARIANT SPELLINGS OCCURING FROM TRANSLITERATION FAR EAST ND L YHM MUHI-AL-DIN MAHJOEDIN MAHAYIDEEN MAHYUDDIN MHIDINE MOHAYUDDIN MOHHDIN MOYIDEEN MOYIDEEN MOHIEDDIN MUHY-AL-DIN MUHYI-UD-DIN plus 25 more = Telecode 0491 LIU = Mandarin LAU = Cantonese YU = Korean RYU = Japanese WAGE = WOEGE, WERGE JANSEN = JAANSEN NONEN = NOONEN IANOZZI = JANOZZI, YANOZZI SNJDER = SNYDER, SNIDER MENSKJ = MENSKY, MENSKIY PETROW PETROV, PETROF FELDMAN = FELDMAN, FELTMAN, FELDTMAN FOR OFFICIAL USE ONLY Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Release 2MA/Q 'jqC F 1 8 EXAMPLES OF SURNAME GROUPS 9,V 39AO00300040006-1 ANNEX 5 009546 IZJERMAN EISERMANN 001712 MATZGER METZGER MEZHER 002914 CHLADEK HLADIC HLADIC HLADIK MAETZCHKER METZKER MEZGER HLADK HLADIK 002194 SCHUKOW CHOUKHOV DIUKOV 008687 ABOURGELI DZHUGOV RUJAYLAH JOUKOFF SCHUCHOW 004739 FOGELER SHUKHOV VOGELER YOUKOV VOGLER YOUKOVA WOEGELER ZHJUKOV ZHUKOV ZHUKOVA EXAMPLES OF GIVEN NAME GROUPS GROUP NAME GROUP NAME Z00007 ABRAHAM BRAHIM ''EBRAHIM IBRAGIM JBRAI-I IM Z00086 EDWARD EDVARD EDOARD EDUARD EDUART EDVART Z00650 STEPHAN SEE ALSO: ED, GROUP STEVAN STEVEN #Z00002 ISTVAN EDW. GROUP ETIENNE ESTABAN #Z00018 STEFAN EDWIN STEFA EDVIN STEVE EDWINS STEVO EDVINE STJEPAN SEE ALSO: ED. GROUP #Z00002 EDW. GROUP FOR OFFICIAL USE ONLY #200018 Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Relese 2004/01/15 : CIA-RDP80B01139A00U0300040006-1 S-E-C-R-E-T ANNEX 6 TERMS OF REFERENCE A. OBJECTIVE To identify means for improving the storage; retrieval and exchange of information from the major name files and related data files in the Intelligence Community. FACT FINDING 1. Identify those significant index and related systems leading to biographic information collections in the government which are routinely consulted by intelligence agencies for their security, counterintelligence or foreign (positive) intelligence content. 2. Establish the following facts concerning each of the above. a. Size: Number of index records (i.e., extracts of information, such as 3 x 5 cards, punched cards, magnetic tape records, disk records, strip records, etc. normally leading to documents and files), type and size of index records, single or multiple reference. b. Emphasis on types of personalities covered: e.g., percentage of foreign vs U. S. citizens, scientists, military political, Communist Party, Maritime, foreign intelligence services, agents, etc. This will include the "name finding" as well as the "name searching" activity. c. Number of names searched daily: Percentage of positive and negative responses, depth of search on name variants. d. Major requesters; proportion of requests from each. e. Methods of communicating requests and responses: Forms, memoranda, teletape, transceiver, data phone; security classification of requests and responses. f. Identifying data in conjunction with name normally included'in index-reference. g. General description of input, maintenance and search processing. h. Current requirements. for submission of requests. S-E-C-R-E-T Approved For Release 2004/01/15': CIA-RDP80B01139A000300040006-1? S-E-C-R-E-T Approved- Release 2004/01/15 : CIA-RDP80B39A000300040006-1 ANNEX 6 i. Classification of the index. C. REVIEW 1. Examine costs, methodology and prospects for biographic systems now undergoing mechanization. 2. Identify basic problems to be faced and areas where policy decisions are required by each agency in planning for mechanization. 3. Identify those areas where format, methodology and equipment compatibility are required or are highly desirable in name searching or finding to obtain optimum speed, quality and economy in automating query and response. D. RECOMMENDATIONS Formulate recommendations for CODIB and USIB approval outlining policy objectives for the Community, with generalized projections of cost, manpower and time required to meet these objectives. Include specific guidelines for agencies to follow in systems planning and development. Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Release 2004/01/15 : CIA-RDP80B0113p00300040006-1 ANNEX 6 TERMS OF REFERENCE A. OBJECTIVE To identify means for improving the storage; retrieval and exchange of information from the major name files and related data files in the Intelligence Community. B. FACT FINDING 1. Identify those significant index and related systems leading to biographic information collections in the government which are routinely consulted by intelligence agencies for their security, counterintelligence or foreign (positive) intelligence content. 2. Establish the following facts concerning each of the above. a. Size: Number of index records (i.e., extracts of information, such as 3 x 5 cards, punched cards, magnetic tape records, disk records, strip records, etc. normally leading to documents and files), type and size of index records, single. or multiple reference. b. Emphasis on types of personalities covered: e.g., percentage of.foreign vs U. S. citizens, scientists, military political, Communist Party, Maritime, foreign intelligence services, agents, etc. This will include the "name finding" as well as the "name searching" activity. c. Number of names searched daily: Percentage of positive and negative responses, depth of search on name variants. d. Major requesters; proportion of requests from each. e. Methods of communicating requests and responses: Forms, memoranda, teletape, transceiver, data phone; security classification of requests and responses. f. Identifying data in conjunction with name normally included in index reference. g. General description of input, maintenance and search processing. h. Current requirements for submission of requests. Approved For Release 2004/01/15--MADc-RE1PSDB01139A000300040006-1 Approved Forelease 2004/01/15 : CIA-RDP80B01133000300040006-1 S-E-C-R-E-T ANNEX 6 i. Classification of the index. C. REVIEW . 1. Examine costs, methodology and prospects for biographic systems now undergoing mechanization. 2. Identify basic. problems to be faced and areas where policy decisionsare required by each agency in planning for mechanization. 3. Identify those areas where format, methodology and equipment compatibility are required or are highly desirable in name searching or finding to obtain optimum speed, quality and economy in automating query and response. D. RECOMMENDATIONS Formulate recommendations for CODIB and USIB approval outlining policy objectives for the Community, with generalized projections of cost, manpower and time required to meet these objectives. Include specific guidelines for agencies to follow in systems planning and development. Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 2004/01/15 CIA-RDP80B01139AQW300040006-1 Approved For ReLe S-E-C-R-E-T ANNEX 7 MEMBERS OF CODIB TASK TEAM V - BIOGRAPHICS 25X1A 25X1A 25X1A CIA Mr. Mr. Mr. DIA Mr. Mr. John L. Keefe STATE Mr. Mitchell Stanley Mr. Halvor Eckern (Alternate) ARMY Mr. Paul Anderson NAVY Mr. Marvin E. Van Dera Mr. William Urick (Alternate) Mr. Earl W. McCoy AIR FORCE Lt. Col. Edmund M. Manning Maj. Russell S. Keen (Alternate) SECRET SERVICE Mr. Frank G. Stoner CSC Mr. Pearley G. Buck CODIB Support Staff Secretary (Alternate) (Alternate) (Alternate) Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1 Approved For Release 2004/01/15 : CIA-RDP80B01139AO000300040006-1 IV, S-E-C-R-E-T ANNEX 7 25X1A 25X1A'' (Alternate) (Alternate) STATE Mr. Mitchell Stanley Mr. Halvor Eckern (Alternate) ARMY Mr. Paul Anderson NAVY Mr. Marvin E. Van Dera Mr. William Urick (Alternate) Alternate) 25X1A AIR FORCE Lt. Col. Edmund M. Manning Maj. Russell S. Keen (Alternate) Mr. John L. Keefe Mr. Earl W. McCoy SECRET SERVICE Mr. Frank G. Stoner Mr. Pearley G. Buck COD Sup-nor- Staff Secretary Approved For Release 2004/01/15 : CIA-RDP80B01139A000300040006-1

Printer-friendly version

Search form

FINAL REPORT: TASK TEAM V (BIOGRAPHICS)