Approved For Release 2001/07/12 : CTA=RDP78-02r727A000200250043-7
CIA AUTOMATIC DATA PROCESSING STAFF
PROJECT ~ 25X1A2g
DOCUMENT/INFORMATION RE'TRIEVV SYSTEM DEVELQ'NLNT TASK
PHASE I OUTLINE REPORT
28 June 1963
Approved For Release 2001/07/12 : CIA-RDP78-04727A0002q
Approved For Release 2001/07/12 : 727A000200250043-7
CIA AUTOMATIC DATA PROCESSING STAFF
Preface
This outline report deals with the document/information retrieval
system development element of Project thinking25X1 A2g
at the end of Phase I of the system development task.
The report covers:
(1) The results of _ fact-finding throughout 25X1A2g
the DD/I;
(2) The conclusion that a major central reference
system is required;
(3)
(4)
25X1A2g
(5)
The initial concept of a new central system;
A suggestion to management that a base docu-
ment indexing system be urged upon the
intelligence community and that this indexing
function be performed once and centrally for
the members of the community;
Theme plan for proceeding with the detailed
development of a new document/information
retrieval system (through Phases II & III);
(6) A set of general observations of particular
interest to management;
(7) Major alternatives open to management; and
(8) ALPS recommendation.
25X1A2g Note: has produced several "depth" papers for its own purposes
which elaborate on the contents of this outline report. These
papers are available in ADPS to persons wishing to peruse them.
Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7
Approved For-Release 2001/07/12: IA-FDP78-04727A000200250043-7
CIA AUTOMATIC DATA PROCESSING STAFF
PROJECT-
DOCUMENT/INFORMATION RETRIEVAL SYSTEM DEVELOPMENT TASK
* * * *
Contents. Page
25X1A2g
I. Docent/Information Retrieval System Developnent Task
A. Four Phases of System Development Task. . . . . . .
1
B. Phase I
1. Fact-Finding . . . . . . . . . . . . . . . . . .
2
- 5
2. Central vs De-Centralized System. . . . . . . .
6
- 7
25X1A2g 3. - System Concept . . . . . . . . . . . . . .
8
- 11
4+. An Intelligence Community Task, Ideally . . . .
12
- 13
5. General Plan for Proceeding with CHIVE System
Task . . . . . . . . . . . . . . . . . . . . . .
14
C. Phase II . . . . . . . . . . . . . . . . . . . . .
15
- 16
D. Phase III . . . . . . . . . . . . . . . . . . . . .
17
E. Phase IV . . . . . . . . . . . . . . . . . . . .
17
II. General Observations . . . . . . . . . . . . . . . . . .
18
- 22
III. Alternatives . . . . . . . . . . . . . . . . . . . . . .
23
- 24
IV. Recommendation . . . . . . . . . . . . . . . . . . . . .
25
Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7
Approved For Release 2001/07/12 : - - 7A000200250043-7
CIA AUTOMATIC DATA PROCESSING STAFF
PROJECT
DOCUMENT/INFORMATION RETRIEVAL SYSTEM DEVELOPMENT TASK
PHASE I OUTLINE REPORT
I. Document/Information Retrieval System Development Task
A. Four Phases of System Development Task:
25X1A2g
Phase I - Fact-Finding and Formulation of the Overall
Concept of the New System
(Sept 62 - June 63)
Phase II - Detailed Systems Design
(July 63 - June 61E )
Phase III - Implementation of Initial Segment
(July 61+ - April 65)
Phase IV - Implementation of Additional Increments
(May 65 - ?)
Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7
fig
zi
Approved For Release 2001/07/1 - >0727A000200250043-7
B. Phase I
1. Fact Finding
a. General
Personnel Conducting the Survey:
4 ADPS
25X1A5a1 4 M 25X1A2g
Scope
All Offices of the DD/I
150 + components studied
Fact-finding reports prepared on each
25X1A2g Major Targets of 'act-Finding
(1) Missions and functions of DD/I components
(2) Information sources used
(3) Internal processing and files (internal
to Branch, etc. visited)
(4) Use and evaluation of external files
(5) Reports produced
(6) Information needs and problems
Survey Completed April 1963
b. Major Factors Bearing on System Development Task
Volume of Document Receipts
Multiplicity of DD/I Missions and Interests
Variety and Depth of Info Required from these
Documents
Variable Time Requirements:
For basic intelligence research
For programmed, shorter-length research
For current intelligence
Approved For Release 2001/07/12 : CIA-Rt '78-04727A000200250043-7
Approved For Release 2001/07/12 7M M - 7A000200250043-7
Trend toward Current Reporting
e. DD/I Information Resources (Present System) Composed of;
Analyst Files (para. d immediately below)
Central Info System (OCR) (para. e)
Dissemination Services (para. f)
Other Internal and External Services (para. g)
d. Analyst Files
The Analyst Files are, in fact, the primary DD/I
info retrieval system in terms of :
Use rate
Response time
Indexing and content to meet analyst
specifications
To check validity of new data and to determine
its effect on what is already known.
To handle immediate, short lead-time ad hoc
queries. Basis for more leisurely research,
also.
Major Strengths
Readily accessible
Contain filtered data (reflects specialist/user
judgment)
Tailored to analysts' needs (topic, sequence,
and index control)
Ability to control subjects (concepts) according
to the specific requirements of the analyst
Major Weaknesses
Data control largely limited to current interests
Not readily manipulated
Approved For Release 2001/07/12: CIA-R5 P78-04727A000200250043-7
Approved For Release 2001/07/12 : Win- tM 78O 7 27A000200250043-7
L _ra: t^d and partial historical depth
Not ideally accessible to other analysts
Organizations, personalities, areas not
.easily controlled
Duplicative processing among DD/I components
File maintenance detracts from analytic time
e. Central System (OCR)
General Role - Back-up to Analyst Files for:
Historical depth
Gaps in analyst file coverage
Routine, long lead time requests
Major Uses
To provide comprehensive recovery for long lead
time, research projects
To provide retrieval of data not controlled in
analyst files
To provide comprehensive storage and retrieval
on organizations, personalities, areas
Major Strengths
Provides historical depth (institutional memory)
Comprehensive topic and area coverage
Multi-access to documents, e.g., date, source,
topic, area, etc.
Backstops intelligence gaps in analyst files
Document repository
rajor Weaknesses
No single point for all-source retrieval
Outputs from multiple points not compatible
Approved For Release 2001/07/12 : CIA-RD$78-04727A000200250043-7
Approved For Release 200 1/07/12 : - 7S U4727A000200250043-7
STATSPEC
Insufficient emphasis given to open literature,
and cables
No sensi::_ve to shi:1'ts in intelligence sources
and prior Lt y .1nterests
Iradecuate geographic coordinate retrieval
Duplicative processing
f. Dissemination Services
Manual system
Minimum of 120 man years/year (rough estimate)
One million unique documents/year
10-15 million multiple copies/year
150-200 components served with specific reading
requirements
General analyst satisfaction
Timely and accurate
Inefficient and costly
g. Other Information Retrieval Services
25X1A5a1
Agriculture, etc.
Published bibliographies and indexes: Monthly Index
of Russian Accessions, Referativnyy Zhurnal, ASTIA
Technical Abstract Bulletin, etc.
Files of other agencies: FTD/AFSC (White Stork),
Dept. of Commerce, NSA, etc. FOIAb3b1
25X1 B4d Map Library, NPIC, , RPB/
FOIAb3b1 ~ RID/DDP, etc.
Analyst chatter
Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7
Approved For Release 2001/07/12 : CtA RDP` 8-04727A000200250043-7
2. Central vs Dc-Centralized system
;'his is a ma 2r ci ,ei ion e-r?ea for both systems design
and management. ~'
/A decision for a (e-centralized system would mean the
up-grading and coordin:z,Gion of the Analyst File complex with
near-total dependence ax)on same and the correlative curtail-
ment of the central sy.;ten: to a very low use, very slow
response, essentially archival role.
/On the other hancc, a? decision for the continuation of
an up-graded central system, in addition to the Analyst File
system, means that heavy expenditures for a central system
will not only continue but undoubtedly increase, that the
effort to devise an improved central system must continue,
and that eventually the resultant advanced system must be
implemented and the cost and commotion of doing so accepted.7
a. De-Centralized System (Analyst Files)
Provides primary support to intelligence production
Proven in practice
Reflects user needs and judgments
For majority of uses, is preferred by analysts.
(Will always exist to some degree.)
Integrated sources (within clearance-level of analyst)
"Personalized" files
Difficult for others to use
Lack continuity and consistency
Difficult to manipulate
Coverage of all orgs., persons, and areas, etc. not
feasible
Number and size would increase without central system
-6-
Approved For Release 2001/07/12 A000200250043-7
Approved For Release 2001/07/12 : - P-78-04727A000200250043-7
b. Centralized System
25X1A2g _ concludes a central system is long-run "must"
for systems tic cioc:/int'o control
If improves., ` ould.:
Have higher use rate... thereby increasing the
return or: expenditures; and
Me Lnroads into present Analyst riles...
thereby helping to offset costs
If accepted as a base index system for the Intelligence
Community (see para. IBl below), the 1111111 system 25X1A2g
would undoubtedly pay for itself several times over.
Approved For Release 2001/07/12: CIA-RDP78-04727A000200250043-7
Approved For Release 2001/07/12. 27A000200250043-7
25X1A2g 3. System Concept.
a. Very simple to Lay:
Central, ir.te:rated, machine-supported system to
provide docuie.ii and information retrieval for the
total DL/l. document flow.
All geographic areas
Al topics (persons, places, things, organi-
zations, subjects)
Depth ina.ex ng
Direct entry to files (input or querying)
Lingle-processing of input
single-point retrieval
Approved For Release 2001/07/1 WePd78o-f~gA000200250043-7
is TnPPhinP-;kAsIrt.P_(3 in-nut (Ine __ auto indexing)
Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7
indexing
urn.;:1 dissemination
c ax-dom access capability
:i-iai machine translat ion/Stenowriter
ttG.:as ility
'xpc irr.ental rE:mote inquiry or display
ntersed E to (1966-1967)
C r,ye hardware complex/some advanced hardware
1 indexing of hard copy
Some automatic indexing of rjmachine language
sources
Some character recognition (ex-Derimental)
Limited remote interrogation ana display
Some automatic dissemination
Volume machine translation
Target System (1968 - ?)
Very large and advanced hardware complex,
including extensive random access capability
Automatic indexing for major portions of
:,ase recovery system (incl. character
recognition )
~~uman indexing for special info retrieval
projects
Remote interrogation and display
Automatic dissemination
Volume machine translation (improved quality)
(1) Document storage and retrieval
(a) Persons, organizations/installations, and 9eo_-
ra Ic locations to be stressed
Approved For Release 2001/07/12 : CIA-RDP78-04727AO00200250043-7
-9-
Approved For Release 2001/07/12 A000200250043-7
:'veils of most universal interest to
Ana y sts
s;7ea:.c st links in Analyst Files
cror:est elements of present Central
- >y s ;em
aloLir:_e beyond proper handling via Analyst
(b) Comraociit- * s r;.nd Subjects to be covered with
less : ml]bLsis
dot priority need
,irn..Lec. use in central system
-.rya.= yst Files handle concepts (Subjects)
Jet E.er
25X1A2g
25X1 B4d
(2) Informations Storage, Manipulation, and Retrieval
(a) Correlative to Document Index System via:
index display
Synthesis and summarization of index entries
(b) Special Projects (Language Processing), such as:
3trategic Facilities Project
Project
(c) Major Automated Information System
;Subject: Targets
cScope : World-wide
Inputs : Machine language files external
to _ 25X1A2g
-index data (selected) 25X1A2g
Special inputs designed for this
system (For elaboration, see -l A2g
paper, same subject, dated 2 May 63)
Approved For Release 2001/07/12.: CIA-RDP78-04727A000200250043-7
-10-
Approved For Release 2001/07/12: I - A000200250043-7
(d) Computatix. ncrical Processing), such as:
25X1 B4b
(3) Non-liter .~ .a a Processing, such as:
25X1 B5e
(1) Machine Trans1 ..ion/Stenowriter
(5) Publication Sup?ort
(Use of comT?uter for composing, tyne settg;, etc.)
25X1A2g d. troubled by size of task
(1) Complexity of system design
(2)
Balanced nundling of such variety and volume
Accompliaa objectives without undesirable
consequences
Hardware/software limitations
(3) Costs - personnel and budgetary
e. Full solution will require:
(1)
Development of new techniques
Index, dissemination, abstract, display, input/
output, etc.
(2) Development of new hardware
Memory, input/output, character readers, etc.
(3) Money and people
Major investments during developmental years.
Savings in long run?
-11-
Approved For Release 2001/07/1 A000200250043-7
Approved For Release 2001/07/12 : CIA77TOM - 000200250043-7
ii . Ideally, an Intellif;ence Community Task
a. Ideal approach VOLn c._ . &r - task s: could be done centrally
for the Intell._;enc?`;. ? zuaity
(1) Community Mon
(a) Eesi ani U:velop centrally a bass- doc/info
c ry cet. for u::e by co rrnuni y mea:tep_
(b) Index cen:aily all does collected/or_ Anated
(c)
by In ell:i, nnce Comm nit y
:ome decen_tralizea input but cc 1i:1 orL.~r.:[
no base system
lame special-purpose, limited-interest
Cate;orles excepted
Provide We retrieval index, or suitable
portions, to community members
(d) Output servicing to be performed by individual
members for its local users
Ease system - provided by central organization
Epeclal files, as required - built and
warvLned by individual members
Some output servicing provided by central
organization
(e) Initially: aoc/info indexing and retrieval
(f) Eventually: translation, requirements control,
etc.
(2) Executive Agent - CIA (or Intelligence Processing
Center under USIB)
C"':n has most experience in large-scale, document
systems
CIA has best/largest personnel base
C_iA already started towards such a system via
25X1A2g
Approved For Release 2001/07/12: CIA= BP78-04727A000200"250043-7
Approved For Release 2001/07/12 : i - - 727A000200250043-7
C.i ;gust -~ Lo :io anyway for its own needs
0 ..: r rnanag ment to take
;interest to CIA,
Luageu shou:t_c~ respond with real
1c F-sm to such an idea, and
J 1u;,:_ -g one system ins ectin Limned to real worla
c. Fund and shape :_xternal R&D of hardware softwa if
commercial duve_io :1nt of same is not adeoua~e... (4--10-3 - ?)
Mast lave new caT:a'bili ties to accomz oda -- grc,__
o,. sys V eia
iequiremen s r ll be clarified during system :sign
d. Implement initial segm-ant of new system...(July 6L. - April 65)
e. ,, d coverage of new system...(May 65 - ?)
Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7
Ti ry D r-n m
Approved For Release 2001/07/12. _ 7A000200250043-7
25X1A2g
C. Phase II - Datail.ea. (July 63 - June 61 )
1. Perscrncl:
a. ADPS - ccntinuiiLg lraa r~:ase l
b. _Con_tracto: (:ii ) - cor ti: uin , ~' ^o ._ 1 C~es,se Z
c. OCR -
: Z`~ L ~:. _ ec~t let c rel ddl~ -lev el t~ ror;. CC' to
irork Pull -sine on -Phase II. Tais e . wo ld:
itccei?.fe training in EDP
,izg
Integrated output
Postures the central sycten to grow with EDP (where future
machine-support capabiliti.os lie)
Eventual automation of some functions now done manually
C. Functions of OCR Affected/ot Affected by-System
1. Affected:
indexing and retrieval
Machine support
Dissemination
Document storage and retrieval
Photo storage and retrieval
25X1A2g
Approved For Release 2001/07/12 : CIA-RJBP78-04727A000200250043-7
Approved For Release 2001/07/12 Iii-RDP7 e4727A000200250043-7
rxtractin/stzactir,; services
Publications prccurm:.er:t accounting and control
2. Not Affected:
Book caiaioc; n anc . rnelving
Publications a,.c r.,_.) --ocuu:^e icnt
Library reference a. a,.- circulation sc: Pic. s (non-document)
Distribution seavic~-:~, i.e., tare mailroc;_~ functions
Motion picture T,rese-Zuations
Liaison Staff
Historical lrte .lig;e:.ic Collection
D. Organizational Effet ;;s c,n CC
Interim - New cys .:m will slowly absorb people and functions
?oni,titute new element; traditional elements
ccatinue
Eventual - Present OCR Divisions will largely disappear
input Di isio.:rs within- will Y "F, sized by25X1 A2g
GccE;raph c Re ~i +n
Service Jivisiorn
Systems :sevelopaent Division
- Prot;rax,=.ng Division
- Computer Operations Division
25X1A2g - New non_ Division (s) for non runctions25X1A2g
E. Schedule of Effects on OCR
Phase I - Fact-Finding and Systems Concept... (Sept 62 - June 63)
lec hone
Phase II - Detailed Systems Design... (July 63 - June 64)
Lifec-: Done, except OCR System Trainees join
with 25X1A2g
Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7
-L9-
Approved For Release 2001/07/1?mM - 727A000200250043-7
- Am
Phase III - Initial ;ma-.c ,-ntation... (July 64 - April 65)
r;t.< index, reference, r.ncl punch
el phase over co new system
-c- on of ola to new files
i. c sec ; u i.ed)
old system
iced by E.,24
Phase IV - ,tc-)a-_lsion of (i ay 6j - '<
L .'ect: ':_lc: maintenance/index/reference
from,,. IR/ DR/GR/DD/Ly
.ellofax) prase into new system
. , cn personnel in MD phase over
?' c conversion accomplishe,l (limited)
;,iec;:: erred portions of old system
coftinu- operations
F. Single Service Point Idea
Implementation of initial seLpent of adds one more-- 25X1A2g
-unless OCR develops now a single service point to tap for the
consumer all perzincnt C01i resources.
25X1A2g
25X1A2g
G. organization of OCR by Geograpnic Region Prior to Implementation
of
organization of T'CR by Region before M implementation 25X1A2g
would foster deaelo-se ent o single OCR service point, would
lead to- i eret.ients oy !Region as well as source, and
would facilitate successive expansions ofd 25X1A2g
11. State-of-the-Art Implications
Conventional human indexing pushed to limit
EAM support pushed to limit
EDP offers hope through. new capabilities
Even with 1'DP, R&D in hard :?rare and software a "must" to
expand capabi..ties to meet expanded-requirements in 25X1A2g
Phase IV.
-20-
Approved For Release 2001/07/12 : CIA-RDP78-04727A000200250043-7
Approved For Release 2001/07 - 27A000200250043-7
Machine indoxi:i infcrio: to human inuexing today
But , offers sooeec , c .s:Mstency, and eventually perhaps
colnparahle quality
Total document re ric val ..; r;: cem for DO/ I appears not feasible
with today's egkip:i:r t
Eventual DD/i syc' *J..1 w b ~;aL?d :n next 3-5 years of
implementation exoerl.enc.: a:--u on ~ L) ',al industry
I. Budgetary Implications
Development and implement i.on costs will be heavy
Hardware tevelopr.ent (Government R&D support may be
required)
Systems/'Tec1rnicuc:s Lt ra_opment (Government support almost
certainly re.quir(ed)
Parallel Systems Oncration
Conversion
Eventual system more ecctomycal per item of data controlled
J. Manpower Implications
By single input hand::.irE:t of documents, hope to gain manpower
to pen.:it :
Deeper indexing
Broader cover-age
Greater effort oa out .:
K. Conversion Implications
It is desirable to convert pr.,sent OCR machine files, if
feasible. EP data nay rot be compatible with EDP files,
however
--A stuc.y question for Phase II
L. Security Implicat:~.ons
"All-Source" cldtrance for all personnel operating the CHIVE
system
25X1A2g I
Approved For Release 2001/07/12 : CiA--RDP78-04727AO00200250043-7
Approved For Release 2001/07/ 000200250043-7
25X1A2g :_a v security classif .cation code,
noweve r )
Approved For Release 2001/07/12 CIA-RDP78-04727A000200250043-7