(UNTITLED)
Document Type:
Collection:
Document Number (FOIA) /ESDN (CREST):
CIA-RDP83-00714R000100410001-1
Release Decision:
RIPPUB
Original Classification:
K
Document Page Count:
11
Document Creation Date:
December 20, 2016
Document Release Date:
October 18, 2007
Sequence Number:
1
Case Number:
Publication Date:
May 7, 1976
Content Type:
REPORT
File:
Attachment | Size |
---|---|
![]() | 825.05 KB |
Body:
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
jNSA review
;completed
USAF review(s)
completed.
STAT
STAT
DIA review(s)
completed.
Acting Director for
Intelligence
Room 7E44 Headquarters
Chief, DDI Management Staff
Room 2F28 Headquarters
Here is the final report of the
Department of the Air Force Workin Group
on Machine translation on which
and
f FBIS represented CIA.
Iwas the Intelligence
Community Staff representative.
You might care to glance at the
multiple mentions marked in red of FBIS
and the recent FBIS-sponsored Machine
Translation Seminar.
Th Workinc, oup recommends approx-
imately in RFD funding for
machine txans a ion over the next 6 years.
Other conclusions and recommendations can
be found on pp 9-10.
Director, Foreign Broadcast
Information Service
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
.,,., ,7TTTt, r,nr TTD I \T
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
,,NIACHINE TRANSLATION
PROBLEM: Assess the effectiveness and utility of machine translation of textual
ma feria s for the intelligence process.
EXPLANATION OF TERMS:
- Human Translation (HT): The process by which a human transfers the meaning
from t e nguistic pattern o 'a source language to the linguistic pattern of an object
language.
Machine-Aided Translation (MAT): A computer-based system for the storage
and retrieva of lexical in orrnation, designed to assist and improve the proficiency
of the human translator.
- Machine Translation (MT): A computer process which accomplishes the translation
function without human assistance except, as required for quality improvement,
in a post-edit function. Current NIT systems have lexical, syntactical, and semantic
components which produce translations of a given level of quality. Based on the
amount of human, post-editing required to achieve a level of quality acceptable for
publication as a finished product, ;SIT systems can be characterized as second or
third generation. Second generation systems require 25-30 percent post-editing
whereas a third generation system would require 5-10 percent post-editing. Increased
sophistication of lexical, syntactical, and semantic components and the addition
of techniques to accommodate contextual and pragmatic information are required
in a third generation system to resolve problems of style and ambiguity in journalistic/
literary prose. Finally, the automation of input to the MT process must be included
in the context of an MT system.
Scientific and Technical (S&T) Prose: A style of writing largely constrained
and regularized by a need to present a ogieally coherent discourse. It is, therefore,
amenable to the logical formulations required by second generation NIT systems.
- Journalist i c,'Li terary Prose: A style of writing that lacks precision and conciseness
and is argely unconstrained and informal. It is characterized by colloquialisms,
idioms, proverbial expressions, metaphor, and other literary devices. Syntactic
complexity is exemplified by structural inversions, ellipsis, and grammatically in-
complete utterances. A third generation MT system must be able to solve fundamental
linguistic problems .wihich obscure meaning and comprehension to provide a quality
translation of journalistic/literary materials.
OPR: AF/INYXP (Mai Baldauf/764'0)
7 May 1976
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
FACTI Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
FACTS: AW '"W
- Exploitation of foreign language materials can provide valuable intelligence
regarding capabilities and intentions.
- Current MT systems are useful and cost/effective for translation of SLIT materials.
- Current MT systems are based on a conceptual base that is now 20 years old.
- Current input techniques are a major constraint to cost/effectiveness and timely
MT production.
- Principal emphasis is on the translation of Russian into English.
- A substantial volume of Russian material of potential intelligence value remains
untranslated.
- The quality of human translation varies considerably and human translator pro-
ficiency must be improved.
Manpower and fiscal constraints preclude a substantial increase in human translator
resources in the Department of Defense.
- Automated techniques can improve the quality and efficiency of the human.
translation process.
- Indicative translations and keyword applications have a utility in the intelligence
process.
- Current state-of-the-art of MT cannot produce high quality journalistic/ literary
output without significant post-editing.
ASSUMPTIONS:
- There has been considerable development in computer technology and linguistics
such that a new synthesis is possible which might produce improved translations
on a more cost/effective basis.
- Continued enhancement of currently operational ;1T systems will reach a point
of diminishing returns; the third generation MT system requires an advanced technological
approach.
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
%ve MW
Machine-aided translation techniques and advanced Optical Character Reader
(OCR) technology afford the greatest potential for improving the quality and efficiency
of the translation process in the near term.
Development of a third generation MT system is a long-term evolutionary process
best pursued by support of promising technological approaches.
CRITERIA:
- To have utility to the intelligence process, an MT system must:
- Provide more timely translations.
Offset a lack of human translation expertise.
- Be competitive with human translation.
- Be responsive to the needs of the consumer with regard to completeness
and quality.
BACKGROUND:
1. Efforts to apply automated data processing (ADP) techniques to the problem
of language translation have been under way for over 20 years. The Air Force Systems
Command's Foreign Technology Division (FTD) has developed a large- scale NIT
system for the translation of Russian language scientific and technical literature
which satisfies the needs of FTD's analysts. Other initiatives have been undertaken
in the Intelligence Community which address dictionary development and keyword
selection. All such efforts are aimed at improving the capability to exploit foreign
language materials for intelligence purposes and provide a more responsive method.
for exchange of data with other nations.
2. The requirement to translate an increasing amount of contemporary Soviet military
and socio-political materials has generated renewed interest in NIT. As a result
of initiatives by the Deputy Assistant Secretary of Defense (Resources and Management)
and by the Assistant Chief of Staff, Intelligence, USAF, the Defense Intelligence
Agency (DIA) conducted a preliminary survey of community translation requirements
and needs for MAT and MT.. This si.rvey ~.,;as provided to the Air Forge by DIA. letter,
subject: "Survey of Machine Translation Requirements," 23 Jan 1976.
In addition, the Foreign Broadcast Information Service sponsored a seminar on machine
translation, 8-9 Mar 1976. This seminar was attended by a wide variety
.of exeerts from government, industry, and the academic common na provided
a valuable forum for the exchange of information concerning the current state-of-
the-art and potential of MAT and 11T.
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
3. In the Intelligence Feb Fetl ofrDefense Guidance
Memorandum (PPGM), 13 1976, the Deputy Secretary rise tasked the.
Air Force to: "chair studies which will determine current state of development
of machine translation and its usefulness in the intelligence process." The PPGM
further tasked the Army, Navy, NSA and DIA to participate in this effort and invited
CIA to send representation. If this study determined that automated translation
systems are efficient and economical, such findings together with proposed resource
levels for a five year program should be furnished in a report to the Assistant
Secretary of Defense (Intelligence).
4. In order to accomplish the PPGM tasking, on 19 Mar 1976 the Air Force convened
a meeting of senior representatives from the Services and Agencies concerned to
develop terms of reference for the study. A series of working group meetings were
held in March and April to evaluate the state-of-the-art, refine Service/Agency require-
ments, and develop a program for MAT/MT. The DIA MT Survey and the data
acquired through the FBIS Seminar provided valuable material for consideration
by the_MT Study Group. This report sets forth the findings and proposals of the
Study Group.
CURRENT SITUATION:
1. FTD is the only DOD organization currently' employing a large-scale MT system.
The system is used successfully for translation of S&T materials. Output is provided
in an unedited, partially edited, or fully edited version, depending on the require-
ments of the consumer. The system can provide indicative translations of journalistic/
literary material but has never been optimized for such prose. FTD is also pursuing
initiatives to improve the efficiency of the input and post-edit processes.
2.Other agencies employ human translation and rely principally an the faint
Publications Research Servi+ e ~JPRS) and/or-comme-reial_vendors. The quality
of some translation support is often so inferior that considerable additional editing
is required before publication of the finished product.
a. Two major translation efforts are conducted by the Air Staff (Directorate
of Threat Applications). They are: Monthly Soviet Press Translations and the
Soviet Military Thought series, generally published in book form. These projects
require translations to be in high quality, idiomatic English. The Soviet Military
Thought series is an open-ended project, and each book averages 75,000 words in
length. The current average monthly volume of Soviet press translations is 13,000
words. The Air Staff has identified requirements for translation of 13 additional.
Soviet journals and newspapers, totalling 14,000 pages annually.
b. Because of a present; estimated human translation (HT) capability for the
production per 12-month period of 17.5 million words from Russian, the Army OACSI
is a long way from requiring MT (6.$ million words were the requirement produced
from Russian by HT in FY 75). With an expansion of the currently small reserve-
officer translation program, and the assignment of more projects of relatively less
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
difficulty to the Regular Armv linguist units at Forts Bragg and Hood, an HT pro-
ductiot. Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1 luite
feasible by FY 78. Althoe:,,, 65-70 percent of the Russian mammal translated may
be considered S&T (75-80 percent for all languages), such material usually comes
in written context. With the frequent Intelligence Community and DOD-wide dis-
semination of OACSI translations, camera-ready copy entailing terminological ac-
curacy, readability and graphics is ordinarily imperative. An ever growing roster
of qualified commercial translation sources with subject specialties generally enables
immediate assignment of, and good to excellent turnaround time for, translation
projects.
c. The Translation Services Division of the Naval Intelligence Support Center
translates a total of approximately 5 million words per year of carefully screened
foreign literature. High quality is desired. The language breakdown is as follows:
S&T, 3 million words; Naval:, 1.5 million words; other 750,000 words. Russian ac-
counts for 70 percent of the total volume, with Japanese, German, French and Italian
accounting for most of the remaining 30 percent, in descending order. The Division
regularly exploits 40 high-yield periodicals and newspapers and prepares abstracts
and translation of tables of contents of about 220 books per year, which. in turn
generate requests for translations. Work is farmed out to individual translators
whose product must meet Division quality standards. Individual consultants are
brought in as needed to provide services in languages not represented in staff capabi-
lities. The Translation Services Division operates a Reserve Translation Project
which utilizes the linguistic-skills of 70 Reserve officers and enlisted men in Russian,
German, French, and Spanish. Applicants for the Program must pass a difficult
test before they are accepted for participation. Current Navy MT-related initia-
tives are directed toward the development of specialized lexical aids. Navy require-
ments for socio-political literature is nearly completely satisfied by__the FBIMIP .S
exploitation effort. Navy estimates that about 2 million words per year would be
translated for intelligence exploitation if additional resources were available. That
translating would represent "nice-to-have" material and would not have to meet
Navy's quality standards.
d. DIA supports the MT efforts at FTD within the general area of the DOD
Scientific and Technical Intelligence Information Support Program (STIISP). In ad-
dition, DIA is presently translating, or having translated for it, a total of approxi-
mately one million words per year. If additional translation capability were developed,
DIA estimates that this requirement would increase to 1.9 million words per year.
e. currently produces 235,000 pages (100 million words) annually. Sixty
languages are involved, with Russian accounting for 45 percent of the total work-
load. Approximately one-third of the. total effort involves S&T material; the
remaining two-thirds involves political, military, economic, biographic, and socio-
logical material. FBIS maintains in-house a staff of linquists and draws on a roster
of about translators under contract to JPRS. All translations must be of literary
quality. F-B-IS maintains that adequate human resources are available to satisfy
its requirements. MAT techniques can materially improve the quality and efficiency
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
of humai translation and initiatives are being pursued in the development or lexical 1
aids. J' Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1 cost- effective programs are ir emented.
f. NSA is extremely interested in the processing of natural-language material
in both graphemic and phonemic form. Not all this material need be translated.
However, what must be translated into English must eventually pass through several
layers of quality control. The final output of the translation process, in addition
to appearing with due timeliness, must maintain the semantics of the original material:
it must omit nothing of significance, it must add nothing of significance and, to
the greatest extent possible? it must minimize the distortion unavoidably resulting
from conversion of semantics of the source language to semantics of the the target
language. Currently, NSA is continuing its development of computerized lexical
aids, dictionaries, and keyword search techniques. Recognizing the similarity of .
some of their needs in these areas of MAT, the NSA and CIA contingents to the
MT Working Group have agreed to coordinate their MAT efforts.
g. The Intelligence Community Staff is concerned with the overall problem
of linguistic expertise in the United States. This problem results from the relative
lack of emphasis on foreign language training in the American academic environment
and inadequate professional opportunities and rewards for linguists. The IC Staff
supports initiatives in NIAT to improve the competence and professional status of
linguists and intelligence analysts and in MT to provide responsive translations of
pertinent material for intelligence exploitation.
DISCUSSION:
.1. Information provided by the DIA MT Survey indicates that there is a definite
requirement for additional translation of Soviet material, principally in the journalistic/
literary category of contemporary Soviet military doctrine, concepts and related
subjects. It is not possible to determine how much of this additional requirement
is duplicative, but the shortfall ranges from a minimum of 5 million word annually
to a cumulative total of 12 million words. The actual requirement, therefore, is
between these extremes and is, in any case, substantial.
2. The FTD experience is useful in determining the cost-effectiveness and utility
of MT for high volume translation. At the request of the Assistant Secretary of
the Air Force, Research and Development, the USAF Scientific Advisory Board
(SAB) conducted a study of the FTD Translation System. The SAB reported (5 Jun 1975)
that the output was highly acceptable to FTD analysts. The system was competi-
tive in cost with human translation. The SAB noted that 75 percent of the MT cost
was accounted for by post-editing and recomposition to provide material that is
camera-ready for the printer. Automation of these processes, as planned by FTD,
would lower the cost of a finished MT product below that of human translation.
Subsequent system improvements have enabled FTD to provide analysts with a greater
proportion of low cost unedited or partially edited MT output which satisfies user
requirements.
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
?'------- -------_. rmnt +.rf l~+:r,n r,rnrlijr+tinn in naorP5 of
a. Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1 Hugh
Russian maLerlai keacn ptv ttverttge~ Lau r Iuo/, tv. .,l~ `J14, ~~ A ,.. .
Mar 1976.
Unedited MT
20,991
Partially Edited MT
14,143
Finished MT
951
Manual (HT)
5,013
Total
41,098
b. The following figures represent approximate direct labor and materials cost
per 1,000 words translated at FTD.
Unedited MT
$ 8.63
Partially Edited MT
17.87
Finished MT
32.38
Manual Draft (HT)
27.28
Manual Finished (HT)
36.00
3. The Air Staff has conducted experiments using the FTD MT System for translation
of purely journalistic/literary material. The output provided was indicative of content
but would require excessive post-editing to obtain a literary English language product.
The system's limitations in this area are accented because optimization efforts have
never been directed toward a journalistic/literary capability. Upgrading any cur-
rently operational system is probably not the best approach to achieve an MT
capability for literary quality output. Such systems are implemented on a conceptual
base that is 20 years old. As stated by several participants and commentators at
the FBIS MT Seminar, a fundamentally new approach is probably required to resolve
t fro ems which havecluded current iy7T systems from providing a hz quality
output with minimal post-editing.
-4. MT Working Group participants have expressed the requirements of their respective
agencies for high quality literary translations. Admittedly, material of long term
value intended for wide distribution should be published as a quality product. However,
considerations involving the American-Soviet Copyright Agreement of May 1973
may well require the restriction of a large volume of material for internal government
use only. In addition, there is a considerable amount of material of a transient nature
which is required by intelligence analysts but need not be provided in a high quality
or camera-ready form. A major constraint in using state-of-the-art MT systems
for such timely indicative translations is the requirement for manual input of the
material to be translated. It is currently more practical to have human translators
scan material for content and value to intelligence analysis. Inasmuch as translation
resources are limited within. the Department of Defense and a volume of material
with potential intelligence payoff goes untranslated, automation of the MT input
process is a priority requirement. The development of an OCR system would bring
the computer power of an MT system to bear on the problem in a cost-effective
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
and resvApproved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1 : per-
formance of the present 1 system as well as for any third g-ration system
which may evolve. Although the material on OCR technology presented at the
FBIS MT Seminar was not encouraging, other opinions obtained from Sem roar Com-
mentators indicated that such technology is approaching a stage where it can be
successfully emp oye in an process.
5. A program to attain high volume, timely production of journalistic/literary
Russian translations should have the overall aim of delivering the highest quality
translation necessary and sufficient to satisfy the user's information requirements
at the lowest cost. High quality translations have been the goal in the past without
regard to whether such quality is in fact required in all cases for the user to perform
his task. The user presumably is the expert in the discipline of the document and
does contribute something to the interface between himself and the translation
in comprehending the material. The validity of this concept is demonstrated by
the fact that users at FTD and Oak Ridge National Laboratories use the raw output
of their respective MT systems.
6,. It is agreed that no NIT system will totally replace the human translator. The
goal in developing a third-generation MT system is to provide an output in idiomatic
English that is faithful to the source input in content and meaning, with a minimum
of human editing (5-10 percent). Such a system should provide the option for human
intervention during the translation process as well as in a post-edit mode. The system
should be modular and designed for ease of software maintenance. Finally, it should
be as language-independent as possible to facilitate implementation of MT for languages
other than Russian. Such a system would accommodate the requirement for timely
translation of an increasing volume of pertinent material. By minimizing manpower-
intensive input and editing functions, it would provide quality translations at a cost
somewhere between that for unedited and partially edited NIT at FTD.
7. There are numerous efforts underway which could lead to a third-generation
MT system. Because immediate substantial payoff from investment in this technology
is unlikely and the exact direction that the development effort should take is uncertain,
a cautious and evolutionary approach is required. However, DOD involvement in
such technological development is essential to insure that it is responsive to identified
requirements. In this regard, near-term emphasis should be placed on MAT tech-
niques, which offer more immediate practical benefits and which might provide
.a valuable contribution (e.g.., through dictionary development) to any future MT
system.
8. The ;IT Working Group participants agree that such initiatives in ,IAT methodo-
logies should be pursued. Specifically, the emphasis should be placed on the continued
.development of dictionaries and lexical aids. Standard MAT software should be
developed which would proviide a common format for dictionary entries, provide
on-line and batch processing capabilities for dictionary update and retrieval, and
aids for editing and formatting of translations. Much of the dictionary development
envisioned by Army and Navy will also contribute to improving the capability of
the FTD system. The development-of such NIAT capabilities are well within
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
the currpnr ctntp-cff-the-art and will contribute substantially toward increasing
the prc Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1 he
FBIS MT Seminar was tha- AT techniques could provide an it-ease in productivity
5y a f ctar ofT2-l or 3-l appears, however, that an increase of 60 percent is
a more realistic assessment. in any event, the benefits are substantial.
9. Technological development and implementation have often been characterized
by multiple independent efforts which result in duplicative capabilities and unneces-
sary costs. To avoid this situation in proposed MAT and MT development efforts,
a formal coordinating structure should be established at the USIB level. In the
interim, if so directed by OASD(I), the present Ad Hoc MT Working Group can per-
form this function for the DOD. In view of the difficulty in identifying the most
lucrative approach to a follow-on MT capability, some mechanisim for providing
professional advice on MT development should be established. Considerable expertise
is available at RADC. In addition, a carefully selected advisory body composed
of experts from such relevant disciplines as linguistics, computational linguistics,
computer science, psychology, human factors engineering, and artificial intelligence
would be helpful. The Working Group recognizes the biases that exist in all of these
disciplines and emphasizes the advisory nature of such a body of experts.
,VCONCLUSIONS:
In the intelligence process, translations are principally useful insofar as the
material translated contributes to the analysis of foreign capabilities and intentions.
In this regard, considerations of comprehensiveness and timeliness must be weighed
against requirements for quality that will insure the proper transfer of concept from
one language into another.
- MT has proven cost-effective and responsive to some S&T user requirements.
The current state-of-th-e-art of MT will not support quality production of
journalistic/literary material without excessive post-editing. Lack of automated
input technology precludes its effective use for timely indicative translations.
- Immediate benefits can be obtained from implementation of MAT methodologies
and may also contribute to development of an advanced MT capability.
-- A long-term, cautious and evolutionary development effort might provide a
cost-effective system capable of providing timely, quality translations of needed
materials, some of which is currently untranslated and probably unexploited.
RECOM:ti:ME(DATIONS:
That ASD(I) provide they following funding for implementation of near-term
0 `7
MAT and long-term OCR and journalistic/literary MT capabilities ($ in thousands):
FY77 FY78 FY79 FY80 FY81 FY82
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001 1 -/MT
,.+...., '.,' -.-
TilttL t'),JLlll e . ul.l: Z LVL Mai VvvL utita.~LILb ~aur ...
and designate an ExecutvAv Agent for MAT/MT i.mplementat
That a similar structure be established by the USIB to address and coordinate
overall community translation requirements, including both the improvement of
translator professionalism and the implementation of automated aids for translation.
- That RADC be tasked to further refine overall translation requirements,
assist in development of Service/Agency Statements of Work for MAT/MT support,
and identify the allocation of funding (by appropriation) needed for MAT/MT
development and implementation.
Submitted by the Ad Hoc Machine Translation Working Group.
Air Force
Col W. P. Olsen AFIS/IND
Col N. P. Vaslef AF/INA
Maj R. E. Baldauf AF/INY
Maj L. M. Hansen FTD/NIT
Army
Mr. G. C. Cooney OACSI
Navy
Mr. T. P. Koines NISC
Mr. C. R. Moctezuma NISC
DIA
r'F13IS
I. C. Staff
DIA/DT3
NSA/R51
FBIS/EPS
APPROVED:
WILLIAM P. OLSEN, Colonel, USAF
Chairman
Approved For Release 2007/10/19: CIA-RDP83-00714R000100410001-1