LINGUISTICS AND THE CONTEMPORARY STATE OF MACHINE TRANSLATION IN THE USSR BY R. G. KOTOV
Document Type:
Collection:
Document Number (FOIA) /ESDN (CREST):
CIA-RDP83M00171R001800120015-9
Release Decision:
RIPPUB
Original Classification:
K
Document Page Count:
31
Document Creation Date:
January 4, 2017
Document Release Date:
December 14, 2001
Sequence Number:
15
Case Number:
Publication Date:
October 31, 1976
Content Type:
REPORT
File:
Attachment | Size |
---|---|
CIA-RDP83M00171R001800120015-9.pdf | 1.93 MB |
Body:
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Vaprosy Yazykoznani j 3 pp 37-49, 1976
INGUiSTIC;S AND THE CONTEMPORARY ST TE OF MACHINE TRANSLATION IN THE USSR
oy R. G . Kotov -
During recent years Machine Translation received considerable public
attention in connection with anal 'tnfi solutions for further development
of the State system for scientific-technical information (Gosuderstvenna,ja
sistema nauchno-tekhnichesk:oj informatiil). MT is treated as one of the
components of this system. Within this framework MT is considered not n s
an exciting theoretical field, but rather as a practical tool for obtaining
large amounts of translation of scientific texts of "rough" quality for
purposes of information retrieval services.
During this period there began to appear reports in foreign countries
describing successful and economically profitable application of the com-
puters for massive commercial translations. However, this changing view-
point was not noted in our country since the established opinion in the
USSR was that MT is a task for the future, and that it will take a long
time to_ work out the theoretical fundamentals of the theory of translation.
There was another fact which escaped the attention of the linguistic
community in the USSR. Toward the end of 1973 and the beginning of 1974
a special temporary committee on science and technology was organized under
the auspices of the State Committee of the Council of Ministries of the
USSR (Gosudarstvennyj Komitet Sovota Ministrov SSR po nauke I tekhnike).
This committee, composed of representatives of various organizations
interested in practical MT, specialists on automatization of informational
processes, specialists on MT took as its task determining under what con-
ditions MT could be developed as a practical system at-the present time.
This should be a working, expedient MT system.
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
4r
Under the "practical (working) MT system" is understood a system of
dictionaries automatically used, equipped with necessary linguistic in-
formation and pro-rams in order to produce massive "rough" quality of
scientific technical texts. The editing of this MT translation should
not take more effort than the editing of the usual translation. The
"rough" quality means that the translated text is understood by the user
in terms of clearly presented meaning; the meaning corresponds to the
meaning of the source, and, therefore this kind of translated text could
be ~sed as a source for information.
of the state of affairs in the MT field, both in the USSR and abroad, are
the' following :
"T] a level of achievements both in theories and experiments on MT makes
it feasible to raise the question of moving toward the practical realiza-
tion of MT in the USSR."
"The economic significance of practical MT could be evaluated on the basis
of the following assumptions/assertions:
--MT processing of the text is approximately 5 times cheaper than human
translation...;
--The time-consumption required by input of the text Into the computer
could be compared with human retyping of the text, translated by humans
--In terms of speed MT (including the post-editing) could be achieved at
least ten times quicker;
"The work for creating practical MT and its deployment should be carried
out already at present time, without demanding the preliminary conditions
for solving all the theoretical problems aimed at producing to rnslation
of a higher quality."
` The conclusions this committee arrived at after a detailed analysis
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
3.
The Coittee has also stated that the absence of practical MT
works became a hampering factor for further development of retrieval
research in general, not just that of MT in particular.
The basic conclusion is that the USSR has no practical MT system
despite the existLng achievements in theory and practice and the real
opportunities, while in foreign countries MT has entered an era of com-
mercial application by both state and private organizations.
What are the causes of this state of affairs in the MT field in the
U5S ? A brief history of MT development within the USSR should be pre-
sented. Certain facts in its development might help in understanding
the' peculiarity of MT development in our country.
THE FIRST STAGE (1954_1958)
In the Institute of precise mechanics and computing technology
(Inst.itut tochnoj mekhaniki i vychislitel'noj tekhniki) of the Academy
of Sciences of the USSR under the guidance of D. Yu. Panov and I. S. Mukhin
and other members of their group (which included L. N. Korolev, S. N.
Razumoski j, and tho linguist I. K. Bel'ska ja ). The first Soviet experi-
mental MT translation from English Into Russian was made on the EVM BESM
computer In December of 1955. Then, in 1955, the group under the guidance
of A. A. Ljapunov and his assistant 0. S. Kulagina carried out experiments
from French into Russian using the EVM "STRELA" (the linguistic work was
done by I. A. Mel'chuk and T. N. Moloshnaja). The results of these two
groups were reported by the senior researchers of both groups in co-
authored papersi
I. D. Yu. P=oov, -A..? Ljapunov, I. S. Mukhin , Avtomatizatsiia Perevoda
s 0dnoo Yazyka na Dru. o', M., 1956.
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
4.
Without co opting on the quc+ss' o s connected with cadf.ng techniques
on the co?r_outer s . differences in usin ? the l in aistic information and do-
signing the transfer algorithms could be reduced to the differences
L> LM
between "empirical"(I. a. Belskaja) and "analytical" (I. A. Biel?chuk,
T. N. IMloloshhana ja) ways of solving tho same problem. However, one has to
note not so much the differences as the fact that the leaders of both
groups recognized the existence indlju. tification of different attitudes
for solv.ng new and complex MT problems.
In that very paper Panov warned about the danger of being carried away
by logical, analysis of the language structure as a tool for solving the
1T problem. This logical way ssemed to be attractive especially if one
would follow the direction of certain MT work in the USA, particularly
from the mathematical point of view since it made it possible to formulate
the MT problem as essentially a mathematical problem. However, "the very
nature of the translation is such that one can not completely ignore the
individual features of the input text. Evidently, we encounter here a
problem which requires special analytical methods, similar to those ex-
perimental methods which are used in studying natural phenomena."2
The group of D. Yu. Panov has also formulated the basic principles of
designing a MT algorithm, some of which are valid at the present time,too:
-the maximal separation of the dictionary from the programs;
--storing In the dictionary of the inherent grammatical features of words;
determining the meaning of the polysemic words on the basis of the con-
textual environment, their grammatical features and the analysis of the
grammatical structure of the sentence, and other factors.
2. Ibid., p 15.
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 5
b' ;an ei rg tng a S "a :i r S ' .t lus ia.st ~ In our country who on
whole did not have o p,rtun1ty to test their algorithms on a
computer and were en7?a~7ed essen4,14.all In theoretical work.
THE SECOND STAGE (1958-1961)
The majority of participants in the first MT conference in the USSR
in 1958 despite the wa.rnin s of Pa nov as mentioned above, found the direc-
tion of Belskaja too "empirical." The directions for research suggesting
following some models in formal terms and using the intermediary language
in order to recreate the logical structure of a natural language were rec-
ognized as more attractive and more promising. The discissLons on the
conference indicated not only the differences in opinions concerning the
ways of going about solving MT problems, but also the inclination of some
researchers to consider the direction of their research as the only correct
one. The stage of suppression of "empiricists" and the start of fruitless
searches for "universal solutions" of MT tasks was completed at the ALL
UNION CONFERENCE ON INFORMATION PROCESSING, MACHINE TRANSLATION, AND AUTO-
MATIC READING OF TESTS IN 1961 (Vsesojuznaja konferencija po obrabotke
informatsii, mashinnomu perevo.du I avtomaticheskomy chteniju teksta v
1961 g.). At the final session it was announced that according to pre-
vailing opinion MT as a practical problem should be removed from the
agenda and all efforts should be devoted to working out the theoretical
basis of translation.
The group of Panov, Mukhin and Belskaja ceased to exist at that time.
The work of other groups came slowly to an end.
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
? 6
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
THE THIRD SAC (1961-19-i-)
portod i ar ~v. Lr. he Pr'.. ..vailir:, development of theo-
rotical studies of l?n-ua o,.t .ado of any coz ection with the sp ctf1r
i
task of designing practical working 1'2T systems.
It is necessary to note that during the period of the late '50s and
early '60s "ree'.alu:tion of values" in the field of MT has also taken
place in foreign countries. The'works carried out made it possible to
come to certain important conclusions.
1. IIt turned out that the existing gram-.cars and the experience
~ in formal-
ion of linguistic data were inadequate. There was no formal apparatus
for describing morphology, syntax and semantics to the degree that they
could be used in designing MT algorithms. This, in particular, served as
timulus for development in various directions of structural and mathe-
matical linguistics.
2. It became evident that one should test on the computer not only al-
gorithms, but also theoretical constructs in linguistics, without which
one can not evaluate their applied importance for liT.
3. Designing an MT system even in its simplest variety should not be
considered as temporary work, but rather as a consuming long term task,
whose success can be guaranteed only by simultaneous efforts of linguists,
programmers and computer engineers.
The conclusion concerning insufficient access to computers and the
resulting discussion that the special purpose computers should be built
in the immediate f'iture lost its relevance since the new computers with
large memories were built, coupled with high speeds and mathematical
operating systems,
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
7.
Ho acv e:', ;^r t imcortant war, the awareness of the fact that language
studies c nd the scare: . or f. rma i i:Ing ?a guage structures are necessaij
.:er ?ocedures for treating the
information by computers as well as in the interest of the development
of the t.'2eor, of language. ;..owever; while the language studies in
terms are both defined and delimited by their applicability for computer
testing, no such polar limits are assumed for the formal model studies
concerning language theory or cybernetic problems in general.
Thus, there developed two trends in,linguist-ic studies, applied
and .information retrieval which differ from each other in their goals,
tasks, depth and time periods needed for achieving the stated goals.
Accordingly, one has to differentaite between MT as a scientific
technical problem for designing a working MT system tested and used on
a computer as a source of information, and MT (if that term should be
used at all) for the various retrieval researches in which language
studies are used for solving various processes in information treatment.
In connection with this division, one has to evaluate various linguistic
studies. MT as a universal scientific problem is a logical intersection
of various, sciences interested in aspects of language such as general
linguistics, mathematical logic, semiotics, psychology, a series of
cybernetic sciences, etc. Within this framework, the area of linguistic
investigations keeps enlarging toward the more fundamental description
of language disregarding its connection with the tasks appropriate for MT.
The studies and research aimed at designing MT systems as practical
and working systems found themselves In quite a different situation. These
efforts were not supported and as a result there is not a single working
MT system in our country.
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
also U nzc..,zc i~ : is ? .,._5wP
technical pr :ler .... It U'21 1': crcr:l 5: --n :.fic problem. As a result
of the consequences of the "roils-tools" approach there was a disbalanced
Corr elattc n e0t;,7Len he levels of ro c arch serving the solution of M2 as
a scientific technical problem versus retrieval research disregarding any
applied testing for IT .3 the applied studies turned into retrieval studies,
and having lost the connection with the original goal, naturally they were
not able to secure or provide the solution for working MT systems.
Starting with '60s the development of MT in our country and the West
went on different roads. One could completely agree here with the evalua-
tion expressed by MT specialists.'
In the 'nest, despite the discussion of "crisis" in MT, attempts to
solve MT problems by "brute force," by using the great dictionaries and
relatively simple algorithms were not stopped.5 At the same time theoret-
ical studies were also conducted concerning formalization of language
structures in a mode of close connection with the computers. Thus, for
example, the principles of syntactic analysis of sentencessnd generation
of "microsentences" were programmed and experimentally tested.6
3. G. Pospelov. Ob7ekt upravlenija_nauka., "NAUKA I ZHIZN," 1975, 11.
4. V. N. Gerasimov, Yu. N. Marchuk, SOVR MENNOE SOSTAJONAIE MASHINNOGO
PEREEVODA, collection "MASIUNNYJ PEREVOD I AVTOIMATIZATSIJA INFmIATSI-
ONNYKH PROCEISOV," M., 1975.
5. J. M. Daniel, Translation by computer, "Electronics Weekly," 304,
1966, 7.
6. B. T. Carmody, P. E. Jonez, trs matic derivation of microsentences,
"Communications of the ACM," 9, 6, 1966
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
f viral r:odol of on the I'asis of Chomsk.y's concepts?,
cjpC , i o n s of t' he sz,. '.. ct c s--ructu.. ea L langua,,.~,
and of . t 1cal const.:acts in lin uis a' I
os,
This provided the opportunity to evaluate the results of theory from
the point of view of practical significance for solving APT tasks, to select
the level of realistically necessary details and formalization of linguis-
tic descriptions, to correctly modify the direction of further investiga-
tions.
One should not maintain the idea that the coupling of these studies
with specific MT tasks, narrowing, formal linguistic studies to their use-
fulness for MT, their feasibility for testing on the computer, hampered
or suppressed the creative thinking of researchers and led to a blind
alley, as is asserted by some purely theoretically oriented researchers.
Experience showed the opposite. Combinability of the theory with the
solution of practical problems and experimental testing of the theory led.
to the creation of the series of working systems, "unscientific" as their
principles nay be, with various degrees of automatization of the trans-
lation process. Thus, for example, there are in existence large automatic
dictionaries, the use of which secures a higher quality and quicker human
translation9, systems of translation for information data,10 systems of
MT producing "rough" translations of arbitrary texts, and with the addi-
tional editing--translations of high quality which is quicker and cheaper
than by human hands.ll
B. J. Friedman, A computer system for transformational grammar,
"Communicat }.ons of the ACM," 12, 6, 1966.
8. W. A. Woods, Transition network grammars for natural language analysis,
Co:munications of the ACM," 13, 5, 1970.
H. J. 5chock, Zusammenarbeit Hensch/Maschtne beim Umgang mit electronisch
gospel Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 9.
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
>e cL ~.__ r~M,an ern .R ._7
create:., a real; ,Da,~z 1.7, for
x ~: r repective proble,
These studies have demonstrated the role of lincuist.io invest I-at Ions
also for the ego 1e1z _ n w _ Alec rorlc co^ou+ ;&-rio1oy 4n par-
titular, they have made it possible to reformulate the new deriiands and
conditions regarding the design of the cor uters of future generations
including considerations of specific features characteristic of human
handling of the information data.12
In our own country, the development of 17, after 1961 took another
road. A new trend in linguistic investigations was formed and rooted
which considered NT as a general scientific problem, but retained the old
applied title "automatic translation." Within the framework of this trend
the scientific-technical problem of MT was pictured as one of the many
specific problems the solution of which was possible after completing
the whole complex of theoretical studies in linguistics. Retaining of
the title "automatic translation" assisted this trend during its forma-
tive stage since it created an illusion that the research would continue
for purposes of solving the scientific-technical task of MT, while in
reality this new trend set as its purpose quite different tasks far re-
moved from MT problems.
The fundamentals of this theoretical direction are most completely
exemplified in the preface to the book "AUTOMATIC TRANSLATION 1949_1963."13
12. D. G. Hays, Linguistics and the future of computations,"AFIpS Conference
- Proceedings," New York, :x373.
13. T, A. Mol'chuk, Preface to the book: I. A. MEL'CHUK, R. D. RAVICH,
AVTO~1ATICHFKIJ PERLVOD 1949-63. A critical bibliographical manual, M.
1967. (Indications for pages are given in the text.)
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 11.
La it, in partic:d._ti,, is said that the Investigations in ' have
.e-- a
is th-o . e
. atifln of a n e-;
5.. ,.. 3nwuAg~ ~. f
(automatic translation) is a specific job within a. more g ~r l scien-
tific goal.--to teach ho co, uters to learn human languages"(p.7).
Thus, inste? i of s ;? t: a specific proble:a, a general grc: with a
global perspective is postulated. From this it follows that "the
description of trends according to which the text is connected with the
meaning is a contral problem of linguistics--one that _is theoretical
and descriptive" (p.8). Within such wide phrasing of linguistic goals
there is no natural border line between purely linguistic work and work
concerning 11T...Any sufficiently rigurous linguistic study or work that
contains material adequately processed has a direct or at least Indirect
relevance for MT" (P. 9).
(It should be noted as an objective criterion for "strictness" and
appropriateness of linguistic work for MT is the fact of its being in-
cluded in the bibliographic list "AUTOMATIC ANALYSIS OF TEXT AND AUTOMATIC
TRANSLATION" of RF (Referativnyj Zhurnal) "INFORMATIKA," whose editor is
I. A. Mel'chuk himself.)i4
Furthermore, all the questions of algorithmization of procedures,
using the results of linguistic investigations for applied goals, and
the experimental testing of linguistic algorithms are declared as not
14. I. A. MIel'chuk, OPUT TEORII LINGVISTICHESKIKH MMODELEJ "4SMYSL(-- )
TEXT,'t M., 1974+ (Indications of pages are given in the text of this
article.) English title: Experience In modeling the theory
"MEANING(--)TEXT".
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 12.
lei ? is ::'
unneeess ry ?o r linguists to d,:),: x.ll =_n .y : a ?hematt ci_..^,s wo e:,:-
pact from the in uistic$ c 1 tract ;;.:;; of lare _yk ) Gorri;i Y ~ _ wt ,c y ; (G )
prose. r nt~3 a ;d ar of 4 Uransfo ire t4ir s;, ~ ict.lc cons zruct.,_s from,
Russ-Ism into Go= an ::or tran,1atic;
pairs of languages, and also for intralangua e trarsformat f. s
ident ficat 4. 9
.on/iscial;zon Of synno2kvmic constructions. Representation
of syntactic structures in tho given system makes it feasible to solve
such complex problems as finding the antecedents of pronouns (anaphoric)
(C..Klimonov, GDR). G. S. TsejtLn presented new models for analysis,
using the preference of linguistic constructs. These types of models,
and also the models with limited nonprojectivity were tested experimen-
tally (G. S. Tsejtin, B. M. Lejkina).
The collective paper of scientists from Czechoslovakia described
the basic components of the functional generative model. Semantic re-
presentations (formulae) are generated based on the syntax of dependences
(E. Beneshova and others). An attempt was made to make a choice of gram-
matical analysis of text on theoretical grounds.
The general principles set up by Hjemslev toward linguistic analysis
are acceptable for the MT as well as the principles of descriptive lin-
guistics. However, the concrete text analysis is based on its own laws,
which are best described by distributional-statistical methods supported
by probabilistic evaluations. The main principle of this method consists
in the fact that the main postulates are not given a priori, rather they
are arrived at as a result of attempts at formalization of language
(A. A. Koverin).
Many papers were devoted to specific syntactic problems; for example,
interpretation of comparative relation in the grammar of syntactic analysis
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
of Sslc2T t? s ( i. . .i cm's of S wS
the algorithm of ar_aly of t . continuous type (E. E.
analysis of pre; os ~ t icn 4 `, . ?: i 7a, T. r. i canorova, N. i eont' eva ),
rules for designing prcdi^ ..e relations (H. S. Fersh;?:ova, and others),
The automatic r cr' hole.,: : c 1 analysis of the written Polish language was
treated in an interesting paper by A. Lukashevicha (Polish People's
Republic). M. P. Muravitskaja delivered a report on automatic morphemic
analysis of verb forms.
It could be said that in the semantic analysis the, experimental
method took root. Thus, the concepts of the semantic connectedness of
the text are verified by algorithms for segmentation of texts into para-
graphs and connected fragments (T. N. Rylova, L. V. Orlova, R. A. Kovalevich
and others, T. V. Dolgaleva, G. S. Osipov and others). Some presentations
deals with the dictionaries containing semantic information, and also
he semantics of specific words and word combinations in natural language
.(M. I. Otkupshchikova, G. M. Il'in and others, 0. A. Shteronova).
The research in the field of formalization of semantics provides
the output of immediate interest for the information analysis and re-
trieval (automatic indexing and annotation, creation of dictionaries
thesauri). Due to this the paper of V. A. Moskovo'o and Yu. S. Martemjanov
dealing with a generative model was very intesting; the papers of communi-
cative organization of text and its reflection in the semantic structures
(E, I. Korolev, A. M. Shaljapina) and Linguistic Justification of the
System "Question" "Answer"(Kor:"ad P., GDR). The quantitative evaluation of
the quality of translation (N. A. Kuzemskaja, E. F. Skorokhod'ko).
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9
31.
In the se' -t (nl :':3~.. C?._ 7,
4 Service' pzapars were
presenter' d X 12 ._t:. the
the machine and man within the Process of inter- and post-editing in the
111 systems (3. OO 1 3ley. a preach io:tiard mathe-
matical and (
~. - for 1T D. N
::. Skitnesvxij) , and d i-n
of general and specific programs (N. A. Krupko and others, N. G. Arsent'eva,
R. S. Karetnikov, L. N. Beljaeva, S. A. Anan'evskij and others, N. A.
Balandinav, S. Krisevich and others,-L. F. Lukjanenkov and others). The
problems of automatic recognition and synthesis of hierogliphs were also
considered (S. M. Shevenko). The results of various programs were demon-
strated, representing linguistic algoritnms.
In the sessions of this seminar 205 persons participated, approxi-
mately 80 theses of papers were.-sent to the organization committee, and
63 papers and reports were presented in sections and at plenary sessions.
Scientists and collectives from five countries and 15 cities of the USSR
participated.
This shows the interest in MT and the trend of practically all
Soviet collectives and scientists studying MT and also the specialists
from other countries--members of MSNTI (?) toward organizational unity
in designing scientific and technical problems for commercially suitable
MT systems.
Yu. Marchuk (Moskva
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9