LINGUISTICS AND THE CONTEMPORARY STATE OF MACHINE TRANSLATION IN THE USSR BY R. G. KOTOV

Document Type: 
Collection: 
Document Number (FOIA) /ESDN (CREST): 
CIA-RDP83M00171R001800120015-9
Release Decision: 
RIPPUB
Original Classification: 
K
Document Page Count: 
31
Document Creation Date: 
January 4, 2017
Document Release Date: 
December 14, 2001
Sequence Number: 
15
Case Number: 
Publication Date: 
October 31, 1976
Content Type: 
REPORT
File: 
AttachmentSize
PDF icon CIA-RDP83M00171R001800120015-9.pdf1.93 MB
Body: 
Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Vaprosy Yazykoznani j 3 pp 37-49, 1976 INGUiSTIC;S AND THE CONTEMPORARY ST TE OF MACHINE TRANSLATION IN THE USSR oy R. G . Kotov - During recent years Machine Translation received considerable public attention in connection with anal 'tnfi solutions for further development of the State system for scientific-technical information (Gosuderstvenna,ja sistema nauchno-tekhnichesk:oj informatiil). MT is treated as one of the components of this system. Within this framework MT is considered not n s an exciting theoretical field, but rather as a practical tool for obtaining large amounts of translation of scientific texts of "rough" quality for purposes of information retrieval services. During this period there began to appear reports in foreign countries describing successful and economically profitable application of the com- puters for massive commercial translations. However, this changing view- point was not noted in our country since the established opinion in the USSR was that MT is a task for the future, and that it will take a long time to_ work out the theoretical fundamentals of the theory of translation. There was another fact which escaped the attention of the linguistic community in the USSR. Toward the end of 1973 and the beginning of 1974 a special temporary committee on science and technology was organized under the auspices of the State Committee of the Council of Ministries of the USSR (Gosudarstvennyj Komitet Sovota Ministrov SSR po nauke I tekhnike). This committee, composed of representatives of various organizations interested in practical MT, specialists on automatization of informational processes, specialists on MT took as its task determining under what con- ditions MT could be developed as a practical system at-the present time. This should be a working, expedient MT system. Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 4r Under the "practical (working) MT system" is understood a system of dictionaries automatically used, equipped with necessary linguistic in- formation and pro-rams in order to produce massive "rough" quality of scientific technical texts. The editing of this MT translation should not take more effort than the editing of the usual translation. The "rough" quality means that the translated text is understood by the user in terms of clearly presented meaning; the meaning corresponds to the meaning of the source, and, therefore this kind of translated text could be ~sed as a source for information. of the state of affairs in the MT field, both in the USSR and abroad, are the' following : "T] a level of achievements both in theories and experiments on MT makes it feasible to raise the question of moving toward the practical realiza- tion of MT in the USSR." "The economic significance of practical MT could be evaluated on the basis of the following assumptions/assertions: --MT processing of the text is approximately 5 times cheaper than human translation...; --The time-consumption required by input of the text Into the computer could be compared with human retyping of the text, translated by humans --In terms of speed MT (including the post-editing) could be achieved at least ten times quicker; "The work for creating practical MT and its deployment should be carried out already at present time, without demanding the preliminary conditions for solving all the theoretical problems aimed at producing to rnslation of a higher quality." ` The conclusions this committee arrived at after a detailed analysis Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 3. The Coittee has also stated that the absence of practical MT works became a hampering factor for further development of retrieval research in general, not just that of MT in particular. The basic conclusion is that the USSR has no practical MT system despite the existLng achievements in theory and practice and the real opportunities, while in foreign countries MT has entered an era of com- mercial application by both state and private organizations. What are the causes of this state of affairs in the MT field in the U5S ? A brief history of MT development within the USSR should be pre- sented. Certain facts in its development might help in understanding the' peculiarity of MT development in our country. THE FIRST STAGE (1954_1958) In the Institute of precise mechanics and computing technology (Inst.itut tochnoj mekhaniki i vychislitel'noj tekhniki) of the Academy of Sciences of the USSR under the guidance of D. Yu. Panov and I. S. Mukhin and other members of their group (which included L. N. Korolev, S. N. Razumoski j, and tho linguist I. K. Bel'ska ja ). The first Soviet experi- mental MT translation from English Into Russian was made on the EVM BESM computer In December of 1955. Then, in 1955, the group under the guidance of A. A. Ljapunov and his assistant 0. S. Kulagina carried out experiments from French into Russian using the EVM "STRELA" (the linguistic work was done by I. A. Mel'chuk and T. N. Moloshnaja). The results of these two groups were reported by the senior researchers of both groups in co- authored papersi I. D. Yu. P=oov, -A..? Ljapunov, I. S. Mukhin , Avtomatizatsiia Perevoda s 0dnoo Yazyka na Dru. o', M., 1956. Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 4. Without co opting on the quc+ss' o s connected with cadf.ng techniques on the co?r_outer s . differences in usin ? the l in aistic information and do- signing the transfer algorithms could be reduced to the differences L> LM between "empirical"(I. a. Belskaja) and "analytical" (I. A. Biel?chuk, T. N. IMloloshhana ja) ways of solving tho same problem. However, one has to note not so much the differences as the fact that the leaders of both groups recognized the existence indlju. tification of different attitudes for solv.ng new and complex MT problems. In that very paper Panov warned about the danger of being carried away by logical, analysis of the language structure as a tool for solving the 1T problem. This logical way ssemed to be attractive especially if one would follow the direction of certain MT work in the USA, particularly from the mathematical point of view since it made it possible to formulate the MT problem as essentially a mathematical problem. However, "the very nature of the translation is such that one can not completely ignore the individual features of the input text. Evidently, we encounter here a problem which requires special analytical methods, similar to those ex- perimental methods which are used in studying natural phenomena."2 The group of D. Yu. Panov has also formulated the basic principles of designing a MT algorithm, some of which are valid at the present time,too: -the maximal separation of the dictionary from the programs; --storing In the dictionary of the inherent grammatical features of words; determining the meaning of the polysemic words on the basis of the con- textual environment, their grammatical features and the analysis of the grammatical structure of the sentence, and other factors. 2. Ibid., p 15. Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 5 b' ;an ei rg tng a S "a :i r S ' .t lus ia.st ~ In our country who on whole did not have o p,rtun1ty to test their algorithms on a computer and were en7?a~7ed essen4,14.all In theoretical work. THE SECOND STAGE (1958-1961) The majority of participants in the first MT conference in the USSR in 1958 despite the wa.rnin s of Pa nov as mentioned above, found the direc- tion of Belskaja too "empirical." The directions for research suggesting following some models in formal terms and using the intermediary language in order to recreate the logical structure of a natural language were rec- ognized as more attractive and more promising. The discissLons on the conference indicated not only the differences in opinions concerning the ways of going about solving MT problems, but also the inclination of some researchers to consider the direction of their research as the only correct one. The stage of suppression of "empiricists" and the start of fruitless searches for "universal solutions" of MT tasks was completed at the ALL UNION CONFERENCE ON INFORMATION PROCESSING, MACHINE TRANSLATION, AND AUTO- MATIC READING OF TESTS IN 1961 (Vsesojuznaja konferencija po obrabotke informatsii, mashinnomu perevo.du I avtomaticheskomy chteniju teksta v 1961 g.). At the final session it was announced that according to pre- vailing opinion MT as a practical problem should be removed from the agenda and all efforts should be devoted to working out the theoretical basis of translation. The group of Panov, Mukhin and Belskaja ceased to exist at that time. The work of other groups came slowly to an end. Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 ? 6 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 THE THIRD SAC (1961-19-i-) portod i ar ~v. Lr. he Pr'.. ..vailir:, development of theo- rotical studies of l?n-ua o,.t .ado of any coz ection with the sp ctf1r i task of designing practical working 1'2T systems. It is necessary to note that during the period of the late '50s and early '60s "ree'.alu:tion of values" in the field of MT has also taken place in foreign countries. The'works carried out made it possible to come to certain important conclusions. 1. IIt turned out that the existing gram-.cars and the experience ~ in formal- ion of linguistic data were inadequate. There was no formal apparatus for describing morphology, syntax and semantics to the degree that they could be used in designing MT algorithms. This, in particular, served as timulus for development in various directions of structural and mathe- matical linguistics. 2. It became evident that one should test on the computer not only al- gorithms, but also theoretical constructs in linguistics, without which one can not evaluate their applied importance for liT. 3. Designing an MT system even in its simplest variety should not be considered as temporary work, but rather as a consuming long term task, whose success can be guaranteed only by simultaneous efforts of linguists, programmers and computer engineers. The conclusion concerning insufficient access to computers and the resulting discussion that the special purpose computers should be built in the immediate f'iture lost its relevance since the new computers with large memories were built, coupled with high speeds and mathematical operating systems, Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 7. Ho acv e:', ;^r t imcortant war, the awareness of the fact that language studies c nd the scare: . or f. rma i i:Ing ?a guage structures are necessaij .:er ?ocedures for treating the information by computers as well as in the interest of the development of the t.'2eor, of language. ;..owever; while the language studies in terms are both defined and delimited by their applicability for computer testing, no such polar limits are assumed for the formal model studies concerning language theory or cybernetic problems in general. Thus, there developed two trends in,linguist-ic studies, applied and .information retrieval which differ from each other in their goals, tasks, depth and time periods needed for achieving the stated goals. Accordingly, one has to differentaite between MT as a scientific technical problem for designing a working MT system tested and used on a computer as a source of information, and MT (if that term should be used at all) for the various retrieval researches in which language studies are used for solving various processes in information treatment. In connection with this division, one has to evaluate various linguistic studies. MT as a universal scientific problem is a logical intersection of various, sciences interested in aspects of language such as general linguistics, mathematical logic, semiotics, psychology, a series of cybernetic sciences, etc. Within this framework, the area of linguistic investigations keeps enlarging toward the more fundamental description of language disregarding its connection with the tasks appropriate for MT. The studies and research aimed at designing MT systems as practical and working systems found themselves In quite a different situation. These efforts were not supported and as a result there is not a single working MT system in our country. Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 also U nzc..,zc i~ : is ? .,._5wP technical pr :ler .... It U'21 1': crcr:l 5: --n :.fic problem. As a result of the consequences of the "roils-tools" approach there was a disbalanced Corr elattc n e0t;,7Len he levels of ro c arch serving the solution of M2 as a scientific technical problem versus retrieval research disregarding any applied testing for IT .3 the applied studies turned into retrieval studies, and having lost the connection with the original goal, naturally they were not able to secure or provide the solution for working MT systems. Starting with '60s the development of MT in our country and the West went on different roads. One could completely agree here with the evalua- tion expressed by MT specialists.' In the 'nest, despite the discussion of "crisis" in MT, attempts to solve MT problems by "brute force," by using the great dictionaries and relatively simple algorithms were not stopped.5 At the same time theoret- ical studies were also conducted concerning formalization of language structures in a mode of close connection with the computers. Thus, for example, the principles of syntactic analysis of sentencessnd generation of "microsentences" were programmed and experimentally tested.6 3. G. Pospelov. Ob7ekt upravlenija_nauka., "NAUKA I ZHIZN," 1975, 11. 4. V. N. Gerasimov, Yu. N. Marchuk, SOVR MENNOE SOSTAJONAIE MASHINNOGO PEREEVODA, collection "MASIUNNYJ PEREVOD I AVTOIMATIZATSIJA INFmIATSI- ONNYKH PROCEISOV," M., 1975. 5. J. M. Daniel, Translation by computer, "Electronics Weekly," 304, 1966, 7. 6. B. T. Carmody, P. E. Jonez, trs matic derivation of microsentences, "Communications of the ACM," 9, 6, 1966 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 f viral r:odol of on the I'asis of Chomsk.y's concepts?, cjpC , i o n s of t' he sz,. '.. ct c s--ructu.. ea L langua,,.~, and of . t 1cal const.:acts in lin uis a' I os, This provided the opportunity to evaluate the results of theory from the point of view of practical significance for solving APT tasks, to select the level of realistically necessary details and formalization of linguis- tic descriptions, to correctly modify the direction of further investiga- tions. One should not maintain the idea that the coupling of these studies with specific MT tasks, narrowing, formal linguistic studies to their use- fulness for MT, their feasibility for testing on the computer, hampered or suppressed the creative thinking of researchers and led to a blind alley, as is asserted by some purely theoretically oriented researchers. Experience showed the opposite. Combinability of the theory with the solution of practical problems and experimental testing of the theory led. to the creation of the series of working systems, "unscientific" as their principles nay be, with various degrees of automatization of the trans- lation process. Thus, for example, there are in existence large automatic dictionaries, the use of which secures a higher quality and quicker human translation9, systems of translation for information data,10 systems of MT producing "rough" translations of arbitrary texts, and with the addi- tional editing--translations of high quality which is quicker and cheaper than by human hands.ll B. J. Friedman, A computer system for transformational grammar, "Communicat }.ons of the ACM," 12, 6, 1966. 8. W. A. Woods, Transition network grammars for natural language analysis, Co:munications of the ACM," 13, 5, 1970. H. J. 5chock, Zusammenarbeit Hensch/Maschtne beim Umgang mit electronisch gospel Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 9. Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 >e cL ~.__ r~M,an ern .R ._7 create:., a real; ,Da,~z 1.7, for x ~: r repective proble, These studies have demonstrated the role of lincuist.io invest I-at Ions also for the ego 1e1z _ n w _ Alec rorlc co^ou+ ;&-rio1oy 4n par- titular, they have made it possible to reformulate the new deriiands and conditions regarding the design of the cor uters of future generations including considerations of specific features characteristic of human handling of the information data.12 In our own country, the development of 17, after 1961 took another road. A new trend in linguistic investigations was formed and rooted which considered NT as a general scientific problem, but retained the old applied title "automatic translation." Within the framework of this trend the scientific-technical problem of MT was pictured as one of the many specific problems the solution of which was possible after completing the whole complex of theoretical studies in linguistics. Retaining of the title "automatic translation" assisted this trend during its forma- tive stage since it created an illusion that the research would continue for purposes of solving the scientific-technical task of MT, while in reality this new trend set as its purpose quite different tasks far re- moved from MT problems. The fundamentals of this theoretical direction are most completely exemplified in the preface to the book "AUTOMATIC TRANSLATION 1949_1963."13 12. D. G. Hays, Linguistics and the future of computations,"AFIpS Conference - Proceedings," New York, :x373. 13. T, A. Mol'chuk, Preface to the book: I. A. MEL'CHUK, R. D. RAVICH, AVTO~1ATICHFKIJ PERLVOD 1949-63. A critical bibliographical manual, M. 1967. (Indications for pages are given in the text.) Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 11. La it, in partic:d._ti,, is said that the Investigations in ' have .e-- a is th-o . e . atifln of a n e-; 5.. ,.. 3nwuAg~ ~. f (automatic translation) is a specific job within a. more g ~r l scien- tific goal.--to teach ho co, uters to learn human languages"(p.7). Thus, inste? i of s ;? t: a specific proble:a, a general grc: with a global perspective is postulated. From this it follows that "the description of trends according to which the text is connected with the meaning is a contral problem of linguistics--one that _is theoretical and descriptive" (p.8). Within such wide phrasing of linguistic goals there is no natural border line between purely linguistic work and work concerning 11T...Any sufficiently rigurous linguistic study or work that contains material adequately processed has a direct or at least Indirect relevance for MT" (P. 9). (It should be noted as an objective criterion for "strictness" and appropriateness of linguistic work for MT is the fact of its being in- cluded in the bibliographic list "AUTOMATIC ANALYSIS OF TEXT AND AUTOMATIC TRANSLATION" of RF (Referativnyj Zhurnal) "INFORMATIKA," whose editor is I. A. Mel'chuk himself.)i4 Furthermore, all the questions of algorithmization of procedures, using the results of linguistic investigations for applied goals, and the experimental testing of linguistic algorithms are declared as not 14. I. A. MIel'chuk, OPUT TEORII LINGVISTICHESKIKH MMODELEJ "4SMYSL(-- ) TEXT,'t M., 1974+ (Indications of pages are given in the text of this article.) English title: Experience In modeling the theory "MEANING(--)TEXT". Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 12. lei ? is ::' unneeess ry ?o r linguists to d,:),: x.ll =_n .y : a ?hematt ci_..^,s wo e:,:- pact from the in uistic$ c 1 tract ;;.:;; of lare _yk ) Gorri;i Y ~ _ wt ,c y ; (G ) prose. r nt~3 a ;d ar of 4 Uransfo ire t4ir s;, ~ ict.lc cons zruct.,_s from, Russ-Ism into Go= an ::or tran,1atic; pairs of languages, and also for intralangua e trarsformat f. s ident ficat 4. 9 .on/iscial;zon Of synno2kvmic constructions. Representation of syntactic structures in tho given system makes it feasible to solve such complex problems as finding the antecedents of pronouns (anaphoric) (C..Klimonov, GDR). G. S. TsejtLn presented new models for analysis, using the preference of linguistic constructs. These types of models, and also the models with limited nonprojectivity were tested experimen- tally (G. S. Tsejtin, B. M. Lejkina). The collective paper of scientists from Czechoslovakia described the basic components of the functional generative model. Semantic re- presentations (formulae) are generated based on the syntax of dependences (E. Beneshova and others). An attempt was made to make a choice of gram- matical analysis of text on theoretical grounds. The general principles set up by Hjemslev toward linguistic analysis are acceptable for the MT as well as the principles of descriptive lin- guistics. However, the concrete text analysis is based on its own laws, which are best described by distributional-statistical methods supported by probabilistic evaluations. The main principle of this method consists in the fact that the main postulates are not given a priori, rather they are arrived at as a result of attempts at formalization of language (A. A. Koverin). Many papers were devoted to specific syntactic problems; for example, interpretation of comparative relation in the grammar of syntactic analysis Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 of Sslc2T t? s ( i. . .i cm's of S wS the algorithm of ar_aly of t . continuous type (E. E. analysis of pre; os ~ t icn 4 `, . ?: i 7a, T. r. i canorova, N. i eont' eva ), rules for designing prcdi^ ..e relations (H. S. Fersh;?:ova, and others), The automatic r cr' hole.,: : c 1 analysis of the written Polish language was treated in an interesting paper by A. Lukashevicha (Polish People's Republic). M. P. Muravitskaja delivered a report on automatic morphemic analysis of verb forms. It could be said that in the semantic analysis the, experimental method took root. Thus, the concepts of the semantic connectedness of the text are verified by algorithms for segmentation of texts into para- graphs and connected fragments (T. N. Rylova, L. V. Orlova, R. A. Kovalevich and others, T. V. Dolgaleva, G. S. Osipov and others). Some presentations deals with the dictionaries containing semantic information, and also he semantics of specific words and word combinations in natural language .(M. I. Otkupshchikova, G. M. Il'in and others, 0. A. Shteronova). The research in the field of formalization of semantics provides the output of immediate interest for the information analysis and re- trieval (automatic indexing and annotation, creation of dictionaries thesauri). Due to this the paper of V. A. Moskovo'o and Yu. S. Martemjanov dealing with a generative model was very intesting; the papers of communi- cative organization of text and its reflection in the semantic structures (E, I. Korolev, A. M. Shaljapina) and Linguistic Justification of the System "Question" "Answer"(Kor:"ad P., GDR). The quantitative evaluation of the quality of translation (N. A. Kuzemskaja, E. F. Skorokhod'ko). Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9 31. In the se' -t (nl :':3~.. C?._ 7, 4 Service' pzapars were presenter' d X 12 ._t:. the the machine and man within the Process of inter- and post-editing in the 111 systems (3. OO 1 3ley. a preach io:tiard mathe- matical and ( ~. - for 1T D. N ::. Skitnesvxij) , and d i-n of general and specific programs (N. A. Krupko and others, N. G. Arsent'eva, R. S. Karetnikov, L. N. Beljaeva, S. A. Anan'evskij and others, N. A. Balandinav, S. Krisevich and others,-L. F. Lukjanenkov and others). The problems of automatic recognition and synthesis of hierogliphs were also considered (S. M. Shevenko). The results of various programs were demon- strated, representing linguistic algoritnms. In the sessions of this seminar 205 persons participated, approxi- mately 80 theses of papers were.-sent to the organization committee, and 63 papers and reports were presented in sections and at plenary sessions. Scientists and collectives from five countries and 15 cities of the USSR participated. This shows the interest in MT and the trend of practically all Soviet collectives and scientists studying MT and also the specialists from other countries--members of MSNTI (?) toward organizational unity in designing scientific and technical problems for commercially suitable MT systems. Yu. Marchuk (Moskva Approved For Release 2008/03/03: CIA-RDP83M00171 R001800120015-9