REPORT ON: 1) SOME PRINCIPLES OF THE UNIFIED TRANSFER SYSTEM (UTS) 2) AUTOMATIC DECLENSION OF RUSSIAN NOUNS FOR UTS 3) COMPUTER IMPLEMENTATION OF UTS

Document Type: 
Collection: 
Document Number (FOIA) /ESDN (CREST): 
CIA-RDP64-00046R000200030003-3
Release Decision: 
RIPPUB
Original Classification: 
K
Document Page Count: 
75
Document Creation Date: 
December 15, 2016
Document Release Date: 
December 19, 2003
Sequence Number: 
3
Case Number: 
Publication Date: 
January 1, 1960
Content Type: 
REPORT
File: 
AttachmentSize
PDF icon CIA-RDP64-00046R000200030003-3.pdf2.38 MB
Body: 
STAT Approved For Release 2004/01/15 : CIA- DP64-00046R0002000302w;~e~j /.i. 1) SOME PRINCIPLES OF THE UNIFIED TRANSFER SYSTEM (UTS) 2) AUTOMATIC DECLENSION OF RUSSIAN NOUNS FOR UTS 3) COMPUTER IMPLEMENTATION OF UTS By Ariadnd Lukjanow Rudolf Loewenthal B, D. Blickstein :*trnuuttuuuunmumuumtutauuumumuununtruntuuunilumm~urnumutumntumumnmmnnunuuun-ummuuur C-E-I-R MAIN A OtF~FICCE: 734 Fifteenth Street, N.W., Washington 5, D. C. lpprove For ReieaseR20d 01/t~5 ~nuuunnuutuuuuuuumutmuuutututunmtmutnltmtunuunuuuuuuunuununnnnunuuuuunnununmm~nnnnu~ Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 REPORT ON: 1) SOME PRINCIPLES OF THE UNIFIED TRANSFER SYSTEM By Ariadne Lukjanow 2) AUTOMATIC DECLENSIQN OF RUSSIAN NOUNS FOR UNIFIED TRANSFER SYSTEM By Rudolf Loewenthal 3) COMPUTER IMPLEMENTATION OF UNIFIED TRANSFER SYSTEM By B. D. Blickstein January 1960 C--E-I-R, INC, Main Office: 734 Fifteenth Street, N?W?,;Washington 5, D.C. Research Center: 1200 Jefferson Davis Highway, Arlington 2, Va.. Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 SOME PRINCIPLES OF THE UNIFIED TRANSFER SYSTEM By Ariadne Lukjanow ................................. I AUTOMATIC DECLENSION OF RUSSIAN NOUNS FOR UNIFIED TRANSFER SYSTEM By Rudolf Loewenthal................................ 38 COMPUTER IMPLEMENTATION OF UNIFIED TRANSFER SYSTEM By B. D. Blickstein ................ ............... 66 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 REPORT ON SOME PRINCIPLES OF THE UNIFIED TRANSFER SYSTEM (UTS) By Ariadne Lukjanow C-E--I-R, INC. I. INTRODUCTION Several approaches have been employed in Machine Translation in the course of the past few years. These approaches were either determined by specific objectives or influenced by the background of the research workers. The ob- jectives range from automatic.dictionaries to translations with varying degrees of_accuracy, readability, and perfection. The background of a researcher can influence his approach to Machine Translation in three basic ways. One approach may be influenced by machines in such a way that only the development ofa new language computer would lead to acceptable results. Another approach may consist of an attempt to simulate human reasoning on a standard computer. A third approach would be to make Machine Translation as mechanical and utilitarian as possible, by adapting this attempt to the capabilities of the machine and by clearly defining the relationship between man.and machine. Sine present-day computers are best suited to repetitive mathematical operations and man is still the best thinker, this last approach will make it possible to utilize both of these capabilities to their fullest extent. All thinking will be expressed in the form of codes in the dictionary in the mariner provided for by the system. In order to translate at all, any system must provide solutions to the problem of transferring structure, function, form and meaning from the source language into the target language. Thus, we can call translation a fourfold transfer process consisting of: Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 (1) Transfer of the function of words (parts of speech) (2) Transfer of the form of words (morphology) (3) Transfer of the meaning of words (semantics) (4) Transfer of the location of words (syntax) Every word has a meaning, even if there occurs a so-called "zero- translation," or non-translation. In this system, we shall.accept a 1:1 translation as equivalent to no-meaning problem. Every word in a language has its function; i.e., it is a part.of speech and, unless it is a non-translation item, it also has a location or position (syntax) qualification. Transfer process can be visualized as a combination of the following six concepts: (1) Function (some "particles," some adverbs) (2) Function + location (some punctuation marks, some adverbs, some gerunds) (3) Function + form + location (groups from all parts of speech) (4) Function + form (some prepositions, some adverbs, some gerunds, negations, etc.) (5) Function + form + meaning + location (groups from every part of speech) (6) Function + meaning + location (some adverbs, some conjunctions, etc.) Example: Combination of function and location: posle - later; adverb with a 1:1 translation equivalent and location "after verb." Colon, punctuation mark:- 1:1 equivalent, position:is at the end of a clause. Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Function Form Meaning Location x 0 0 0 0 x 0 0 0 0 x Q! 0 0 0 x x x 0 0 x 0 x 0 x 0 0 x 0 x x 0 0 X 0 X 0 0 x x x x X 0 x 0 x x 0 x x x x x 0 x It would seem that these variations could be expressed in mathematical formulae, but this is not true because the rdlationship between the variants does not. follow the rules of permutation or random combinations. In contrast, these variations follow definite linguistic rules which permit only certain variants within certain combinations. In order to determine these linguistic combinations for the elements of transfer, it is necessary to define and Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 classify each variant for every element of transfer, as well as the relationship between the variants of each element of the transfer to the variants of the other three. This can best be illustrated on prepositions: ELEMENT OF TRANSFER DEFINITION function preposition form case government; i.e., pre- positions demanding the genitive, dative, accusative, instrumental, or locative meaning prepositions of time (static, earlier, later), location or space (where, to where, from where), cause, goal, substi- tution, division, etc. location first item in prepositional phrase, or position 1 in pre- positional phrase Theoretically, we could produce a transfer combination of preposition + dative + location (from where) + position 1 of prepositional phrase, but the grammatical rules and semantic connotations do not permit this type of com- bination. The prepositions of location are subject to the following division only: Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 -5- LOCATION GENITIVE DATIVE ACCUSATIVE INSTRUMENTAL LOCATIVE a) where? bliz po za v mezhdu na vne nad pri mezhdu sredi pered pod u b) where do k v to? za o na pod skvoz6 cherez c) from iz where? iz-za iz-pod of s The above table shows that the "from where?" definition is used only with the genitive case. Thus, the only usable and meaningful combination is: preposition + genitive + location (from where?) + first position of prepositional phrase In the UTS we accept any meaningful and valid combination of elements of transfer expressed in the form of numerical digits as a single unified transfer code. Since many words of the source language can be associated with several function, form, meaning, and location qualifications, it is necessary to combine single transfer code units into sets of codes which can express these variations. Examples: dannye nominal modifier vdol6 preposition of genitive adverb s ? preposition of - genitive Approved For Release 2004/01/15: CIA-RDP64-6Otfb200030003-3 - instrumental Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 sredi preposition of location (where?) time (static) If we consider that we have four elements of transfer, each of which has a definite and limited number of variants, it is safe to assume that the number of transfer codes is limited and that we may likewise assume that the same applies to sets of transfer codes. This leads us to the concept that numerous words in the dictionary are associated with identical transfer codes or identical sets of transfer codes. This fact makes possible the concept of code patterns. The number of single transfer code units in the pattern can vary from one to several. After examining some 50,000 canonical entries (stems) in the dictionary of Smirnitskij, we have decided to set the limit at a maximum of 25 single code units in the pattern. Now let us examine the actual elements of each transfer. Since in translation we are dealing with at least two languages simultaneously, we have to develop a criterion for parts of speech, morphology, semantics, and syntax which would accommodate both languages under consideration, or we must establish a classification system which in form of transfer codes would permit us to place an equal sign between the two languages. This necessitates a certain type of analysis and of synthesis of the grammars of both languages. Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 II. THE. FUNCTION OF WORDS OR THE CATEGORIZATION OF WORD BEHAVIOR When examining conventional parts of speech in Russian and English grammars separately, we note that they contain identical categories such as prepositions, adverbs, nominals, modifiers, etc. But when we compare these categories of both languages, we discover that they differ considerably in usage, behavior, and function. In terms of a translation system, this means that either we have to introduce new synthetic categories or we have to divide and redistribute words differently within these categories. Categorizing is, of course, a somewhat subjective process. That can best be illustrated by examining the Englishpre- position "to," in the following manner: QUALIFICATION ENGLISH RUSSIAN EQUIVALENTS BILINGUAL DATA TRANSFER DATA (CLASSIFICATION) Function 1. prepo- 1. preposition. 1. prepo- 1. preposition sition sition - code like item Behavior 2. intro- 2. non- 2. particle 2. particle ducer of existent like item code infini- tive classified as a special auxiliary verb (instead of "particle"), but to the author of the system the definition as "particle" appears more reasonable, perhaps because of the occurrence of the Russian particle "by" in the verbal phrase. In the process of comparative analysis-synthesis, we have established the following basic categories as transfer parts of speech (listed .alphabetically): (1) adjectival modifier (2) adjective/noun (3) adverb (incl. some gerunds and the particle li) Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 (4) adverbial modifier (type: bolee, menee, etc.) (5) Auxiliary verb (byl, byli, etc.) (6) auxiliary verb (moch6, khotet6, etc.) (7) conjunction (8.) negation (incl. some negative adverbs) (9) nominal (animate), incl. some pronouns (10) nominal (inanimate), incl. some pronouns and numerals. (11) nominal (formulae, cardinal numbers, missing words) (12) numerical modifier (1.3) particle (14) participal modifier (15) preposition (16) pronominal modifier (17) pronoun (type: nami, vami, imi, etc.) (18) pronoun (sohoj) (19) punctuation marks (each treated as a separate category, a total of six) (20) verb (including participles such as izucheny, cotkryty, etc.) The assignment of these basic categories to individual words is a discrete and subjective process. It can give valid results only if all other factors and constituent parts of transfer are being taken into consideration. We proceed from the parts of speech as categories to their classification. That can be expressed in the form of a numeric code. We know that sentences and phrases are combinations of these categories and that these combinations cannot be produced by random distribution of words. Words have to occupy certain positions in order to form a meaningful combination or phrase. Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 If we take the three-word phrase "in this room," we cannot convey the same idea by a redistribution of the participating words: "this in room" "this room in" "room in this" "room this in" "in room this" We will either get.a meaningless jumble of words or convey a different idea. We say "our new building," but not "new our building." We place some adverbs before verbs, some after them. Some of these phenomena can be explained, some are ascribed to usage, but others escape any logical explanation. Dealing with.26 categories and considering each of them in relation to the other 25, we can establish a hierarchy within the meaningful combinations of parts of speech; i.e., logical sequences. This point can be illustrated by the position of words within the sequence of a. prepositional phrase consisting of a preposition (P), a nominal (N), two adjectival modifiers (AM), and a pronominal modifier (PM): P before N AM before N PM before N PM before AN P before PM P before AM AM = AM Thus, we arrive at P-PM-AM-AM-N; or if we assign numerical values to these categories and would like them to form a progression of it 12 i3, etc., we will emerge with the following correlations. P a) Cd d ?CI Cd ?r1 Cd Cd ?r1 ' ?ri 'n a) I ? H ?~ ?n > > r. ?' ?n > . , r CD (1) ?r) dl 0 0 a) a) Cd ?ri a) 0 ?r1 -I r-I a M d N ,4 r4 N C g Cd Cgs H g ~ a C C -I Ii Cd -1 .14 C d 14 0 > 0 > a) 'n a) ?n a) ?'n 0) > (2) 29 'n a) 'n 29 > a) t51 t5 'n ?n ' 'n ?n ?n > > 'n ?n > 'n ?'l > . . . a) a) r) CO (D a) a) 0 0 a) a) O a) a) 0 a) ri ( >, ?rl ?rl ?rl ?ri ?ri ?r1 Cd ?r ?rrI 0 H >?, H ?r1 ?r1 H ?rC ?ri ?r1 ?r1 ?r1 >, ?r1 ?~ Cd ? 1 ? 1 r , Cd H O a) a) a) a) ?rl a) a) a) N ?rl a) a) a) a) a) 0 ?r1 a) a) ?ri H a) a) a) N a) Q) a) a) 0) LO 0 0 O 5 E E El Cd Cd a) S Cd Cd ?n r., O ' n .r1i ?n ?n H H r a) El 0 a) a) a) S a) a) (1) (2) (1) a) O (2) a) (1) (1) 0 O '- a) (D a) a) 0 CD CD 0 0 a) a) O 0 a) 0 O Cd ri d~ t5ti ts1 'n O O tsi ?r Cd i O a) a) a) a) :j 0 0 H ?r~?I ?0 ?r~C CO CO Cd d rq C H ? M O. Ed cd 0 0 .ri M a) a) H H H O H a) H H a rii a) N a) a) r1 a) N H H ~ ~ 7 O ?r ?I ?r ?I a) O a) O a CV Cd Cd ?,H?1 ?rri ?rl (d ?r~i N H ?4 Cd ?r~?1 'Y, ?rl H ?r1 H ?ri ?r1 ?r1 ?,i Cd Cd -H -H d d C C Cd Cd Cd ri t9. L9. '~ CO CD 29. 'n 0 a) a) a) 0 Cd Cd Cd rq H ?rI .1144 CO CO 25. t9. O CD 'a m m tg, 0 0) a) E3 1--1 Q 1 ? ? 1 +?a 0 Cd 9 Cd Cd Cd Cd Cd ?ri Cd Cd Cd Cd Cd +H Cd H +a Cd ?rH H Cd ?r1 ?r1 ?r1 ?ri ?r1 Cd Cd ? 1 I d r r O ? r. C Gi 0 0 . Cd 9 0 Ci 0 0 C". 9 0 C, 9 F' 9 F,? 0 9 0 r 0 L" r 0 - P C 0 O 0 . ?r ?ri ?11 ?ri ?r1 ?r1 Cd ?r1 ?r1 -1 -1 ?ri .0 H Cd ,.CZ ?rI Cd Cd ?r1 Cd Cd Cd Cd . Cd i H , ?r1 Cd ?, Cd 1 ? r N a a) (1) r-q O U ? >1 O Cd A Cd ,Q U ?C) d Cd A U T) Cd A U Cd .0 C) "0 Cd A Cd a Cd O O A N E? Fi r'-1 ri CV N N CV CV M V4 V V, to CU Cf) CO CD CD CO I- h 00 00 0) Ca) O -1 -I 0 r-I r r H ri ri a) G a) Q) ri o + C 0 O 0 ,--1 C'l C tf) CD h 00 aA 0 r-1 N M cr LO CD n 00 CA O r-1 CV co Il U') CO C- 00 0) ? . H El H 0 ri 0 ri 0 H 0 -1 O i O 0 0 0 ri r-1 1-1 r-I r-I 1-1 r-I 1-1 r-1 H N CI CV N CV CV CV CV CV N 0 M C P r r ri r-I ri ri ri r1 ri 1-1 ri r-i ri H r-I ri r-1 ri ri r-1 .-I ri ri ri ri ri r-I C1 a a) o Z1 >ti ?ri F U N U N U N V V U U H +; +-~ +~ +~ U V V U U C) U U U U +H U + a) Ed N N N N : Cy Cd Cd Cd Cd a) N a) a) a) N S a) N N N N N N N N L '.) > A S F a) E a) Cd 9 9 q G w a) w w w a) a) O S D S , Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 OF THE UNIFIED TRANSFER SYSTEM B. D. Blickstein Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 67 The basic flow diagram, Figure 1 on the next page, traces the basic functions which the computer must follow, and shows the necessary magnetic tape configuration:.- Also shown on this chart is an index to the tapes, showing the processes in which each tape is involved. The flow chart is divided into the following computer program steps: 1. Text Preparation The entry to this box is the raw text, prepared by either key- punching from the Russian or by a character-scanning device. The function of this program is to convert the text to a form which the machine may more easily accept. At the same time, Romanized expressions will be extracted and saved for later re-entry into the system. At this point, a transliteration of the text can be produced. 2, Alpha Sort The sequenced and prepared text is now sorted into dictionary order. The original text sequence numbers are retained. 3. Dictionary Search The sorted-:text is matched against the dictionary tape. For each text entry for which a dictionary match exists, a record will be written on tape D, consisting of the appropriate pattern number and the set of English meanings, still retaining the text sequence number. For each text entry which has no match, a dummy "word missing" record will be written, and the Russian word written on the "missing entries" tape Dl for subsequent printing. 4. Sequence Sort Tape D is now sorted back into text sequence. At the end of the sort, a split of the tape D record will occur, creating two tapes, E and El; Tape E contains only pattern numbers, and tape El the corresponding sets of English meanings. Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Rmease 2004/01/15 : CIA-RDP64-00046R000200030003-3 -p 1ja) ~wa `f~a,Aammr ~., aaa)Aaww _ h r. 9 Z ? c i C)) oo O CH (D a co Cd Ei idsv1 W U a) 0 -rq a) -P -P -P -N `rx'i R, 1 U U~ Cciy.~.l -q rI ?rl t O ta0 U C-~+ e r ] A V~ ~ co A 4 ) h0 -to) $ H C 4 a) 4.3 +~~ N -P --~ 0 N 'd (XI cd cd VI t ~ + P rX4 ti R . y , rn CO E- W U - , P 4 + w 444 to 0 o c ~Q) U 1 f U) l ? 03 ?rl r U) ?r ?r 4 " ' v~ -{ U A U O D CS ?:j i Q C-4KC co co r, co O it H 4-3 4-J ILA a) -P N rq 4-) H +7 + (3It!) 8 0 ID 0 A I'd 10 '0 E1 F +~ N (E-1 O - 0) ( N Z }y' U N s 7 U) (' z F i taD U U U! a3 S. C", ~i' U3 r-1 U) H H (1) (1) U) - r q U) U) -H p) a'b Dc 0 -P C/) ' C a L C 7co CO & oC P l it ~PQr-qWC] A:z1 r~4 441 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 ro 0 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 5. Unified Transfer The basic code matching algorithms are performed here. Blocks are recognized, and the proper meaning selections are made, the output is the sequenced selections tape. The computer considerations of this section will be treated later at some length. 6. English Extraction The selections tape is used to select the proper English meaning from the English tape at this point. The output is an English text with certain block marks present. 7. Syntactic Ordering Re-arrangement of the syntactic blocks is performed here; at the same time, the Romanized expressions are merged back into the text, and a final translation tape, suitable for printing, is produced. Some discussion of the matching algorithms is appropriate here; the first part of the process is shown in Figure 2 on the next page. This involves the identification of phrases by means of the parts-of-speech code numbers, which we shall refer to as progression numbers. Let PR(j) be the progression number associated with the jth text sequence. As the translation progresses, suppose all phrases through the (j-1)th are strung, and we thus wish to find the boundaries of the phrase beginning with this jth word. The flow chart (beginning at step tO) traces the entire technique for identifying the phrase. At the conclusion of this process, the phrase is bounded, and the code matching on the actual dictionary patterns may commence. It can be seen that this algorithm involves little else than a few arithmetic counts and comparisons, and certainly no analysis of the source language is performed. This example serves well to point up the essential philosophy of the Unified Transfer technique; the computer is used for the things it does best, namely arithmetic and logic, while the analysis is done in advance by means of the dictionary. We do not ask the computer Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Unified Transfer Phrase Identification Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 to come to conclusions about form; we merely ask it to choose between various possible forms on a basis of simple logical rules. In this way, the full power of the machine is used in the most efficient manner. The subsequent code matching process is also designed with this same philosophy. The only question asked is basically an "equal-or-unequal" choice; blocking for syntactic re-arrangement is similarly well suited to this type of treatment. In no case does the machine ever "know" about syiitax or meaning; it only follows completely abstract rules for operating on certain numerical Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3