REPORT ON: 1) SOME PRINCIPLES OF THE UNIFIED TRANSFER SYSTEM (UTS) 2) AUTOMATIC DECLENSION OF RUSSIAN NOUNS FOR UTS 3) COMPUTER IMPLEMENTATION OF UTS
Document Type:
Collection:
Document Number (FOIA) /ESDN (CREST):
CIA-RDP64-00046R000200030003-3
Release Decision:
RIPPUB
Original Classification:
K
Document Page Count:
75
Document Creation Date:
December 15, 2016
Document Release Date:
December 19, 2003
Sequence Number:
3
Case Number:
Publication Date:
January 1, 1960
Content Type:
REPORT
File:
Attachment | Size |
---|---|
![]() | 2.38 MB |
Body:
STAT
Approved For Release 2004/01/15 : CIA- DP64-00046R0002000302w;~e~j
/.i.
1) SOME PRINCIPLES OF THE UNIFIED TRANSFER
SYSTEM (UTS)
2) AUTOMATIC DECLENSION OF RUSSIAN NOUNS
FOR UTS
3) COMPUTER IMPLEMENTATION OF UTS
By
Ariadnd Lukjanow
Rudolf Loewenthal
B, D. Blickstein
:*trnuuttuuuunmumuumtutauuumumuununtruntuuunilumm~urnumutumntumumnmmnnunuuun-ummuuur
C-E-I-R
MAIN A OtF~FICCE: 734 Fifteenth Street, N.W., Washington 5, D. C.
lpprove For ReieaseR20d 01/t~5
~nuuunnuutuuuuuuumutmuuutututunmtmutnltmtunuunuuuuuuunuununnnnunuuuuunnununmm~nnnnu~
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
REPORT ON:
1) SOME PRINCIPLES OF THE UNIFIED TRANSFER SYSTEM
By
Ariadne Lukjanow
2) AUTOMATIC DECLENSIQN OF RUSSIAN NOUNS FOR
UNIFIED TRANSFER SYSTEM
By
Rudolf Loewenthal
3) COMPUTER IMPLEMENTATION OF UNIFIED TRANSFER
SYSTEM
By
B. D. Blickstein
January 1960
C--E-I-R, INC,
Main Office: 734 Fifteenth Street, N?W?,;Washington 5, D.C.
Research Center: 1200 Jefferson Davis Highway, Arlington 2, Va..
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
SOME PRINCIPLES OF THE UNIFIED TRANSFER SYSTEM
By Ariadne Lukjanow ................................. I
AUTOMATIC DECLENSION OF RUSSIAN NOUNS FOR
UNIFIED TRANSFER SYSTEM
By Rudolf Loewenthal................................ 38
COMPUTER IMPLEMENTATION OF UNIFIED TRANSFER SYSTEM
By B. D. Blickstein ................ ............... 66
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
REPORT ON SOME PRINCIPLES OF THE UNIFIED TRANSFER SYSTEM (UTS)
By
Ariadne Lukjanow
C-E--I-R, INC.
I. INTRODUCTION
Several approaches have been employed in Machine Translation in the course
of the past few years. These approaches were either determined by specific
objectives or influenced by the background of the research workers. The ob-
jectives range from automatic.dictionaries to translations with varying degrees
of_accuracy, readability, and perfection. The background of a researcher can
influence his approach to Machine Translation in three basic ways. One approach
may be influenced by machines in such a way that only the development ofa new
language computer would lead to acceptable results. Another approach may consist
of an attempt to simulate human reasoning on a standard computer.
A third approach would be to make Machine Translation as mechanical and
utilitarian as possible, by adapting this attempt to the capabilities of the
machine and by clearly defining the relationship between man.and machine. Sine
present-day computers are best suited to repetitive mathematical operations and
man is still the best thinker, this last approach will make it possible to
utilize both of these capabilities to their fullest extent. All thinking will
be expressed in the form of codes in the dictionary in the mariner provided for
by the system.
In order to translate at all, any system must provide solutions to the
problem of transferring structure, function, form and meaning from the source
language into the target language. Thus, we can call translation a fourfold
transfer process consisting of:
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
(1) Transfer of the function of words (parts of speech)
(2) Transfer of the form of words (morphology)
(3) Transfer of the meaning of words (semantics)
(4) Transfer of the location of words (syntax)
Every word has a meaning, even if there occurs a so-called "zero-
translation," or non-translation. In this system, we shall.accept a 1:1
translation as equivalent to no-meaning problem.
Every word in a language has its function; i.e., it is a part.of speech
and, unless it is a non-translation item, it also has a location or position
(syntax) qualification. Transfer process can be visualized as a combination of
the following six concepts:
(1) Function (some "particles," some adverbs)
(2) Function + location (some punctuation marks, some adverbs,
some gerunds)
(3) Function + form + location (groups from all parts of speech)
(4) Function + form (some prepositions, some adverbs, some gerunds,
negations, etc.)
(5) Function + form + meaning + location (groups from every part of
speech)
(6) Function + meaning + location (some adverbs, some conjunctions,
etc.)
Example:
Combination of function and location:
posle - later; adverb with a 1:1 translation equivalent
and location "after verb."
Colon, punctuation mark:- 1:1 equivalent, position:is at
the end of a clause.
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Function
Form
Meaning
Location
x
0
0
0
0
x
0
0
0
0
x
Q!
0
0
0
x
x
x
0
0
x
0
x
0
x
0
0
x
0
x
x
0
0
X
0
X
0
0
x
x
x
x
X
0
x
0
x
x
0
x
x
x
x
x
0
x
It would seem that these variations could be expressed in mathematical
formulae, but this is not true because the rdlationship between the variants
does not. follow the rules of permutation or random combinations. In contrast,
these variations follow definite linguistic rules which permit only certain
variants within certain combinations. In order to determine these linguistic
combinations for the elements of transfer, it is necessary to define and
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
classify each variant for every element of transfer, as well as the relationship
between the variants of each element of the transfer to the variants of the other
three.
This can best be illustrated on prepositions:
ELEMENT OF
TRANSFER
DEFINITION
function
preposition
form
case government; i.e., pre-
positions demanding the
genitive, dative, accusative,
instrumental, or locative
meaning
prepositions of time (static,
earlier, later), location or
space (where, to where, from
where), cause, goal, substi-
tution, division, etc.
location
first item in prepositional
phrase, or position 1 in pre-
positional phrase
Theoretically, we could produce a transfer combination of preposition +
dative + location (from where) + position 1 of prepositional phrase, but the
grammatical rules and semantic connotations do not permit this type of com-
bination. The prepositions of location are subject to the following division
only:
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
-5-
LOCATION
GENITIVE
DATIVE
ACCUSATIVE
INSTRUMENTAL LOCATIVE
a) where?
bliz
po
za v
mezhdu na
vne
nad pri
mezhdu
sredi
pered
pod
u
b) where
do
k
v
to?
za
o
na
pod
skvoz6
cherez
c) from
iz
where?
iz-za
iz-pod
of
s
The above table shows that the "from where?" definition is used
only with the genitive case. Thus, the only usable and meaningful combination
is:
preposition + genitive + location (from where?) +
first position of prepositional phrase
In the UTS we accept any meaningful and valid combination of elements of transfer
expressed in the form of numerical digits as a single unified transfer code.
Since many words of the source language can be associated with
several function, form, meaning, and location qualifications, it is necessary
to combine single transfer code units into sets of codes which can express
these variations.
Examples:
dannye nominal
modifier
vdol6 preposition of genitive
adverb
s ? preposition of - genitive
Approved For Release 2004/01/15: CIA-RDP64-6Otfb200030003-3
- instrumental
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
sredi preposition of location (where?)
time (static)
If we consider that we have four elements of transfer, each of which has
a definite and limited number of variants, it is safe to assume that the number
of transfer codes is limited and that we may likewise assume that the same
applies to sets of transfer codes. This leads us to the concept that numerous
words in the dictionary are associated with identical transfer codes or identical
sets of transfer codes. This fact makes possible the concept of code patterns.
The number of single transfer code units in the pattern can vary from one to
several. After examining some 50,000 canonical entries (stems) in the dictionary
of Smirnitskij, we have decided to set the limit at a maximum of 25 single code
units in the pattern.
Now let us examine the actual elements of each transfer. Since in
translation we are dealing with at least two languages simultaneously, we have
to develop a criterion for parts of speech, morphology, semantics, and syntax
which would accommodate both languages under consideration, or we must establish
a classification system which in form of transfer codes would permit us to place
an equal sign between the two languages. This necessitates a certain type of
analysis and of synthesis of the grammars of both languages.
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
II. THE. FUNCTION OF WORDS OR THE CATEGORIZATION OF WORD BEHAVIOR
When examining conventional parts of speech in Russian and English grammars
separately, we note that they contain identical categories such as prepositions,
adverbs, nominals, modifiers, etc. But when we compare these categories of both
languages, we discover that they differ considerably in usage, behavior, and
function. In terms of a translation system, this means that either we have to
introduce new synthetic categories or we have to divide and redistribute words
differently within these categories. Categorizing is, of course, a somewhat
subjective process. That can best be illustrated by examining the Englishpre-
position "to," in the following manner:
QUALIFICATION
ENGLISH
RUSSIAN
EQUIVALENTS
BILINGUAL
DATA
TRANSFER DATA
(CLASSIFICATION)
Function
1. prepo-
1. preposition.
1. prepo-
1. preposition
sition
sition -
code
like item
Behavior
2. intro-
2. non-
2. particle
2. particle
ducer of
existent
like item
code
infini-
tive
classified as a special auxiliary verb (instead of "particle"), but to the
author of the system the definition as "particle" appears more reasonable,
perhaps because of the occurrence of the Russian particle "by" in the verbal
phrase.
In the process of comparative analysis-synthesis, we have established the
following basic categories as transfer parts of speech (listed .alphabetically):
(1) adjectival modifier
(2) adjective/noun
(3) adverb (incl. some gerunds and the particle li)
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
(4) adverbial modifier (type: bolee, menee, etc.)
(5) Auxiliary verb (byl, byli, etc.)
(6) auxiliary verb (moch6, khotet6, etc.)
(7) conjunction
(8.) negation (incl. some negative adverbs)
(9) nominal (animate), incl. some pronouns
(10) nominal (inanimate), incl. some pronouns and numerals.
(11) nominal (formulae, cardinal numbers, missing words)
(12) numerical modifier
(1.3) particle
(14) participal modifier
(15) preposition
(16) pronominal modifier
(17) pronoun (type: nami, vami, imi, etc.)
(18) pronoun (sohoj)
(19) punctuation marks (each treated as a separate
category, a total of six)
(20) verb (including participles such as izucheny, cotkryty, etc.)
The assignment of these basic categories to individual words is a discrete
and subjective process. It can give valid results only if all other factors and
constituent parts of transfer are being taken into consideration. We proceed
from the parts of speech as categories to their classification. That can be
expressed in the form of a numeric code.
We know that sentences and phrases are combinations of these categories
and that these combinations cannot be produced by random distribution of words.
Words have to occupy certain positions in order to form a meaningful combination
or phrase.
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
If we take the three-word phrase "in this room," we cannot convey the
same idea by a redistribution of the participating words:
"this in room"
"this room in"
"room in this"
"room this in"
"in room this"
We will either get.a meaningless jumble of words or convey a different
idea. We say "our new building," but not "new our building." We place some
adverbs before verbs, some after them. Some of these phenomena can be explained,
some are ascribed to usage, but others escape any logical explanation.
Dealing with.26 categories and considering each of them in relation to
the other 25, we can establish a hierarchy within the meaningful combinations
of parts of speech; i.e., logical sequences.
This point can be illustrated by the position of words within the sequence
of a. prepositional phrase consisting of a preposition (P), a nominal (N), two
adjectival modifiers (AM), and a pronominal modifier (PM):
P before N
AM before N
PM before N
PM before AN
P before PM
P before AM
AM = AM
Thus, we arrive at P-PM-AM-AM-N; or if we assign numerical values to
these categories and would like them to form a progression of it 12 i3, etc.,
we will emerge with the following correlations.
P
a)
Cd
d
?CI
Cd
?r1
Cd
Cd
?r1
'
?ri
'n
a)
I
?
H
?~
?n
>
>
r.
?'
?n
>
.
,
r
CD
(1)
?r)
dl
0
0
a)
a)
Cd
?ri
a)
0
?r1
-I
r-I
a
M
d
N
,4
r4
N
C
g
Cd
Cgs
H
g
~
a
C
C
-I
Ii
Cd
-1
.14
C
d
14
0
>
0
>
a)
'n
a)
?n
a)
?'n
0)
>
(2)
29
'n
a)
'n
29
>
a)
t51
t5
'n
?n
'
'n
?n
?n
>
>
'n
?n
>
'n
?'l
>
.
.
.
a)
a)
r)
CO
(D
a)
a)
0
0
a)
a)
O
a)
a)
0
a)
ri (
>,
?rl
?rl
?rl
?ri
?ri
?r1
Cd
?r
?rrI
0
H
>?,
H
?r1
?r1
H
?rC
?ri
?r1
?r1
?r1
>,
?r1
?~
Cd
?
1
?
1
r
,
Cd
H
O
a)
a)
a)
a)
?rl
a)
a)
a)
N
?rl
a)
a)
a)
a)
a)
0
?r1
a)
a)
?ri
H
a)
a)
a)
N
a)
Q)
a)
a)
0)
LO
0
0
O
5
E
E
El
Cd
Cd
a)
S
Cd
Cd
?n
r., O
'
n
.r1i
?n
?n
H
H
r
a)
El
0
a)
a)
a)
S
a)
a)
(1)
(2)
(1)
a)
O
(2)
a)
(1)
(1)
0
O '-
a)
(D
a)
a)
0
CD
CD
0
0
a)
a)
O
0
a)
0
O
Cd ri
d~
t5ti
ts1
'n
O
O
tsi
?r Cd i
O
a)
a)
a)
a)
:j
0
0
H
?r~?I
?0
?r~C
CO
CO
Cd
d
rq
C
H
?
M
O.
Ed
cd
0
0
.ri
M
a)
a)
H
H
H
O
H
a)
H
H
a
rii
a)
N
a)
a)
r1
a)
N
H
H
~
~
7
O
?r
?I
?r
?I
a)
O
a)
O
a
CV
Cd
Cd
?,H?1
?rri
?rl
(d
?r~i
N
H
?4
Cd
?r~?1
'Y,
?rl
H
?r1
H
?ri
?r1
?r1
?,i
Cd
Cd
-H
-H
d
d
C
C
Cd
Cd
Cd
ri
t9.
L9.
'~
CO
CD
29.
'n
0
a)
a)
a)
0
Cd
Cd
Cd
rq
H
?rI
.1144
CO
CO
25.
t9.
O
CD
'a
m
m
tg,
0
0) a)
E3 1--1 Q
1 ?
?
1
+?a
0
Cd
9
Cd
Cd
Cd
Cd
Cd
?ri
Cd
Cd
Cd
Cd
Cd
+H
Cd
H
+a
Cd
?rH
H
Cd
?r1
?r1
?r1
?ri
?r1
Cd
Cd
?
1
I
d
r
r
O
?
r.
C
Gi
0
0
.
Cd
9
0
Ci
0
0
C".
9
0
C,
9
F'
9
F,?
0
9
0
r
0
L"
r
0
-
P
C
0
O 0
.
?r
?ri
?11
?ri
?r1
?r1
Cd
?r1
?r1
-1
-1
?ri
.0
H
Cd
,.CZ
?rI
Cd
Cd
?r1
Cd
Cd
Cd
Cd
.
Cd
i
H
,
?r1
Cd
?,
Cd
1
?
r
N a a)
(1)
r-q
O
U ?
>1
O
Cd
A
Cd
,Q
U
?C)
d
Cd
A
U
T)
Cd
A
U
Cd
.0
C)
"0
Cd
A
Cd
a
Cd
O
O
A N E?
Fi
r'-1
ri
CV
N
N
CV
CV
M
V4
V
V,
to
CU
Cf)
CO
CD
CD
CO
I-
h
00
00
0)
Ca)
O
-1
-I
0
r-I
r
r
H
ri
ri
a) G a) Q)
ri o +
C
0
O
0
,--1
C'l
C
tf)
CD
h
00
aA
0
r-1
N
M
cr
LO
CD
n
00
CA
O
r-1
CV
co
Il
U')
CO
C-
00
0)
?
.
H
El
H
0
ri
0
ri
0
H
0
-1
O
i
O
0
0
0
ri
r-1
1-1
r-I
r-I
1-1
r-I
1-1
r-1
H
N
CI
CV
N
CV
CV
CV
CV
CV
N
0
M
C
P
r
r
ri
r-I
ri
ri
ri
r1
ri
1-1
ri
r-i
ri
H
r-I
ri
r-1
ri
ri
r-1
.-I
ri
ri
ri
ri
ri
r-I
C1 a
a)
o
Z1
>ti
?ri
F
U
N
U
N
U
N
V
V
U
U
H
+;
+-~
+~
+~
U
V
V
U
U
C)
U
U
U
U
+H
U
+
a)
Ed
N
N
N
N
:
Cy
Cd
Cd
Cd
Cd
a)
N
a)
a)
a)
N
S
a)
N
N
N
N
N
N
N
N
L
'.)
>
A
S
F
a)
E
a)
Cd
9
9
q
G
w
a)
w
w
w
a)
a)
O
S
D
S
,
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
OF THE UNIFIED TRANSFER SYSTEM
B. D. Blickstein
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3 67
The basic flow diagram, Figure 1 on the next page, traces the basic
functions which the computer must follow, and shows the necessary magnetic
tape configuration:.- Also shown on this chart is an index to the tapes,
showing the processes in which each tape is involved.
The flow chart is divided into the following computer program steps:
1. Text Preparation
The entry to this box is the raw text, prepared by either key-
punching from the Russian or by a character-scanning device. The function of
this program is to convert the text to a form which the machine may more easily
accept. At the same time, Romanized expressions will be extracted and saved
for later re-entry into the system. At this point, a transliteration of the
text can be produced.
2, Alpha Sort
The sequenced and prepared text is now sorted into dictionary order.
The original text sequence numbers are retained.
3. Dictionary Search
The sorted-:text is matched against the dictionary tape. For each
text entry for which a dictionary match exists, a record will be written on
tape D, consisting of the appropriate pattern number and the set of English
meanings, still retaining the text sequence number. For each text entry which
has no match, a dummy "word missing" record will be written, and the Russian
word written on the "missing entries" tape Dl for subsequent printing.
4. Sequence Sort
Tape D is now sorted back into text sequence. At the end of the
sort, a split of the tape D record will occur, creating two tapes, E and El;
Tape E contains only pattern numbers, and tape El the corresponding sets of
English meanings.
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Rmease 2004/01/15 : CIA-RDP64-00046R000200030003-3
-p
1ja)
~wa
`f~a,Aammr
~.,
aaa)Aaww
_
h
r. 9
Z
? c
i
C))
oo
O
CH (D
a co
Cd
Ei
idsv1
W U
a) 0 -rq a)
-P
-P -P -N
`rx'i R, 1 U U~
Cciy.~.l -q rI
?rl t O ta0 U
C-~+ e r ] A V~
~ co A
4
)
h0
-to) $
H
C
4
a) 4.3
+~~ N
-P --~ 0 N 'd
(XI
cd cd VI
t
~
+ P
rX4
ti
R
.
y
,
rn CO E- W U
-
,
P
4
+
w 444 to 0
o c ~Q) U
1
f U)
l ?
03 ?rl
r
U) ?r
?r
4
"
'
v~ -{ U
A
U O
D
CS
?:j
i
Q
C-4KC
co co r,
co
O
it
H
4-3 4-J ILA
a) -P
N
rq 4-)
H +7
+ (3It!) 8 0
ID 0 A I'd 10 '0
E1
F
+~ N (E-1 O
-
0) ( N
Z
}y' U N s 7
U) (' z F i
taD U U U! a3 S.
C", ~i' U3 r-1
U)
H
H (1) (1) U)
-
r q U) U) -H
p) a'b
Dc
0
-P
C/)
' C
a
L
C
7co CO &
oC
P
l
it
~PQr-qWC] A:z1 r~4
441 Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
ro
0
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
5. Unified Transfer
The basic code matching algorithms are performed here. Blocks are
recognized, and the proper meaning selections are made, the output is the
sequenced selections tape. The computer considerations of this section will be
treated later at some length.
6. English Extraction
The selections tape is used to select the proper English meaning
from the English tape at this point. The output is an English text with
certain block marks present.
7. Syntactic Ordering
Re-arrangement of the syntactic blocks is performed here; at the
same time, the Romanized expressions are merged back into the text, and a
final translation tape, suitable for printing, is produced.
Some discussion of the matching algorithms is appropriate here; the first
part of the process is shown in Figure 2 on the next page. This involves the
identification of phrases by means of the parts-of-speech code numbers, which
we shall refer to as progression numbers. Let PR(j) be the progression number
associated with the jth text sequence. As the translation progresses, suppose
all phrases through the (j-1)th are strung, and we thus wish to find the
boundaries of the phrase beginning with this jth word. The flow chart (beginning
at step tO) traces the entire technique for identifying the phrase. At the
conclusion of this process, the phrase is bounded, and the code matching on the
actual dictionary patterns may commence. It can be seen that this algorithm
involves little else than a few arithmetic counts and comparisons, and certainly
no analysis of the source language is performed. This example serves well to
point up the essential philosophy of the Unified Transfer technique; the computer
is used for the things it does best, namely arithmetic and logic, while the
analysis is done in advance by means of the dictionary. We do not ask the computer
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Unified Transfer
Phrase Identification
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3
to come to conclusions about form; we merely ask it to choose between various
possible forms on a basis of simple logical rules. In this way, the full power
of the machine is used in the most efficient manner.
The subsequent code matching process is also designed with this same
philosophy. The only question asked is basically an "equal-or-unequal" choice;
blocking for syntactic re-arrangement is similarly well suited to this type of
treatment. In no case does the machine ever "know" about syiitax or meaning; it
only follows completely abstract rules for operating on certain numerical
Approved For Release 2004/01/15 : CIA-RDP64-00046R000200030003-3