The two principal books published in Russian by the laboratory more than fifteen years ago have unfortunately become a rarity and are unlikely to be found in bookstores today. We would like to offer
you electronic versions of these books in DjVu format. To view the content, unpack the
files and use any suitable DjVu-reader (for MS Windows-based machines you can download such a viewer here).
The books:
Лингвистическое обеспечение системы ЭТАП-2 (9.2 MB)
Лингвистический процессор для сложных информационных систем (5.5 MB)
Various publications:
1. Leonid Iomdin & Leonid Cinman, 1997.
Lexical Functions and Machine Translation
[Dialogue'97. Computational Linguistics and its Applications. Proceedings.
Moscow, 1997.]
Abstract:
The ETAP-3 English-to-Russian/Russian-to
English MT system makes use of two main combinatorial dictionaries (ComD), the Russian
ComD and the English ComD, both resorted to in each direction of translation. Recently, an
important step has been made to introduce information on lexical functions (LF) of the
Meaning Ы Text theory (MTT) into the ETAP-3 system. The information on LFs is considered
to be independent of the translation purposes and is therefore located in the general zone
of the ComDs. The notation used for representing LFs is close to that accepted in the
standard MTT and, hence, quite familiar to NLP developers.
A major lexicographic problem to be solved when introducing LFs into MT is the
determination of the space of LF values. Normally, an LF value is a lexeme; however, more
complex values such as wordforms on the one hand and word combinations on the other hand
are quite common in the MTT. In the present ETAP-3 release, every entered LF value is a
lexeme which may however be followed by a preposition or a phrasal adverb.
LFs are used at a few concrete points of the MT algorithm:
· After the syntactic structure of the sentence has been built, the parameter-type LFs
are identified.
· In the transfer phase, the LF values are translated into the target language in
accordance with the LF dictionary zone, overriding default equivalents.
· Subsequently, information on LFs is used to generate the missing prepositions and
adverbs, wherever necessary.
2. Leonid Iomdin & Oliver Streiter, 1999.
Learning from Parallel Corpora:
Experiments in Machine Translation
[Dialogue'99: Computational Linguistics and its Applications International
Workshop Vol.2, pp. 79-88]
Abstract:
The research described in this paper is rooted in the endeavours to dynamically combine
different MT approaches in order to improve the performance of MT systems - most
importantly, the quality of translation. The authors review the ongoing activities in the
field and present a case study, which shows how simple statistical data concerning single-
and multiword translations can be drawn from parallel corpora and compiled into the
lexicon of a rule-based MT system. As a result, the lexicon is enriched with translation
equivalents attested for different subject domains, which facilitates the tuning of the MT
system to a specific subject domain and improves the quality and adequacy of translation.
3. Michael Carl, Leonid L. Iomdin, Catherine Pease and Oliver Streiter, 1999.
Towards a Dynamic Linkage of Example-Based and
Rule-Based Machine Translation
Abstract:
In order to ensure a better performance of a machine translation system, most importantly
to improve the quality of translation, and to make the MT systems easier to tune to the
needs of different users, IPPI and IAI are combining the advantages of two machine
translation ideologies, those of inductive and deductive MT, into one system. The
objective of this activity is to investigate the consequences of this linkage and to
determine the types of linguistic entities that can be dynamically transferred between the
different components without introducing additional translation errors. Extensive research
in this area will contribute to a better understanding of translation as a human activity
and help to optimize the general paradigm of machine translation.
4. Oliver Streiter, Leonid L. Iomdin, Munpyo Hong and Ute Hauck, 1999
Learning, Forgetting and Remembering: Statistical Support for Rule-Based MT
[Proceedings of the 8th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI99), August 23-25, 1999, Chester, England]
Abstract:
The paper describes the incorporation of statistical knowledge into two different
Rule-Based MT (RBMT) systems. In earlier experiments, these systems were linked with
Memory-Base MT components, so that by now the translation process is supported by three MT
paradigms. The paper concentrates on the acquisition of rich, informative, balanced, and
up-to-date statistical data from monolingual and parallel corpora and on ways of using
these data in RBMT systems. The authors keep their pledge of a systematic investigation of
the linkage of different MT paradigms aimed at improving the quality of translation.
5. И.М. Богуславский, Л.Л. Иомдин, 1999.
Семантика быстроты (Semantics of Quickness)
[Вопросы языкознания, № 6, 1999. С. 13–30.] (In Russian.)
pdf, 661 KB
6. И.М. Богуславский, Л.Л. Иомдин, 2000.
Семантика медленности (Semantics of Slowness)
[Слово в тексте и в словаре. Сборник статей к 70-летию академика Ю.Д.Апресяна. Москва, Языки русской культуры, 2000. С. 52–60.] (In Russian.)
pdf, 288 KB
7. Igor Boguslavsky, Nadezhda Frid, Leonid Iomdin, Leonid Kreidlin, Irina Sagalova and Victor Sizov, 2000.
Creating a Universal Networking Language Module within an Advanced NLP System
[COLING in Europe. The 18th International Conference on Computational Linguistics. Saarbrucken, Proceedings. Vol. 1, 2000, pp. 83–90]
[zipped Word 97 file]
Abstract:
A multifunctional NLP environment, ETAP-3, is presented. The environment has several NLP applications, including a machine translation system, a natural language interface to SQL type databases, synonymous paraphrasing of sentences, syntactic error correction module, and a computer-assisted language learning tool. Emphasis is laid on a new module of the processor responsible for the interface with the Universal Networking Language, a recent product by the UN University intended for the facilitation of multilanguage, multiethnic access to communication networks such as WWW. The UNL module of ETAP-3 naturally combines the two major approaches accepted in machine translation: the transfer-based approach and the interlingua approach.
8. Л.Л.Иомдин, В.Г.Сизов, Л.Л.Цинман, 2001.
Использование эмпирических весов при синтаксическом анализе (Using Empirical Weights in Syntactic Analysis)
[Обработка текста и когнитивные технологии, № 6. Казань, Отечество, 2001. С. 64–72] (In Russian, English abstract.)
pdf, 229 KB
Abstract:
The paper discusses a complex of solutions aimed at ambiguity resolution in the parsing component of a multipurpose NLP system, ETAP-3. The main idea is to introduce a system of priorities, or weights, dynamically assigned to the elements of the text processed and of the structure generated during all parsing phases. These weights, empirically assigned by linguists to lexical entries and fragments of parsing rules, help tune the parser to the generation of an optimal syntactic structure of an ambiguous sentence.
9. Ju. D. Apresjan, I. M. Boguslavsky, L.L. Iomdin, L. L. Tsinman, 2002.
Lexical Functions in NLP: Possible Uses.
[Computational Linguistics for the New Millennium: Divergence or Synergy? Festschrift in Honour of Peter Hellwig on the occasion of his 60th Birthday. Peter Lang, 2002, pp. 55–72]
pdf, 182 KB
Abstract:
The paper describes the use of lexical functions, an instrument proposed in Igor Melcuk's Meaning
<=> Text linguistic model, in advanced NLP applications, including parsers, high quality machine
translation, a system of paraphrasing and computer-aided learning of lexica. In parsing, collocate LFs are used to resolve or reduce syntactic and lexical ambiguity. The MT system resorts to LFs to provide idiomatic target language equivalents for source sentences in which both the argument and the value of the same LF are present. The system of paraphrasing, which automatically produces one or several synonymous transforms for a given sentence or phrase, can be used in a number of advanced NLP applications ranging from machine translation to authoring and text planning. The computer-aided system of learning lexica is also based on the concept of LFs as a tool of formal description of that part of vocabulary which is simultaneously systematic and idiomatic and is therefore most difficult for language acquisition.
10. И.М. Богуславский, Л.Л. Иомдин и др.,2002.
Разработка синтаксически размеченного корпуса русского языка (Development of a syntactically annotated corpus of Russian)
[Доклады научной конференции «Корпусная лингвистика и лингвистические базы данных». СПб, изд-во Санкт-Петербургского университета, 2002. С. 40–50] (In Russian.)
pdf, 214 KB
Аннотация:
В течение нескольких последних лет Лаборатория компьютерной лингвистики
ИППИ РАН разрабатывает размеченный корпус русских текстов для
последующего его использования в широком классе теоретических и прикладных
задач. Значительную научную и практическую ценность корпусу придает глубина
аннотации текста: в составляемом корпусе – первом в истории аннотированном
корпусе для русского языка - тексты снабжаются детальной морфологической и
синтаксической информацией. В настоящее время разрабатывается вторая очередь
корпуса, по завершении которой общий его объем составит 12 000 синтаксически
аннотированных предложений, или свыше 180 000 словоупотреблений. К обеим
очередям корпуса после окончания работы будет обеспечен свободный
телекоммуникационный доступ.
11. L.L. Iomdin, V.G. Sizov, L.L. Tsinman, 2002.
Utilisation des poids empiriques dans l’analyse syntaxique: une application en Traduction Automatique
[META, vol. 47. 2002. N. 3, pp. 351–358]
pdf, 84 KB
Abstract: Empirical Weights in Parsing
The paper discusses a complex of solutions aimed at
ambiguity resolution in the parsing component of a multipurpose NLP system, ETAP-3. The main
idea is to introduce a system of priorities, or weights, dynamically produced for the elements of
the text processed and of the structure generated during all parsing phases. These weights,
empirically assigned to lexical entries and fragments of parsing rules, help tune the parser to the
generation of an optimal syntactic structure of an ambiguous sentence.
12. Igor Boguslavsky, Ivan Chardin, Svetlana Grigorjeva, Nikolai Grigoriev, Leonid Iomdin, Lеonid Kreidlin, Nadezhda Frid, 2002.
Development of a dependency treebank for Russian and its possible applications in NLP
[Proceedings of the Third International Conference on Language Resources and Evaluation (LREC-2002), v. III, Las Palmas. P. 852–856]
pdf, 214 KB
Abstract:
The paper describes a tagging scheme designed for the Russian Treebank and presents tools used for corpus creation.
13. Leonid Iomdin, 2003
Natural Language Processing as a Source of Linguistic Knowledge
[ Proceedings of the International Conference on Machine Learning; Models, Technologies and Applications. Las Vegas, June 23–26 2003, pp. 68–74]
pdf, 266 KB
Abstract:
The paper discusses a number of specific problems of natural text parsing that emerge during
the operation of a highly developed rule-based machine translation system, ETAP-3. Emphasis is
laid on two classes of problems: 1) adequacy of linguistic description of the working languages of the MT system and 2) means of resolving lexical and syntactic ambiguity of the source text. It is claimed that no parser, however sophisticated or advanced, can be made entirely free of lacunae and gaps. The reason is that many of the linguistic facts, including those critical for parser operation, have never come into view of researchers simply because they have not had at their disposal mass material of unexpected or incorrect parsing. It is exactly such material that is
amply provided by a highly developed NLP system. If handled properly, this feedback helps the researcher to find the gaps of scientific descriptions and eliminate them. Consequently, linguistic
experimentation with NLP systems becomes a rightful and very promising scientific method. In a way,
linguistic applications start to stimulate theoretical research, thus inverting the situation that has existed ever since NLP came to life.
14. Jurij Apresian, Igor Boguslavsky, Leonid Iomdin, Alexander Lazursky, Vladimir Sannikov, Victor Sizov, Leonid Tsinman, 2003.
ETAP-3 Linguistic Processor: a Full-Fledged NLP Implementation of the MTT
[MTT 2003. First International Conference on Meaning-Text Theory. Paris, Ecole Normale Superieure, June 16–18 2003, pp. 279–288]
pdf, 214 KB
Abstract:
A multifunctional NLP environment, ETAP-3 linguistic processor, is presented. The
environment, largely based on the Meaning <-> Text Theory, offers several NLP applications,
including a machine translation system, a module of synonymous paraphrasing of sentences, a
tagger for syntactic annotation of text corpora, a Universal Networking Language interface, a
computer-assisted language learning tool, a natural language interface to SQL type databases,
and a syntactic error correction module. While all applications are briefly discussed, emphasis
is laid on machine translation, as it is by far the most advanced application of all.
15. Jurij Apresian, Igor Boguslavsky, Leonid Iomdin, Leonid Tsinman, 2003.
Lexical Functions as a Tool of
ETAP-3.
[MTT 2003. First International Conference on Meaning-Text Theory. Paris, Ecole Normale Superieure, June 16–18 2003]
pdf, 187 KB
Abstract:
The paper describes the use of lexical functions, an instrument proposed in Igor Mel'cuk's “Meaning
<-> Text Theory” (MTT), in advanced NLP applications as exemplified in the ETAP-3 linguistic
processor, including parsers, high quality machine translation (MT), a system of paraphrasing and
computer-aided learning of lexica. In parsing, collocate LFs are used to resolve or reduce syntactic and lexical ambiguity. The MT system resorts to LFs to provide idiomatic target language equivalents for source sentences in which both the argument and the value of the same LF are present. The system of paraphrasing, which automatically produces one or several synonymous transforms for a given sentence or phrase, can be used in a number of advanced NLP applications ranging from MT to authoring and text planning. The computer-aided system of learning lexica is also based on the concept of LFs as a tool of formal description of that part of vocabulary which is simultaneously systematic and idiomatic and is therefore most difficult for language acquisition.
16. Leonid Iomdin, 2003.
Purpose and Idea: a Lesson Drawn from Machine Translation
[MTT 2003. First International Conference on Meaning-Text Theory. Paris, Ecole Normale Superieure, June 16–18 2003, pp. 269–278]
pdf, 187 KB
Abstract:
The paper discusses certain problems of natural text parsing that emerge during the operation
of a machine translation system. Emphasis is laid on adequacy of syntactic description of the
working languages. It is claimed that no parser, however sophisticated, can be made
completely free of lacunae. The reason is that many of the linguistic facts, critical for parser
operation, have never come into view of researchers because they have not had at their
disposal mass material of unexpected or incorrect parsing. It is exactly such material that is
abundantly provided by the output of a highly developed NLP system. If handled properly,
this material helps the researcher to locate the gaps of linguistic descriptions and eliminate
them. Consequently, linguistic experimentation with NLP systems becomes a rightful and
very promising scientific method. In a way, linguistic applications start to stimulate theoretical
research, thus inverting the situation that has existed ever since NLP came to life. To
substantiate this standpoint, a specific type of Russian copulative compound sentences is
considered in detail. A new type of syntactic feature is introduced in order to adequately
handle such sentences.
17. И.М.Богуславский, Л.Л.Иомдин, В.Г.Сизов, И.С.Чардин, 2003.
Использование размеченного корпуса текстов при автоматическом синтаксическом анализе (Using a tagged corpus in automatic parsing)
[Труды Международной конференции «Когнитивное моделирование в лингвистике-2003». Варна, 2003] (In Russian.)
pdf, 266 KB
Аннотация:
Предлагается комбинированный алгоритм синтаксического анализа, используемый в лингвистическом
процессоре ЭТАП-3 и, в первую очередь, в системе машинного перевода. При разрешении языковой
неоднозначности составляющие ядро процессора эвристические правила динамически взаимодействуют
со специально разработанным статистическим модулем, который на основе данных корпуса текстов
с синтаксической разметкой приписывает веса гипотетическим синтаксическим связям.
Для сбора корпусных данных были использованы русские тексты с синтаксической разметкой
общим объемом в 6900 предложений (около 104000 слов). В ходе экспериментов по машинному
переводу текстов с русского на английский язык с помощью данного комбинированного алгоритма
выявлены локальные улучшения в работе лингвистического процессора, стимулирующие качественное
развитие синтаксического анализатора и открывающие перед его разработчиками новые перспективы.
В то же время количественное сравнение результатов работы комбинированного и эвристического
алгоритмов синтаксического анализа не показало существенных различий в результатах их работы.
18. Igor Boguslavsky, Leonid Iomdin, Victor Sizov, 2003.
Interactive enconversion by means of the ETAP-3 system
[Proceedings of the International Conference on the Convergence of Knowledge, Culture, Language and Information Technologies. Alexandria, 2003]
pdf, 149 KB
Abstract:
A module for enconversion of NL texts into Universal networking Language (UNL) graphs is considered. This module is designed for the system of multi-lingual communication in the Internet that is being developed by research centers of about 15 countries under the aegis of UN. The enconversion of NL texts into UNL is carried out by means of a multi-functional linguistic processor ETAP-3, developed in the Computational linguistics laboratory of the Institute for Information Transmission Problems of the Russian Academy of Sciences. One of the major problems in the automatic text analysis is high degree of ambiguity of linguistic units. The resolution of this ambiguity (morphological, syntactic, lexical, translational) is partly ensured by the linguistic knowledge base of ETAP-3, but complete algorithmic solution of this problem is unfeasible. We describe an interactive system that helps resolve difficult cases of linguistic ambiguity by means of a dialogue with the human.
19. Igor Boguslavsky, Leonid Iomdin, Victor Sizov, 2004.
Multilinguality in ETAP-3. Reuse of Linguistic Resources
[Proceedings of the Workshop “Multilingual Linguistic Resources. 20th International Conference on Computational Linguistics, Geneva, 2004. pp. 7–14]
pdf, 128 KB
Abstract:
The paper presents the work done at the Institute for Information Transmission Problems (Russian Academy of Sciences, Moscow) on the multifunctional linguistic processor ETAP-3. Its two multilingual options are discussed - machine translation in a variety of language pairs and translation to and from UNL, a meaning representation language.
For each working language, ETAP has one integral dictionary, which is used in all applications both for the analysis and synthesis (generation) of the given language. In difficult cases, interactive dialogue with the user is used for disambiguation. Emphasis is laid on multiple use of lexical resources in the multilingual environment.
20. Л.Л. Иомдин, 2004.
Уроки машинного перевода для детей и взрослых (Lessons of Machine Translation for Children and Grownups)
[Лингвистика для всех. Зимняя лингвистическая школа–2004. Москва, НИИРО. С. 56–68]
(In Russian.)
pdf, 291 KB
Аннотация:
Что такое машинный, он же автоматический, он же компьютерный, перевод? Сейчас, когда
для перевода текстов с одного языка на другой компьютер используется самыми разными
способами – от двуязычных и многоязычных электронных словарей до систем типа
translation memory («память», или «архив» переводов), этот вопрос оказывается не таким
уж простым. Мы будем понимать машинный перевод как процесс, при котором компьютер
по заданному тексту на одном языке производит новый текст на другом языке, которого
раньше в этом компьютере не было: понятно, что ни словари, ни архивы переводов таким
свойством не обладают. Когда можно говорить о том, что текст A на одном естественном языке является переводом текста Б на другом языке? Разумеется, тогда, когда оба текста – А и Б – имеют одинаковый
смысл. Задача любого переводчика как раз и состоит в том, чтобы передать смысл текста
(будь то письменного или устного) на одном языке средствами другого языка. В этом же
состоит и задача машинного перевода.
21. Л.Л. Иомдин, 2004.
Идея и цель: об одном типе русских связочных предложений (Idea and Purpose: On One Sort of Russian Copula Sentences)
[Сокровенные смыслы. Слово, текст, культура. Сборник статей в честь Н.Д. Арутюновой. Москва, Языки славянской культуры. 2004. С. 418–425] (In Russian.)
pdf, 291 KB
22. Igor M. Boguslavsky, Leonid L. Iomdin et al, 2005.
Interactive Resolution of Intrinsic and Translational Ambiguity in a Machine Translation System
[CICLing 2005. Lecture notes in computer science. A.Gelbukh (ed.), Springer-Verlag Berlin – Heidelberg 2005, pp. 383–394]
pdf, 288 KB
Abstract:
The paper presents the module of interactive word sense disambiguation and syntactic ambiguity resolution used within a sophisticated machine translation system, ETAP-3. The method applied consists in asking the user to identify a word sense, or a syntactic interpretation, whenever the system lacks reliable data to make the choice automatically. For this purpose, entries of the working dictionaries of the system are supplemented with clear diagnostic com-ments and illustrations that enable the user to choose the most appropriate option and in this way channel the course of system operation.