AN APPEAL FOR SUPPORT
BOOKS FOR YOU TO READ AND DOWNLOAD FREE!
Girish Karnad as a Modern Indian Dramatist - A Study ...
Copyright © 2009
Advances in Machine Translation SystemsVishal Goyal, M.Tech.
Gurpreet Singh Lehal, Ph.D.
Machine translation system is software designed that essentially takes a text in one language (called the source language) and translates it into another language (called the target language). This paper presents the state of the art in the field of machine translation. First part of this paper discusses the machine translation systems for non-Indian languages and second part discusses the machine translation systems for Indian languages.
Keywords : Machine Translation Systems, Natural Language Processing, MT in India
1. Machine Translation Systems
1.1 Machine Translation System for non-Indian languages
Various machine translation (MT) systems have already been developed for most of the commonly used natural languages. This section briefly discusses some of the existing machine translation systems and the approaches that have been followed.
An English Japanese Machine Translation System (1982) developed by Makoto Nagao et al. The title sentences of scientific and engineering papers are analyzed by simple parsing strategies, and only eighteen fundamental sentential structures are obtained from ten thousand titles. Title sentences of physics and mathematics of some databases in English are translated into Japanese with their keywords, author names, journal names and so on by using fundamental structures. The translation accuracy for the specific areas of physics and mathematics from INSPEC database was about 93%.
RUSLAN (1985), a direct machine translation system between closely related languages Czech and Russian, by Hajic J, for thematic domain, the domain of operating systems of mainframes. The system used transfer based architecture. This project started in 1985 at Charles University, Prague in cooperation with Research Institute of Mathematical Machines in Prague. It was terminated in 1990 due to lack of funds.
The system was rule-based, implemented in Colmerauer's Q-Systems. The system had a main dictionary of about 8,000 words, accompanied by transducing dictionary covering another 2000 words.
The typical steps followed in the system are Czech morphological analysis, syntactic-semantic analysis with respect to Russian sentence structure and morphological synthesis of Russian. Due to close language pair, a transfer-like translation scheme was adopted with many simplifications. Also many ambiguities are left unresolved due to the close relationship between Czech and Russian. No deep analysis of input sentences was performed.
The evaluations of results of RUSLAN showed that roughly 40% of the input sentences were translated correctly, about 40% of input sentences with minor errors correctable by human post-editor and about 20% of the input required substantial editing or re-translation.
There are two main factors that caused a deterioration of the translation. The first factor was the incompleteness of the main dictionary of the system and the second factor was the module of syntactic analysis of Czech. RUSLAN is a unidirectional system dealing with one pair of language, Czech to Russian.
PONS (1995), an experimental interlingua system for automatic translation of unrestricted text, constructed by Helge Dyvik, Department of Linguistics and Phonetics, University of Bergen. 'PONS' is an acronym in Norwegian for "Partiell Oversettelse mellom Nærstående Språk" (Partial Translation between Closely Related Languages).
PONS exploits the structural similarity between source and target language to make the shortcuts during the translation process. The system makes use of a lexicon and a set of syntactic rules. There is no morphological analysis. The lexicon consists of a list of entries for all word forms and a list of stem entries, or 'lexemes'. The source text is divided into substrings at certain punctuation marks, and the strings are parsed by a bottom-up, unification-based active chart parser.
The system had been tested for the translation of sentence sets and simple texts between the closely related languages, Norwegian and Swedish, and between the more distantly related English and Norwegian. The developer concluded that in the case of the closely related languages, formally similar constructions will typically share stylistic properties.
CESILKO (2000), a machine translation system for closely related Slavic language pairs, developed by Hajic J, Hric J. K. and Ubon V. It has been fully implemented for Czech to Slovak, the pair of two most closely related Slavic languages.
The main aim of the system is localization of the texts and programs from one source language into a group of mutually related target languages.
In this system, no deep analysis had been performed and word-for-word translation using stochastic disambiguation of Czech word forms has been performed. The input text is passed through different modules namely morphological analyzer, morphological disambiguation, Domain related bilingual glossaries, general bilingual dictionary, and morphological synthesis of Slovak. The dictionary covers over 7, 00,000 items and it is able to recognize more than 15 million word-forms. The system is claimed to achieve about 90% match with the results of human translation, based on relatively large test sample. Work is in progress on translation for Czech-to-Polish language pairs.
Attitude towards Mother Tongue - A Study of the Tribal Students of Orissa | Computer-mediated Communication in a Bilingual Chatroom | Compensation Strategies for Speaking English Adopted by Engineering Students of Tamil Nadu - A Study | Acquisition of English Intransitive Verbs by Urdu Speakers | Community, Culture and Curriculum in the Context of Tribal Education in Orissa, India | Auxiliary Verbs in Modern Tamil | Getting Around 'Offensive' Language | Noun Morphology in Kuki-Chin Languages | A Plea for the Use of Language Portals in Imparting Communication Skills | Advances in Machine Translation Systems | A Comparative Study of the Effect of Explicit-inductive and Explicit-deductive Grammar Instruction in EFL Contexts | Lexical Choice and Social Context in Shashi Deshpande's That Long Silence | The Voice of Servility and Dominance Expressed through Animal Imagery in Adiga's The White Tiger | Phonological Analysis of English Phonotactics of Syllable Initial and Final Consonant Clusters by Yemeni Speakers of English | Effective Use of Language in Communicating News through Political Emergency | Helping the Limited English Proficient Learner Learn the Second Language Effectively through Strategy Instruction | P.S. Sri's The Temple Elephant: A Bestiary with Socio-Political and Spiritual Message | Papers Presented in the All-India Conference on Multimedia Enhanced Language Teaching - MELT 2009 | A Phonological Study of the Variety of English Spoken by Oriya Speakers in Western Orissa - A Doctoral Dissertation | HOME PAGE of November 2009 Issue | HOME PAGE | CONTACT EDITOR
Vishal Goyal, M.Tech.
Department of Computer Science
firstname.lastname@example.org Gurpreet Singh Lehal, Ph.D.
Advanced Centre for Technical Development of Punjabi Language, Literature & Culture