LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 9 : 11 November 2009
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         K. Karunakaran, Ph.D.
         Jennifer Marie Bayer, Ph.D.

HOME PAGE


AN APPEAL FOR SUPPORT

  • We seek your support to meet the expenses relating to the formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc.Please write to the Editor in his e-mail address msthirumalai2@gmail.com to find out how you can support this journal.
  • Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.

In Association with Amazon.com



BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to msthirumalai2@gmail.com.
  • Contributors from South Asia may send their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    or e-mail to mallikarjun@ciil.stpmy.soft.net.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and booklength reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2009
M. S. Thirumalai


 
Web www.languageinindia.com

Advances in Machine Translation Systems

Vishal Goyal, M.Tech.
Gurpreet Singh Lehal, Ph.D.


Abstract

Machine translation system is software designed that essentially takes a text in one language (called the source language) and translates it into another language (called the target language). This paper presents the state of the art in the field of machine translation. First part of this paper discusses the machine translation systems for non-Indian languages and second part discusses the machine translation systems for Indian languages.

Keywords : Machine Translation Systems, Natural Language Processing, MT in India

1. Machine Translation Systems

1.1 Machine Translation System for non-Indian languages

Various machine translation (MT) systems have already been developed for most of the commonly used natural languages. This section briefly discusses some of the existing machine translation systems and the approaches that have been followed.

An English Japanese Machine Translation System (1982) developed by Makoto Nagao et al. The title sentences of scientific and engineering papers are analyzed by simple parsing strategies, and only eighteen fundamental sentential structures are obtained from ten thousand titles. Title sentences of physics and mathematics of some databases in English are translated into Japanese with their keywords, author names, journal names and so on by using fundamental structures. The translation accuracy for the specific areas of physics and mathematics from INSPEC database was about 93%.

RUSLAN (1985), a direct machine translation system between closely related languages Czech and Russian, by Hajic J, for thematic domain, the domain of operating systems of mainframes. The system used transfer based architecture. This project started in 1985 at Charles University, Prague in cooperation with Research Institute of Mathematical Machines in Prague. It was terminated in 1990 due to lack of funds.

The system was rule-based, implemented in Colmerauer's Q-Systems. The system had a main dictionary of about 8,000 words, accompanied by transducing dictionary covering another 2000 words.

The typical steps followed in the system are Czech morphological analysis, syntactic-semantic analysis with respect to Russian sentence structure and morphological synthesis of Russian. Due to close language pair, a transfer-like translation scheme was adopted with many simplifications. Also many ambiguities are left unresolved due to the close relationship between Czech and Russian. No deep analysis of input sentences was performed.

The evaluations of results of RUSLAN showed that roughly 40% of the input sentences were translated correctly, about 40% of input sentences with minor errors correctable by human post-editor and about 20% of the input required substantial editing or re-translation.

There are two main factors that caused a deterioration of the translation. The first factor was the incompleteness of the main dictionary of the system and the second factor was the module of syntactic analysis of Czech. RUSLAN is a unidirectional system dealing with one pair of language, Czech to Russian.

PONS (1995), an experimental interlingua system for automatic translation of unrestricted text, constructed by Helge Dyvik, Department of Linguistics and Phonetics, University of Bergen. 'PONS' is an acronym in Norwegian for "Partiell Oversettelse mellom Nærstående Språk" (Partial Translation between Closely Related Languages).

PONS exploits the structural similarity between source and target language to make the shortcuts during the translation process. The system makes use of a lexicon and a set of syntactic rules. There is no morphological analysis. The lexicon consists of a list of entries for all word forms and a list of stem entries, or 'lexemes'. The source text is divided into substrings at certain punctuation marks, and the strings are parsed by a bottom-up, unification-based active chart parser.

The system had been tested for the translation of sentence sets and simple texts between the closely related languages, Norwegian and Swedish, and between the more distantly related English and Norwegian. The developer concluded that in the case of the closely related languages, formally similar constructions will typically share stylistic properties.

CESILKO (2000), a machine translation system for closely related Slavic language pairs, developed by Hajic J, Hric J. K. and Ubon V. It has been fully implemented for Czech to Slovak, the pair of two most closely related Slavic languages.

The main aim of the system is localization of the texts and programs from one source language into a group of mutually related target languages.

In this system, no deep analysis had been performed and word-for-word translation using stochastic disambiguation of Czech word forms has been performed. The input text is passed through different modules namely morphological analyzer, morphological disambiguation, Domain related bilingual glossaries, general bilingual dictionary, and morphological synthesis of Slovak. The dictionary covers over 7, 00,000 items and it is able to recognize more than 15 million word-forms. The system is claimed to achieve about 90% match with the results of human translation, based on relatively large test sample. Work is in progress on translation for Czech-to-Polish language pairs.


This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ARTICLE IN PRINTER-FRIENDLY VERSION.


Attitude towards Mother Tongue - A Study of the Tribal Students of Orissa | Computer-mediated Communication in a Bilingual Chatroom | Compensation Strategies for Speaking English Adopted by Engineering Students of Tamil Nadu - A Study | Acquisition of English Intransitive Verbs by Urdu Speakers | Community, Culture and Curriculum in the Context of Tribal Education in Orissa, India | Auxiliary Verbs in Modern Tamil | Getting Around 'Offensive' Language | Noun Morphology in Kuki-Chin Languages | A Plea for the Use of Language Portals in Imparting Communication Skills | Advances in Machine Translation Systems | A Comparative Study of the Effect of Explicit-inductive and Explicit-deductive Grammar Instruction in EFL Contexts | Lexical Choice and Social Context in Shashi Deshpande's That Long Silence | The Voice of Servility and Dominance Expressed through Animal Imagery in Adiga's The White Tiger | Phonological Analysis of English Phonotactics of Syllable Initial and Final Consonant Clusters by Yemeni Speakers of English | Effective Use of Language in Communicating News through Political Emergency | Helping the Limited English Proficient Learner Learn the Second Language Effectively through Strategy Instruction | P.S. Sri's The Temple Elephant: A Bestiary with Socio-Political and Spiritual Message | Papers Presented in the All-India Conference on Multimedia Enhanced Language Teaching - MELT 2009 | A Phonological Study of the Variety of English Spoken by Oriya Speakers in Western Orissa - A Doctoral Dissertation | HOME PAGE of November 2009 Issue | HOME PAGE | CONTACT EDITOR


Vishal Goyal, M.Tech.
Department of Computer Science
Punjabi University
Patiala-147002
Punjab, India
vishal.pup@gmail.com

Gurpreet Singh Lehal, Ph.D.
Advanced Centre for Technical Development of Punjabi Language, Literature & Culture
Punjabi University
Patiala 147002
Punjab, India
gslehal@gmail.com

 
Web www.languageinindia.com
  • Send your articles
    as an attachment
    to your e-mail to
    msthirumalai2@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.