LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 10 : 10 October 2010
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         K. Karunakaran, Ph.D.
         Jennifer Marie Bayer, Ph.D.
         S. M. Ravichandran, Ph.D.
         G. Baskaran, Ph.D.

HOME PAGE


AN APPEAL FOR SUPPORT

  • We seek your support to meet the expenses relating to the formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc.Please write to the Editor in his e-mail address languageinindiaUSA@gmail.com to find out how you can support this journal. Thank you. Thirumalai, Editor.


BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to languageinindiaUSA@gmail.com.
  • Contributors from South Asia may e-mail their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    mallikarjun@ciil.stpmy.soft.net.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and booklength reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2010
M. S. Thirumalai


 
Web www.languageinindia.com

Development of a Hindi to Punjabi
Machine Translation System
A Doctoral Dissertation

Vishal Goyal, Ph.D.


Abstract

Machine Translation is a task of automatic translation a text from one natural language to another. Even after more than 60 years of research, Machine Translation is still an open problem. Work for the development of Machine Translation systems for Indian languages is still in infancy. This research work is an attempt to develop a Machine Translation system from Hindi to Punjabi language. A number of Machine Translation systems have already been developed though their accuracy needs to be improved. Machine Translation is not a trivial task by nature of translation process itself. But Machine Translation of closely related languages eases the task. We call a language pair to be closely related if the languages have the grammar that is close in structure, contain similar constructs having almost same semantics, and share a great deal of lexicon. By closely related languages, we also mean in?ectively and morphosyntactically similar languages. Some linguist define closeness between the languages on the basis of features viz. common root, similar alphabets, similar verb patterns, structural similarity, similar grammar, similar religio-cultural and demograpohic contexts and references, a similar clearly displayed ability to blend with foreign tongues . Generally, such languages have originated from the same source and spoken in the areas in close proximity. Hindi and Punjabi belong to same sub group of the Indo European family, thus are sibling languages. It has been analysed that Hindi and Punjabi languages share all features of closely related languages. For such closely related sibling languages, effective word for word translation can be achieved (Hajic et al., 2000) [90]. Thus for our system, Direct Machine Translation approach which seems promising approach has been used.

The challenges in deleveloping Hindi to Punjabi Machine Translation system lie with major problems mainly related to the non-availability of lexical resources, spelling variations, word sense disambiguation, transliteration, named entity recognition and collocations.

Synopsis

This research work addresses the problems in the various stages of the development of a complete Hindi to Punjabi Machine Translation system and discusses potential solutions. The thesis has been divided into eight chapters.

The first chapter of the thesis introduces general concept of Machine Translation, various approaches to Machine Translation systems and key activities involved in Machine Translation. It also provides a formal description about the research question undertaken for this study. The objectives, need, and scope of the study have also been discussed. Then some of the key application areas of Machine Translation system are explored. Afterwards, the approach followed along with the reasons behind its selection to solve this research problem has been explained in brief. An overview of the design of the Machine Translation system undertaken to develop in this research work is provided later. The chapter concludes by presenting major contributions of this research work and an outline of the study.

Chapter 2 discusses the existing work in the field of Machine Translation in India and outside India. This chapter on literature survey forms the basis of our work on developing the Machine Translation system and later on helps us in comparing our work with the existing state of the art in Machine Translation system.

Chapter 3 explains and compares Hindi and Punjabi languages with respect to orthography, grammar, and Machine Translation.

Chapters 4 and 5 provide the design and implementation details of various activities involved in the Machine Translation system. Chapter 4 describes the system architecture and preprocessing stage. The chapter starts with the choice of approach and discusses the motivation behind its selection. Then the required resources are discussed followed by description of system architecture. The details of preprocessing phase which involves text normalization, Identifying Collocations, Identifying Proper Nouns are discussed. Then tokenization process is explained. The details of the translation system involving the identifying titles, identifying surnames, lexicon lookup, word sense disambiguation module, transliteration module and post processing modules are discussed in Chapter 5.

Chapter 6 describes the post processing stage of the system. Chapter 7 provides the evaluation of the system and its results. Chapter 8 concludes this thesis by providing a summary of the research work undertaken, contributions of this research work, limitations, and some directions in which this work could be extended in the future. In appendix A, the interface designed for text translation, website translation and email translation has been discussed. Test data set for intelligibility test and accuracy test is available in Appendix B and C respectively. The system has been rigorously evaluated and its accuracy has been found to be 94% on the basis of intelligibility test and 90.84% on the basis of accuracy test.


This is only the beginning part of the Dissertation. PLEASE CLICK HERE TO READ THE ENTIRE DISSERTATION IN PRINTER-FRIENDLY VERSION.


English Loanwords in Meiteiron A Linguistic and Sociolinguistic Analysis | A Report on the State of Urdu Literacy in India, 2010 | More Than Meets the Eye Reasons Behind Asian Students' Perceived Passivity in the ESL/EFL Classroom | English for Medical Students of Hodeidah University, Yemen - A Pre-sessional Course | Education as an Indicator for Human Resource Development | Representation of Malaysian Women in Politics | A Modern Approach to Application of Abbreviation and Acronym Strategy for Vocabulary Learning in Second/Foreign Language Learning Procedure | Causes of Social Acceptance of "O" and "A" Level Education System in Pakistan | Pronounce Foreign Words the English way! | Dubhashi and the Colonial Port in Madras Presidency | An Investigation of Davis' Translation of SHAHNAMEH - Rostam and Sohrab Story in Focus | Feminine, Female and Feminist - A Critical Spectrum on Selected Novels by Kamala Markandaya, Shahsi Deshpande and Arundhati Roy | Four-letter Words and the Urdu Learner's Dictionaries in Pakistan | Margaret Atwood's The Blind Assassin - A Study of the Impact of War on Historical and Economic Aspects of the Society | Was Gandhi a True Mahatma? | Omani Women
Are Their Language Skills Good Enough for the Workplace?
| Spread of English Globalisation Threatens English Language Teaching (ELT) in Pakistan | Multiple Intelligences, Blended Learning and the English Teacher | A Micro-Case Study of Vocabulary Acquisition among First Year Engineering Students | Imagery of Wilderness in Margaret Hollingsworth's Islands | The Influence of Learning Environment on Learners' Attitude in a Foreign Language Setting | Caste - Gender Ideology in Gundert's Malayalam-English Dictionary | Development of a Hindi to Punjabi Machine Translation System - A Doctoral Dissertation | A PRINT VERSION OF ALL THE PAPERS OF OCTOBER, 2010 ISSUE IN BOOK FORMAT. | HOME PAGE of October 2010 Issue | HOME PAGE | CONTACT EDITOR languageinindiaUSA@gmail.com


Vishal Goyal, Ph.D.
Department of Computer Science
Punjabi University, Patiala
Punjab, India

 
Web www.languageinindia.com
  • Send your articles
    as an attachment
    to your e-mail to
    languageinindiaUSA@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.