LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 8 : 5 May 2008
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         K. Karunakaran, Ph.D.
         Jennifer Marie Bayer, Ph.D.

HOME PAGE


AN APPEAL FOR SUPPORT

  • We seek your support to meet expenses relating to formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc.Please write to the Editor in his e-mail address mthirumalai@comcast.net to find out how you can support this journal.
  • Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.

In Association with Amazon.com



BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to mthirumalai@comcast.net.
  • Contributors from South Asia may send their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    or e-mail to mallikarjun@ciil.stpmy.soft.net. PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and booklength reports should be written following the MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2007
M. S. Thirumalai


 
Web www.languageinindia.com

A Proposal for Standardization of English to Bangla Transliteration and Bangla Universal Editor

Joy Mustafi, M.C.A. & B. B. Chaudhuri, Ph.D.


1. Introduction

Indian language technology is being more and more a challenging field in linguistics and computer science. Bangla (also written as Bengali) is one of the most popular languages worldwide [Chinese Mandarin 13.69%, Spanish 5.05%, English 4.84%, Hindi 2.82%, Portuguese 2.77%, Bengali 2.68%, Russian 2.27%, Japanese 1.99%, German 1.49%, Chinese Wu 1.21%]. Bangla is a member of the New Indo-Aryan language family, and is spoken by a vast population within the Indian subcontinent and abroad. Bangla provides a lot of scope for research on computational aspects.

Efficient processors for Bangla, which exhaustively deal with all the general and particular phenomena in the language, are yet to be developed. Needless to say transliteration system is one of them. To represent letters or words in the corresponding characters of another alphabet is called transliteration.

English to Bangla transliteration has no standard till now. Some early systems like Lekho [1], Pata [2], Bangla Pad [3] are not very user-friendly having complex rules for character mapping. Some keyboard layouts like Ekushey [4], Avro [5], and Bijoy [6] have Unicode [7] or ASCII [8] or ISCII [9] mappings, which are again very hard to use, particularly when these systems deal with compound clusters of consonant characters. The main problem is that there is no particular rule for English to Bangla transliteration.

The system described here proposes a standard, a definite rule, and application program for writing, editing, storing, reusing and viewing Bangla text in a digital media. The English text can be stored as simple plain text file in any platform and may be used for other research activities like machine translation, information retrieval, spell checker, optical character recognition, speech technology and other Bangla language technologies [10].

This system is designed for English to Bangla character conversion and representation of Bangla in Unicode. The mapping of characters follows the morphological structure and the spelling rule of Bangla. Though some systems were developed earlier for phonological representation, but, for visual editor or storage of Bangla corpus, the spelling is the more important than the pronunciation.

A universal editor for Bangla is also proposed here which follows the standard and represents Bangla in Unicode [11] with suitable Bangla open type font. The text in English script is used for the input, which can be browsed from any location, and the Bangla Universal Editor converts the text into Bangla and displays it in the specified window.

The Bangla Unicode output can be used for the development of Bangla software like operating system, compiler, word-processor, dictionary, web-page [12] and other software. It is useful in writing emails, messages, blogs in Bangla. The standards, methodologies and applications are described here.

1.1 Objective

The main objective of the work is to introduce a standard for English to Bangla transliteration system. Advanced research on Bangla language technology [13][15][16][17] by us is already established. Bangla corpus is used for developing many language technology systems.

Some early research on Bangla also proposed some transliteration rules or character mapping [14]. As there is no standard for Bangla transliteration, Bangla corpus cannot be stored in a specific format. As a result, the researchers get different representation of Bangla text from different sources. If one can write Bangla text in English script, and can store data in plain text, it may not require any other specific software for Bangla. A simple ASCII editor (like Notepad, gedit, nedit) will work. It will become platform independent also.

To view and edit the English script written to represent Bangla, a Universal editor is introduced here, which can be used for correction or modification of the Bangla text written in English script. However, in this editor, one can see the Bangla text in English and as well as in Bangla font simultaneously in separate frames of the same application program.

1.2 Justification of the Proposal

The proposal for the standard is necessary as there is no standard available for Bangla transliteration. Some important points discussed here are:

  1. Plain Text Storage Mode for English (platform, encoding, font independent)
  2. One-to-One Character Mapping (using lower [a~z] and upper [A~Z] cases (26 X 2) 52 characters of English à 50 Basic characters of Bangla (Vowels, Consonants, Special Characters); 1 'hasanwa' [Q]; and 1 unused [L])
  3. Phonetic Character Chart (English characters are chosen very close to the phonetics of Bangla characters, but the word construction rule obeys the spelling of correct Bangla words. In most cases the phonetic character chart is maintained, but there are a few exceptions)
  4. Simple Representation of Character Clusters (easy to parse the input text).
  5. Morphological Word Construction Rule (using spelling not pronunciation, to overcome ambiguities)

This is only a beginning part of the paper. PLEASE CLICK HERE TO READ THE ARTICLE IN PRINTER-FRIENDLY VERSION.


Language and Literacy Learning in the Accelerated Programme for Reading in Bangalore | Patterns of Language Choice in the Domain of Office Among the Malaysian University Undergraduates | The Role of Transfer in Thanking and Apologizing in English: A Study on ESL Speakers of Hindustani | Sanskrit and Prakrit as National Link Languages -
A Balanced Assessment
| Measuring the Achievements of English Language Learners: A Study of the Learners of Punjab in Pakistan at the Secondary Level | A Proposal for Standardization of English to Bangla Transliteration and Bangla Universal Editor | LANGUAGE AND POWER IN COMMUNICATION | HOME PAGE of May 2008 Issue | HOME PAGE | CONTACT EDITOR


Joy Mustafi, M.C.A.
Technology Integration and Management
IBM India Pvt. Ltd.
Kolkata - 700156
India
jmustafi@in.ibm.com

B. B. Chaudhuri, Ph.D.
Computer Vision and Pattern Recognition Unit
Indian Statistical Institute
Kolkata - 700108
India
bbc@isical.ac.in

 
Web www.languageinindia.com
  • Send your articles
    as an attachment
    to your e-mail to
    mthirumalai@comcast.net.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.