LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 9 : 10 October 2009
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         K. Karunakaran, Ph.D.
         Jennifer Marie Bayer, Ph.D.

HOME PAGE

AN APPEAL FOR SUPPORT

We seek your support to meet the expenses relating to the formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc.Please write to the Editor in his e-mail address msthirumalai2@gmail.com to find out how you can support this journal.

Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.

BOOKS FOR YOU TO READ AND DOWNLOAD FREE!

Journey of Self-discovery in Anita Nair's Ladies' Coup� ... V. Chandra, M.A.

The Literary Value of the Book of Isaiah ... Helen Unius Backiavathy, M.A.,M.Phil., Ph.D. Candidate

A Study of Structural Duplication in Tamil and Telugu - A Doctoral Dissertation ... Parimalagantham, Ph.D.

The Politics of Survival in the Novels of Margaret Atwood ... Pauline Das, Ph.D.

Nonverbal Communication in Tamil Novels - A Book in Tamil ... M. S. Thirumalai, Ph.D.

Girish Karnad as a Modern Indian Dramatist - A Study ...
B. Reena, M.A., M.Phil.

A Study of English Loan Words in Selected Bahasa Melayu Newspaper Articles...
Shamimah Binti Haja Mohideen, M.HSc. (TESL)

The Internal Landscape and the Existential Agony of Women in Anjana Appachana�s Novel LISTENING NOW, A Doctoral Dissertation ...
M. Poonkodi, Ph.D.

Trends and Spatial Patterns of Crime in India - A Case Study of a District in India ...
M. Jayamala,, Ph.D.

The Trading Community in Early Tamil Society Up To 900 AD ...
R. Jeyasurya, M.A., M.Phil., Ph.D.

A Study of Auxiliaries in the Old and the Middle Tamil ...
A.Boologarambai, M.A., Ph.D.

History of Growth and Reforms of British Military Administration in India, 1848-1949 ...
Hemalatha, M.A., M.Phil.

Language of Mass Media: A Study Based on Malayalam Broadcasts - A Doctoral Dissertation ...
K. Parameswaran, Ph.D.

Form and Function of Disorders in Verbal Narratives - A Doctoral Dissertation ...
Kandala Srinivasacharya, Ph.D.

Status Marking in Tamil - A Ph.D. Dissertation ...
P. Perumalsamy, Ph.D.

LANGUAGE AND POWER IN COMMUNICATION ...
Editors: Jennifer M. Bayer, Ph.D., and Pushpa Pai, Ph.D.

Onomatopoeia in Tamil ...
V. Gnanasundaram, Ph.D.

Linguistics and Literature ...
C.Shunmugom, Ph.D., and C. Sivashanmugam, Ph.D., V. Thayalan, Ph.D. and C. Sivakumar, Ph.D. (Editors)

Translation: New Dimensions ...
C.Shunmugom, Ph.D., and C. Sivashanmugam, Ph.D., Editors

Language of Headlines in Kannada Dailies ...
M. N. Leelavathi, Ph.D.

Cooperative Learning Incorporating Computer-Mediated Communication: Participation, Perceptions, and Learning Outcomes in a Deaf Education Classroom ...
Michelle Pandian, M.S.

The Effects of Age on the Ability to Learn English As a Second Language ...
Mariam Dadabhai, B.A. Hons.

A STUDY OF THE SKILLS OF READING COMPREHENSION IN ENGLISH DEVELOPED BY STUDENTS OF STANDARD IX IN THE SCHOOLS IN TUTICORIN DISTRICT, TAMILNADU ...
A. Joycilin Shermila, Ph.D.

A Socio-Pragmatic Comparative Study of Ostensible Invitations in English and Farsi ...
Mohammad Ali Salmani-Nodoushan, Ph.D.

ADVANCED WRITING - A COURSE TEXTBOOK ...
Parviz Birjandi, Ph.D.
Seyyed Mohammad Alavi, Ph.D.
Mohammad Ali Salmani-Nodoushan, Ph.D.

TEXT FAMILIARITY, READING TASKS, AND ESP TEST PERFORMANCE: A STUDY ON IRANIAN LEP AND NON-LEP UNIVERSITY STUDENTS - A DOCTORAL DISSERTATION ...
Mohammad Ali Salmani-Nodoushan, Ph.D.

A STUDY ON THE LEARNING PROCESS OF ENGLISH
BY HIGHER SECONDARY STUDENTS
WITH SPECIAL REFERENCE TO DHARMAPURI DISTRICT IN TAMILNADU ...
K. Chidambaram, Ph.D.

SPEAKING STRATEGIES TO OVERCOME COMMUNICATION DIFFICULTIES IN THE TARGET LANGUAGE SITUATION - BANGLADESHIS IN NEW ZEALAND ...
Harunur Rashid Khan

THE PROBLEMS IN LEARNING MODAL AUXILIARY VERBS IN ENGLISH AT HIGH SCHOOL LEVEL ...
Chandra Bose, Ph.D. Candidate

THE ROLE OF VISION IN LANGUAGE LEARNING
- in Children with Moderate to Severe Disabilities ...
Martha Low, Ph.D.

SANSKRIT TO ENGLISH TRANSLATOR ...
S. Aparna, M.Sc.

A LINGUISTIC STUDY OF ENGLISH LANGUAGE CURRICULUM AT THE SECONDARY LEVEL IN BANGLADESH - A COMMUNICATIVE APPROACH TO CURRICULUM DEVELOPMENT by
Kamrul Hasan, Ph.D.

COMMUNICATION VIA EYE AND FACE in Indian Contexts by
M. S. Thirumalai, Ph.D.

COMMUNICATION
VIA GESTURE: A STUDY OF INDIAN CONTEXTS by M. S. Thirumalai, Ph.D.

CIEFL Occasional
Papers in Linguistics,
Vol. 1

Language, Thought
and Disorder - Some Classic Positions by
M. S. Thirumalai, Ph.D.

English in India:
Loyalty and Attitudes
by Annika Hohenthal

Language In Science
by M. S. Thirumalai, Ph.D.

Vocabulary Education
by B. Mallikarjun, Ph.D.

A CONTRASTIVE ANALYSIS OF HINDI
AND MALAYALAM
by V. Geethakumary, Ph.D.

LANGUAGE OF ADVERTISEMENTS
IN TAMIL
by Sandhya Nayak, Ph.D.

An Introduction to TESOL:
Methods of Teaching English
to Speakers of Other Languages
by M. S. Thirumalai, Ph.D.

Transformation of
Natural Language
into Indexing Language:
Kannada - A Case Study
by B. A. Sharada, Ph.D.

How to Learn
Another Language?
by M.S.Thirumalai, Ph.D.

Verbal Communication
with CP Children
by Shyamala Chengappa, Ph.D.
and M.S.Thirumalai, Ph.D.

Bringing Order
to Linguistic Diversity
- Language Planning in
the British Raj by
Ranjit Singh Rangila,
M. S. Thirumalai,
and B. Mallikarjun

REFERENCE MATERIAL

UNIVERSAL DECLARATION OF LINGUISTIC RIGHTS

Lord Macaulay and
His Minute on
Indian Education

In Defense of
Indian Vernaculars
Against
Lord Macaulay's Minute
By A Contemporary of
Lord Macaulay

Languages of India,
Census of India 1991

The Constitution of India:
Provisions Relating to
Languages

The Official
Languages Act, 1963
(As Amended 1967)

Mother Tongues of India,
According to
1961 Census of India

BACK ISSUES

FROM MARCH 2001

E-mail your articles and book-length reports in Microsoft Word to msthirumalai2@gmail.com.
Contributors from South Asia may send their articles to
B. Mallikarjun,
Central Institute of Indian Languages,
Manasagangotri,
Mysore 570006, India or e-mail to mallikarjun@ciil.stpmy.soft.net.

PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.

Your articles and booklength reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.

The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2009
M. S. Thirumalai

Will Sentences Have Divergence Upon
Translation? : A Corpus-Evidence Based
Solution for Example Based Approach
Deepa Gupta, B.A (Hons), M.Sc. (Maths), Ph.D.

Abstract

This paper presents a corpus-evidence based scheme for deciding whether the translation of an English sentence into Hindi will involve divergence. Divergence is the phenomenon when sentences of similar structure in the source language do not translate into structurally similar sentences in the target language. Divergence assumes special significance in the domain of Example Based Machine Translation (EBMT) where translation of a given sentence is generated by first retrieving translation example(s) of similar sentence(s) from the system's example base, and then by adapting them suitably to meet the requirements of the present input sentence. Surely, occurrence of divergence poses a great hindrance in efficient adaptation of retrieved sentences.

A possible remedy may lie in dividing the example base of an EBMT system into two parts: examples of normal translation, in one, and examples involving divergence in the other, so that given an input, the retrieval can be made from the appropriate part of the example base. But success of this scheme depends heavily on the system's ability to judge a priori whether translation of a given input will involve divergence. The task, however, is not straightforward as occurrence of divergence does not follow any rules that make their prior identification simple. The technique proposed here is aimed at achieving this goal. The scheme is explained and illustrated in the context of English to Hindi EBMT.

1 Introduction

Dealing with divergence is one major difficulty of any translation system. Typically, in a translation the structure of the translated sentence is guided by the syntactic and semantic properties of the target language. If upon translation the Parts of Speech (POS) and Functional Tags (FT) of the constituent words of the source language sentence do not undergo any changes then we term it as a normal translation. However, there are occasions when the structure of the translated sentence deviates from this normal structure. Such exceptions are called translation divergences [4]. Consider, for example, the English sentences \It is running" and \It is raining". Although these two sentences are structurally very similar, their Hindi translations are structurally very different. The first sentence is translated as \wah (it) bhaag (run) rahaa (..ing) hai (is)", which is a normal translation. But the second one is translated as "\baarish (rain) ho (be) rahii (..ing) hai (is)". The second example is a clear case of divergence, where the subject of the Hindi sentence is realized from the verb of the English sentence.

Translation divergence has heavy bearings on Example Based Machine Translation (EBMT). In an EBMT system the translation for a given input sentence is generated by retrieving the translation of a similar sentence from the system example base, and then modifying (adapting) them to suit the requirements of the current input sentence [8] [1]. Selection of the right past example is, therefore, extremely important for successful EBMT. The need arises primarily in the following two scenarios:

The past example that is retrieved for carrying out the task of adaptation has a normal translation, but translation of the input sentence should involve divergence.
The translation of the retrieved example involves divergence, whereas the input sentence should have a normal translation.

In both the situations the retrieved example may not be helpful in generating the translation of the given input, and consequently, developing efficient adaptation scheme becomes extremely difficult.

A possible solution may lie in separating the example base (EB) into two parts: Divergence EB and Normal EB so that given an input sentence retrieval can be made from the appropriate part of the example base. However, this scheme can work successfully only if the EBMT system has the capability to judge from the input sentence itself whether its translation will involve any divergence. But making such a decision is not straightforward since occurrence of divergence does not follow any patterns or rules. In fact, a divergence may be induced by various factors, such as, structure of the input sentence, semantics of its constituent words etc. In this work we propose a corpus-evidence based approach to deal with this difficulty. Under this scheme, upon receiving an input sentence, a system looks into its example base to glean evidences in support/against any possible type of divergence. Based on these evidences the system decides whether the retrieval has to be made from the normal EB, or from the divergence EB.

A critical look at machine translation suggests that EBMT has been studied extensively as a major paradigm for machine translation over the last decade and more [2]. At the same time literature is replete with works on translation divergence, and its identification, resolution etc. However, the works on these two aspects of machine translation have progressed somewhat independently. No significant work has so far been found regarding how divergence can be dealt with efficiently in an EBMT framework. The proposed work aims at bridging this gap. Since divergence is a language-dependent phenomenon, we have concentrated on a specific source and target language pair, English and Hindi, for this work.

Divergence in English to Hindi translation has been studied thoroughly in some of our earlier works ([5], [6], [7]). With respect to English to Hindi translation, seven different types of divergence have been identified. These are structural, categorial, conflational, demotional, pronominal, nominal and pos- sessional. Of the seven types, possessional divergence is somewhat different in nature as unlike the other six, its occurrence depends upon more than one Functional Tag of the sentence. The scheme in its present form cannot handle possessional divergence efficiently. Hence we exclude possessional divergence from the present discussion. The algorithm proposed here, therefore, works with respect to the first six types of divergence. For convenience of presentation we denote them as d1, d2, d3, d4, d5 and d6, respectively.

Barring structural divergence (d1) all of the other five types of divergence (i.e. d2,...,d6) have further been classified into several sub-types depending upon the variations in the role of different functional tags upon translation to Hindi. Appendix-A gives a brief description of all the six divergence types mentioned above, and their sub-types. It further provides the necessary FT-features that the source language (English) sentences should have in order that a particular type/sub-type of divergence may occur. This, however, does not mean that any sentence having those FT-features will necessarily produce a divergence upon translation. As a consequence, mere examination of the FTs of an input sentence cannot ascertain whether its translation will induce any divergence or not. Hence more evidences need to be considered. In this work we describe all these evidences and how they are to be used for making a priori decision regarding whether the input English sentence will involve any divergence upon translation to Hindi.

This paper is organised in the following way. Section 2 explains the diffierent types of corpus-based evidences that are used by the proposed approach. Most of these evidences are formulated by analysing a parallel corpus comprising more than 4000 sentences collected from various sources, such as, children's stories, translation books, advertisement materials and official letters. Sections 3 explain how different evidences are generated and combined to arrive at a final decision regarding an input. Section 4 provides illustrations of the scheme, and experimental results.

This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ARTICLE IN PRINTER-FRIENDLY VERSION.

Spelling Variations in Kannada | A Survey of the State of the Art in Punjabi Language Processing | The Representation of Homosexuality - A Content Analysis in a Malaysian Newspaper | Noun Reduplication in Tamil and Kannada | Journey of Self-discovery in Anita Nair's Ladies' Coup� | A Study of Communicability and Intelligibility of Advertisements in Tamil With Special Reference to Tooth Paste and Health Drink | Explicit Grammar Instruction | Teaching English as a Second Language Using Communicative Language Teaching - An Evaluation of Practice in India | Discovering Values in English Language Teaching | The Core Functions of the Hindi Modals - Speech Act Approach | Textbook Analysis of English for Engineers | Cross-Professional Collaboration on E-Learning Courses | Reading Arundhati Roy's Fiction The God of Small Things Through Her Non-Fiction | Teaching English through Indian Writing in English in Rural India | Proverbs in Modern Tamil and Telugu Societies | Using Problem Based Learning Technique in Teaching English Grammar | Problems in Reading Comprehension Skills among Secondary School Students in Yemen | The Literary Value of the Book of Isaiah | Will Sentences Have Divergence Upon Translation? : A Corpus-Evidence Based Solution for Example Based Approach | HOME PAGE of October 2009 Issue | HOME PAGE | CONTACT EDITOR

Deepa Gupta, B.A (Hons), M.Sc(Maths), Ph.D.
Department of Mathematics
Amrita School of Engineering
Amrita Vishwa Vidyapeetham University
Kasavanahalli, Bangalore - 560 035
Karnataka, India
deepag iitd@yahoo.com

Send your articles
as an attachment
to your e-mail to
msthirumalai2@gmail.com.
Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.

LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 9 : 10 October 2009 ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D. Editors: B. Mallikarjun, Ph.D. Sam Mohanlal, Ph.D. B. A. Sharada, Ph.D. A. R. Fatihi, Ph.D. Lakhan Gusain, Ph.D. K. Karunakaran, Ph.D. Jennifer Marie Bayer, Ph.D.

Will Sentences Have Divergence Upon Translation? : A Corpus-Evidence Based Solution for Example Based Approach Deepa Gupta, B.A (Hons), M.Sc. (Maths), Ph.D.

Volume 9 : 10 October 2009
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
Sam Mohanlal, Ph.D.
B. A. Sharada, Ph.D.
A. R. Fatihi, Ph.D.
Lakhan Gusain, Ph.D.
K. Karunakaran, Ph.D.
Jennifer Marie Bayer, Ph.D.

Will Sentences Have Divergence Upon
Translation? : A Corpus-Evidence Based
Solution for Example Based Approach
Deepa Gupta, B.A (Hons), M.Sc. (Maths), Ph.D.