HOME PAGE
AN APPEAL FOR SUPPORT
- We seek your support to meet the expenses relating to the formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc.Please write to the Editor in his e-mail address msthirumalai2@gmail.com to find out how you can support this journal.
- Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.
BOOKS FOR YOU TO READ AND DOWNLOAD FREE!
- Journey of Self-discovery in Anita Nair's Ladies' Coupé ... V. Chandra, M.A.
- The Literary Value of the Book of Isaiah ... Helen Unius Backiavathy, M.A.,M.Phil., Ph.D. Candidate
- A Study of Structural Duplication in Tamil and Telugu - A Doctoral Dissertation ... Parimalagantham, Ph.D.
- The Politics of Survival in the Novels of Margaret Atwood ... Pauline Das, Ph.D.
- Nonverbal Communication in Tamil Novels -
A Book in Tamil ... M. S. Thirumalai, Ph.D.
Girish Karnad as a Modern Indian Dramatist - A Study ... B. Reena, M.A., M.Phil.
- A Study of English Loan Words in Selected Bahasa Melayu Newspaper Articles...
Shamimah Binti Haja Mohideen, M.HSc. (TESL)
- The Internal Landscape and the Existential Agony of Women in Anjana Appachana’s Novel LISTENING NOW, A Doctoral Dissertation ...
M. Poonkodi, Ph.D.
- Trends and Spatial Patterns of Crime in India - A Case Study of a District in India ...
M. Jayamala,, Ph.D.
- The Trading Community in Early Tamil Society Up To 900 AD ...
R. Jeyasurya, M.A., M.Phil., Ph.D.
- A Study of Auxiliaries in the Old and the Middle Tamil ...
A.Boologarambai, M.A., Ph.D.
- History of Growth and Reforms of British Military Administration in India, 1848-1949 ...
Hemalatha, M.A., M.Phil.
- Language of Mass Media: A Study Based on Malayalam Broadcasts - A Doctoral Dissertation ...
K. Parameswaran, Ph.D.
- Form and Function of Disorders in Verbal Narratives - A Doctoral Dissertation ...
Kandala Srinivasacharya, Ph.D.
- Status Marking in Tamil - A Ph.D. Dissertation ...
P. Perumalsamy, Ph.D.
- LANGUAGE AND POWER IN COMMUNICATION ...
Editors: Jennifer M. Bayer, Ph.D., and Pushpa Pai, Ph.D.
- Onomatopoeia in Tamil ...
V. Gnanasundaram, Ph.D.
- Linguistics and Literature ...
C.Shunmugom, Ph.D., and C. Sivashanmugam, Ph.D., V. Thayalan, Ph.D. and C. Sivakumar, Ph.D. (Editors)
- Translation: New Dimensions ...
C.Shunmugom, Ph.D., and C. Sivashanmugam, Ph.D., Editors
- Language of Headlines in Kannada Dailies ...
M. N. Leelavathi, Ph.D.
- Cooperative Learning Incorporating Computer-Mediated Communication: Participation, Perceptions, and Learning Outcomes in a Deaf Education Classroom ...
Michelle Pandian, M.S.
-
The Effects of Age on the Ability to Learn English As a Second Language ...
Mariam Dadabhai, B.A. Hons.
- A STUDY OF THE SKILLS OF READING COMPREHENSION IN ENGLISH DEVELOPED BY STUDENTS OF STANDARD IX IN THE SCHOOLS IN TUTICORIN DISTRICT, TAMILNADU ...
A. Joycilin Shermila, Ph.D.
- A Socio-Pragmatic Comparative Study of Ostensible Invitations in English and Farsi ...
Mohammad Ali Salmani-Nodoushan, Ph.D.
- ADVANCED WRITING - A COURSE TEXTBOOK ...
Parviz Birjandi, Ph.D. Seyyed Mohammad Alavi, Ph.D. Mohammad Ali Salmani-Nodoushan, Ph.D.
- TEXT FAMILIARITY, READING TASKS, AND ESP TEST PERFORMANCE: A STUDY ON IRANIAN LEP AND NON-LEP UNIVERSITY STUDENTS - A DOCTORAL DISSERTATION ...
Mohammad Ali Salmani-Nodoushan, Ph.D.
- A STUDY ON THE LEARNING PROCESS OF ENGLISH
BY HIGHER SECONDARY STUDENTS WITH SPECIAL REFERENCE TO DHARMAPURI DISTRICT IN TAMILNADU ... K. Chidambaram, Ph.D.
- SPEAKING STRATEGIES TO OVERCOME COMMUNICATION DIFFICULTIES IN THE TARGET LANGUAGE SITUATION - BANGLADESHIS IN NEW ZEALAND ...
Harunur Rashid Khan
- THE PROBLEMS IN LEARNING MODAL AUXILIARY VERBS IN ENGLISH AT HIGH SCHOOL LEVEL ...
Chandra Bose, Ph.D. Candidate
- THE ROLE OF VISION IN LANGUAGE LEARNING
- in Children with Moderate to Severe Disabilities ... Martha Low, Ph.D.
- SANSKRIT TO ENGLISH TRANSLATOR ...
S. Aparna, M.Sc.
- A LINGUISTIC STUDY OF ENGLISH LANGUAGE CURRICULUM AT THE SECONDARY LEVEL IN BANGLADESH - A COMMUNICATIVE APPROACH TO CURRICULUM DEVELOPMENT by
Kamrul Hasan, Ph.D.
- COMMUNICATION VIA EYE AND FACE in Indian Contexts by
M. S. Thirumalai, Ph.D.
- COMMUNICATION
VIA GESTURE: A STUDY OF INDIAN CONTEXTS by M. S. Thirumalai, Ph.D.
- CIEFL Occasional
Papers in Linguistics, Vol. 1
- Language, Thought
and Disorder - Some Classic Positions by M. S. Thirumalai, Ph.D.
- English in India:
Loyalty and Attitudes by Annika Hohenthal
- Language In Science
by M. S. Thirumalai, Ph.D.
- Vocabulary Education
by B. Mallikarjun, Ph.D.
- A CONTRASTIVE ANALYSIS OF HINDI
AND MALAYALAM by V. Geethakumary, Ph.D.
- LANGUAGE OF ADVERTISEMENTS
IN TAMIL by Sandhya Nayak, Ph.D.
- An Introduction to TESOL:
Methods of Teaching English to Speakers of Other Languages by M. S. Thirumalai, Ph.D.
- Transformation of
Natural Language into Indexing Language: Kannada - A Case Study by B. A. Sharada, Ph.D.
- How to Learn
Another Language? by M.S.Thirumalai, Ph.D.
- Verbal Communication
with CP Children by Shyamala Chengappa, Ph.D. and M.S.Thirumalai, Ph.D.
- Bringing Order
to Linguistic Diversity - Language Planning in the British Raj by Ranjit Singh Rangila, M. S. Thirumalai, and B. Mallikarjun
REFERENCE MATERIAL
BACK ISSUES
- E-mail your articles and book-length reports in Microsoft Word to msthirumalai2@gmail.com.
- Contributors from South Asia may send their articles to
B. Mallikarjun, Central Institute of Indian Languages, Manasagangotri, Mysore 570006, India or e-mail to mallikarjun@ciil.stpmy.soft.net.
- PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
- Your articles and booklength reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
- The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.
Copyright © 2009 M. S. Thirumalai
|
Will Sentences Have Divergence Upon
Translation? : A Corpus-Evidence Based
Solution for Example Based Approach
Deepa Gupta, B.A (Hons), M.Sc. (Maths), Ph.D.
Abstract
This paper presents a corpus-evidence based scheme for deciding whether
the translation of an English sentence into Hindi will involve divergence.
Divergence is the phenomenon when sentences of similar structure in the
source language do not translate into structurally similar sentences in the
target language. Divergence assumes special significance in the domain of
Example Based Machine Translation (EBMT) where translation of a given
sentence is generated by first retrieving translation example(s) of similar
sentence(s) from the system's example base, and then by adapting them
suitably to meet the requirements of the present input sentence. Surely,
occurrence of divergence poses a great hindrance in efficient adaptation of retrieved sentences.
A possible remedy may lie in dividing the example
base of an EBMT system into two parts: examples of normal translation,
in one, and examples involving divergence in the other, so that given an
input, the retrieval can be made from the appropriate part of the example base. But success of this scheme depends heavily on the system's
ability to judge a priori whether translation of a given input will involve
divergence. The task, however, is not straightforward as occurrence of
divergence does not follow any rules that make their prior identification
simple. The technique proposed here is aimed at achieving this goal. The
scheme is explained and illustrated in the context of English to Hindi
EBMT.
1 Introduction
Dealing with divergence is one major difficulty of any translation system. Typically, in a translation the structure of the translated sentence is guided by the
syntactic and semantic properties of the target language. If upon translation
the Parts of Speech (POS) and Functional Tags (FT) of the constituent words
of the source language sentence do not undergo any changes then we term it as
a normal translation. However, there are occasions when the structure of the
translated sentence deviates from this normal structure. Such exceptions are
called translation divergences [4]. Consider, for example, the English sentences
\It is running" and \It is raining". Although these two sentences are structurally
very similar, their Hindi translations are structurally very different. The first
sentence is translated as \wah (it) bhaag (run) rahaa (..ing) hai (is)", which is
a normal translation. But the second one is translated as "\baarish (rain) ho
(be) rahii (..ing) hai (is)". The second example is a clear case of divergence,
where the subject of the Hindi sentence is realized from the verb of the English
sentence.
Translation divergence has heavy bearings on Example Based Machine Translation (EBMT). In an EBMT system the translation for a given input sentence
is generated by retrieving the translation of a similar sentence from the system
example base, and then modifying (adapting) them to suit the requirements of
the current input sentence [8] [1]. Selection of the right past example is, therefore, extremely important for successful EBMT. The need arises primarily in the following two scenarios:
- The past example that is retrieved for carrying out the task of adaptation
has a normal translation, but translation of the input sentence should
involve divergence.
- The translation of the retrieved example involves divergence, whereas the
input sentence should have a normal translation.
In both the situations the retrieved example may not be helpful in generating the translation of the given input, and consequently, developing efficient
adaptation scheme becomes extremely difficult.
A possible solution may lie in separating the example base (EB) into two
parts: Divergence EB and Normal EB so that given an input sentence retrieval
can be made from the appropriate part of the example base. However, this
scheme can work successfully only if the EBMT system has the capability to
judge from the input sentence itself whether its translation will involve any divergence. But making such a decision is not straightforward since occurrence
of divergence does not follow any patterns or rules. In fact, a divergence may
be induced by various factors, such as, structure of the input sentence, semantics of its constituent words etc. In this work we propose a corpus-evidence
based approach to deal with this difficulty. Under this scheme, upon receiving
an input sentence, a system looks into its example base to glean evidences in
support/against any possible type of divergence. Based on these evidences the system decides whether the retrieval has to be made from the normal EB, or
from the divergence EB.
A critical look at machine translation suggests that EBMT has been studied
extensively as a major paradigm for machine translation over the last decade
and more [2]. At the same time literature is replete with works on translation
divergence, and its identification, resolution etc. However, the works on these
two aspects of machine translation have progressed somewhat independently.
No significant work has so far been found regarding how divergence can be
dealt with efficiently in an EBMT framework. The proposed work aims at
bridging this gap. Since divergence is a language-dependent phenomenon, we
have concentrated on a specific source and target language pair, English and
Hindi, for this work.
Divergence in English to Hindi translation has been studied thoroughly
in some of our earlier works ([5], [6], [7]). With respect to English to Hindi
translation, seven different types of divergence have been identified. These are
structural, categorial, conflational, demotional, pronominal, nominal and pos-
sessional. Of the seven types, possessional divergence is somewhat different
in nature as unlike the other six, its occurrence depends upon more than one
Functional Tag of the sentence. The scheme in its present form cannot handle
possessional divergence efficiently. Hence we exclude possessional divergence
from the present discussion. The algorithm proposed here, therefore, works
with respect to the first six types of divergence. For convenience of presentation
we denote them as d1, d2, d3, d4, d5 and d6, respectively.
Barring structural divergence (d1) all of the other five types of divergence
(i.e. d2,...,d6) have further been classified into several sub-types depending upon
the variations in the role of different functional tags upon translation to Hindi.
Appendix-A gives a brief description of all the six divergence types mentioned above, and their sub-types. It further provides the necessary FT-features that
the source language (English) sentences should have in order that a particular
type/sub-type of divergence may occur. This, however, does not mean that
any sentence having those FT-features will necessarily produce a divergence
upon translation. As a consequence, mere examination of the FTs of an input
sentence cannot ascertain whether its translation will induce any divergence or
not. Hence more evidences need to be considered. In this work we describe
all these evidences and how they are to be used for making a priori decision
regarding whether the input English sentence will involve any divergence upon
translation to Hindi.
This paper is organised in the following way. Section 2 explains the diffierent
types of corpus-based evidences that are used by the proposed approach. Most
of these evidences are formulated by analysing a parallel corpus comprising
more than 4000 sentences collected from various sources, such as, children's
stories, translation books, advertisement materials and official letters. Sections
3 explain how different evidences are generated and combined to arrive at a final
decision regarding an input. Section 4 provides illustrations of the scheme, and
experimental results.
This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ARTICLE IN PRINTER-FRIENDLY VERSION.
Spelling Variations in Kannada | A Survey of the State of the Art in Punjabi Language Processing | The Representation of Homosexuality - A Content Analysis in a Malaysian Newspaper | Noun Reduplication in Tamil and Kannada | Journey of Self-discovery in Anita Nair's Ladies' Coupé | A Study of Communicability and Intelligibility of Advertisements in Tamil With Special Reference to Tooth Paste and Health Drink | Explicit Grammar Instruction | Teaching English as a Second Language Using Communicative Language Teaching - An Evaluation of Practice in India | Discovering Values in English Language Teaching | The Core Functions of the Hindi Modals - Speech Act Approach | Textbook Analysis of English for Engineers | Cross-Professional Collaboration on E-Learning Courses | Reading Arundhati Roy's Fiction The God of Small Things Through Her Non-Fiction | Teaching English through Indian Writing in English in Rural India | Proverbs in Modern Tamil and Telugu Societies | Using Problem Based Learning Technique in Teaching English Grammar | Problems in Reading Comprehension Skills among Secondary School Students in Yemen | The Literary Value of the Book of Isaiah | Will Sentences Have Divergence Upon Translation? : A Corpus-Evidence Based Solution for Example Based Approach | HOME PAGE of October 2009 Issue | HOME PAGE | CONTACT EDITOR
Deepa Gupta, B.A (Hons), M.Sc(Maths), Ph.D.
Department of Mathematics
Amrita School of Engineering
Amrita Vishwa Vidyapeetham University
Kasavanahalli, Bangalore - 560 035
Karnataka, India
deepag iitd@yahoo.com
|
- Send your articles
as an attachment to your e-mail to msthirumalai2@gmail.com.
- Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.
|