LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 4 : 3 March 2004

Editor: M. S. Thirumalai, Ph.D.
Associate Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.

HOME PAGE


In Association with Amazon.com



AN APPEAL FOR SUPPORT

  • We are in need of support to meet expenses relating to some new and essential software, formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc. If you wish to support this voluntary effort, please send your contributions to
    M. S. Thirumalai
    6820 Auto Club Road Suite C
    Bloomington
    MN 55438, USA
    .
    Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.

BOOKS FOR YOU TO READ AND DOWNLOAD


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports to thirumalai@bethfel.org or send your floppy disk (preferably in Microsoft Word) by regular mail to:
    M. S. Thirumalai
    6820 Auto Club Road #320
    Bloomington, MN 55438 USA.
  • Contributors from South Asia may send their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    or e-mail to mallikarjun@ciil.stpmy.soft.net
  • Your articles and booklength reports should be written following the MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2001
M. S. Thirumalai

A CORPUS BASED LINGUISTIC TOOL FOR MACHINE TRANSLATION:
-ING AND ITS EQUIVALENTS IN TAMIL
S. Kamakshi, Ph.D.


1. INTRODUCTION

This paper uses a corpus based linguistic tool for machine translation to study the use of the various types of the form -ing in English and their equivalents in Tamil, through the grammatical association approach.

Traditionally, linguistic analyses have emphasized structure - identifying the structural units and classes of a language (for example, morphemes, words, phrases, grammatical classes, etc.) and describing how smaller units can be combined to form larger grammatical units (for example, how words can be combined to form phrases, phrases can be combined to form clauses, etc.).

A recent perspective - corpus based statistical approach - has been adopted here to investigate how speakers and writers exploit the resources of their language. Rather than looking at what is theoretically possible in a language, we study the actual language used in naturally occurring texts.

This paper aims at investigating specifically the grammatical association of ~ing words in English and its possible translation equivalents found in Tamil Language. Seemingly similar structures occur in different contexts and serve different functions. Based on their grammatical associations these structures can help us to prepare a Bi-lingual Machine Tractable Linguistic Tool for Machine Translation.

2. THE OPERATION

In order to accomplish this task, we need to make the machine recognize the the right and left collocates of -ing.
  1. One of the left collocates of ~ing is verb that is grouped as a Open Parts of Speech in English. It has grammatical association with the present participle suffix ~ing. A Corpus Based Statistical Approach identifies the database for all the words found in the dictionary or in any tagged corpus as a verb as its principal entry.
  2. The Nouns can be made use of as Verbs to shrink the language instead of populating the parts of speech. In addition, the lexical resources are enriched through this facility. A investigation of this process can result in a machine readable dictionary which is an ultimate linguistic tool for machine translation.
  3. During the process of deriving present-participles, some of the English verbs which possess -ing with the root verb take another ing. So it is necessary to make the distinction between the present participle verbs which has an ing and an ing as found in the following samples like singing, ringing, singing etc., since the machine is incapable of understanding the process of present participiling automatically.
  4. Analysis can be made of the problematic verb forms such as will since such forms do not have past tense form willed* or the past participle form willed/willen*. But these may have present participle forms such as willing which may function as an adjective as in the example `Lakshmi was willing to join with us. ' These forms may also accept the III person singular present tense 's' form, resulting forms such as `wills' in rare contexts, like in the expression `if God wills ...'

3. THE OVERLAPPING BETWEEN WORD-FORMATION RULES

The following samples illustrate the overlapping between the word-formation rules in English. It is very interesting to note that even an adjectives like `ready' is made use of as verbs as given below:

  • `Theni readying for CM's Visit' found in the popular daily THE HINDU dated 23rd January 2004.
  • `. . . officials are readying venues here and…' found in the same daily dated January 31, 2004.

An analysis of such expressions is worth pursuing with the corpus database to find out the new word-formations and usages of such words to instance it in the lexical resources like Machine Readable Dictionary (MRD).

Context Free Grammar (CFG) techniques can be adopted to train the machine to understand the various grammatical functions of the ~ing words as follows:

  1. Adjective in the example `Smiling beauty'.
  2. Gerund in the example `Smiling rapports everbody.'
  3. Present participle in the example `She is smiling'.

Concordance helps us in this analysis.

4. STATISTICAL ANALYSIS

Statistical analysis can be made to database and distinguish the verbless-ing words under its various grammatical functions as follows:

Evening [Noun/Time adverb]
King [Pure Noun]
During [Preposition]
According to [Adverb]
Something [Pronoun]

This analysis would be useful for preparing a linguistic tool that would recognize and tag the various grammatical category of ~ing words in the given corpus.

5. IMPORTANT STEPS

The case and correlative study of ing's in English and its equivalents in Tamil require that we keep in mind the following while building a Machine Tractable Linguistic TOOL (MTLT).

  1. It is important to make an analysis whether all gerundial nouns in English result in an equivalent with either the 'thal' or ` ththal' suffixes in Tamil. These affixes are highly productive in nature. We also need to look at the matter closely to decide whether we can add some more suffixes like ippu as gerundial suffixes in Tamil.
  2. The gerund `doing' is neutral in English in the following examples (a), (b), and (c). This presents a challenge to us in providing the translation equivalents. The main emphasis in Tamil in this regard seems to be the relativity of tense.

Examples:

  1. By doing it himself, he is saving lot of money (Present time reference).
    avane athai ceykirathan/ceyvathan muulamaaka avan athika panaththai ceemiththukkontirukkiraan.
  2. By doing it himself, he saved a lot of money (Past time reference).
    avane athai ceyththan muulamaaka avan athika panaththai cemithaan.
  3. By doing it himself, he will save a lot of money (Future time reference).
    avane athai ceyvathan muulamaaka avan athika panaththai cemippan.

6. -ING WORDS THAT FUNCTION AS AJECTIVES

The ~ing ending words that function as adjectives take participial noun suffixes in Tamil based on tense, person, number, gender. Or they may take akaa or aana suffixes in Tamil based on the concordance as illustrated in the following examples.

  1. `The story is interesting' intha kathai aarvamaaka inrukkirathu. In this example the word interesting is functioning as an adjective, but it takes `aaka' suffix (which is popularly noted as an adverbial suffix) in Tamil since the copula verb `is' is used in the English equivalent.
  2. `It is an interesting story' ithu oru arumaiyaana kathai. In this example, the word interesting is functioning as an adjective, and it takes 'aana' suffix in Tamil (which is popularly noted as an adjectival suffix), since an article follows it.

7. THE -ING WORDS THAT FRONT A VERB

Whether the -ing fronted with a verb followed by a `be' form take kontiru as its translation in Tamil. (be +verb+ing = -----+kontiru +--------+-----) needs to be investigated.

8. VARIANT FORMS, MORE OR LESS SYNONYMOUS

Since the systematic usages like 'neighbouring house' or 'neighbouring country' are being translated as pakkathil ulla viitu or pakkatthu viitu and pakkaththil ulla nadu or pakkaththu nadu, the frequency of lexical - lexical association or concordance listings from a parallel corpus can be considered for providing translation equivalents in the machine understandable linguistic tools such as Arbitrarily Reordered Dictionaries (ARD) or Machine Readable Dictionaries (MRD).

9. CONCLUSIONS

The scope of paper, indeed, falls under a very wide spectrum of identifying translation equivalents ranging from morphemes to words and to sentences, and transferring the English Sentences into Tamil. An exhaustive correlative study of English and Tamil, which amalgamates both contrastive and typological studies of the two languages, will be a future documentation for one who will attempt for a full fledged Machine Tractable Dictionary (MTD) for the Machine Translation (MT) system for transferring English into Tamil.


Acknowledgement

My thanks are due to Professor Dr. S. Rajendran, Professor of Linguistics at the Tamil University, and to Mr. T. Ronald, Software Engineer, Chennai, for their helpful suggestions to implement my language analysis through machine. My final year students in the M.A. Applied Linguistics program at the University of Madras helped me collect the ~ing words from the Oxford English Dictionary as part of their practical work.

CLICK HERE FOR PRINTER-FRIENDLY VERSION


CLICK HERE TO GO TO HOME PAGE


S. Kamakshi, Ph.D.
Linguistics Studies Unit
University of Madras
Marina Campus
Chennai 600 005
E-mail: skamatchi@yahoo.com

Send your articles
as an attachment
to your e-mail to
thirumalai@bethfel.org.