LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 7 : 1 January 2007
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         K. Karunakaran, Ph.D.
         Jennifer Marie Bayer, Ph.D.

HOME PAGE


AN APPEAL FOR SUPPORT

PAYPAL

  • We seek your support to meet expenses relating to some new and essential software, formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc. You can use the PAYPAL link given above. Please click on the PAYPAL logo, and it will take you to the PAYPAL website. Please use the e-mail address mthirumalai@comcast.net to make your contributions using PAYPAL.
    Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.

In Association with Amazon.com



BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports (preferably in Microsoft Word) to mthirumalai@comcast.net.
  • Contributors from South Asia may send their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    or e-mail to mallikarjun@ciil.stpmy.soft.net
  • Your articles and booklength reports should be written following the MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2006
M. S. Thirumalai


 
Web www.languageinindia.com

COMPLEXITY OF TAMIL IN POS TAGGING

S. Rajendran, Ph.D.


THE FOCUS OF THIS PAPER

The paper aims to focuses on the Morphological complexity in Tamil language from the point of view of POS tagging. Nouns get inflected for number and cases. Verbs get inflected for various inflections which include tense, finite and non-finite suffixes. Verbs are adjectivalized and adverbialized. Also verbs and adjectives are nominalized by means of certain nominalizers. Adjectives and adverbs do not inflect. Many post-positions in Tamil are from nominal and verbal sources. So, many times we need to depend on syntactic function or context to decide upon whether one is a noun or adjective or adverb or post position. This leads to the complexity of Tamil in POS tagging.

PARTS OF SPEECH IN TAMIL

The following parts of speech or word classes are identified for Tamil languages by modern grammarians:1) Noun, 2) Verb, 3) Adjective, 4) Adverb, 5) Postposition, 6) Numeral, 7) Quantifier, 8) Words of conjunction, 9) Exclamatory words, 10) Words expressing feeling, 11) Word of calling, and 13) Words accepting calling.

NOMINAL COMPLEXITY

Nouns need to be annotated into pronoun, proper noun and common noun. Pronouns need to be further annotated for person (1st, 2nd and 3rd), number (singular and plural), gender (masculine, feminine, neuter), status (honorific and non-honorific). Nouns need to be annotated into rational and irrational. Also nouns need to be annotated for nominative, accusative, dative, instrumental, sociative, locative, ablative, genitive, vocative cases. Nouns and Pronouns need to be annotated as oblique or non-oblique form.

Furthermore, nouns need to be annotated for number and gender (masculine, feminine, and neuter) as the subject nouns show agreement with PNG marker at the finite verbal form. Nominaliztion makes the nominalized verbal form more complex. Nomininalized verbal forms need to be distinguished into two or three types. For example, Tamil requires the productive forms formed by the suffixation of tal/kai/aamai which are sentential in nature are to be differentiated from non-productive forms formed by the suffixation of ppu etc. which are lexical in nature. In the following examples, paTittal is sentential form and paTippu is lexical form.

Many pronominalized forms are also ambiguous in Tamil and need to be distinguished into two types: lexical and sentential (productive).

VERBAL COMPLEXITY

The verbal forms are complex in Tamil. A finite verb shows the following morphological structure:

V+Tense+PNG

A number of non-finite forms are possible: adverbial forms, Adjectival forms, infinite forms and nominalized forms.

Distinction needs to be made between main verb followed by main verb and main Verb followed by an auxiliary verb. The main verb followed by an auxiliary need to be interpreted together, whereas the main verb followed by a main verb need to be interpreted separately. This leads to functional ambiguity.

FUNCTIONAL AMBIGUITY IN ADVERBIAL FORM

FUNCTIONAL AMBIGUITY IN INFINITIVAL FORM

The adjectival forms differ by tense markings: V+Tense+Adjectivalizer.

Adjectival form allows several interpretations as given in the following examples.

The adjectival forms when followed by nouns such as ceyti 'news', and uNmai 'fact' etc. are ambiguous as they allow relative interpretation and non-relative interpretation.

Some adjectivialized verbal forms of verbs are lexicalized as adjectives (as against sentential ones). So there is ambiguity in the interpretation of them purely as an adjective modifying only the noun which it follow and sentential adjective modifying the noun which stands as a relative clause modifying the nominalizer (i.e. noun which moved to position after the relativized verb).

Nominals can function as adjectives modifying a noun as given in the following examples.

Verbal roots functions can function as adjectives as given in the following examples.

cuTu cooRu (T) 'hot rice' aazh kiNaRu (T) 'deep well'

A number of adverbial forms of verbs functions as postpositions. They are discussed under 'complexity in postpositions'.

COMPLEXITY IN ADVERBS

We have seen that a number of adjectival and adverbial forms of verbs are lexicalized as adjectives and adverbs respectively and clash with their respective sentential adjectival and adverbial forms semantically creating ambiguity in POS tagging.

Adverbs too need to be distinguished based on their source category. Many adverbs are derived by suffixing aaka with nouns in Tamil. But not all aaka suffixed forms are adverbial.

Functional clash can be seen between adjective and adverb in aaka suffixed forms. This type of clash is seen among other Dravidian languages too.

COMPLEXITY IN POSTPOSITIONS

Postpositions are from various categories such as verbal, nominal and adverbial in Tamil. Many a time, the demarking line between verb/noun/adverb and postposition is slim leading to ambiguity. Some postpositions are simple and some are compound. Postpositions are conditioned by the nouns inflected for case they follow. Simply tagging one form as postposition will be misleading There are postpositions which come after noun and also after verbs which makes the postposition ambiguous (spatial vs. temporal).

Use of adverbial forms of verbs leads to ambiguity in the annotation of postpositions.

CONCLUSION

Tamil is no doubt a morphologically rich language. The relation between verb and its nominal arguments is decided by case suffixes rather than position. It is possible to have a few numbers of tagset at shallow level. But one needs to address other unique features at the deep level. Hierarchical tagset is a welcome thing.


This is only a brief summary.

PLEASE CLICK HERE TO READ THE ENTIRE ARTICLE IN A PRINTER-FRIENDLY VERSION.


Diasporic Experience: A Gateway to Liberation in the Novels of Chitra Banerjee Divakaruni | The Language of Rhythm Instruments: A Preliminary Study With Reference to "Mridangam" | A Study of Echolalia in Malayalam Speaking Autistic Children | Complexity of Tamil in POS Tagging | Vowel Reduction and Elision in Igbo Data | A Review of IMAGINING MULTILINGUAL SCHOOLS - LANGUAGES IN EDUCATION AND GLOCALIZATION | Equal Access and English Language Learning | HOME PAGE OF JANUARY 2007 ISSUE | HOME PAGE | CONTACT EDITOR


S. Rajendran, Ph.D.
Department of Linguistics
Tamil University
Thanjavur 613 005
Tamilnadu, India
raj_ushush@yahoo.com
 
Web www.languageinindia.com
  • Send your articles
    as an attachment
    to your e-mail to
    mthirumalai@comcast.net.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.