HOME PAGE
AN APPEAL FOR SUPPORT
- We seek your support to meet expenses relating to some new and essential software, formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc. You can use the PAYPAL link given above. Please click on the PAYPAL logo, and it will take you to the PAYPAL website. Please use the e-mail address mthirumalai@comcast.net to make your contributions using PAYPAL.
Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.
BOOKS FOR YOU TO READ AND DOWNLOAD FREE!
- A STUDY OF THE SKILLS OF READING
COMPREHENSION IN ENGLISH DEVELOPED BY STUDENTS OF STANDARD IX IN THE SCHOOLS IN TUTICORIN DISTRICT, TAMILNADU ...
A. Joycilin Shermila, Ph.D.
- A Socio-Pragmatic Comparative Study of Ostensible Invitations in English and Farsi ...
Mohammad Ali Salmani-Nodoushan, Ph.D.
- ADVANCED WRITING - A COURSE TEXTBOOK ...
Parviz Birjandi, Ph.D. Seyyed Mohammad Alavi, Ph.D. Mohammad Ali Salmani-Nodoushan, Ph.D.
- TEXT FAMILIARITY, READING TASKS, AND ESP TEST PERFORMANCE: A STUDY ON IRANIAN LEP AND NON-LEP UNIVERSITY STUDENTS - A DOCTORAL DISSERTATION ...
Mohammad Ali Salmani-Nodoushan, Ph.D.
- A STUDY ON THE LEARNING PROCESS OF ENGLISH
BY HIGHER SECONDARY STUDENTS WITH SPECIAL REFERENCE TO DHARMAPURI DISTRICT IN TAMILNADU ... K. Chidambaram, Ph.D.
- SPEAKING STRATEGIES TO OVERCOME COMMUNICATION
DIFFICULTIES IN THE TARGET LANGUAGE SITUATION - BANGLADESHIS IN NEW ZEALAND ...
Harunur Rashid Khan
- THE PROBLEMS IN LEARNING MODAL AUXILIARY VERBS IN ENGLISH AT HIGH SCHOOL LEVEL ...
Chandra Bose, Ph.D. Candidate
- THE ROLE OF VISION IN LANGUAGE LEARNING
- in Children with Moderate to Severe Disabilities ... Martha Low, Ph.D.
- SANSKRIT TO ENGLISH TRANSLATOR ...
S. Aparna, M.Sc.
- A LINGUISTIC STUDY OF ENGLISH LANGUAGE CURRICULUM AT THE SECONDARY LEVEL IN BANGLADESH - A COMMUNICATIVE APPROACH TO CURRICULUM DEVELOPMENT by
Kamrul Hasan, Ph.D.
- COMMUNICATION VIA EYE AND FACE in Indian Contexts by
M. S. Thirumalai, Ph.D.
- COMMUNICATION
VIA GESTURE: A STUDY OF INDIAN CONTEXTS by M. S. Thirumalai, Ph.D.
- CIEFL Occasional
Papers in Linguistics, Vol. 1
- Language, Thought
and Disorder - Some Classic Positions by M. S. Thirumalai, Ph.D.
- English in India:
Loyalty and Attitudes by Annika Hohenthal
- Language In Science
by M. S. Thirumalai, Ph.D.
- Vocabulary Education
by B. Mallikarjun, Ph.D.
- A CONTRASTIVE ANALYSIS OF HINDI
AND MALAYALAM by V. Geethakumary, Ph.D.
- LANGUAGE OF ADVERTISEMENTS
IN TAMIL by Sandhya Nayak, Ph.D.
- An Introduction to TESOL:
Methods of Teaching English to Speakers of Other Languages by M. S. Thirumalai, Ph.D.
- Transformation of
Natural Language into Indexing Language: Kannada - A Case Study by B. A. Sharada, Ph.D.
- How to Learn
Another Language? by M.S.Thirumalai, Ph.D.
- Verbal Communication
with CP Children by Shyamala Chengappa, Ph.D. and M.S.Thirumalai, Ph.D.
- Bringing Order
to Linguistic Diversity - Language Planning in the British Raj by Ranjit Singh Rangila, M. S. Thirumalai, and B. Mallikarjun
REFERENCE MATERIAL
BACK ISSUES
- E-mail your articles and book-length reports (preferably in Microsoft Word) to mthirumalai@comcast.net.
- Contributors from South Asia may send their articles to
B. Mallikarjun, Central Institute of Indian Languages, Manasagangotri, Mysore 570006, India or e-mail to mallikarjun@ciil.stpmy.soft.net
- Your articles and booklength reports should be written following the MLA, LSA, or IJDL Stylesheet.
- The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.
Copyright © 2006 M. S. Thirumalai
|
COMPLEXITY OF TAMIL IN POS TAGGING S. Rajendran, Ph.D.
THE FOCUS OF THIS PAPER
The paper aims to focuses on the Morphological complexity in Tamil language from the point of view of POS tagging. Nouns get inflected for number and cases. Verbs get inflected for various inflections which include tense, finite and non-finite suffixes. Verbs are adjectivalized and adverbialized. Also verbs and adjectives are nominalized by means of certain nominalizers. Adjectives and adverbs do not inflect. Many post-positions in Tamil are from nominal and verbal sources. So, many times we need to depend on syntactic function or context to decide upon whether one is a noun or adjective or adverb or post position. This leads to the complexity of Tamil in POS tagging.
PARTS OF SPEECH IN TAMIL
The following parts of speech or word classes are identified for Tamil languages by modern grammarians:1) Noun, 2) Verb, 3) Adjective, 4) Adverb, 5) Postposition, 6) Numeral, 7) Quantifier, 8) Words of conjunction, 9) Exclamatory words, 10) Words expressing feeling, 11) Word of calling, and 13) Words accepting calling.
NOMINAL COMPLEXITY
Nouns need to be annotated into pronoun, proper noun and common noun. Pronouns need to be further annotated for person (1st, 2nd and 3rd), number (singular and plural), gender (masculine, feminine, neuter), status (honorific and non-honorific). Nouns need to be annotated into rational and irrational. Also nouns need to be annotated for nominative, accusative, dative, instrumental, sociative, locative, ablative, genitive, vocative cases. Nouns and Pronouns need to be annotated as oblique or non-oblique form.
Furthermore, nouns need to be annotated for number and gender (masculine, feminine, and neuter) as the subject nouns show agreement with PNG marker at the finite verbal form. Nominaliztion makes the nominalized verbal form more complex. Nomininalized verbal forms need to be distinguished into two or three types. For example, Tamil requires the productive forms formed by the suffixation of tal/kai/aamai which are sentential in nature are to be differentiated from non-productive forms formed by the suffixation of ppu etc. which are lexical in nature. In the following examples, paTittal is sentential form and paTippu is lexical form.
Many pronominalized forms are also ambiguous in Tamil and need to be distinguished into two types: lexical and sentential (productive).
VERBAL COMPLEXITY
The verbal forms are complex in Tamil. A finite verb shows the following morphological structure:
V+Tense+PNG
A number of non-finite forms are possible: adverbial forms, Adjectival forms, infinite forms and nominalized forms.
Distinction needs to be made between main verb followed by main verb and main Verb followed by an auxiliary verb. The main verb followed by an auxiliary need to be interpreted together, whereas the main verb followed by a main verb need to be interpreted separately. This leads to functional ambiguity.
FUNCTIONAL AMBIGUITY IN ADVERBIAL FORM
FUNCTIONAL AMBIGUITY IN INFINITIVAL FORM
The adjectival forms differ by tense markings: V+Tense+Adjectivalizer.
Adjectival form allows several interpretations as given in the following examples.
The adjectival forms when followed by nouns such as ceyti 'news', and uNmai 'fact' etc. are ambiguous as they allow relative interpretation and non-relative interpretation.
Some adjectivialized verbal forms of verbs are lexicalized as adjectives (as against sentential ones). So there is ambiguity in the interpretation of them purely as an adjective modifying only the noun which it follow and sentential adjective modifying the noun which stands as a relative clause modifying the nominalizer (i.e. noun which moved to position after the relativized verb).
Nominals can function as adjectives modifying a noun as given in the following examples.
Verbal roots functions can function as adjectives as given in the following examples.
cuTu cooRu (T) 'hot rice'
aazh kiNaRu (T) 'deep well'
A number of adverbial forms of verbs functions as postpositions. They are discussed under 'complexity in postpositions'.
COMPLEXITY IN ADVERBS
We have seen that a number of adjectival and adverbial forms of verbs are lexicalized as adjectives and adverbs respectively and clash with their respective sentential adjectival and adverbial forms semantically creating ambiguity in POS tagging.
Adverbs too need to be distinguished based on their source category. Many adverbs are derived by suffixing aaka with nouns in Tamil. But not all aaka suffixed forms are adverbial.
Functional clash can be seen between adjective and adverb in aaka suffixed forms. This type of clash is seen among other Dravidian languages too.
COMPLEXITY IN POSTPOSITIONS
Postpositions are from various categories such as verbal, nominal and adverbial in Tamil. Many a time, the demarking line between verb/noun/adverb and postposition is slim leading to ambiguity. Some postpositions are simple and some are compound. Postpositions are conditioned by the nouns inflected for case they follow. Simply tagging one form as postposition will be misleading There are postpositions which come after noun and also after verbs which makes the postposition ambiguous (spatial vs. temporal).
Use of adverbial forms of verbs leads to ambiguity in the annotation of postpositions.
CONCLUSION
Tamil is no doubt a morphologically rich language. The relation between verb and its nominal arguments is decided by case suffixes rather than position. It is possible to have a few numbers of tagset at shallow level. But one needs to address other unique features at the deep level. Hierarchical tagset is a welcome thing.
This is only a brief summary.
PLEASE CLICK HERE TO READ THE ENTIRE ARTICLE IN A PRINTER-FRIENDLY VERSION.
Diasporic Experience: A Gateway to Liberation in the Novels of Chitra Banerjee Divakaruni | The Language of Rhythm Instruments: A Preliminary Study With Reference to "Mridangam" | A Study of Echolalia in Malayalam Speaking Autistic Children | Complexity of Tamil in POS Tagging | Vowel Reduction and Elision in Igbo Data | A Review of IMAGINING MULTILINGUAL SCHOOLS - LANGUAGES IN EDUCATION AND GLOCALIZATION | Equal Access and English Language Learning | HOME PAGE OF JANUARY 2007 ISSUE | HOME PAGE | CONTACT EDITOR
S. Rajendran, Ph.D.
Department of Linguistics
Tamil University
Thanjavur 613 005
Tamilnadu, India
raj_ushush@yahoo.com
|
- Send your articles
as an attachment to your e-mail to mthirumalai@comcast.net.
- Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.
|