LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 11 : 9 September 2011
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         Jennifer Marie Bayer, Ph.D.
         S. M. Ravichandran, Ph.D.
         G. Baskaran, Ph.D.
         L. Ramamoorthy, Ph.D.


HOME PAGE



BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to languageinindiaUSA@gmail.com.
  • Contributors from South Asia may e-mail their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    mallikarjun@ciil.stpmy.soft.net.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and book-length reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2010
M. S. Thirumalai


Custom Search

A Hybrid POS Tagger for Indian Languages

M. Mohamed Yoonus M.Sc., M.Phil., PGDNLP and Samar Sinha, M.A., M.Phil.


Abstract

This paper describes the work on building Part-of-Speech (POS) tagger for 12 Indian Languages using hybrid approach, and presents the performance of the tagger for each Indian language. Unlike the most of the previous POS taggers for Indian languages which are designed to annotate few languages, the present tagger called 'POS Tagger' is an attempt to facilitate annotation of several Indian languages following a computational approach. The POS Tagger is trained on 80K to 85K tagged corpora for each language from the LDC-IL corpus. Finally, this paper highlights the performance of the tagger and the need of language specific resources required for obtaining optimal result.

1 Introduction

The basic objective of Natural Language Processing (henceforth, NLP) is to facilitate human-machine interaction through the means of natural human language. Research on NLP has focused on various intermediate tasks that make partial sense of language structure without requiring complete understanding which, in turn, contributes to develop a successful system. Part-Of-Speech (henceforth, POS) tagging is one of the processes in which grammatical categories are assigned to each token in its context from a given set of tags called POS tagset. It serves wide number of applications like speech synthesis, and recognition, information extraction, partial parsing, machine translation, lexicography, Word Sense Disambiguation (WSD), question-answering, etc.

Although various automatic POS taggers have been developed worldwide using linguistic rules, stochastic models and hybrid approaches, but each approach has its own merits and demerits. In this context, Indian languages further present a challenge in developing an automatic POS tagger as the languages are highly inflectional and morphologically rich. Hence, we need to consider text processing prior to POS tagging in order to achieve high performance, more reliability, and to incorporate most of the Indian languages into a single framework of POS annotation.


This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ARTICLE IN PRINTER-FRIENDLY VERSION.


M. Mohamed Yoonus, M.Sc., M.Phil., PGDNLP
Lecturer cum Resource Person
LDC-IL Project
Central Institute of Indian Languages
Mysore 570006
Karnataka, India
yoonussoft@gmail.com

Samar Sinha, M.A., M.Phil.
Senior Lecturer cum Junior Research Officer
LDC-IL Project
Central Institute of Indian Languages
Mysore 570006
Karnataka, India
samarsinha@gmail.com

Custom Search


  • Click Here to Go to Creative Writing Section

  • Send your articles
    as an attachment
    to your e-mail to
    languageinindiaUSA@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknowledged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.