LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 11 : 5 May 2011
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         Jennifer Marie Bayer, Ph.D.
         S. M. Ravichandran, Ph.D.
         G. Baskaran, Ph.D.
         L. Ramamoorthy, Ph.D.


HOME PAGE



BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to languageinindiaUSA@gmail.com.
  • Contributors from South Asia may e-mail their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    mallikarjun@ciil.stpmy.soft.net.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and book-length reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2010
M. S. Thirumalai


Custom Search

Sentence Boundary Disambiguation in Kannada Texts

Mona Parakh, Rajesha N. and Ramya M.


Abstract

The proposed paper reports the work on developing a system for identifying valid sentence boundaries in Kannada texts and fragmenting the text into sentences. The task of sentence boundary identification is made challenging by the fact that the period, question marks and exclamation marks, do not always mark the sentence boundary. This paper particularly addresses the issue of disambiguating period which can be a sentence boundary marker as well as a marker of abbreviation in Kannada. This methodology is devised to fragment corpora into sentences without any intermediate tools and resources like NER or Abbreviation List.

I. INTRODUCTION

As an important and challenging task sentence boundary disambiguation (SBD) is the problem in natural language processing of deciding where sentences begin and end. Often natural language processing tools require their input to be divided into sentences for various purposes such as building bilingual parallel corpora. “A parallel corpus is a collection of texts in two languages, one of which is the translation equivalent of the other. Although parallel corpora are very useful resources for many natural languages processing applications such as building machine translation systems, multi-lingual dictionaries and word sense disambiguation, they are not yet available for many languages of the world” [2].


This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ARTICLE IN PRINTER-FRIENDLY VERSION.


Mona Parakh
Reader-Research Officer
ldc-monaparakh@ciil.stpmy.soft.net

Rajesha N.
Senior Technical Officer
ldc-rajesha@ciil.stpmy.soft.net

Ramya M.
Senior Technical Officer
ldc-ramya@ciil.stpmy.soft.net

Linguistic Data Consortium for Indian Languages
Central Institute of Indian Languages
Mysore 570 006
Karnataka
India

Custom Search


  • Click Here to Go to Creative Writing Section

  • Send your articles
    as an attachment
    to your e-mail to
    languageinindiaUSA@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknowledged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.