LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 18:1 January 2018
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         Jennifer Marie Bayer, Ph.D.
         G. Baskaran, Ph.D.
         L. Ramamoorthy, Ph.D.
         C. Subburaman, Ph.D. (Economics)
         N. Nadaraja Pillai, Ph.D.
         Renuga Devi, Ph.D.
         Soibam Rebika Devi, M.Sc., Ph.D.
         Dr. S. Chelliah, Ph.D.
Assistant Managing Editor: Swarna Thirumalai, M.A.

Language in India www.languageinindia.com is included in the UGC Approved List of Journals. Serial Number 49042.


HOME PAGE

Click Here for Back Issues of Language in India - From 2001




BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIALS

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to languageinindiaUSA@gmail.com.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and book-length reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2016
M. S. Thirumalai

Publisher: M. S. Thirumalai, Ph.D.
11249 Oregon Circle
Bloomington, MN 55438
USA


Custom Search

Developing POS Tagset for Dogri

Sunil Kumar, M.A., M.Phil., B.Ed.
Central Institute of Indian Languages


Abstract

Annotated Text Corpora is an important resource for advances in Natural Language Processing (NLP) research and for developing different language technologies. The annotation of corpora is done using a set of tags, which mark the linguistic properties of a word, sentence or discourse. In corpus linguistics the parts of speech tagging is also called as grammatical tagging or word category disambiguation. This is a process of marking up the words in text or corpus as corresponding to a particular part of speech based on both its definition, as well as its context i.e. the relationship with adjacent and related words in phrase, sentence, or paragraph. The corpora annotated with various linguistic information not only form a precious resource for language technologies but also involves large amount of effort and time. Therefore, it is important to create corpora which once created can be used for various purposes. In softwares like Machine Translation, Information Retrieval, speech recognition and other related areas, the significance of large annotated corpora in the present day is widely known. This paper makes an attempt to provide a structure of POS tag set module for Dogri language, one among the languages of Indo-Aryan family.

Keywords: Corpora, Dogri, Part-of-Speech (POS), Tagging, Tagset.

Dogri Language

Dogri is one of the modern Indo-Aryan languages along with Punjabi which have developed tonal contrasts. It has three tones: low / ? / mid / - / and high / ’/. Dogri is a morphologically rich language having the pre-dominant word order of Subject-Object-Verb (SOV) with a flexibility to rearrange the constituents as many Indian languages allow. Nouns are generally inflected for number, gender and case. There are two numbers –singular and plural; two genders-masculine and feminine; and three cases- simple, oblique and vocative. The oblique forms occur when a noun or noun phrase is followed by a postposition. Nouns are inflected according to their gender and the word final sound. Dogri is a modern Indo-Aryan language spoken primarily in the Jammu and Kashmir state and the adjoining areas of Himachal Pradesh, Punjab and across the border in Sialkot and Shakargarh tehsils presently in Pakistan. As language part of the Census of India 2011 is not available so according to the Census of India 2001 the number of Dogri speakers is 22,82,589.


This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ENTIRE ARTICLE IN PRINTER-FRIENDLY VERSION.



Sunil Kumar, M.A., M.Phil., B.Ed.
Senior Resource Person (Academic)
National Translation Mission
Central Institute of Indian Languages
Mysore 570006
Karnataka
India
sk07choudhary@gmail.com


Custom Search


  • Click Here to Go to Creative Writing Section

  • Send your articles
    as an attachment
    to your e-mail to
    languageinindiaUSA@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknowledged the work or works of others you used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian/South Asian scholarship.