LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 11 : 5 May 2011
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         Jennifer Marie Bayer, Ph.D.
         S. M. Ravichandran, Ph.D.
         G. Baskaran, Ph.D.
         L. Ramamoorthy, Ph.D.


HOME PAGE



BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to languageinindiaUSA@gmail.com.
  • Contributors from South Asia may e-mail their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    mallikarjun@ciil.stpmy.soft.net.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and book-length reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2010
M. S. Thirumalai


Custom Search

ADVANCEMENT OF CLINICAL STEMMER

Pramod Premdas Sukhadeve and Dr. Sanjay Kumar Dwivedi


Abstract

Word Stemming is common form of language processing in most Information Retrieval (IR) systems. Word stemming is an important feature supported by present day indexing and search systems. Idea is to improve by automatic handling of word endings by reducing the words to their word roots, at the time of indexing and searching. Stemming is usually done by removing any attached suffixes, and prefixes from index terms before the assignment of the term. Since the stem of a term represents a broader concept than the original term, the stemming process eventually increases the number of retrieved documents. Texts from the medical domain are an important task for natural language processing. This paper investigates the usefulness of a large medical database for the translation of medical documents using a rule based machine translation system. We are able to show that the extraction of affixes from the words.

Keywords: Stemming, Information Retrieval, Suffix, Prefix, Natural Language Processing.

Introduction

Stemming is the procedure of finding the root word, by stripping away the affix attached to the word. In many languages words are often obtained by affixing existing words or roots. Stemming is a widespread form of language processing in most information retrieval systems [1]. It is similar to the morphological process used in natural language processing, but has somewhat different aims. In an Information retrieval system, stemming is used to reduce different word forms to common roots, and thereby improve the aptitude of the system to match query and document vocabulary. It also helps in clinical language to knob the clinical terms, names of deceases and symptoms of patient. Although stemming has been studied mainly for English, there is evidence that it is useful for a number of languages. Stemming in English is usually done during document indexing by removing word endings or suffixes using tables of common endings and heuristics about when it is appropriate to remove them. Thus using a stemmer improves the number of documents retrieved in response to translate the clinical data. Also, since many terms are mapped to one, stemming serves to decrease the size of the index files in the information retrieval system. Many stemming algorithms have been proposed, and there have been many experimental evaluations of these. But, very few work on stemming has been reported for clinical language. This paper investigates the usefulness of a large medical database for the translation of documents; we present a stemmer for clinical language. This conflates1 terms by stripping off word endings from a suffix list maintained in a database.


This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ARTICLE IN PRINTER-FRIENDLY VERSION.


Pramod Premdas Sukhadeve
Department of Computer Science Babasaheb Bhimrao Ambedkar University (A Central University)
Lucknow
Uttar Pradesh, India
Sukhadeve.pramod@gmail.com

Dr. Sanjay Kumar Dwivedi
Department of Computer Science
Babasaheb Bhimrao Ambedkar University (A Central University)
Vidya Vihar Raebareli Road
Lucknow
Uttar Pradesh, India
Skd2000@yahoo.com

Custom Search


  • Click Here to Go to Creative Writing Section

  • Send your articles
    as an attachment
    to your e-mail to
    languageinindiaUSA@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknowledged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.