LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 25:11 November 2025
ISSN 1930-2940

Editors:
         Selvi M. Bunce, M.A., Ph.D. Candidate
         Nathan Mulder Bunce, M.A., Ph.D. Candidate
         Sam Mohanlal, Ph.D.
         B. Mallikarjun, Ph.D.
         A. R. Fatihi, Ph.D.
         G. Baskaran, Ph.D.
         T. Deivasigamani, Ph.D.
         Pammi Pavan Kumar, Ph.D.
         Soibam Rebika Devi, M.Sc., Ph.D.

Honorary Managing Editor & Publisher: M. S. Thirumalai, Ph.D.

Celebrate India!
Unity in Diversity!!

HOME PAGE

Click Here for Back Issues of Language in India - From 2001

Poetic Encounter
Available in https://www.amazon.in/dp/B09TT86S4T

Poems
Naked: the honest browsings of two brown women
Available in https://www.amazon.in

Decrees
Available in https://www.amazon.com




BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIALS

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to languageinindiaUSA@gmail.com.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and book-length reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2025
M. S. Thirumalai

Publisher: M. S. Thirumalai, Ph.D.
11249 Oregon Circle
Bloomington, MN 55438
USA


Custom Search

Evaluating IndicNER for Telugu: Entity-Level Performance and Error Analysis

Vanka Ramesh


Abstract

This paper presents a comprehensive evaluation of IndicNER, a multilingual transformer-based Named Entity Recognition (NER) system developed by AI4Bharat, on Telugu-language data. A dataset comprising 100 headlines collected manually from the Andhra Jyothi newspaper was annotated for three entity types: Person (PER), Location (LOC), and Organization (ORG). The IndicNER model’s output was benchmarked against human annotations using standard metrics—Precision, Recall, and F1-score. While the system achieves satisfactory results for frequently named entities, it exhibits significant performance drops in handling morphologically rich expressions, compound names, and regional or domain-specific entities. Detailed error analysis reveals systematic challenges, including the misclassification of honorifics, code-mixed tokens, and inconsistent tagging of multi-word entities. The findings underscore the need for domain-adapted NER models and more representative training corpora for Telugu and other low-resource Indian languages.

Keywords:Natural Language Processing, Named Entity Recognition, IndicNER, Telugu

Introduction

Named Entity Recognition (NER) is a foundational task in Natural Language Processing (NLP), which involves identifying and classifying entities, such as persons, locations, and organizations, within text. Over the past two decades, NER systems for high-resource languages like English have achieved remarkable accuracy due to the availability of large annotated corpora and advanced deep learning architectures. However, for an Indian language like Telugu, which is spoken by over 80 million people primarily in southern India, the development and evaluation of robust NER systems remain a significant challenge.

Dravidian languages, including Telugu, present unique linguistic complexities for NER systems, including agglutinative morphology, rich inflectional forms, compound nouns, and frequent use of honorifics. Furthermore, Media language, with some grammatical liberties and shortforms, is often characterized by code-mixing, domain-specific jargon and abbreviations, making entity recognition more difficult. Despite recent advancements in multilingual models, the performance of these systems on Telugu data is underexplored and largely unvalidated.

IndicNER, a transformer-based NER model developed by IIT Madras as part of AI4Bharat, represents one of the first major efforts to bring pre-trained NER capabilities to Indian languages. While IndicNER has demonstrated promising results across several languages in the Indic NLP landscape, its performance on domain-specific and informal Telugu text—such as that found in newspaper headlines—has not been rigorously benchmarked or analyzed.


This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ENTIRE ARTICLE IN PRINTER-FRIENDLY VERSION.


Vanka Ramesh
Research Scholar
University of Hyderabad
vankaramesh2001@gmail.com


Custom Search


  • Click Here to Go to Creative Writing Section

  • Send your articles
    as an attachment
    to your e-mail to
    languageinindiaUSA@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknowledged the work or works of others you used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian/South Asian scholarship.