LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 20:8 August 2020
ISSN 1930-2940

Editors:
         Sam Mohanlal, Ph.D.
         B. Mallikarjun, Ph.D.
         A. R. Fatihi, Ph.D.
         G. Baskaran, Ph.D.
         T. Deivasigamani, Ph.D.
         Pammi Pavan Kumar, Ph.D.
         Soibam Rebika Devi, M.Sc., Ph.D.

Managing Editor & Publisher: M. S. Thirumalai, Ph.D.

Celebrate India!
Unity in Diversity!!

HOME PAGE

Click Here for Back Issues of Language in India - From 2001




BOOKS FOR YOU TO READ AND DOWNLOAD FREE!


REFERENCE MATERIALS

BACK ISSUES


  • E-mail your articles and book-length reports in Microsoft Word to languageinindiaUSA@gmail.com.
  • PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
  • Your articles and book-length reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2020
M. S. Thirumalai

Publisher: M. S. Thirumalai, Ph.D.
11249 Oregon Circle
Bloomington, MN 55438
USA


Custom Search

Generating a Parallel Corpus Stream for Odia: Mining Parallel Corpus
from Odia Twitter

Anjan Kumar Panda, MSC IT, KSOU Mysore and
Dr Arun Kumar Malik, PhD


Introduction

A corpus is a fundamental need for natural language process applications.

A parallel corpus is a foundational need for languages like Odia (Oriya - The Unicode Standard, Version 13.0."https://unicode.org/charts/PDF/U0B00.pdf. Accessed 8 Aug. 2020) which would enable explorations in natural language processing advancements into machine translation, (Machine translation - Wikipedia. "https://en.wikipedia.org/wiki/Machine translation. Accessed 8 Aug. 2020) computational language modelling, (Language model - Wikipedia." https://en.wikipedia.org/wiki/Language_model. Accessed 8 Aug. 2020) Question Answer Systems, Generative (Better Language Models and Their Implications - OpenAI." 14 Feb. 2019, https://openai.com/blog/better-language-models/. Accessed 8 Aug. 2020) systems.

To make a neural model learn actively a stream of training data is needed.

NLP tasks based on neural architectures based on deep learning need a lot of training data.

Machine Translation tasks need millions of parallel pairs known as a parallel corpus for training.

This paper describes a way to mine a parallel corpus stream on social media to be used by machine learning-based natural language processing systems.


This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ENTIRE ARTICLE IN PRINTER-FRIENDLY VERSION.


Anjan Kumar Panda, MSC IT, KSOU Mysore
Internet Application Specialist, Technology Manager
Life Member, OSA. The Odisha Society of the Americas
5050, Hacienda Drive, Apt 2232, Dublin, CA, 94568
panda.anjankumar@gmail.com
Contact: 1- 845-535-0961

Dr Arun Kumar Malik, PhD, Assistant Professor of Political Science
Gujarat National Law University, Gandhinagar
amalik@gnlu.ac.in
Contact No. 8128650850

Custom Search


  • Click Here to Go to Creative Writing Section

  • Send your articles
    as an attachment
    to your e-mail to
    languageinindiaUSA@gmail.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknowledged the work or works of others you used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian/South Asian scholarship.