HOME PAGE
AN APPEAL FOR SUPPORT
- We seek your support to meet the expenses relating to the formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc.Please write to the Editor in his e-mail address languageinindiaUSA@gmail.com to find out how you can support this journal. Thank you. Thirumalai, Editor.
BOOKS FOR YOU TO READ AND DOWNLOAD FREE!
- A Study of B.ED. Students' Attitude
Towards Using Internet in Vellore District, Tamilnadu, India ... T. Pushpanathan, M.A., M.Phil., B.Ed.
- Development of a Hindi to Punjabi Machine Translation System, A Doctoral Dissertation ... Vishal Goyal, Ph.D.
- A Report on the State of Urdu Literacy in India, 2010 ...
Omar Khalidi, Ph.D.
- English for Medical Students of Hodeidah University, Yemen - A Pre-sessional Course ...
Arif Ahmed Mohammed Hassan Al-Ahdal, Ph.D. Scholar
- Global Perspective of Teaching English Literature in Higher Education in Pakistan ...
Rabiah Rustam, M.S., Ph.D. Candidate
- Improving Chemmozhi Learning and Teaching - Descriptive Studies in Classical-Modern Tamil Grammar ...
A. Boologa Rambai, Ph.D.
- A Phonetic and Phonological Study of
the Consonants of English and Arabic ...
Abdulghani A. Al-Hattami, Ph.D. Candidate
- Some Aspects of Teaching-Learning English as a
Second Language ...
R. Krishnaveni, M.A., M.Sc., M.Phil., Ph.D. Candidate
- The Influence of First Language Grammar (L1) on
the English Language (L2) Writing of Tamil School Students: A Case Study from Malaysia ...
Mahendran Maniam, Ph.D. (ESL)
- Economics of Crime : A Comparative Analysis of the Socio-Economic Conditions of Convicted Female and Male Criminality In Selected Prisons in Tamil Nadu ...
S. Santhanalakshmi, Ph.D.
- Technique as Voyage of Discovery: A Study of the Techniques in Dante's Paradiso ...
Raji Narasimhan, M.A.
- A Critical Study of The Wasteland - Poetry as Metaphor ...
K. R. Vijaya, M.A., M.Phil.
- Language and Literature: An Exposition - Papers Presented in the Karunya University National Seminar ...
Editor: J. Sundar Singh, Ph.D.
- Purism and Language Planning in a Multilingual Context ...
L. Ramamoorthy, Ph.D.
- Papers Presented in the All-India Conference on Multimedia Enhanced Language Teaching - MELT 2009 ...
L. Ramamoorthy, Ph.D. and J.R. Nirmala, Ph.D.
- A Phonological Study of Variety of English Spoken by Oriya Speakers in Western Orissa - A Doctoral Dissertation ... Arun K. Behera, Ph.D.
- Phonological Analysis of English Phonotactics of
Syllable Initial and Final Consonant Clusters by Yemeni Speakers of English ... Abdulghani. M. A. Al-Shuaibi, M.A.
- A Study of Structural Duplication in Tamil and Telugu - A Doctoral Dissertation ... Parimalagantham, Ph.D.
- The Politics of Survival in the Novels of Margaret Atwood ... Pauline Das, Ph.D.
- Nonverbal Communication in Tamil Novels -
A Book in Tamil ... M. S. Thirumalai, Ph.D.
Girish Karnad as a Modern Indian Dramatist - A Study ... B. Reena, M.A., M.Phil.
- A Study of English Loan Words in Selected Bahasa Melayu Newspaper Articles...
Shamimah Binti Haja Mohideen, M.HSc. (TESL)
- The Internal Landscape and the Existential Agony of Women in Anjana Appachana’s Novel LISTENING NOW, A Doctoral Dissertation ...
M. Poonkodi, Ph.D.
- Trends and Spatial Patterns of Crime in India - A Case Study of a District in India ...
M. Jayamala,, Ph.D.
- The Trading Community in Early Tamil Society Up To 900 AD ...
R. Jeyasurya, M.A., M.Phil., Ph.D.
- A Study of Auxiliaries in the Old and the Middle Tamil ...
A.Boologarambai, M.A., Ph.D.
- History of Growth and Reforms of British Military Administration in India, 1848-1949 ...
Hemalatha, M.A., M.Phil.
- Language of Mass Media: A Study Based on Malayalam Broadcasts - A Doctoral Dissertation ...
K. Parameswaran, Ph.D.
- Form and Function of Disorders in Verbal Narratives - A Doctoral Dissertation ...
Kandala Srinivasacharya, Ph.D.
- Status Marking in Tamil - A Ph.D. Dissertation ...
P. Perumalsamy, Ph.D.
- LANGUAGE AND POWER IN COMMUNICATION ...
Editors: Jennifer M. Bayer, Ph.D., and Pushpa Pai, Ph.D.
- Onomatopoeia in Tamil ...
V. Gnanasundaram, Ph.D.
- Linguistics and Literature ...
C.Shunmugom, Ph.D., and C. Sivashanmugam, Ph.D., V. Thayalan, Ph.D. and C. Sivakumar, Ph.D. (Editors)
- Translation: New Dimensions ...
C.Shunmugom, Ph.D., and C. Sivashanmugam, Ph.D., Editors
- Language of Headlines in Kannada Dailies ...
M. N. Leelavathi, Ph.D.
- Cooperative Learning Incorporating Computer-Mediated Communication: Participation, Perceptions, and Learning Outcomes in a Deaf Education Classroom ...
Michelle Pandian, M.S.
-
The Effects of Age on the Ability to Learn English As a Second Language ...
Mariam Dadabhai, B.A. Hons.
- A STUDY OF THE SKILLS OF READING COMPREHENSION IN ENGLISH DEVELOPED BY STUDENTS OF STANDARD IX IN THE SCHOOLS IN TUTICORIN DISTRICT, TAMILNADU ...
A. Joycilin Shermila, Ph.D.
- A Socio-Pragmatic Comparative Study of Ostensible Invitations in English and Farsi ...
Mohammad Ali Salmani-Nodoushan, Ph.D.
- ADVANCED WRITING - A COURSE TEXTBOOK ...
Parviz Birjandi, Ph.D. Seyyed Mohammad Alavi, Ph.D. Mohammad Ali Salmani-Nodoushan, Ph.D.
- TEXT FAMILIARITY, READING TASKS, AND ESP TEST PERFORMANCE: A STUDY ON IRANIAN LEP AND NON-LEP UNIVERSITY STUDENTS - A DOCTORAL DISSERTATION ...
Mohammad Ali Salmani-Nodoushan, Ph.D.
- A STUDY ON THE LEARNING PROCESS OF ENGLISH
BY HIGHER SECONDARY STUDENTS WITH SPECIAL REFERENCE TO DHARMAPURI DISTRICT IN TAMILNADU ... K. Chidambaram, Ph.D.
- SPEAKING STRATEGIES TO OVERCOME COMMUNICATION DIFFICULTIES IN THE TARGET LANGUAGE SITUATION - BANGLADESHIS IN NEW ZEALAND ...
Harunur Rashid Khan
- THE PROBLEMS IN LEARNING MODAL AUXILIARY VERBS IN ENGLISH AT HIGH SCHOOL LEVEL ...
Chandra Bose, Ph.D. Candidate
- THE ROLE OF VISION IN LANGUAGE LEARNING
- in Children with Moderate to Severe Disabilities ... Martha Low, Ph.D.
- SANSKRIT TO ENGLISH TRANSLATOR ...
S. Aparna, M.Sc.
- A LINGUISTIC STUDY OF ENGLISH LANGUAGE CURRICULUM AT THE SECONDARY LEVEL IN BANGLADESH - A COMMUNICATIVE APPROACH TO CURRICULUM DEVELOPMENT by
Kamrul Hasan, Ph.D.
- COMMUNICATION VIA EYE AND FACE in Indian Contexts by
M. S. Thirumalai, Ph.D.
- COMMUNICATION
VIA GESTURE: A STUDY OF INDIAN CONTEXTS by M. S. Thirumalai, Ph.D.
- CIEFL Occasional
Papers in Linguistics, Vol. 1
- Language, Thought
and Disorder - Some Classic Positions by M. S. Thirumalai, Ph.D.
- English in India:
Loyalty and Attitudes by Annika Hohenthal
- Language In Science
by M. S. Thirumalai, Ph.D.
- Vocabulary Education
by B. Mallikarjun, Ph.D.
- A CONTRASTIVE ANALYSIS OF HINDI
AND MALAYALAM by V. Geethakumary, Ph.D.
- LANGUAGE OF ADVERTISEMENTS
IN TAMIL by Sandhya Nayak, Ph.D.
- An Introduction to TESOL:
Methods of Teaching English to Speakers of Other Languages by M. S. Thirumalai, Ph.D.
- Transformation of
Natural Language into Indexing Language: Kannada - A Case Study by B. A. Sharada, Ph.D.
- How to Learn
Another Language? by M.S.Thirumalai, Ph.D.
- Verbal Communication
with CP Children by Shyamala Chengappa, Ph.D. and M.S.Thirumalai, Ph.D.
- Bringing Order
to Linguistic Diversity - Language Planning in the British Raj by Ranjit Singh Rangila, M. S. Thirumalai, and B. Mallikarjun
REFERENCE MATERIAL
BACK ISSUES
- E-mail your articles and book-length reports in Microsoft Word to languageinindiaUSA@gmail.com.
- Contributors from South Asia may e-mail their articles to
B. Mallikarjun, Central Institute of Indian Languages, Manasagangotri, Mysore 570006, India mallikarjun@ciil.stpmy.soft.net.
- PLEASE READ THE GUIDELINES GIVEN IN HOME PAGE IMMEDIATELY AFTER THE LIST OF CONTENTS.
- Your articles and booklength reports should be written following the APA, MLA, LSA, or IJDL Stylesheet.
- The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.
Copyright © 2010 M. S. Thirumalai
|
Development of Punjabi-Hindi Aligned Parallel Corpus
from Web Using Machine Translation
Gurpreet Singh Josan, Ph. D.
Jagroop Kaur, M. Tech.
Abstract
Aligned parallel corpus plays a vital role for research in various automatic NLP tasks. A constantly increasing resource for collecting parallel text is the World Wide Web. This paper discusses a novel approach for collecting parallel text for language pair Punjabi-Hindi. We use Machine Translation and DOM for finding parallel text from internet. The collected text is of heterogeneous nature and is aligned at word level with high precision. The approach discussed in this paper guarantees high quality parallel data in short time span.
1. Introduction
Recent advancements in natural language processing are largely based on statistical approaches. The parallel corpus plays a vital role in statistical approaches as it allows empirical studies for various applications of NLP as language studies, machine translation, cross language information retrieval, bi-lingual lexicon development etc. Parallel corpus is a collection of original texts translated to another language where the texts, paragraphs, and sentences down to word level are typically linked to each other.
There exists multi language parallel corpus like Europarl, Bible, and OPUS etc. as well as bi lingual parallel corpus like ISJ-ELAN Sloveign English, English Chinese, English Norwegian parallel corpus etc. English enjoys the privileges when it came to the creation of parallel corpora. Most of the time, it is one of the two languages in the pair. Also the size of available corpus is limited. Another constraint is the limited domain. Most of existing corpora are developed from either government documents or from Newswire texts. There is a scarcity of parallel corpora for any other language pair excluding English particularly among Indian languages. The problem is a big barrier in the development of NLP applications involving Indian Languages.
World Wide Web is a constantly evolving source of a parallel text. Electronically accessible information is available on the web and is increasing day by day. The web mining seems to be a promising and can be used for building parallel corpora for the under privileged and minority languages. Collecting parallel corpus particularly for resources starved languages from the internet is among the challenging problems in NLP tasks. This is not a trivial task at all for the huge network makes the process very labor intensive. Besides there are the chances that useful documents are mixed up with garbage and high quality translations are mixed up with garbage.
Therefore, scientists have designed several systems to automate this construction process. The idea leads to the development of software for automatic discovering parallel text on World Wide Web such as BITS (Xiaoyi and Liberman, 1999), PTMiner (Chen and Nie, 2000), and Strand (Resnik, 1998; Resnik and smith, 2003) etc. This paper describes a technique for automatic generation of parallel corpora for Punjabi and Hindi. We will try to utilize best possible techniques available and supplement these techniques with additional resources. We will show why the already present systems are not suitable for our work and then we describe how a machine translation system helps in identifying and then aligning the parallel corpus obtained from the web.
2. Existing systems
(Resnik, 1998) proposed a simple method based on the anchor tag. A simple query is posted to Altavista to locate the pages that point to a pair of pages which contain an anchor text indicating the language of its parallel text. This is the case for an Index.html file which contains pointers to two parallel texts anchored as "English version" and "French version". However this simple method can only catch a small part of all the parallel pages. A lot of other parallel pages do not satisfy this condition.
This is only the beginning part of the article. PLEASE CLICK HERE TO READ THE ARTICLE IN PRINTER-FRIENDLY VERSION.
Implementing Explicit Grammatical Instruction in Thailand Schools | Nature of Sentence Intonation in Kannada, Tulu and Konkani | Language and Gender - Linguistic Analysis of Intermediate English Textbooks in Pakistan | Development of Punjabi-Hindi Aligned Parallel Corpus from Web Using Machine Translation | Paralinguistic and Non-Verbal Props in Second-Language Use: A Study of Icheoku and Masquerade in Nigeria | Economic Perspectives and Life-style Characteristics of the Aged Population in Tamil Nadu, India | Redefining Secularism - An Analysis of John Updike's Terrorist and Mohsin Hamid's The Reluctant Fundamentalist as Post-9/11 Novels | Reduplication in Bengali Language | Development of Time-Compressed Speech Test for Children between 8 - 12 Years of Age in Telugu | Bridging the Gap - The Potential of Contrastive Rhetoric in Teaching L2 Writing | ELT in Yemen and India - The Need for Remedial Measures | Relationship between Multiple Intelligence Categories and Learning Styles of Students in Pakistan | Internet as an Educational Resource in Vocabulary Instruction | The Effectiveness of Technology in Teaching Study Skills | A Study of the Comparative Elements in the Poetry of Keats and Ghani Khan | Sentence Pattern Method - A New Approach for Teaching Spoken English for Tamil/Indian/EFL Learners | Enhancing Language Skills Using Learn to Speak English Software in Engineering Students of Andhra Pradesh | Problems in Teaching of English Language at the Primary Level in District Kohat, NWFP, Pakistan | An Appraisal of the Practicum - Finding the Gaps between Theory and Practice in Teacher Training Institutions in Pakistan | A Study of B.Ed. Students' Attitude Towards Using Internet in Vellore District, Tamilnadu, India, Masters Dissertation | Politics of Sambalpuri or Kosali as a Dialect of Oriya in Orissa | A Six-Step Approach to Teaching Poetry Incorporating the Four Skills | Lexis of a Suicidal | A Case Review of Tamil Diglossia | Comparison of Markedness of Lexical Semantic Abilities in Normal Children and Children with Hearing Impairment | Social Effects and Other Impediments in Teaching Literature | Aligning the Connotations of Love and Freedom in the Novels of Iris Murdoch | Spiritual Communication and Managerial Effectiveness | A PRINT VERSION OF ALL THE PAPERS OF NOVEMBER, 2010 ISSUE IN BOOK FORMAT. | HOME PAGE of November 2010 Issue | HOME PAGE | CONTACT EDITOR languageinindiaUSA@gmail.com
Gurpreet Singh Josan, Ph.D.
Rayat & Bahra Institute of Engineering & Biotechnology
Sahauran
Mohali
Punjab, India
josangurpreet@rediffmail.com
Jagroop Kaur, M.Tech.
University College of Engineering
Punjabi University
Patiala
Punjab, India
jagroop_80@rediffmail.com
|
- Send your articles
as an attachment to your e-mail to languageinindiaUSA@gmail.com.
- Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.
|