HOME PAGE
AN APPEAL FOR SUPPORT
- We seek your support to meet expenses relating to some new and essential software, formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc. You can use the PAYPAL link given above. Please click on the PAYPAL logo, and it will take you to the PAYPAL website. Please use the e-mail address thirumalai@mn.rr.com to make your contributions using PAYPAL.
Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.
BOOKS FOR YOU TO READ AND DOWNLOAD FREE!
- A STUDY OF THE SKILLS OF READING
COMPREHENSION IN ENGLISH DEVELOPED BY STUDENTS OF STANDARD IX IN THE SCHOOLS IN TUTICORIN DISTRICT, TAMILNADU ...
A. Joycilin Shermila, Ph.D.
- A Socio-Pragmatic Comparative Study of Ostensible Invitations in English and Farsi ...
Mohammad Ali Salmani-Nodoushan, Ph.D.
- ADVANCED WRITING - A COURSE TEXTBOOK ...
Parviz Birjandi, Ph.D. Seyyed Mohammad Alavi, Ph.D. Mohammad Ali Salmani-Nodoushan, Ph.D.
- TEXT FAMILIARITY, READING TASKS, AND ESP TEST PERFORMANCE: A STUDY ON IRANIAN LEP AND NON-LEP UNIVERSITY STUDENTS - A DOCTORAL DISSERTATION ...
Mohammad Ali Salmani-Nodoushan, Ph.D.
- A STUDY ON THE LEARNING PROCESS OF ENGLISH
BY HIGHER SECONDARY STUDENTS WITH SPECIAL REFERENCE TO DHARMAPURI DISTRICT IN TAMILNADU ... K. Chidambaram, Ph.D.
- SPEAKING STRATEGIES TO OVERCOME COMMUNICATION
DIFFICULTIES IN THE TARGET LANGUAGE SITUATION - BANGLADESHIS IN NEW ZEALAND ...
Harunur Rashid Khan
- THE PROBLEMS IN LEARNING MODAL AUXILIARY VERBS IN ENGLISH AT HIGH SCHOOL LEVEL ...
Chandra Bose, Ph.D. Candidate
- THE ROLE OF VISION IN LANGUAGE LEARNING
- in Children with Moderate to Severe Disabilities ... Martha Low, Ph.D.
- SANSKRIT TO ENGLISH TRANSLATOR ...
S. Aparna, M.Sc.
- A LINGUISTIC STUDY OF ENGLISH LANGUAGE CURRICULUM AT THE SECONDARY LEVEL IN BANGLADESH - A COMMUNICATIVE APPROACH TO CURRICULUM DEVELOPMENT by
Kamrul Hasan, Ph.D.
- COMMUNICATION VIA EYE AND FACE in Indian Contexts by
M. S. Thirumalai, Ph.D.
- COMMUNICATION
VIA GESTURE: A STUDY OF INDIAN CONTEXTS by M. S. Thirumalai, Ph.D.
- CIEFL Occasional
Papers in Linguistics, Vol. 1
- Language, Thought
and Disorder - Some Classic Positions by M. S. Thirumalai, Ph.D.
- English in India:
Loyalty and Attitudes by Annika Hohenthal
- Language In Science
by M. S. Thirumalai, Ph.D.
- Vocabulary Education
by B. Mallikarjun, Ph.D.
- A CONTRASTIVE ANALYSIS OF HINDI
AND MALAYALAM by V. Geethakumary, Ph.D.
- LANGUAGE OF ADVERTISEMENTS
IN TAMIL by Sandhya Nayak, Ph.D.
- An Introduction to TESOL:
Methods of Teaching English to Speakers of Other Languages by M. S. Thirumalai, Ph.D.
- Transformation of
Natural Language into Indexing Language: Kannada - A Case Study by B. A. Sharada, Ph.D.
- How to Learn
Another Language? by M.S.Thirumalai, Ph.D.
- Verbal Communication
with CP Children by Shyamala Chengappa, Ph.D. and M.S.Thirumalai, Ph.D.
- Bringing Order
to Linguistic Diversity - Language Planning in the British Raj by Ranjit Singh Rangila, M. S. Thirumalai, and B. Mallikarjun
REFERENCE MATERIAL
BACK ISSUES
- E-mail your articles and book-length reports (preferably in Microsoft Word) to thirumalai@mn.rr.com.
- Contributors from South Asia may send their articles to
B. Mallikarjun, Central Institute of Indian Languages, Manasagangotri, Mysore 570006, India or e-mail to mallikarjun@ciil.stpmy.soft.net
- Your articles and booklength reports should be written following the MLA, LSA, or IJDL Stylesheet.
- The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.
Copyright © 2004 M. S. Thirumalai
|
A SURVEY OF THE STATE OF THE ART IN
TAMIL LANGUAGE TECHNOLOGY S. Rajendran, Ph.D.
A PRELUDE
The use of computer for language analysis leads to the technological
development of languages in general, and Tamil, in particular. The world
scenario has its impact on Tamil language too. Both the government and private
organizations have initiated programs for the technological development of Tamil
language.
The Department of Electronics had conducted training Courses on
Natural Language Processing through selected institutions throughout India and
paved way to technological development of Tamil. It funded Machine Translation
programs among Indian languages and between English and Indian languages. It
also funded for the development of corpus for Indian languages. It had identified
certain centres for the Technological Development of Indian languages and
funded them to initiate projects, which aims to achieve their goal.
Anna University at Chennai had been identified for the technological
development of Tamil language and provided with a fund of a few crores of
rupees to fulfill this mission. Under this scheme a Resource Centre for Indian
Language Technology Solutions-Tamil has been established at Anna University.
A team of researchers employed under the scheme has prepared a number of
Language Technology Products. This has lead to the technological development
of Tamil in many areas. Many other organizations, both government and private,
followed this.
Tamil University at Thanjavur, Tamil Virtual University, AUKBC Research Centre
at Chennai, Central Institute of Indian Languages at Mysore and International
Forum for Information Technology in Tamil (INFITT), which conducts
international conference of Tamil internet every year, put their efforts for the
technological development of Tamil. Apart from the above institutions IIT,
Chennai, IISC, Bangalore, and Micro Software, Bangalore also have contributed
for the technological development of Tamil.
In this paper the technological development of Tamil has been classified under
certain heads and the research works under taken and successfully completed
as well as the products made are discussed in details.
CORPUS AND CORPUS MANAGEMENT TOOLS
Corpus linguistics seeks to further our understanding of language through
the analysis of large quantities of naturally occurring data. There is a long
tradition of corpus linguistic studies in Europe. The need for corpus for a
language is multifarious. Starting from the preparation of a dictionary or lexicon
to machine translation, corpus has become an inevitable resource for
technological development of languages. Corpus means a body of huge text
incorporating various types of textual materials, including newspaper, weeklies,
fictions, scientific writings, literary writings, and so on. Corpus represents all the
styles of a language. Corpus must be very huge in size as it is going to be used
for many language applications such as preparation of lexicons of different sizes,
purposes and types, machine translation programs and so on.
Tagged corpus, Parallel Corpus, and Aligned Corpus
Corpuses can be distinguished as tagged corpus, parallel corpus and
aligned corpus. The tagged corpus is that which is tagged for part-of-speech. A
parallel corpus contains texts and translations in each of the languages involved
in it. It allows wider scopes for double-checking of the translation equivalents.
Aligned corpus is a kind of bilingual corpus where text samples of one language
and their translations into other language are aligned, sentence by sentence,
phrase by phrase, word by word, or even character by character.
CIIL Corpus for Tamil
As for as building corpus for the Indian languages is concerned it was Central
Institute of Indian languages (CIIL) which took initiative and started preparing
corpus for some of the Indian languages (Tamil, Telugu, Kannada, and
Malayalam). Department of electronics (DOE) financed the corpus-building
project. The target was to prepare corpus with ten million words for each
language. But due to financial crunch and time restriction it ends up with three
million words for each language. Tamil corpus with three million words is built by
CIIL in this way. It is a partially tagged corpus. This corpus is available in CD
and one can get a free copy from CIIL for research purpose. At present CIIL is
planning to build corpus with 10 million words for Indian languages.
AUKBCRC’s Improved Tagged Corpus for Tamil
AUKBC Research Centre which has taken up NLP oriented works for Tamil, has
improved upon the CIIL Tamil Corpus and tagged it for their MT programs. It
also developed parallel corpora for English-Tamil to promote its goal of preparingan MT tool for English-Tamil translation. Parallel corpus is very useful for training
the corpus and for building example based machine translation. Parallel corpus
is a useful tool for MT programs.
Corpus Indexing Tools (Concordance, KWIC index, etc.)
Many such tools have been made for Tamil. A few important ones are
listed below in the article.
This Article
In addition, this long and detailed research article makes a detailed presentation on the state of the art in the field of Tamil language technology and reviews the strengths and weaknesses of current research and suggests directions for future work.
PLEASE CLICK HERE TO READ THE ENTIRE ARTICLE IN A PRINTER-FRIENDLY VERSION.
A Study of the Relationship Between Critical Reading and Empirical Inquiry in Undergraduate Classrooms in Pakistan | In Making Manipuri Dictionary - The Semantic Problems | A Survey of the State of the Art in Tamil Language Technology | Does Cognitive Style Contribute to Systematic Variance in Communicative Language Tests? | Ramayana & Thirukkural on Mobile Phones! Great Books from All South Asian Languages!! | Practicing Literary Translation, A Symposium by Mail - ROUND 11 |E-mailing in Indian Contexts - Brief Guidelines for Inclusion in Our Curriculum
| Creative Literature of Overseas Tamil -- A Review of Pon. Sundararaju's Short Stories | HOME PAGE OF OCTOBER 2006 ISSUE | HOME PAGE | CONTACT EDITOR
S. Rajendran, Ph.D.
Department of Linguistics
Tamil University
Thanjavur 613 005
Tamilnadu, India
raj_ushush@yahoo.com
|
- Send your articles
as an attachment to your e-mail to thirumalai@mn.rr.com.
- Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.
|