LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 6 : 10 October 2006
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.
         Lakhan Gusain, Ph.D.
         K. Karunakaran, Ph.D.
         Jennifer Marie Bayer, Ph.D.

AN APPEAL FOR SUPPORT

We seek your support to meet expenses relating to some new and essential software, formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc. You can use the PAYPAL link given above. Please click on the PAYPAL logo, and it will take you to the PAYPAL website. Please use the e-mail address thirumalai@mn.rr.com to make your contributions using PAYPAL.
Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.

BOOKS FOR YOU TO READ AND DOWNLOAD FREE!

A STUDY OF THE SKILLS OF READING COMPREHENSION IN ENGLISH DEVELOPED BY STUDENTS OF STANDARD IX IN THE SCHOOLS IN TUTICORIN DISTRICT, TAMILNADU ...
A. Joycilin Shermila, Ph.D.

A Socio-Pragmatic Comparative Study of Ostensible Invitations in English and Farsi ...
Mohammad Ali Salmani-Nodoushan, Ph.D.

ADVANCED WRITING - A COURSE TEXTBOOK ...
Parviz Birjandi, Ph.D.
Seyyed Mohammad Alavi, Ph.D.
Mohammad Ali Salmani-Nodoushan, Ph.D.

TEXT FAMILIARITY, READING TASKS, AND ESP TEST PERFORMANCE: A STUDY ON IRANIAN LEP AND NON-LEP UNIVERSITY STUDENTS - A DOCTORAL DISSERTATION ...
Mohammad Ali Salmani-Nodoushan, Ph.D.

A STUDY ON THE LEARNING PROCESS OF ENGLISH
BY HIGHER SECONDARY STUDENTS
WITH SPECIAL REFERENCE TO DHARMAPURI DISTRICT IN TAMILNADU ...
K. Chidambaram, Ph.D.

SPEAKING STRATEGIES TO OVERCOME COMMUNICATION DIFFICULTIES IN THE TARGET LANGUAGE SITUATION - BANGLADESHIS IN NEW ZEALAND ...
Harunur Rashid Khan

THE PROBLEMS IN LEARNING MODAL AUXILIARY VERBS IN ENGLISH AT HIGH SCHOOL LEVEL ...
Chandra Bose, Ph.D. Candidate

THE ROLE OF VISION IN LANGUAGE LEARNING
- in Children with Moderate to Severe Disabilities ...
Martha Low, Ph.D.

SANSKRIT TO ENGLISH TRANSLATOR ...
S. Aparna, M.Sc.

A LINGUISTIC STUDY OF ENGLISH LANGUAGE CURRICULUM AT THE SECONDARY LEVEL IN BANGLADESH - A COMMUNICATIVE APPROACH TO CURRICULUM DEVELOPMENT by
Kamrul Hasan, Ph.D.

COMMUNICATION VIA EYE AND FACE in Indian Contexts by
M. S. Thirumalai, Ph.D.

COMMUNICATION
VIA GESTURE: A STUDY OF INDIAN CONTEXTS by M. S. Thirumalai, Ph.D.

CIEFL Occasional
Papers in Linguistics,
Vol. 1

Language, Thought
and Disorder - Some Classic Positions by
M. S. Thirumalai, Ph.D.

English in India:
Loyalty and Attitudes
by Annika Hohenthal

Language In Science
by M. S. Thirumalai, Ph.D.

Vocabulary Education
by B. Mallikarjun, Ph.D.

A CONTRASTIVE ANALYSIS OF HINDI
AND MALAYALAM
by V. Geethakumary, Ph.D.

LANGUAGE OF ADVERTISEMENTS
IN TAMIL
by Sandhya Nayak, Ph.D.

An Introduction to TESOL:
Methods of Teaching English
to Speakers of Other Languages
by M. S. Thirumalai, Ph.D.

Transformation of
Natural Language
into Indexing Language:
Kannada - A Case Study
by B. A. Sharada, Ph.D.

How to Learn
Another Language?
by M.S.Thirumalai, Ph.D.

Verbal Communication
with CP Children
by Shyamala Chengappa, Ph.D.
and M.S.Thirumalai, Ph.D.

Bringing Order
to Linguistic Diversity
- Language Planning in
the British Raj by
Ranjit Singh Rangila,
M. S. Thirumalai,
and B. Mallikarjun

REFERENCE MATERIAL

UNIVERSAL DECLARATION OF LINGUISTIC RIGHTS

Lord Macaulay and
His Minute on
Indian Education

In Defense of
Indian Vernaculars
Against
Lord Macaulay's Minute
By A Contemporary of
Lord Macaulay

Languages of India,
Census of India 1991

The Constitution of India:
Provisions Relating to
Languages

The Official
Languages Act, 1963
(As Amended 1967)

Mother Tongues of India,
According to
1961 Census of India

BACK ISSUES

FROM MARCH 2001

FROM JANUARY 2002

INDEX OF ARTICLES
FROM MARCH, 2001
- OCTOBER 2006

INDEX OF AUTHORS
AND THEIR ARTICLES
FROM MARCH, 2001
- OCTOBER 2006

E-mail your articles and book-length reports (preferably in Microsoft Word) to thirumalai@mn.rr.com.
Contributors from South Asia may send their articles to
B. Mallikarjun,
Central Institute of Indian Languages,
Manasagangotri,
Mysore 570006, India or e-mail to mallikarjun@ciil.stpmy.soft.net

Your articles and booklength reports should be written following the MLA, LSA, or IJDL Stylesheet.

The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2004
M. S. Thirumalai

A SURVEY OF THE STATE OF THE ART IN
TAMIL LANGUAGE TECHNOLOGY
S. Rajendran, Ph.D.

A PRELUDE

The use of computer for language analysis leads to the technological development of languages in general, and Tamil, in particular. The world scenario has its impact on Tamil language too. Both the government and private organizations have initiated programs for the technological development of Tamil language.

The Department of Electronics had conducted training Courses on Natural Language Processing through selected institutions throughout India and paved way to technological development of Tamil. It funded Machine Translation programs among Indian languages and between English and Indian languages. It also funded for the development of corpus for Indian languages. It had identified certain centres for the Technological Development of Indian languages and funded them to initiate projects, which aims to achieve their goal.

Anna University at Chennai had been identified for the technological development of Tamil language and provided with a fund of a few crores of rupees to fulfill this mission. Under this scheme a Resource Centre for Indian Language Technology Solutions-Tamil has been established at Anna University. A team of researchers employed under the scheme has prepared a number of Language Technology Products. This has lead to the technological development of Tamil in many areas. Many other organizations, both government and private, followed this.

Tamil University at Thanjavur, Tamil Virtual University, AUKBC Research Centre at Chennai, Central Institute of Indian Languages at Mysore and International Forum for Information Technology in Tamil (INFITT), which conducts international conference of Tamil internet every year, put their efforts for the technological development of Tamil. Apart from the above institutions IIT, Chennai, IISC, Bangalore, and Micro Software, Bangalore also have contributed for the technological development of Tamil.

In this paper the technological development of Tamil has been classified under certain heads and the research works under taken and successfully completed as well as the products made are discussed in details.

CORPUS AND CORPUS MANAGEMENT TOOLS

Corpus linguistics seeks to further our understanding of language through the analysis of large quantities of naturally occurring data. There is a long tradition of corpus linguistic studies in Europe. The need for corpus for a language is multifarious. Starting from the preparation of a dictionary or lexicon to machine translation, corpus has become an inevitable resource for technological development of languages. Corpus means a body of huge text incorporating various types of textual materials, including newspaper, weeklies, fictions, scientific writings, literary writings, and so on. Corpus represents all the styles of a language. Corpus must be very huge in size as it is going to be used for many language applications such as preparation of lexicons of different sizes, purposes and types, machine translation programs and so on.

Tagged corpus, Parallel Corpus, and Aligned Corpus

Corpuses can be distinguished as tagged corpus, parallel corpus and aligned corpus. The tagged corpus is that which is tagged for part-of-speech. A parallel corpus contains texts and translations in each of the languages involved in it. It allows wider scopes for double-checking of the translation equivalents. Aligned corpus is a kind of bilingual corpus where text samples of one language and their translations into other language are aligned, sentence by sentence, phrase by phrase, word by word, or even character by character.

CIIL Corpus for Tamil

As for as building corpus for the Indian languages is concerned it was Central Institute of Indian languages (CIIL) which took initiative and started preparing corpus for some of the Indian languages (Tamil, Telugu, Kannada, and Malayalam). Department of electronics (DOE) financed the corpus-building project. The target was to prepare corpus with ten million words for each language. But due to financial crunch and time restriction it ends up with three million words for each language. Tamil corpus with three million words is built by CIIL in this way. It is a partially tagged corpus. This corpus is available in CD and one can get a free copy from CIIL for research purpose. At present CIIL is planning to build corpus with 10 million words for Indian languages.

AUKBCRC�s Improved Tagged Corpus for Tamil

AUKBC Research Centre which has taken up NLP oriented works for Tamil, has improved upon the CIIL Tamil Corpus and tagged it for their MT programs. It also developed parallel corpora for English-Tamil to promote its goal of preparingan MT tool for English-Tamil translation. Parallel corpus is very useful for training the corpus and for building example based machine translation. Parallel corpus is a useful tool for MT programs.

Corpus Indexing Tools (Concordance, KWIC index, etc.)

Many such tools have been made for Tamil. A few important ones are listed below in the article.

This Article

In addition, this long and detailed research article makes a detailed presentation on the state of the art in the field of Tamil language technology and reviews the strengths and weaknesses of current research and suggests directions for future work.

PLEASE CLICK HERE TO READ THE ENTIRE ARTICLE IN A PRINTER-FRIENDLY VERSION.

A Study of the Relationship Between Critical Reading and Empirical Inquiry in Undergraduate Classrooms in Pakistan | In Making Manipuri Dictionary - The Semantic Problems | A Survey of the State of the Art in Tamil Language Technology | Does Cognitive Style Contribute to Systematic Variance in Communicative Language Tests? | Ramayana & Thirukkural on Mobile Phones! Great Books from All South Asian Languages!! | Practicing Literary Translation, A Symposium by Mail - ROUND 11 |E-mailing in Indian Contexts - Brief Guidelines for Inclusion in Our Curriculum | Creative Literature of Overseas Tamil -- A Review of Pon. Sundararaju's Short Stories | HOME PAGE OF OCTOBER 2006 ISSUE | HOME PAGE | CONTACT EDITOR

S. Rajendran, Ph.D.
Department of Linguistics
Tamil University
Thanjavur 613 005
Tamilnadu, India
raj_ushush@yahoo.com

Send your articles
as an attachment
to your e-mail to
thirumalai@mn.rr.com.
Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.

LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 6 : 10 October 2006 ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D. Editors: B. Mallikarjun, Ph.D. Sam Mohanlal, Ph.D. B. A. Sharada, Ph.D. A. R. Fatihi, Ph.D. Lakhan Gusain, Ph.D. K. Karunakaran, Ph.D. Jennifer Marie Bayer, Ph.D.

A SURVEY OF THE STATE OF THE ART IN TAMIL LANGUAGE TECHNOLOGY S. Rajendran, Ph.D.

Volume 6 : 10 October 2006
ISSN 1930-2940

Managing Editor: M. S. Thirumalai, Ph.D.
Editors: B. Mallikarjun, Ph.D.
Sam Mohanlal, Ph.D.
B. A. Sharada, Ph.D.
A. R. Fatihi, Ph.D.
Lakhan Gusain, Ph.D.
K. Karunakaran, Ph.D.
Jennifer Marie Bayer, Ph.D.

A SURVEY OF THE STATE OF THE ART IN
TAMIL LANGUAGE TECHNOLOGY
S. Rajendran, Ph.D.