HOME PAGE
AN APPEAL FOR SUPPORT
- We seek your support to meet expenses relating to some new and essential software, formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc. You can use the PAYPAL link given above. Please click on the PAYPAL logo, and it will take you to the PAYPAL website. Please use the e-mail address thirumalai@mn.rr.com to make your contributions using PAYPAL.
Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.
BOOKS FOR YOU TO READ AND DOWNLOAD
- THE ROLE OF VISION IN LANGUAGE LEARNING
- in Children with Moderate to Severe Disabilities ... Martha Low, Ph.D.
- SANSKRIT TO ENGLISH TRANSLATOR ...
S. Aparna, M.Sc.
- A LINGUISTIC STUDY OF ENGLISH LANGUAGE CURRICULUM AT THE SECONDARY LEVEL IN BANGLADESH - A COMMUNICATIVE APPROACH TO CURRICULUM DEVELOPMENT by
Kamrul Hasan, Ph.D.
- COMMUNICATION VIA EYE AND FACE in Indian Contexts by
M. S. Thirumalai, Ph.D.
- COMMUNICATION
VIA GESTURE: A STUDY OF INDIAN CONTEXTS by M. S. Thirumalai, Ph.D.
- CIEFL Occasional
Papers in Linguistics, Vol. 1
- Language, Thought
and Disorder - Some Classic Positions by M. S. Thirumalai, Ph.D.
- English in India:
Loyalty and Attitudes by Annika Hohenthal
- Language In Science
by M. S. Thirumalai, Ph.D.
- Vocabulary Education
by B. Mallikarjun, Ph.D.
- A CONTRASTIVE ANALYSIS OF HINDI
AND MALAYALAM by V. Geethakumary, Ph.D.
- LANGUAGE OF ADVERTISEMENTS
IN TAMIL by Sandhya Nayak, Ph.D.
- An Introduction to TESOL:
Methods of Teaching English to Speakers of Other Languages by M. S. Thirumalai, Ph.D.
- Transformation of
Natural Language into Indexing Language: Kannada - A Case Study by B. A. Sharada, Ph.D.
- How to Learn
Another Language? by M.S.Thirumalai, Ph.D.
- Verbal Communication
with CP Children by Shyamala Chengappa, Ph.D. and M.S.Thirumalai, Ph.D.
- Bringing Order
to Linguistic Diversity - Language Planning in the British Raj by Ranjit Singh Rangila, M. S. Thirumalai, and B. Mallikarjun
REFERENCE MATERIAL
BACK ISSUES
- E-mail your articles and book-length reports (preferably in Microsoft Word) to thirumalai@mn.rr.com.
- Contributors from South Asia may send their articles to
B. Mallikarjun, Central Institute of Indian Languages, Manasagangotri, Mysore 570006, India or e-mail to mallikarjun@ciil.stpmy.soft.net
- Your articles and booklength reports should be written following the MLA, LSA, or IJDL Stylesheet.
- The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.
Copyright © 2004 M. S. Thirumalai
|
ONTOLOGY FOR WORD-FORM GENERATION IN ORIYA Bira Chandra Singh
ABSTRACT
This paper discusses the need of the inclusion of ontological properties in the generation of word-forms in Oriya. For the construction of Oriya nominal word-forms ontological properties are very significant. For instance, plural markings like -mAne, -gudZika, -gudZAka, and -gudZA are attached to the nominal bases according to their ontological properties such as [plus or minus] human, [plus or minus] animacy; socially relevant properties like [plus or minus] honorificity, [plus or minus] formal, [plus or minus] dearness, etc. Inflection of certain case markers is also based on features like [plus or minus] animate, [plus or minus] definite, etc. Not only does it seem necessary for a language like Oriya, but also necessary for the languages having similar deep-rooted ontological properties in their morphology (e.g. Bangla and Telugu have been reported to have such properties).
1. INTRODUCTION
A number of NLP applications require computational processing of words. This provides the context for work on computational morphology leading to the development of morphological processing systems like an analyser, a generator, etc. Computational morphology involves the analysis and synthesis of words using well designed computational tools and techniques. The basic principle of morphological generation is to get all and only the possible word-forms from a given stem and an inventory of a set of morphosyntactic properties. It involves mechanical generation of fully inflected word forms ready for insertion into syntactic constructs (cf. Chaitra, T. P. 2003).
"Indian languages, though agglutinative in their morphology in the usual sense of the term, present a highly complex morphology in their surface realization" (Rao 2004: 13). Computational analysis and implementation in these languages require careful consideration of their morphological complexity. These languages mostly involve allomorphic variations; as a consequence morphological generators have to tackle this allomorphism. However, Oriya shows very limited allomorphism. Conversely, in word-form construction, ontological properties are the deciding factors of well-formed ness.
Nominal word-forms in Oriya inflect for number and case. They also involve functional categories like adverbial nouns, postpositional words, particles, and clitics.
2. NUMBER INFLECTION
Oriya nouns distinguish two numbers, viz. singular and plural. In general unmarked singular contrasts with marked plural. The markers, most frequently used for pluralization, are -mAne, -gudZA, -gudZAka, and -gudZika. (/-gudZi/ is unattested in Oriya.) It should be emphasized that certain ontologically derived semantic properties of the root/stem are very significant in the selection of these plural markers. Nouns having [+human] feature take -mAne (see Mohapatra, B. 1997:7-8). In other words, nonhuman animate/inanimate nouns do not usually take -mAne (e.g. *bahimAne 'books', *CelYimAne 'goats'). However, nonhuman animate/inanimate nouns with the plural suffix -mAne are also attested in Oriya when exceptional reverence or honour shown, probably, due to some religious faith attached to them. Certain celestial non-human nouns such as animals, birds, stars, Sun, and Moon also belong to this category.
Ex. 2. gAImAne padZiAre
caruCanwi
cow-pl
ground-on grazing-are
‘Cows
are grazing on the ground.’
Ex. 3. manxiramAnafkare
GaNtA bAjuCi
temples-in
bell ringing-are
‘Bells
are ringing in the temples.’
Ex. 4. ebe
mXXa graha nakRawramAnafku loke pUjA
karanwi
now even planet
star-pl-obj people worship-do
‘Even now people worship
planets and stars.’
Nouns with the features [-human, animate] take -gudZA, -gudZAka, and -gudZika. There are some subtle usage differences among these three plural markers. Some of the criteria responsible can be conceived as standard/regular/common/frequent and some are non-standard/irregular/infrequent/rare. -gudZika is more standard, regular and formal; it denotes some degree of endearment (dearness) or closeness or interiority. -gudZAka, and -gudZA are rather less frequent, less formal and denote some degree of non-dearness, i.e. exteriority, etc. Study of corpus also supports our assumption. The number of word forms with -gudZika is the highest and -gudZAka is higher than -gudZA. However, these distinctions are not always very sharp.(Originally developed by the Institute of Applied Language Sciences, Bhubaneswar with 3 million words of running texts.)
It is very significant to note that these three suffixes can also be used with human nouns with an implication of derogatoriness or negative attitude towards the persons referred to (cf. Ray 2003).
Ex. 5. AjikAli mARtaragudZAka
Au pATapaDZAunAhAnwi.
now-a-days teacher-pl-derog
anymore teaching-not.
‘Now-a-days
teachers are not teaching any more.’
3. QUANTIFICATIONAL CLASSIFIERS OR CLASSIFICATORY PLURALS
In Oriya there are a set of interesting morphological class of classifiers akin to plural markers, viz. -gaNa, -bqnxa, -balYi, -mAna, -rAji, -caya, etc. They usually occur with words of Sanskrit origin and especially in the formal contexts (e.g. CAwragaNa 'students', SikRakabqnxa 'teachers', prabanXAbalYi 'essays', waWyamAna 'data/information', bqkRarAji 'trees', puRpacaya 'flowers'). They are, in fact, classifiers and their distribution is dependent on the noun's membership in an ontological category. In other words, -gaNa and -bqnxa are attached to [+] human nouns; -balYi, -mAna, -rAji, -caya, etc. are attached to [-] animate nouns.
Ex. 6. CAwragaNa
‘students’
*kukuragaNa ‘dogs’
*puswakagaNa ‘books’
Ex. 7. bqkRarAji
‘trees’
*CAwrarAji ‘students’
*haswIrAji ‘elephants’
Like the four regular
plural markers, these markers are mutually exclusive with the numerals.
Ex.
8. CAwramAne ‘students’
winoti CAwra
‘three students’
*winoti CAwramAne ‘three
students’
CAwragaNa ‘students’
*winoti CAwragaNA ‘three
students’
Also, they are mutually
exclusive in distribution with other plural markers of regular type as shown in
section-2.
Ex. 9. * CAwragaNAmAne
‘three students’
Besides the above number
property, these classifiers, when attached to [+] human or [+] animate nouns
and function as subjects, the verbs show agreement in plural. On the other hand,
when they are attached to [-] animate nouns and function as subjects, the verbs
show agreement in singular.
Ex. 10. newAgaNa upasWiwa
aCanwi.
leaders
present are
'Leaders are present.'
Ex. 11. sabuja bqkRarAji upawyakAtira SoBA baDZAuCi
green trees valley-def-of
beauty magnifying-is
‘The green trees are magnifying the beauty of the valley.’
4. CASE INFLECTION
Inflection
of case markers in Oriya is also based on certain semantic features of the nominals.
Mohanty (1995) and Mahapatra, B. (1997) have mentioned about the
role of features like [±] animate, [±]
human, etc. However, reconsideration of their distributional properties,
through corpus study, leads us to reformulate as below:
Some instrumental,
ablative and locative case markers show certain distinctions in their
distribution according to the [±] animacy feature of the nouns. Instrumental marker -re,
Ablative markers -u & -ru, locative markers -e & -re
are restricted to [-] animate nouns only.
Ex. 12 (a) Ame kalamare leKu
pen-by write
‘We
write with pen.’
(b) *balYaxare halYa karAyYAe
ox-by ploughing done-is
‘Ploughing is done by ox.’
Ex. 13 (a) AkASaru
pANi topA KasuCi
Sky-from water
drop falling-is
‘Water drops are falling from sky.’
(b) *gAIru Ame kRIra pAu
cow-from we
milk get
‘We get milk from the cow.’
Ex. 14 (a) rAswAre
kukuratie SoiCi
road-on dog-def
sleeping-is
‘A dog is sleeping on the road.’
(b) *rAmare tafkA nAhiz
Rama-at money
have-not
‘Rama does not have money with him.’
Ablative -u
and locative -e have further restrictions. They are added to a few nouns
ending in ‘ra’, ‘la’, ‘na’, etc. (e.g. Gara + u =>Garu
‘from the house’, xinu ‘from the day’, bAhAre ‘at outside’,
Gare ‘at the house’, etc.) In those cases -u and -e replace
the final vowel -a. But there are many instances with similar endings
where -e and -u are not possible (e.g. pararu ‘from feather’, *paru, BubaneSbara-re
‘at Bhubaneswara’, *BubaneSbare, etc.). However, in all instances they
must be treated as lexicalized and not rule governed, since they are not
productive. Another point to be noted here is that the final vowel ‘u’ in
the accusative marker -ku can optionally be replaced by ‘i’ if an
‘i’ ending word precedes it, as instances of vowel harmony (ex: BAiki
‘to brother’, gAIki ‘to cow’).
Other than
these, instrumental, ablative and locative markers, i.e. -xbArA, -xei,
-TAru, -Tu -Tuz, -TAre, -Ti can be added to both [+] animate and [-]
animate stems. However, it seems, there are some more subtle restrictions in
their distributions, on which we are currently studying with the help of the
Oriya corpus.
5. NOMINAL BASES
Observing the case
inflections, we have two types of nominal bases (both singular and plural) in
Oriya, viz. direct and indirect/oblique. The direct base form is the nominative
case form to which no explicit case markers or postpositions are added and
hence remains unchanged in a sentence. Whereas in all other cases, i.e.
accusative, instrumental, dative, ablative, locative, and genitive, the
relevant case markers/postpositions are added to the oblique base. All oblique
bases, however, do not show explicit marking for oblique extension.
Specifically, plural
and/or honorific human nouns add –fka as the oblique
extension (e. g. bApAfka ‘father-obl’, pilAmAnafka
‘children-obl’). This is also applicable to the animate/inanimate nouns when
exceptional reverence is attached (e.g. gAImAnafka ’cows-obl’ manxiramAnafka
‘temples-obl’, etc.). However, plural animate nouns can also add the above
oblique marker to show some degree of respect usage (e. g. CelYigudZifka/CelyigudZAfka
‘goats’). In those cases, the final sequence ‘ka’ in a plural suffix
is lost (e.g. pilAgudZika + fka => pilAgudZifka). Also, the plural
suffix -mAne replaces its final vowel ‘e’ by ‘a’ (e. g. pilAmAne
+ fka => pilAmAnafka. In other cases, the oblique base form is
the same as the direct form (we assume a null marker as oblique extension).
In the light of the
above discussion, the possible singular and plural nominal bases (both direct
and oblique) can be listed as below with their detailed feature specification
combinations.
1. N-f (D) [sg, hum, hon/norm/nhon, stnd]
[sg, anim, norm, stnd]
[sg, inanim, norm, stnd]
2. N-f (O) [sg, hum, norm/nhon, stnd]
[sg, anim, norm, stnd]
[sg, inanim, norm, stnd]
3. N-fka (O)
[sg, hum, hon, stnd]
[sg, anim, exp. reverence, stnd]
[sg,
inanim, exp. reverence, stnd]
4. N-mAne (D) [pl,
hum, hon/norm, stnd]
[pl,
anim, exp. reverence, stnd]
[pl,
inanim, exp. reverence, stnd]
5. N-mAnafka (O) [pl, hum,
hon/norm, stnd]
[pl, anim, exp.
reverence, stnd]
[pl, inanim,exp.
reverence, stnd]
6. N-gudZika (D) [pl,
hum, nhon, non-stnd, dearness]
[pl, anim, norm, stnd, dearness]
[pl, inanim, norm, stnd, dearness]
7. N-gudZika (O) [pl,
hum, nhon, non-stnd, dearness]
[pl, anim, norm, stnd, dearness]
[pl, inanim, norm, stnd, dearness]
8. N-gudZifka (O) [pl,
hum, nhon, non-stnd]
[pl, anim, norm, non-stnd, dearness]
9. N-gudZAka (D) [pl, hum,
nhon, non-stnd]
[pl, anim, norm, stnd, non-dearness]
[pl, inanim, norm, stnd, non-dearness]
10. N-gudZAka (O) [pl, hum, nhon, non-stnd]
[pl, anim, norm, stnd, non-dearness]
[pl,
inanim, norm, stnd, non-dearness]
11. N-gudZAfka (O) [pl, hum, nhon, non-stnd,
non-dearness]
[pl, anim, non-stnd, non-dearness]
12. N-gudZA (D) [pl, hum, nhon, non-stnd, infml]
[pl, anim, norm, stnd, non-dearness, infml]
[pl, inanim, norm, stnd, non-dearness, infml]
13. N-gudZA (O) [pl, hum, nhon, non-stnd, infml]
[pl, anim, norm, stnd, non-dearness, infml]
[pl, inanim, norm, stnd, non-dearness, infml]
14. N-gaNa (D) [pl, hum, hon/norm, stnd,
classif, formal]
15. N-gaNafka (O) [pl,
hum, hon/norm, stnd, classif, formal]
16. N-bqnxa (D) [pl, hum, hon/norm, stnd,
classif, formal]
17. N-bqnxafka (O) [pl, hum, hon/norm, hum, stnd,
classif, formal]
18. N- balYi (D/O) [pl, inanim, norm, stnd, classif,
formal]
19. N –mAna (D/O) [pl, inanim, norm, stnd, classif,
formal]
20. N-caya (D/O) [pl, inanim, norm,
stnd, classif, formal]
21. N-rAji (D/O) [pl, inanim, norm, stnd,
classif, formal]
6 IMPLEMENTATION ALGORITHMS
Taking into
consideration the above discussed ontological properties, the overall
representation of the algorithm to generate nominal word-forms in Oriya is
given in the following diagram. Appropriate feature combinations help the
program to select the appropriate base forms (sg/pl) required and further
inflections are made to construct larger word-forms. The output word-forms of
the generator can readily be inserted into syntactic constructs.
Step-1 Prompt the user to type in a word (root/stem).
Step-2 Specify its category.
Step-3 If 'Noun', then specify the ontological category
Step-4 Specify the social status.
Step-5 Specify the usage
pattern and attitude.
Step-6 Specify the number.
Step-7 After getting appropriate feature combinations,
choose the exact functional element to be concatenated.
Which Adverbial Noun?
Which Case / Postposition?
Which Particle?
Which Clitic?
Step-9 Now the output will go through the allomorphy table.
Step-10 Finally, the well-formed word form is generated
and appears on the screen.
7 CONCLUSION
Studies in Oriya
morphology have not so far concerned with this problem in any detail in the
generation of word-forms involving them. This is an attempt in this direction
towards generation or synthesis of Oriya word-forms exhaustively.
ABBREVIATIONS USED
anim Animate
clasif Classifier
D Direct form
exp Exceptional
hon Honorific
hum Human
inanim Inanimate
N Noun
nhon Non-honorific
Nm Numeral
non-stnd Non-standard
norm Normal
O Oblique form
P Pronoun
pl Plural
sg Singular
stnd Standard
V Verb
REFERENCES
§
Mohanty, P. 1987. An
Argument for Three Cases in Oriya, International Journal of Dravidian
Linguistics Vol. XVI. 2.p. 286-290.
§
--------------- 1995. Translation
between Cognate Languages: The Problem of Case-endings in Oriya and Bengali,
CALTS working papers: 75-92. Hyderabad: University of Hyderabad.
§
Mohapatra, B. 1997. Materials
for an Oriya Morphological Analyzer for the Anusaaraka (Machine Translation)
System, M. Phil. dissertation. Hyderabad: University of Hyderabad.
§
Rao, G. U. et al. 2004. The Generic
Architecture for Morphological Generators of Morphologically Complex
Agglutinative Languages, SIMPLE-04: 13-15 IIT Kharagpur.
§
Ray, Tapas S. 2003.
Oriya. In Cardona, G. and Jain, D. (ed.), The Indo-Aryan Languages,
Routledge Language Family Series. New York: Routledge.
§
Singh, B.C. 2004. A Morphological Generator for Oriya.
M. Phil. dissertation. Hyderabad:
University of Hyderabad.
§
Spencer, A. and Zwicky,
A. M. (ed.). 2003. The Handbook of Morphology. Great Britain: Blackwell
Publishers.
ACKNOWLEDGEMENT: I am
grateful to Prof. G. Uma Maheswar Rao for his guidance in the research reported
here. I thank CIIL, Mysore for their financial support.
CLICK HERE FOR PRINTER-FRIENDLY VERSION.
ASPECTS OF CONCATENATIVE AND NON-CONCATENATIVE MORPHOLOGY OF STANDARD HINDI | ONTOLOGY FOR WORD-FORM GENERATION IN ORIYA | STUDY OF HINDI NOUN PHRASE MORPHOLOGY FOR DEVELOPING A LINK GRAMMAR BASED PARSER | ENGLISH LANGUAGE LEARNING IN THE ESP CONTEXT - AN INDIAN EXPERIMENT | USING ANIMATION FOR TEACHING
PHRASAL VERBS - A BRIEF INDIAN EXPERIMENT | MOTHER AND CHILD RELATIONS AS A SEMIOTIC EVENT | HOME PAGE | CONTACT EDITOR
Bira Chandra Singh
Centre for Applied Linguistics and Translation Studies
University of Hyderabad
India.
C/o. LANGUAGE IN INDIA
|
- Send your articles
as an attachment to your e-mail to thirumalai@mn.rr.com.
- Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.
|