LANGUAGE IN INDIA
http://www.languageinindia.com
Volume 5 : 8 August 2005

Strength for Today and Bright Hope for Tomorrow

Editor: M. S. Thirumalai, Ph.D.
Associate Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.

ONTOLOGY FOR WORD-FORM GENERATION IN ORIYA
Bira Chandra Singh


ABSTRACT

This paper discusses the need of the inclusion of ontological properties in the generation of word-forms in Oriya. For the construction of Oriya nominal word-forms ontological properties are very significant. For instance, plural markings like -mAne, -gudZika, -gudZAka, and -gudZA are attached to the nominal bases according to their ontological properties such as [plus or minus] human, [plus or minus] animacy; socially relevant properties like [plus or minus] honorificity, [plus or minus] formal, [plus or minus] dearness, etc. Inflection of certain case markers is also based on features like [plus or minus] animate, [plus or minus] definite, etc. Not only does it seem necessary for a language like Oriya, but also necessary for the languages having similar deep-rooted ontological properties in their morphology (e.g. Bangla and Telugu have been reported to have such properties).

1. INTRODUCTION

A number of NLP applications require computational processing of words. This provides the context for work on computational morphology leading to the development of morphological processing systems like an analyser, a generator, etc. Computational morphology involves the analysis and synthesis of words using well designed computational tools and techniques. The basic principle of morphological generation is to get all and only the possible word-forms from a given stem and an inventory of a set of morphosyntactic properties. It involves mechanical generation of fully inflected word forms ready for insertion into syntactic constructs (cf. Chaitra, T. P. 2003).

"Indian languages, though agglutinative in their morphology in the usual sense of the term, present a highly complex morphology in their surface realization" (Rao 2004: 13). Computational analysis and implementation in these languages require careful consideration of their morphological complexity. These languages mostly involve allomorphic variations; as a consequence morphological generators have to tackle this allomorphism. However, Oriya shows very limited allomorphism. Conversely, in word-form construction, ontological properties are the deciding factors of well-formed ness.

Nominal word-forms in Oriya inflect for number and case. They also involve functional categories like adverbial nouns, postpositional words, particles, and clitics.

2. NUMBER INFLECTION

Oriya nouns distinguish two numbers, viz. singular and plural. In general unmarked singular contrasts with marked plural. The markers, most frequently used for pluralization, are -mAne, -gudZA, -gudZAka, and -gudZika. (/-gudZi/ is unattested in Oriya.) It should be emphasized that certain ontologically derived semantic properties of the root/stem are very significant in the selection of these plural markers. Nouns having [+human] feature take -mAne (see Mohapatra, B. 1997:7-8). In other words, nonhuman animate/inanimate nouns do not usually take -mAne (e.g. *bahimAne 'books', *CelYimAne 'goats'). However, nonhuman animate/inanimate nouns with the plural suffix -mAne are also attested in Oriya when exceptional reverence or honour shown, probably, due to some religious faith attached to them. Certain celestial non-human nouns such as animals, birds, stars, Sun, and Moon also belong to this category.

Ex. 2.               gAImAne          padZiAre             caruCanwi

                        cow-pl              ground-on            grazing-are

                        ‘Cows are grazing on the ground.’

Ex. 3.               manxiramAnafkare     GaNtA      bAjuCi

                        temples-in                     bell            ringing-are

                        ‘Bells are ringing in the temples.’

Ex. 4.             ebe    mXXa   graha  nakRawramAnafku   loke        pUjA karanwi

                                    now   even    planet star-pl-obj                     people   worship-do

                                  ‘Even now people worship planets and stars.’

Nouns with the features [-human, animate] take -gudZA, -gudZAka, and -gudZika. There are some subtle usage differences among these three plural markers. Some of the criteria responsible can be conceived as standard/regular/common/frequent and some are non-standard/irregular/infrequent/rare. -gudZika is more standard, regular and formal; it denotes some degree of endearment (dearness) or closeness or interiority. -gudZAka, and -gudZA are rather less frequent, less formal and denote some degree of non-dearness, i.e. exteriority, etc. Study of corpus also supports our assumption. The number of word forms with -gudZika is the highest and -gudZAka is higher than -gudZA. However, these distinctions are not always very sharp.(Originally developed by the Institute of Applied Language Sciences, Bhubaneswar with 3 million words of running texts.)

It is very significant to note that these three suffixes can also be used with human nouns with an implication of derogatoriness or negative attitude towards the persons referred to (cf. Ray 2003).

Ex. 5.  AjikAli                          mARtaragudZAka     Au           pATapaDZAunAhAnwi.

now-a-days       teacher-pl-derog       anymore    teaching-not.

‘Now-a-days teachers are not teaching any more.’

3. QUANTIFICATIONAL CLASSIFIERS OR CLASSIFICATORY PLURALS

In Oriya there are a set of interesting morphological class of classifiers akin to plural markers, viz. -gaNa, -bqnxa, -balYi, -mAna, -rAji, -caya, etc. They usually occur with words of Sanskrit origin and especially in the formal contexts (e.g. CAwragaNa 'students', SikRakabqnxa 'teachers', prabanXAbalYi 'essays', waWyamAna 'data/information', bqkRarAji 'trees', puRpacaya 'flowers'). They are, in fact, classifiers and their distribution is dependent on the noun's membership in an ontological category. In other words, -gaNa and -bqnxa are attached to [+] human nouns; -balYi, -mAna, -rAji, -caya, etc. are attached to [-] animate nouns.

Ex. 6.               CAwragaNa                             ‘students’

                                    *kukuragaNa                          ‘dogs’

                                    *puswakagaNa                                    ‘books’

            Ex. 7.               bqkRarAji                                ‘trees’

                                    *CAwrarAji                              ‘students’

                                    *haswIrAji                               ‘elephants’

Like the four regular plural markers, these markers are mutually exclusive with the numerals.

            Ex. 8.               CAwramAne                 ‘students’

                                    winoti CAwra               ‘three students’

            *winoti CAwramAne     ‘three students’

                                    CAwragaNa                 ‘students’

            *winoti CAwragaNA    ‘three students’

Also, they are mutually exclusive in distribution with other plural markers of regular type as shown in section-2.

Ex. 9.               * CAwragaNAmAne     ‘three students’

Besides the above number property, these classifiers, when attached to [+] human or [+] animate nouns and function as subjects, the verbs show agreement in plural. On the other hand, when they are attached to [-] animate nouns and function as subjects, the verbs show agreement in singular.

Ex. 10.              newAgaNa   upasWiwa   aCanwi.

leaders          present        are

'Leaders are present.'

            Ex. 11.              sabuja  bqkRarAji   upawyakAtira     SoBA        baDZAuCi

green      trees          valley-def-of      beauty      magnifying-is

‘The green trees are magnifying the beauty of the valley.’

4. CASE  INFLECTION

Inflection of case markers in Oriya is also based on certain semantic features of the nominals. Mohanty (1995) and Mahapatra, B. (1997) have mentioned about the role of features like [±] animate, [±] human, etc. However, reconsideration of their distributional properties, through corpus study, leads us to reformulate as below: 

Some instrumental, ablative and locative case markers show certain distinctions in their distribution according to the [±] animacy feature of the nouns. Instrumental marker -re, Ablative markers -u & -ru, locative markers -e & -re are restricted to [-] animate nouns only.

Ex. 12 (a)         Ame      kalamare         leKu                

            pen-by              write

            ‘We write with pen.’

(b)        *balYaxare      halYa             karAyYAe

 ox-by               ploughing      done-is

‘Ploughing is done by ox.’

Ex. 13 (a)         AkASaru          pANi  topA       KasuCi

Sky-from          water drop        falling-is

‘Water drops are falling from sky.’

(b)        *gAIru             Ame      kRIra      pAu

cow-from          we         milk        get

‘We get milk from the cow.’

Ex. 14 (a)         rAswAre           kukuratie         SoiCi

road-on             dog-def             sleeping-is

‘A dog is sleeping on the road.’

(b)        *rAmare   tafkA                   nAhiz

Rama-at    money     have-not

‘Rama does not have money with him.’

Ablative -u and locative -e have further restrictions. They are added to a few nouns ending in ‘ra’, la’, ‘na’, etc. (e.g. Gara + u =>Garu ‘from the house’, xinu ‘from the day’, bAhAre ‘at outside’, Gare ‘at the house’, etc.) In those cases -u and -e replace the final vowel -a. But there are many instances with similar endings where -e and -u are not possible (e.g.  pararu ‘from feather’, *paru, BubaneSbara-re ‘at Bhubaneswara’, *BubaneSbare, etc.). However, in all instances they must be treated as lexicalized and not rule governed, since they are not productive. Another point to be noted here is that the final vowel ‘u’ in the accusative marker -ku can optionally be replaced by ‘i’ if an ‘i’ ending word precedes it, as instances of vowel harmony (ex: BAiki ‘to brother’, gAIki ‘to cow’).

Other than these, instrumental, ablative and locative markers, i.e. -xbArA, -xei, -TAru, -Tu -Tuz, -TAre, -Ti can be added to both [+] animate and [-] animate stems. However, it seems, there are some more subtle restrictions in their distributions, on which we are currently studying with the help of the Oriya corpus.

5.    NOMINAL BASES

Observing the case inflections, we have two types of nominal bases (both singular and plural) in Oriya, viz. direct and indirect/oblique. The direct base form is the nominative case form to which no explicit case markers or postpositions are added and hence remains unchanged in a sentence. Whereas in all other cases, i.e. accusative, instrumental, dative, ablative, locative, and genitive, the relevant case markers/postpositions are added to the oblique base. All oblique bases, however, do not show explicit marking for oblique extension.

Specifically, plural and/or honorific human nouns add –fka as the oblique extension (e. g. bApAfka ‘father-obl’, pilAmAnafka ‘children-obl’). This is also applicable to the animate/inanimate nouns when exceptional reverence is attached (e.g. gAImAnafka ’cows-obl’ manxiramAnafka ‘temples-obl’, etc.). However, plural animate nouns can also add the above oblique marker to show some degree of respect usage (e. g. CelYigudZifka/CelyigudZAfkagoats’). In those cases, the final sequence ‘ka’ in a plural suffix is lost (e.g. pilAgudZika + fka => pilAgudZifka). Also, the plural suffix -mAne replaces its final vowel ‘e’ by ‘a’ (e. g. pilAmAne + fka => pilAmAnafka. In other cases, the oblique base form is the same as the direct form (we assume a null marker as oblique extension).

In the light of the above discussion, the possible singular and plural nominal bases (both direct and oblique) can be listed as below with their detailed feature specification combinations.

1.         N-f      (D)                   [sg, hum, hon/norm/nhon, stnd]

            [sg, anim, norm, stnd]

            [sg, inanim, norm, stnd]

2.         N-f      (O)                   [sg, hum, norm/nhon, stnd]

            [sg, anim, norm, stnd]

            [sg, inanim, norm, stnd]

3.         N-fka   (O)                   [sg, hum, hon, stnd]

                                                [sg, anim, exp. reverence, stnd]

                                                            [sg, inanim, exp. reverence, stnd]

4.         N-mAne           (D)       [pl, hum, hon/norm, stnd]

                                                            [pl, anim, exp. reverence, stnd]

                                                [pl, inanim, exp. reverence, stnd]

5.         N-mAnafka     (O)       [pl, hum, hon/norm, stnd]

                                    [pl, anim, exp. reverence, stnd]

                                    [pl, inanim,exp. reverence, stnd]

6.         N-gudZika       (D)       [pl, hum, nhon, non-stnd, dearness]

[pl, anim, norm, stnd, dearness]

                                                [pl, inanim, norm, stnd, dearness]

7.         N-gudZika       (O)       [pl, hum, nhon, non-stnd, dearness]

            [pl, anim, norm, stnd, dearness]

                                                [pl, inanim, norm, stnd, dearness]

8.         N-gudZifka      (O)       [pl, hum, nhon, non-stnd]

            [pl, anim, norm, non-stnd, dearness]

9.         N-gudZAka     (D)       [pl, hum, nhon, non-stnd]

            [pl, anim, norm, stnd, non-dearness]

            [pl, inanim, norm, stnd, non-dearness]

10.        N-gudZAka     (O)       [pl, hum, nhon, non-stnd]

            [pl, anim, norm, stnd, non-dearness]

                                                [pl, inanim, norm, stnd, non-dearness]

11.        N-gudZAfka    (O)       [pl, hum, nhon, non-stnd, non-dearness]

            [pl, anim, non-stnd, non-dearness] 

12.        N-gudZA (D)               [pl, hum, nhon, non-stnd, infml]

[pl, anim, norm, stnd, non-dearness, infml]

[pl, inanim, norm, stnd, non-dearness, infml]

13.        N-gudZA (O)               [pl, hum, nhon, non-stnd, infml]

[pl, anim, norm, stnd, non-dearness, infml]

[pl, inanim, norm, stnd, non-dearness, infml]

14.        N-gaNa (D)                 [pl, hum, hon/norm, stnd, classif, formal]

15.       N-gaNafka (O)            [pl, hum, hon/norm, stnd, classif, formal]

16.        N-bqnxa (D)                [pl, hum, hon/norm, stnd, classif, formal]

17.        N-bqnxafka (O)           [pl, hum, hon/norm, hum, stnd, classif, formal]

18.        N- balYi (D/O)             [pl, inanim, norm, stnd, classif, formal]

19.        N –mAna (D/O)           [pl, inanim, norm, stnd, classif, formal]

20.        N-caya (D/O)              [pl, inanim, norm, stnd, classif, formal]

21.        N-rAji (D/O)                [pl, inanim, norm, stnd, classif, formal]

6    IMPLEMENTATION ALGORITHMS

Taking into consideration the above discussed ontological properties, the overall representation of the algorithm to generate nominal word-forms in Oriya is given in the following diagram. Appropriate feature combinations help the program to select the appropriate base forms (sg/pl) required and further inflections are made to construct larger word-forms. The output word-forms of the generator can readily be inserted into syntactic constructs.

Step-1 Prompt the user to type in a word (root/stem).
Step-2 Specify its category.
Step-3 If 'Noun', then specify the ontological category
Step-4 Specify the social status.
Step-5 Specify the usage pattern and attitude.
Step-6 Specify the number.
Step-7 After getting appropriate feature combinations, choose the exact functional element to be concatenated.
Which Adverbial Noun?
Which Case / Postposition?
Which Particle?
Which Clitic?
Step-9 Now the output will go through the allomorphy table.
Step-10 Finally, the well-formed word form is generated and appears on the screen.

7    CONCLUSION

Studies in Oriya morphology have not so far concerned with this problem in any detail in the generation of word-forms involving them. This is an attempt in this direction towards generation or synthesis of Oriya word-forms exhaustively.


ABBREVIATIONS USED

anim                 Animate

clasif                Classifier

D                     Direct form

exp                   Exceptional

hon                   Honorific

hum                  Human

inanim               Inanimate

N                     Noun

nhon                 Non-honorific

Nm                   Numeral

non-stnd            Non-standard

norm                 Normal

O                     Oblique form

P                      Pronoun

pl                      Plural

sg                     Singular

stnd                  Standard

V                     Verb


REFERENCES

§         Mohanty, P. 1987. An Argument for Three Cases in Oriya, International Journal of Dravidian Linguistics Vol.  XVI. 2.p. 286-290.

§         --------------- 1995. Translation between Cognate Languages: The Problem of Case-endings in Oriya and Bengali, CALTS working papers: 75-92. Hyderabad: University of Hyderabad.

§         Mohapatra, B. 1997. Materials for an Oriya Morphological Analyzer for the Anusaaraka (Machine Translation) System, M. Phil. dissertation. Hyderabad: University of Hyderabad.

§         Rao, G. U. et al. 2004. The Generic Architecture for Morphological Generators of Morphologically Complex Agglutinative Languages, SIMPLE-04: 13-15 IIT Kharagpur.

§         Ray, Tapas S. 2003. Oriya. In Cardona, G. and Jain, D. (ed.), The Indo-Aryan Languages, Routledge Language Family Series. New York: Routledge.

§         Singh, B.C. 2004. A Morphological Generator for Oriya. M. Phil. dissertation. Hyderabad: University of Hyderabad.

§         Spencer, A. and Zwicky, A. M. (ed.). 2003. The Handbook of Morphology. Great Britain: Blackwell Publishers.


ACKNOWLEDGEMENT: I am grateful to Prof. G. Uma Maheswar Rao for his guidance in the research reported here. I thank CIIL, Mysore for their financial support.

CLICK HERE TO GO TO HOME PAGE


Bira Chandra Singh
Centre for Applied Linguistics and Translation Studies
University of Hyderabad
India.
C/o. LANGUAGE IN INDIA