LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 3 : 11 October 2003

Editor: M. S. Thirumalai, Ph.D.
Associate Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.

BOOKS FOR YOU TO READ AND DOWNLOAD


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports to thirumalai@bethfel.org or send your floppy disk (preferably in Microsoft Word) by regular mail to:
    M. S. Thirumalai
    6820 Auto Club Road #320
    Bloomington, MN 55438 USA.
  • Contributors from South Asia may send their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    or e-mail to mallikarjun@ciil.stpmy.soft.net
  • Your articles and booklength reports should be written following the MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2001
M. S. Thirumalai

A PROLOG ANALYZER/GENERATOR FOR
SANSKRIT SUBANTA PADAS
Girish Nath Jha, Ph.D.


1. INTRODUCTION

The present paper is an attempt to apply the techniques of Prolog based Natural Language Processing for capturing the formalism inherent in Panini's nominal inflectional morphology by suitably interpreting and adapting the sutras for computing purposes.

INPUT
|
Script
|
Nominal DB
|
Morph Analyzer
|
Affix Lexicon
|
Sandhi
|
Script
|
OUTPUT

The resultant model is intended

  • as an intermediate system to be input to larger sophisticated NL systems like a parser for verb-less sutras in the sutraic literature, M(A)T System, or NL, Understanding System in the Indian multilingual context, and
  • as an end-system for application in educational research institutions for pedagogical purposes.

The transliteration scheme followed is that specified by ITRANS 5.0 as shown in appendix I.

2. IMPORTANCE OF INFLECTIONAL MORPHOLOGY

Morphology is central to the Paninian grammatical system because of the highly inflectional nature of Sanskrit. Its rule-based nature with few unmanageable exceptions have prompted the computational linguists and computer scientists alike to adapt and formalize it keeping in mind its tremendous potentials for application oriented Research and Development (R & D).

The inflections in the hierarchy of word-structure have the effect of equipping the word for a larger role at the sentence level. They carry grammatical information, determine the status of and relation between the sentential constituents, and release the structured semantic payload for intended communication.

In Sanskrit, inflections operate at the level of nominals (subanta) and verbs (ti~Nanta). The inflection ending stems are called `pada' (syntactic word). Padas with suP constitute the NPs (subanta-pada), and those with ti~N can be called constituting the VPs (Ti~Nanta-pada). In the former, the bases are called `prAtipadikas' (pdk) which undergo suP affixations under specifically formulated conditions of case, gender, number, and also the end-characters of the bases to yield nominal syntactic words.

This paper presents a Prolog model for subanta analysis and generation.

2.1. PREVIOUS WORK AND THE WORK PRESENTED NOW

Some projects under the Government of India (GOI) funding attempted building analyzers/generators for Sanskrit in general not specifically for the subanta padas. Among significant projects, mention may be made of following projects completed under TDIL (Technology Development for Indian Languages) initiative of the Government of India.

  1. Desika is a NLU system for generation and analysis for plain and accented written Sanskit texts based on grammar rules of Panini's Ashtadhyayi. It also has a database based on Amarakosha and heuristics based on Nyaya & Mimamsa. It is claimed to analyze Vedic (scriptural) texts as well.
  2. Shabdabodha is an interactive application to analyze the semantic and syntactic structure of Sanskrit sentences

The present work needs to be distinguished from above mentioned projects because it is attempting to present a comprehensive solution for subanta padas only, not for other Sanskrit derivations. It takes Panini as the basic formalism and then interpretations form Siddhantakaumudi and other sources for understanding the intricacies involved in subanta morphology. The R&D for it was carried out as M.Phil. work completed by the author in J.N.U in 1993, and was supported by the CASTLE project run at SC&SS, J.N.U. under Prof. G.V. Singh.

3. THE OPERATION OF NOMINAL INFLECTIONS

In Sanskrit, there are 21 (7x3) case affixes for the purpose of morphological generation of nominal syntactic words (subanta padas). Panini [4.1.2] has listed them as:

svaujasamautchasTAbhyAmbhis~
NebhyAm bhyas~
NasibhyAmbhyas~
NasosAm~
Nyossup
sU(1-1) au(1-2) Jas(1-3)
am(2-1) auT(2-2) shas(2-3)
TA(3-1) bhyAm(3-2) bhis(3-3)
~Ne(4-1) bhyAm (4-2) bhyas(4-3)
~Nasi(5-1) bhyAm(5-2) bhyas(5-3)
~Nas(6-1) Os(6-2) Am(6-2)
~NI(7-1) Os(7-2) suP(7-3)

3.1 KEY DEFINITIONS

PRATIPADIKA (pdk) are the crude forms with which the suP affixes combine under specific morphophonemic environments to generate padas or syntactic words. According to Panini any meaningful form of a word which is neither a root nor an affix is called a pdk.

arthavadadhAturapratyayaH prAtipadikam [1.2.45]

They are either primitive (stored in GaNa PATha) or are derived through primary (kRRit), secondary (taddhita), feminine (stri) affixations, and also by compounding (samAsa) [1.2.46].

About NUMBER Panini says that the three affixes in each set like [sU, au, Jas ] [am, auT shas]etc., are respectively called singular, dual and plural [1.4.103]. They denote unity, duality and plurality respectively [1.4.21-22]. Some words are only in plural like `dArAH (wife) and `apaH'(water). The numeral `eka(' (one) is in singular only, `dva' in dual only while 'tri' and above are plural only. As in other languages, many words are, by the nature of their use, found to occur only in the singular. The dual is used strictly in the cases where two objects are logically related, whether directly or by the combination of two individuals. When the duality of the objects is well understood (as in the case of pairs of objects), the dual is used (without the word `dva'). For example, `Ashvinau' (two Ashvins), `akShiNI' (pair of eyes), `hastau' (pair of hands) etc.

VIBHAKTI is defined as the sets of three case affixes [1.4.104]. Each case will be marked by its vibhakti for three numbers. The declensional forms show primarily case and number, but they also indicate gender since though the distinctions of gender are made partly in the stem itself, they also appear in the changes of inflection. Sanskrit has three genders masculine, feminine and neuter. The only words which show no sign of gender distinction are the personal pronouns of the first or second person, and the numerals above four.

PADA is another crucial term here. AD defines it as supti~Nantam padam [1.4.14],that is, anything that ends in 21 suP or 9+9 ti~N affix is a pada. The other definition of pada as given in the subsequent three sutras is not relevant here.

3.2 CATEGORIZATION

The declension of Sanskrit nominal bases (NB) (nouns, adjectives, pronouns and numerals) is done according to the end-characters of the bases. End-character can be a vowel (mAtrA) or a consonant. Each of these classes can have gender specifications from one to three.

Broadly, however, the NBs are divided into four categories. Of the general classes of nouns, adjectives, pronouns and numerals, the correspondence of the first two is so close that they are treated as one category. For these Whitney provides a more compact classification:

stems ending in `a'
stems ending in `i' and `u'
stems ending in `a',`i',`u'
stems ending in `r'
stems ending in consonants.

According to him, there is no agreement among scholars as to the number and order of Sanskrit declension. The pronouns exhibit many peculiarities and hence form a category. The words designating number or numerals also form a class peculiar enough to be presented by themselves.

4. MORPHOLOGICAL PROCESSING

Though the program model described in the fourth section and the system developed on that model accounts for all the categorizations of nominal bases to generate grammatically correct noun forms, in this section we have taken only the bases ending in vowel `a' to show the complexity involved in the processing.

4.1 "a" ENDING MASCULINE NOUNS

For nom. sing.(1-1),the affix is 'r-s' (from `sU' [1.3.2]) which in three stages is replaced by visarga (s>r>:). The pada formed would be `RAmaH', `GopAlaH' etc. The acc. sing.(2-1) termination is `am' which forms `rAmam' after 6.1.107 (ami pUrvaH) blocks dIrgha sandhi. The rule 7.1.1 provides for instr. sing(3-1) `in' (TA >A [1.3.7] > in [7.1.2]), abl. sing. (5-1) `At' (~Nasi > ~Nas[1.3.2]> as[1.3.8]> At[7.1.12]), and gen.sing. (6-1) `sya' (~Nas > as[1.3.8]>sya[7.1.12]). The padas generated are `rAmeNa', `rAmAt' and `rAmasya' rsepectively for nom. base `rAma'. For dat.sing. (4-1) `ya' is substituted for `e' of `~Ne' (~Ne>e[1.3.8]>ya[7.1.3]) which causes the final `a' of the base to be lengthened [7.3.102]. The pada obtained is `rAmAya'. The loc.sing.(7-1) form is `rAme'[6.1.87] i.e. `rAma'+ 'i' (~Ni>I[1.3.8]) => `rAme'. Voc.sing.(7-1) form is `he rAma' (he rAma + sU[2.3.49] => he rAma + 0[6.1.69]).

The Nom-dual (1-2) form is rama + rama = ramau (rama + au ->ramau[6.1.88]). The Paninian injunction here is that if there are more than one word of the same form and vibhakti then only the last one is retained [1.2.64]. Similarly, acc-pl (2-2) form is rAma + au -> rAmau[6.1.87] auT->au [1.3.7]). The 3-2,4-2,5-2 termination `bhyAm' yields the pada rAmAbhyAm[7.3.102]. Gen. and loc.dual (6-2,7-2) affix os -> oH yields rAmeyoH[6.1.77] (rAme[7.3.104] + oH). The 8-2 form is like nominative with the prefix `he'.

For nom-pl (1-3), the termination `as' (from `jas') by dIrgha gives rAmAs or rAmAH. Acc-pl (2-3) `as'(from`shas'[1.3.8]) combines with the base: rAma + as ->rAma + an (as->an[6.1.103]) -> rAmAn[6.1.102]. Instr-pl (3-3) affix `bhis' is substituted by `ais'[7.1.9] to give rAmaiH[6.1.88]. The 4-3,5-3 affix `bhyas' combines with a modified base (`a' > `e'[7.3.103]). Thus:

rAma + bhyas -> rAme + bhyaH -> rAmebhyaH. For 6-3, affix `am' gets 'nuT' augment with the final `a' of the base getting lengthened rAma + am -> rAmA [6.4.3] + nAm[7.1.5] (n->N[8.4.2]). Before loc-pl (7-3) vibhakti `su'(suP->su[1.3.3]), the final `a' of the base is changed to `e'[7.3.103]. Thus rAma + suP -> rAme + su -> rAmeShu (s->Sh[8.3.57]). Voc-pl (7-3) is like nom-pl(1-3). The form is he rAmAH'.

4.2 "a" ENDING MASCULINE PRONOUNS

In Sanskrit pronouns are 35 in number. The basic Paninian rule is

sarvAdIni sarvanAmAni [1.1.27]

Some words marking numbers (like `eka', `dvi' etc), directions, and some adjectives are also included among pronouns. All these are declined alike and thus can be distinguished from nominals, adjectives or numerals. Except in the cases of nom-pl (1-3), dat-sg (4-1), abl-sg (5-1), gen-pl (6- 3),loc-sg (7-1), and voc-pl (7-3), the form of `a' ending masculine pronouns are same as that of nouns. We take `sarva'(R) as representative of the class of `a' ending masculine pronouns for illustrating the following processes -

1-3 : R + Jas -> R + shi [7.1.17] -> R + i[1.3.7] -> sarve [6.1.87]
4-1 : R + ~Ne -> R + e[1.3.7] -> R + smai[7.1.14] ->sarvasmai
5-1 : R + ~Nasi -> R + as[1.3.8/1.3.2] -> R+ smAt[7.1.5] -> sarvasmAt
6-3 : R + am ->sarve[7.3.103] + s[7.1.52] +am -> sarveShAm (s->Sh[8.3.57])
7-1 : R + ~Ni -> R + I[1.3.7] -> R + smin[7.1.15] -> sarvasmin
8-3 : he R + Jas -> he R + shi[7.1.17] -> he R + i[1.3.7] -> he sarve

`a' ending numeral `eka' which falls in this category is declined like `sarva'. Original adjectives ending in `a' form a large class and are declined like `a' ending masculine nouns. There are no feminine bases ending in `a'.

4.3 "a" ENDING NEUTER BASES

Like their masculine counterparts, `a' ending neuter stems also form a large class with their declension differing from the former only in the nominative, accusative and vocative forms (1-1, 2-1, 1-3, 2-1, 2-2, 2-3, 7-1, 7-2, 7-3). Let `phala'(R) be representing this class. Thus

1-1 : R + sU -> R + am[7.1.24] -> phalam[6.1.107]
2-1 : R + am -> phalam[6.1.117]
1-2 : R + au -> R + i[7.1.19/6.4.148/vArttika] -> phale[6.1.87]
2-2 : R + auT ->R + au[1.3.3] -> R + i[7.1.19] -> phale[6.1.87]
1-3 : R + Jas ->R + as[1.3.7] -> R + shi[7.1.20]->phalA + n[7.1.72] + i[1.3.7] -> phalAn[6.4.7] + i -> phalAni
2-3 : R + shas ->R + as[1.3.7] -> R + shi[7.1.20] -> phalA + n[7.1.72] + i [1.3.7] -> phalAn[6.4.7] + i -> phalAni
7-1 : he phalam[6.1.107] -> he phala[6.1.69]
7-2 : he phale (as in 1-2)
7-3 : he phalAni (as in 1-3)

4.4 "a" ENDING MASCULINE NOUNS AND ADJECTIVES

This is a comparatively smaller class. The declension of the Word `vishvapA'(R) (let it be a representative of this class) is as follows -

2-3 : R + shas -> vishvap[6.4.140/1.1.52] + as[1.3.7] -> vishvapaH 3-1 : R + TA -> vishvap [6.4.140/1.1.52] + a[1.3.7] -> vishvapA
3-3 : R + bhis -> vishvapAbhiH[1.3.8]
4-1 : R + ~Ne -> vishvap[6.4.140/1.1.52] + e[1.3.7] ->vishvape
5-1 : R + ~Nasi-> vishvap[6.4.140/1.1.52] + a[s1.3.7] -> vishvapaH
6-1 : R + ~Nas -> vishvap[6.4.140/1.1.52] + as[1.3.7] -> vishvapaH
6-2,7-2 : R + os -> vishvap[6.4.140/1.1.52] + o -> vishvapoH
6-3 : R + am -> vishvap[6.4.140/1.1.52] + am -> vishvapAm
7-1 : R + i -> vishvap[6.4.140/1.1.52] + I[1.3.7] -> vishvapi

The rest are like `a' ending masculine nominal bases. The exceptions to these are those `a' ending masculine bases which are NOT of `Bha' type, and do not end in a root. For example, the word `hAhA'(R) which is declined differently in following cases -

2-3 : R + shas -> R + an[6.1.103] -> hAhAn
3-1 : R + TA -> R + a[1.3.7] -> hAhA [6.1.101]
4-1 : R + ~Ne -> R + e[1.3.7] -> hAhai[6.1.88]
5-1 : R + ~Nasi -> R + as[1.3.8/1.3.2] -> hAhAH[6.1.101]
6-1 : R + ~Nas ->R + as[1.3.7] -> hAhAH
6-2,7-2 : R + os -> hAhauH[6.1.88]
7-1 : R + ~Ni -> hAhA + i[1.3.7] -> hAhe[6.1.87]

The rest operate by the general rule.

4.5 "a" ENDING FEMININES

`a' ending feminines form a very large class formed by the addition of feminine affixes. The inflection of these stems has maintained itself with little change through the history of the language, being almost the same in the Vedas as later. The declension process of this class (represented by `' (R) ) is as follows -

1-1 : R + sU -> R +0[6.1.87] -> sItA
1-2,2-2:R +au(1-2),auT(2-2)->R+shi[7.1.17]->R+i[1.3.7]-> sIte[6.1.87]
1-3,2-3: R + Jas(1-3),shas(2-3) ->R + as -> sItAH
2-1 : R + am -> sItAm[6.1.68]
3-1 : R + TA -> sIte[7.3.105] + a[1.3.7] -> sItayA[6.1.78]
4-1 : R + ~Ne -> R + e[1.3.7] -> sItayA[7.3.113] + e -> sItAyai[6.1.88]
5-1,6-1 : R + ~Nasi(5-1),~Nas(6-1)-> sItA +as-> sItAya+aH- > sItAyAH[6.1.68]
6-2,7-2 : R + os -> sIte[7.3.105] + oH -> sIteyoH[6.1.78]
6-3 : R + Am -> R + NAm[7.1.54] -> sItAnAm
7-1 : R + ~Ni -> sItAya[7.3.113] + Am[7.3.116] -> sItAyAm[6.1.68]
8-1 : he R + sU -> he sIte + 0[6.1.69]
8-2, 8-3 : like 1-2 and 1-3 respectively.

4.6 "a" ENDING PRONOUNS

Let `sarva'(R) be the representative of this class. We have -

4-1 : R + ~Ne -> sarvasya [7.3.114](R1) + e[1.3.8] -> sarvasyai [6.1.88]
5-1 : R + ~Nasi -> R1 + as[1.3.8/1.3.2] -> sarvasyAH[6.1.88] (R2)
6-1 : R + ~Nas -> R1 + as[1.3.8] -> R2
6-3 : R + Am -> R + s[7.8.52] + am -> sarvAsAm[6.1.68]
7-1 : R + ~Ni -> R1 + am[7.3.116] -> sarvasyAm[6.1.68]

Other forms operate like 'sItA.'

5. THE PROGRAM MODEL

The present program model uses AI language called Prolog (PDC-3.2) for the implementation of Pinian subanta rules as explained in Bhattojidiksita's Siddhanta Kaumudi. The GIST technology developed at C-DAC, Pune has been used for transliteration. For each input called PrAtipadika the program will generate a maximum of 24(21+3) shabdarupa and also parsed constituents and details of processing on pressing `Enter' for each word form. The rules for these processing of subanta padas are found scattered in Ashtadhyayi mostly in chapters 7-1, 7-2, 7-3, 6-1, 6-4. However, these rules have been treated in the Subanta chapter of Siddhanta Kaumudi from rule number 177 to 446. The explanation of these rules provided by the Kaumudikara was basically for pedagogical purpose, but presupposes some basic knowledge of the traditional system. Therefore, the machine implementation of these rules required the implementer to have a thorough acquaintance in the system of Panini, modern linguistics as well as an AI language like Prolog.

5.1 PROCEDURAL DETAILS

On running the `nomor.exe' the program will ask for an input, a nominal base, technically called prAtipadika in Panini's meta-language, and flash a set of Roman symbols at the right-hand corner of the screen to handle the input-output mechanism in Roman as well (in case, facilities for earlier mentioned Indian scripts do not exist. On pressing `Enter', the script program will first convert the input into its Nagari equivalent and search this input in a program specific database to return other required information like the category, and the gender of the nominal base. In case of a nominal base being used in all the three genders, the program, with the help of a small menu will ask the user to enter the gender for which he/she would like to see the paradigm of word forms for the input nominal base. In case, the input nominal base is not a member of the database of nominal bases, the program will query about the category and the gender/s associated with the input through small menus.

The information pertinent to the prAtipadika (either returned from the database or sought from the user), will be fed into the morphological analyzer and thence to the lexicon of case affixes. These two components constitute the core of the program model. They embody the grammar section containing the Prolog implementation of the Paninian rules for nominal inflectional morphology. The morphological analyzer will, first of all, extract the final character/string of the nominal base with the help of user-defined predicates called `last_str', and `word_tail'. The former gives as outputs, the last string (of nth string length) and the remaining string. This predicate is useful in extracting strings of variable length for identification and processing. The latter splits the input string into the final string of length one (N=1), and the remaining string. This predicate is particularly useful in the vowel ending stems. The standard Prolog predicates `frontstr' and `str_len' have been called in these and other such user defined predicates.

The terminating string can be either a vowel represented by its mAtrA, a consonant represented by its marker called `halanta', a consonant cluster, a conveniently isolated part of the input string or the whole string itself. The clause section of the program has specifications for vowels (mAtrA), consonants, and other specific constituents as final strings of the nominal base. The program will search for the terminating strings/whole words among the clauses section and return the change/s (if any) in the nominal base before affixation. In case, program does not find any clause for the said final string/ whole word, the last clause (that is the general rule ) will bound the variables with values.

In the next stage, the program will go to the lexicon of case affixes and look for a possible match among various clauses. For each category of the terminating string/character, there would be a set of case affixes with their appropriate reduced forms, descriptive details and rules etc. Otherwise, a general set of affixes and other values will return the outputs.

The next stage involves the all important morphophonemic combinations to arrive at the macro forms from the micro constituent elements. Here, the modified (modifications if any) stem and the reduced stem (reductions if any) pass through morphemic processing under specific rule governed environments to yield the final output called `pada' or syntactic word, finished word etc. This output will once again pass through the 'Script' program for re-conversion into its original fold. This operation will be repeated for a maximum of 24 (21+3 case conditions) times. All these components, that is, the script, morphological Analyzer, case_aff_lexicon and the sandhi component operate within a super component called the `sound_class' which has basic Paninian pratyAhAras or sigla denoting important sound classes being used in various components.

After this chain of multiple processing, combinations and substitutions, the final word forms will be displayed in a paradigm arranged in the case-number order. If the user intends to see the parsed constituents and descriptive details etc. the curser can be moved onto the appropriate word-form and `Enter' pressed. Upon which a full size window will display the details based on the Ashtadhyayi formalism for nominal morphology.

5.2 THE ACCESSORY PROGRAMS

As will be seen in the diagram, the model contains some accessory programs apart from what can be called the core part of the program model (morphological analyzer and the case_aff_lexicon ). The following list enumerates them in order

  • script.pro : the program which converts a Roman input (through a given set of symbols) in to Devanagari equivalents for processing etc and finally reconverts the output (which will be in Devanagari equivalents) into the original Roman equivalents for display
  • sound_cl.pro : the sound class program which contains conveniently selected Paninian PratyAhAras or sigla to be called in various programs;
  • nom_lex.pro : the program-specific database of Nom_Bases to return associated information with respect to class/category and gender;
  • mph_comb.pro : the rules of morphophonemic combinations to be called at the end to combine the micro constituents

5.2.1 THE SCRIPT PROGRAM

The program script.pro accepts an input in Roman for which the symbols are flashed on the screen. It converts this input into its Ngari equivalent for internal processing, and finally reconverts the same back in the Roman for display. Inputs in Ngari or other mentioned scripts will be allowed to pass through unchanged.

5.2.2 THE SOUND CLASSES

As mentioned earlier (refer to section 1.3), Panini's sounds are given in a sound catalog called shivasUtras which are 14 in number. These are used to generate PratyAhAras or Sigla, 42 of which have been used by Panini in Ashtadhyayi. There were two options as far as implementing the shivasUtras vis a vis the need of the main program was concerned. One, to write the rules in a way that the program was able to generate lists of the elements in the required siglum. Two, to maintain a small database of such and other sound classes which have been used in various programs. The former option would have been too arduous and time consuming an exercise from the point of view of the present work. It would also have considerably slowed down the processing speed. Hence the preference for the second option. In the sound_cl.pro, the important lists are those of vowels, consonants, matras, classes 1-5 containing consonants based on their place and manner of articulation like velar, palatal, retroflex, dental, labial, semivowels, nasals, stops, aspirates, non aspirates, voiced and unvoiced, sibilants etc.

5.2.3 THE NOMINAL DATABASE

Names and concepts in a language are rooted in their socio-cultural milieu. For example, while learning Sanskrit nominal inflection, an input like `John' or `Mary' would lead to an error, while inputs like `RAma', `KRRiShNa', `SItA', `Hari',`vRRikSha' etc which are peculiar to the language in question or the language/s related to it socio culturally would yield correct results. To make this distinction right in the beginning, and also to provide information related to class/category and gender, a small database has been provided with entries which are commonly used for teaching shabdarUpa. In the case of an input not being a member of the said database, the program will query the user to elicit further information. If a nominal base has forms in more than one genders, the user has to select the gender from the menu in which the forms are to be seen. After this stage only the Prolog will go on to the next stage.

5.2.4 PROGRAM FOR EUPHONIC COMBINATIONS

All the morphological processes in Ashtadhyayi take place by the operation of sandhi rules for which there are specific utsargas and apavAdas (general operational rules and exceptions). Euphonic combinations are primarily of two types external and internal.

The accessory program in question combines two phonemes or morphemes externally and also effects internal morphophonemic changes (like those of dental `n' and `s' to their retroflex counterparts) according to the requirement of the present research. Some of the more complex internal sandhi mechanisms within a pada was found to be difficult at this level of implementation, and hence, avoided. After the external sandhi, program `mph_comb.pro' has combined the micro elements into one larger unit, the latter is passed on to the `n_s_retflx.pro' which looks for dental to cerebral change/s (if any) for `n' and `s'.

6. THE MAIN PROGRAM

The main program consists of three files namely `call.pro', `morph.pro' (with separate files for vowel and consonant ending stems) and `c_aff_lex.pro'.

  • The first one that is, the `call-pro' contains main predicates which make windows, menus etc, read the inputs and associated information (if any), call predicates from other files and display the generated word forms along with parsed constituents on the screen.
  • The second file called `morph.pro' examines the end character of the input string. Based on its type (various kinds of vowels and consonants), it effects modifications (if any) in the input string (nominal base ). The output from this file would become the first input (W1) in the morphophonemic component.
  • The third file that is, `c_aff_lex.pro' is a sort of lexicon of nominal case affixes based on the various types of end characters of a nominal base.
  • After this morphological type of Nom-Base has been identified from the previous file, the present file assigns sets of values of case affixes to the modified nominal-base depending on the number, gender and case.

Thus for each nominal base, there would be a maximum of 21+3=24 combinations. Outputs from this file, that is values of affixes, will go as the second input (W2) into the morphophonemic component which will combine W1 and W2 according to appropriate sandhi rules and send the output to the call.pro wherefrom the results are displayed.

6.1 TECHNICAL/NON-TECHNICAL PROBLEMS

The program-model works on the Paninian formalisms for the inflectional morphology of case affixes and its explanation as provided by Bhattojidiksita's Siddhanta Kaumudi (SK). The chapter SubantaprakaraNam in SK contains Paninian rules for subanta forms collected at one place along with the commentary and explanations which are not entirely without ambiguities. Therefore, the cross references from other books like Katre's edition of Ashtadhyayi, Kale's grammar and a few school level Sanskrit grammar books were essential.

One of the technical problems was morphophonemic combinations internal and external. Their multiple operations for combining elements at different stage has affected the execution speed of the program. Handling exceptions was another difficulty. Finally, providing a facility for the execution of the program on PCs without the GIST card facility was particularly problematic. A separate program had to be written for both sandhi as well as for the script.

6.2 LIMITATIONS

The program will give the paradigm of word forms for most of the words correctly, though many words for which rules the Paninian grammar provides description, are rarely used in the commonplace Sanskrit. The parsed constituents and procedural details also follow the Paninian methodology as far as practicable. These details may not be of much use and interest to a child learner, but certainly so for an advanced learner who has some background in the Paninian system.

The program allows input output mechanism in the scripts of major Indian languages or Roman for which the symbols will be flashed on the screen at the point where input nominal base is to be entered. The input has to be correct, that is, it should NOT be a mix of scripts (for example, a mixing of Devanagari and Roman symbols for entering the input nominal base) in which case the program may return wrong answers or fail. Secondly, the names and concepts etc. to be input as nominal base should be peculiar to Sanskrit language or other Indian languages which are culturally on the same plane. Other inputs (if required) which are to be selected from menus should also be correct, otherwise the outputs may not be according to the Paninian formalism.

7. CONCLUSION

The paper thus attempts to present a computational system for nominal inflectional morphology of Sanskrit by applying the Prolog based NL techniques. Though there are limitations on various counts as mentioned above, yet the resultant program-model can be used in a machine aided language pedagogy and for NLP research purposes. It can also be suitably modified and adapted as a system intermediate to larger and more sophisticated NL systems for Indian languages besides Sanskrit, e.g. a parser for verb-less sutras for machine aided interpretation of the sutraic literature in Sanskrit, an M(A)T System, or an NL Understanding System in the Indian multilingual context.


BIBLIOGRAPHY

Cardona, George, 1965. On translating and formalizing Paninian rules, Journal of Oriental Institute, Baroda, vol 14, 306-14.

Cardona, George, 1987. Panini: His work and its traditions (vols 1-3, first edn. 1987. Motilal Banarasidass, 1988.

Deshpande, Madhav M. 1992. Panini in the context of Modernity. Language and text. ed. R.N.Srivastava, et al., Kalinga Publications, Delhi.

Gal, A., Lapalme, G., Saint-Dizier, P. & Somers, H., 1991. Prolog for Natural Language Processing. John Wily and Sons Ltd., West Sussex, England.

Grishman, R. 1986. Computational Linguistics: An Introduction (Studies in NLP, sponsored by ACL, Cambridge Univ. Press, New York Univ. 1986.

Joshi, S.D. 1969. Sentence structure according to Panini. Indian Antiquary

Kale, M.R. 1894. A Higher Sanskrit Grammar. Motilal Banarasidass, 1987, latest edition.

Kanthan, K.L. Formal language system of Panini. Chemical Bank, Information Technology Management, New York.

Kapoor, Kapil. 1991. Panini Vyakarana: Nature, Applicability and Organization. Course notes for NLP-91, IIT-Kanpur, 1991.

Katre, Sumitra, M. 1985. Astadhyayi of Panini. First Indian edn. Motilal Banarasidass,1989.

Rowe, Neil, C. 1988. Artificial Intelligence through Prolog. US Naval Post Graduate School, Prentice Hall, Inc.

Singh, G.V., Mishra, Mandan, Suryanarayan, K. 1992. The word morphology of Sanskrit. CPAL-2 proceedings of the second regional workshop, March 1992. IIT-Kanpur, ed. R.M.K. Sinha, pp 315-16.

Singh, G.V., Jha, Girish Nath. 1994. Indian Theory of Knowledge: An Artificial Intelligence perspective. Proceedings of the national level seminar on Inference mechanisms in shastras and computer science? (Apr 9- 10,1994) held by Academy of Sanskrit Research, Melkote, Karnataka.

Vasu, S.C. 1891. Astadhyayi (Vols I & II)

Veda Varidhi Ramanujam, P. 1992. Computer Processing of Sanskrit (CPAL-2 proceedings of the second regional workshop, March 1992, IIT-Kanpur, ed. R.M.K. Sinha. pp 159-168)

Whitney, W.D. Sanskrit Grammar. Motilal Banarasidas, Delhi 1969.


Appendix I
Phonetic chart (ITRANS 5.0 - http://www.aczone.com/itrans/TRANS.TXT)

Vowels

a A i I u U RRi RRI LLi LLI e ai o au aM aH

Consonants

k kh g gh ~N
ch ch j jh ~n
T Th D Dh N
t th d dh n
p ph b bh m
y r l v
sh Sh s h L
kSh j~n shr

HOME PAGE | BACK ISSUES | A Prolog Analyzer/Generator for Sanskrit Subanta Padas | From Spelling Bee to Misinterpretation of Telephone Conversations | Useful and Advanced Topics for Conversation in Intensive Courses: Heritage Learning and Current Social Learning | Understanding Proxemic Behavior | CIEFL Occasional Papers in Linguistics, Vol. 10 | The Two Lives of -unnu in Malayalam: a Response to Amritavali and Jayaseelan | The Order of the Inflectional Morphemes in Arabic | Serial Verbs with the Light Verbs ja: and de in Oriya | Multiple Wh-Fronting and Superiority: A Nested Movement Analysis ...
| Stress and Tone in Punjabi | The Prosodic Phonology of Negation in Assamese | Phonological Awareness in Adult Illiterates: Onsets, Rimes, and Analogies | CONTACT EDITOR


Girish Nath Jha, Ph.D.
Special Centre for Sanskrit Studies
Jawaharlal Nehru University
New Delhi-110067, India
E-mail: gnjha@hotmail.com