LANGUAGE IN INDIA

Strength for Today and Bright Hope for Tomorrow

Volume 5 : 8 August 2005

Editor: M. S. Thirumalai, Ph.D.
Associate Editors: B. Mallikarjun, Ph.D.
         Sam Mohanlal, Ph.D.
         B. A. Sharada, Ph.D.
         A. R. Fatihi, Ph.D.

HOME PAGE


AN APPEAL FOR SUPPORT

PAYPAL

  • We seek your support to meet expenses relating to some new and essential software, formatting of articles and books, maintaining and running the journal through hosting, correrspondences, etc. You can use the PAYPAL link given above. Please click on the PAYPAL logo, and it will take you to the PAYPAL website. Please use the e-mail address thirumalai@mn.rr.com to make your contributions using PAYPAL.
    Also please use the AMAZON link to buy your books. Even the smallest contribution will go a long way in supporting this journal. Thank you. Thirumalai, Editor.

PAYPAL


In Association with Amazon.com



BOOKS FOR YOU TO READ AND DOWNLOAD


REFERENCE MATERIAL

BACK ISSUES


  • E-mail your articles and book-length reports (preferably in Microsoft Word) to thirumalai@mn.rr.com.
  • Contributors from South Asia may send their articles to
    B. Mallikarjun,
    Central Institute of Indian Languages,
    Manasagangotri,
    Mysore 570006, India
    or e-mail to mallikarjun@ciil.stpmy.soft.net
  • Your articles and booklength reports should be written following the MLA, LSA, or IJDL Stylesheet.
  • The Editorial Board has the right to accept, reject, or suggest modifications to the articles submitted for publication, and to make suitable stylistic adjustments. High quality, academic integrity, ethics and morals are expected from the authors and discussants.

Copyright © 2004
M. S. Thirumalai


STUDY OF HINDI NOUN PHRASE MORPHOLOGY FOR
DEVELOPING A LINK GRAMMAR BASED PARSER
Shailly Goyal and Niladri Chatterjee


ABSTRACT

Development of Hindi Link Grammar has already been initiated [Goyal and Chatterjee, 2005. To appear in a forthcoming issue of LANGUAGE IN INDIA]. However, in that work simple sentence structures were considered, and the focus was on verb morphology only. This work considers the Noun Phrase Morphology of Hindi in detail, and suggests appropriate links by taking into account the variations in the Noun Phrase structures of English and Hindi.

1. INTRODUCTION

Link Grammar provides a systematic way of parsing sentences by establishing links between the constituent words of a sentence. Typically, these links are aimed at providing the syntactic relationships between words of a phrase, and also between different phrases in a sentence. These two types of links have been named as "Intra-phrase" and "Inter-phrase" links, respectively. Although, English Link Grammar (ELG) is fully developed [Sleator and Temperley, 1991], work towards developing Hindi links has just begun. In this work we focus on developing Intra-phrase links for Noun Phrases in Hindi.

Typically, a link grammar is developed by creating a dictionary of all the words in a language. By judging the roles of a particular word in different contexts, a list of possible linkages that can be associated with that word is ascertained. A sentence is said to be valid if all its words have their link bindings satisfied.

However, creating such an exhaustive dictionary for any language is arduous and time-consuming. A simpler approach may be to follow the English links that are already developed and available for developing a link grammar-based parser for Hindi. However, variations of the syntactic rules of the two languages make straightforward utilization of the English links in Hindi difficult, if not impossible. Consequently, appropriate modifications need to be made.

This work focuses on identifying the discrepancies in the Noun Phrase morphology between English and Hindi. Any such work almost inevitably demands a systematic analysis of the morphologies of the two languages under consideration. This paper is therefore organized as follows. Section 2 provides a description of different English Noun Phrase links as given in [Sleator and Temperley, 1991]. Difficulties in straightforward adaptation of English links for Hindi are discussed in Section 3. Section 4 discusses different Hindi Noun phrase structures, as given in [Singh, 2003], and explains how English links need to be modified in order to capture the nuances Hindi syntax. The proposed links are illustrated with examples.

2. STUDY OF ENGLISH NOUN PHRASES AND LINKS

English Noun Phrases (NPs) typically consists of a noun/pronoun, called the "head" of the NP. It may further have the following optional constituents [Singh, 2003]:

  • Determiner(s), e.g. 'the', 'an', 'this', 'all', 'one', 'some' and also genitive cases.
  • Pre-modifier(s), such as adjectives, participles.
  • Post-modifier(s), such as preposition phrase, to-infinitive.

Each of the constituents along with its related links is discussed below:

  • Determiners: English nouns may take more than one determiner. Most important links for determiners in ELG are 'D', 'DD' and 'DG' link. Details of these links are given below:
    1. D: D connects determiners to nouns. Suffixes following 'D' link are used to differentiate between singular, mass and plural nouns. For example, in the sentence, "Many people were present", 'many' and 'people' are connected with 'D' link.
    2. DD: DD is used to connect definite determiners ("the", "his", "Ramesh's") to number expressions and adjectives acting as nouns. It is also used to connect determiners to adjectives when they are being used as self-contained noun-phrases. For example, in the sentence "His three sisters are coming next week", 'his' and 'three' are linked via 'DD' link.
    3. DG: DG connects the determiner "the" with proper nouns. For example, in the sentence "The Emir of Kuwait died today", 'the' is connected to 'Emir' through 'DG' link.
    4. Genitive Links: Genitives in ELG are connected through 'YS' and 'YP' links. YS connects nouns to the possessive suffix "'s" and YP is used in possessive constructions to connect plural noun forms ending in 's' to " ' ". " 's / ' " then acts as a determiner, making a 'D' link to a noun. For illustration, in the sentences "I like Ramesh's book" and "Workers' demands have been neglected", there is a 'YS' connection between 'Ramesh' and ' 's ' and a 'YP' link between 'workers' and ' ' ', respectively.
  • Pre-modifiers: In English, pre-modifiers come after the determiners. Three different parts-of-speech can pre-modify the noun: Adjective, Participle (present or past), and Noun. Links for these pre-modifiers are discussed below.
    1. A: A connects pre-noun ("attributive") adjectives to nouns. Any number of adjectives can be used; all connect to the noun. This link connects pre-modifying adjectives as well as participles. For illustration, consider the sentences, "He ate a big apple", "I like the sleeping baby" and "Rotten apples are harmful". In each of these sentences, the italicized words are linked through 'A' link.
    2. AN: AN connects noun-modifiers to nouns. In the sentence, "This lady doctor is good", 'AN' connects 'lady' and 'doctor'.
    3. EA: EA connects adverbs to adjectives. For example, in the sentence, "He has done a very good job", EA link connects 'very' and 'good'.
    4. G: G link connects proper noun together in a series. For example, consider the sentence, "The president of India is Dr. Avul Pakir Jainulabdeen Abdul Kalam". In this sentence, the italicized words are connected by 'G' links.
    5. GN: GN is a link used in expressions where proper nouns are introduced by a common noun, with or without a determiner. For example, in the sentence, "The famous painter Anjolie will come tonight", 'GN' link connects the common noun 'painter' with the proper noun 'Anjolie'.
  • Post Modifiers: In English sentences, post modifiers are used in many forms; such as, adjective (e.g., the President designate), preposition phrase (e.g., the girl in the room), participle (e.g., the train bound for Delhi), relative clause (e.g., the house where I was born). Most important links used for post-nominal modifiers are briefly discussed below:
    1. M: M connects nouns to various kinds of post-nominal modifiers without commas, such as prepositional phrases, participle modifiers, prepositional relatives, and possessive relatives. Although 'M' link is used for most kinds of post-nominal modifiers, relative clauses are exception (Relative clauses use 'R' and 'B' link.). Different suffixes after 'M' link are used to distinguish between different types of modifiers: 'p' is used for prepositional phrases, 'a' is used for adjectives, 'v' and 'g' are used for participle modifiers etc. As an illustration consider the following sentences: "The girl in the garden is beautiful", "People unhappy about the economy are angry", "The apple eaten by the child was sweat" etc. In these sentences, italicized words are connected through appropriate 'M' link.
    2. MX: MX connects nouns to post-nominal noun modifiers surrounded by commas. Similar to 'M' connective, the 'MX' connectors have different subscripts depending on the kind of modifying phrase.
    3. TH: TH connects words that take "that [clause]" complements with the word "that". These include verbs, nouns and adjectives. In this work, we consider only noun "that" clause. For example, in the sentence "Your assertion that I knew her was wrong", 'assertion' is connected to 'that' with 'TH' link.

There are several other links like R, RS, B, C etc. which are used for English Noun Phrases. However, space limitation prohibits us from discussing all those links.

3. DISSIMILARITIES IN ENGLISH AND HINDI NPs

Straightforward application of Intra-phrase Noun Phrase links for Hindi suffers from several difficulties. Most important ones are being discussed in the following subsections.

3.1 Usage of Articles

Usage of articles in English Noun Phrases is governed by certain rules. On the other hand, Hindi does not have articles. Generally, nouns not preceded with "ek" are considered as definite, and the nouns preceded with "ek" are treated as either indefinite or quantitative noun [Singh, 2003].

3.2 Dissimilarities in English-Hindi Adjectives

  • Inflected adjectives: English adjectives are necessarily uninflected - they undergo no morphological changes with the variations in the nouns they qualify. But Hindi adjectives may be inflected as well as uninflected [Sastri and Apte, 1968]. For example: "achchhaa" is an inflected adjective (e.g., achchhaa laDkaa, achchhii laDkii, achchhe laDke), while "sundar" is an uninflected adjective (e.g., sundar laDkaa, sundar laDkii, sundar laDke).
  • Different types of adjectives in Hindi: Hindi adjectives of certain forms do not have any equivalent structure in English [Singh, 2003]. For example, reduplicated adjective (e.g., achchhe-achchhe pakwaan), saa-adjective (e.g., laal-saa kapDaa), waalaa-adjective (e.g., saamne waalii khiDkii).

3. 3. Post modifiers

Pre-modifiers as well as post-modifiers are used in English (e.g., "blue-eyed girl", "the girl with blue eyes"), whereas only pre-modifiers are used in Hindi (e.g., niilii aankhoon waalii laDkii). In Hindi, post-modifiers are used only in the form of relative clause or ki-clause [Singh, 2003] (e.g. "yah tathya ki tumne jhooth bolaa bhulaaya nahii jaa saktaa").

3. 4. Dependence of Modifiers on Gender, Number and Case-ending of Head Noun

Like Hindi adjectives, morphology of other pre-nominal modifiers (such as, genitives, participle modifier) varies with variations in head noun. In this subsection, we discuss the variations for genitives in detail. Similar variations are observed for other modifiers.

In Hindi, genitives are indicated with kaa/ke/kii as morpho-word. Choice of kaa/ke/kii depends on the gender, number and case ending of the head noun. Table 1 explains and illustrates the usage of kaa/ke/kii for different variations of head noun.

TABLE 1: Usage of kaa/ke/kii in genetive case

Gender of head noun Number of head noun Case-ending of head noun kaa/ke/ kii Example(s)
Masculine Singular Absent kaa laDke kaa bhai
Masculine Plural Absent ke ladke ke bhai
Masculine Don't care Present ke laDke ke bhai ne,
ladke ke bhaiyoon ne
Feminine Don't care Don't care kii ladke kii behan,
laDkee kii behan ne,
laDke kii behanon ne,
laDke kii behanein

4. PROPOSED HINDI LINKS FOR NPs

Here we explain the Hindi links (H-links) that we propose corresponding to the above discussed English links. It may be noted that unlike the Verb Phrase links discussed in [Goyal and Chatterjee, 2005], direction for the Noun Phrase links can be specified since the relative position of various modifiers with respect to the head noun is fixed.

In the following links, the direction information is given by direction specifier '+' or '-', as is followed with respect to the English Link Grammar [Sleator and Temperley, 1991].

D Link: 'D' link connects Hindi determiner with the head noun. Like English 'D' link, this link is also followed by two suffixes to give number information of the head noun.

Genitive Links: As discussed in Section 3. 4, case ending kaa/ke/kii is used to construct Hindi genitives. We propose the following two links for this construction:

  • J: 'J' H-link connects the Hindi nouns with case endings [Goyal and Chatterjee, 2005]. We propose a new suffix 'p' for 'J' H-link to mark the possessive case, i.e. 'Jp' will connect the nouns with the case ending kaa/ke/kii.
  • D1: 'D1' link is proposed to connect the case ending kaa/ke/kii with the head noun. Three suffixes may follow 'D1' H-link. The first suffix can be either 'm' or 'f', denoting the gender of the head noun. The second suffix gives the number information and thus it can be 's' (singular) or 'p' (plural). The third suffix, 'j', can be used as a case ending marker of the head noun.

Thus, the case ending kaa/ke/kii has (Jp- & D1+). Table 2 gives different 'D1' suffixes for kaa/ke/kii. Figure 1 gives an example of these links.

case-link

Table 2: 'D1' H-link

jp-dimp

Figure 1: Example of Genitive Case

Adjective Links: Below we discuss the proposed links for Hindi adjectives.

  • A: This H-link connects the adjectives with the noun. All adjectives are given 'A+' H-link. As discussed earlier, Hindi adjectives can be inflected as well as uninflected. Since inflected adjectives depend on gender, number and case ending of the head noun, 'A' link is followed by three suffixes for adjective-noun agreement. These suffixes are same as those for genitive case. Figure 2 illustrates an example of this link.
  • PT1: In Hindi, for present participle form of adjectives, taa/te/tii form of verb is used which is followed by morpho-word huaa/hue/huii. 'PT1' link connects the taa/te/tiii form of the verb with the morpho-word huaa/hue/huii. Further, huaa/hue/huii is connected to the head noun through 'A' link. In this case also, adjective morphology depends on noun. An example of this H-link is given in Figure 3.
  • ams, achhaa

    Figure 2: Example of 'A' H-link

    roote  hue bachchee

    Figure 3: Example of 'PT1' H-link

  • PP2: This link connects aa/e/ii form of the verb (see [Sastri and Apte, 1968]) with huaa/hue/huii. This formulation is used in the past participle form of modifiers. Like other noun modifier H-links, this too has three suffixes for adjective-noun agreement.
  • AR: Unlike English, Hindi has reduplicated adjectives. 'AR' link is used in this case to connect the first repetition of adjective with second repetition of adjective. Finally second repetition of adjective makes an 'A' H-link with the head noun. No suffix is required for this H-link, since adjective-noun agreement is confirmed through 'A' link. An example of this H-link is given in Figure 4.
  • main sundar

    Figure 4: Example of 'AR' H-link

    saamne waali

    Figure 5: Example of 'AV' H-link

  • AV: One more different type of adjectives is used in Hindi. These are waala-adjectives. 'AV' H-link is used to connect waala/waale/walii with the preceding word. This link is also followed by three suffixes to give gender, number and case information of the head noun. Further waala/waale/walii is linked to the head noun through 'A' link. Figure 5 gives an example of this link.
  • AS: saa-adjectives are also unique to Hindi. saa/se/sii is added after the Hindi adjective to mean "like". 'AS' link is proposed to be used to connect the adjective with saa/se/sii. Choice of saa/se/sii depends on the head noun, and thus 'AS' link is followed by three suffixes, as is the case with other Hindi adjective links.
  • AN, EA, G and GN: These H-links will have the same usage as in English Link Grammar.

Post-Modifier Links: In Hindi, post modifiers come either in the form of relative clause or as ki-clause. Due to lack of space, we discuss only ki-clause in this work.

  • TH and C: 'TH' H-link connects the head noun with ki. Further, ki is linked with the subject of the subordinate clause through 'C' link. As an example of these links, consider the linkage of the sentence given in Figure 6.

apki manyata

Figure 6: Example of ki-clause

It may be noted that like in ELG, 'TH' and 'C' links can be used in various constructions in Hindi Link Grammar also. Since the focus of this work is on Noun Phrase links, we omit the discussion of other usages of these links from this work.

4. CONCLUDING REMARKS

In this work, we studied the Hindi Noun Phrase morphology to develop links that may provide syntactic relationship between the words in a Noun Phrase. We have followed an Example Based approach where links given in ELG have been considered and suitably modified to capture the characteristics of Hindi morphology. Due to lack of space, many other variations (e.g. relative clause) could not be discussed here. We are currently working on developing algorithms for parsing Hindi sentences using the proposed Hindi Link Grammar.


REFERENCES

Goyal S. and Chatterjee N.: 2005, Towards Developing a Link Grammar Based Parser for Hindi, a paper submitted to Workshop on Morphology, IIT Bombay. To appear in LANGUAGE IN INDIA http://www.languageinindia.com.

Sastri S. and Apte B.: 1968, Hindi Grammar, Dakshina Bharat Hindi Prachar Sabha, Madras, India.

Singh, S.: 2003, English-Hindi Translation Grammar, Prabhat Publication, New Delhi.

Sleator D. and Temperley D.: 1991, Parsing English with a Link Grammar, Computer Science technical report CMU-CS-91-196, Carnegie Mellon University.


CLICK HERE FOR PRINTER-FRIENDLY VERSION.


ASPECTS OF CONCATENATIVE AND NON-CONCATENATIVE MORPHOLOGY OF STANDARD HINDI | ONTOLOGY FOR WORD-FORM GENERATION IN ORIYA | STUDY OF HINDI NOUN PHRASE MORPHOLOGY FOR DEVELOPING A LINK GRAMMAR BASED PARSER | ENGLISH LANGUAGE LEARNING IN THE ESP CONTEXT - AN INDIAN EXPERIMENT | USING ANIMATION FOR TEACHING PHRASAL VERBS - A BRIEF INDIAN EXPERIMENT | MOTHER AND CHILD RELATIONS AS A SEMIOTIC EVENT | HOME PAGE | CONTACT EDITOR


Shailly Goyal and Niladri Chatterjee
Department of Mathematics
Indian Institute of Technology Delhi
Hauz Khas, New Delhi 110016
India.
C/o. LANGUAGE IN INDIA
  • Send your articles
    as an attachment
    to your e-mail to
    thirumalai@mn.rr.com.
  • Please ensure that your name, academic degrees, institutional affiliation and institutional address, and your e-mail address are all given in the first page of your article. Also include a declaration that your article or work submitted for publication in LANGUAGE IN INDIA is an original work by you and that you have duly acknolwedged the work or works of others you either cited or used in writing your articles, etc. Remember that by maintaining academic integrity we not only do the right thing but also help the growth, development and recognition of Indian scholarship.