ABSTRACT
Development of Hindi Link Grammar has already been initiated [Goyal and Chatterjee, 2005. To appear in a forthcoming issue of LANGUAGE IN INDIA]. However, in that work simple sentence structures were considered, and the focus was on verb morphology only. This work considers the Noun Phrase Morphology of Hindi in detail, and suggests appropriate links by taking into account the variations in the Noun Phrase structures of English and Hindi.
1. INTRODUCTION
Link Grammar provides a systematic way of parsing sentences by establishing links between the constituent words of a sentence. Typically, these links are aimed at providing the syntactic relationships between words of a phrase, and also between different phrases in a sentence. These two types of links have been named as "Intra-phrase" and "Inter-phrase" links, respectively. Although, English Link Grammar (ELG) is fully developed [Sleator and Temperley, 1991], work towards developing Hindi links has just begun. In this work we focus on developing Intra-phrase links for Noun Phrases in Hindi.
Typically, a link grammar is developed by creating a dictionary of all the words in a language. By judging the roles of a particular word in different contexts, a list of possible linkages that can be associated with that word is ascertained. A sentence is said to be valid if all its words have their link bindings satisfied.
However, creating such an exhaustive dictionary for any language is arduous and time-consuming. A simpler approach may be to follow the English links that are already developed and available for developing a link grammar-based parser for Hindi. However, variations of the syntactic rules of the two languages make straightforward utilization of the English links in Hindi difficult, if not impossible. Consequently, appropriate modifications need to be made.
This work focuses on identifying the discrepancies in the Noun Phrase morphology between English and Hindi. Any such work almost inevitably demands a systematic analysis of the morphologies of the two languages under consideration. This paper is therefore organized as follows. Section 2 provides a description of different English Noun Phrase links as given in [Sleator and Temperley, 1991]. Difficulties in straightforward adaptation of English links for Hindi are discussed in Section 3. Section 4 discusses different Hindi Noun phrase structures, as given in [Singh, 2003], and explains how English links need to be modified in order to capture the nuances Hindi syntax. The proposed links are illustrated with examples.
2. STUDY OF ENGLISH NOUN PHRASES AND LINKS
English Noun Phrases (NPs) typically consists of a noun/pronoun, called the "head" of the NP. It may further have the following optional constituents [Singh, 2003]:
- Determiner(s), e.g. 'the', 'an', 'this', 'all', 'one', 'some' and also genitive cases.
- Pre-modifier(s), such as adjectives, participles.
- Post-modifier(s), such as preposition phrase, to-infinitive.
Each of the constituents along with its related links is discussed below:
- Determiners: English nouns may take more than one determiner. Most important links for determiners in ELG are 'D', 'DD' and 'DG' link. Details of these links are given below:
- D: D connects determiners to nouns. Suffixes following 'D' link are used to differentiate between singular, mass and plural nouns. For example, in the sentence, "Many people were present", 'many' and 'people' are connected with 'D' link.
- DD: DD is used to connect definite determiners ("the", "his", "Ramesh's") to number expressions and adjectives acting as nouns. It is also used to connect determiners to adjectives when they are being used as self-contained noun-phrases. For example, in the sentence "His three sisters are coming next week", 'his' and 'three' are linked via 'DD' link.
- DG: DG connects the determiner "the" with proper nouns. For example, in the sentence "The Emir of Kuwait died today", 'the' is connected to 'Emir' through 'DG' link.
- Genitive Links: Genitives in ELG are connected through 'YS' and 'YP' links. YS connects nouns to the possessive suffix "'s" and YP is used in possessive constructions to connect plural noun forms ending in 's' to " ' ". " 's / ' " then acts as a determiner, making a 'D' link to a noun. For illustration, in the sentences "I like Ramesh's book" and "Workers' demands have been neglected", there is a 'YS' connection between 'Ramesh' and ' 's ' and a 'YP' link between 'workers' and ' ' ', respectively.
- Pre-modifiers: In English, pre-modifiers come after the determiners. Three different parts-of-speech can pre-modify the noun: Adjective, Participle (present or past), and Noun. Links for these pre-modifiers are discussed below.
- A: A connects pre-noun ("attributive") adjectives to nouns. Any number of adjectives can be used; all connect to the noun. This link connects pre-modifying adjectives as well as participles. For illustration, consider the sentences, "He ate a big apple", "I like the sleeping baby" and "Rotten apples are harmful". In each of these sentences, the italicized words are linked through 'A' link.
- AN: AN connects noun-modifiers to nouns. In the sentence, "This lady doctor is good", 'AN' connects 'lady' and 'doctor'.
- EA: EA connects adverbs to adjectives. For example, in the sentence, "He has done a very good job", EA link connects 'very' and 'good'.
- G: G link connects proper noun together in a series. For example, consider the sentence, "The president of India is Dr. Avul Pakir Jainulabdeen Abdul Kalam". In this sentence, the italicized words are connected by 'G' links.
- GN: GN is a link used in expressions where proper nouns are introduced by a common noun, with or without a determiner. For example, in the sentence, "The famous painter Anjolie will come tonight", 'GN' link connects the common noun 'painter' with the proper noun 'Anjolie'.
- Post Modifiers: In English sentences, post modifiers are used in many forms; such as, adjective (e.g., the President designate), preposition phrase (e.g., the girl in the room), participle (e.g., the train bound for Delhi), relative clause (e.g., the house where I was born). Most important links used for post-nominal modifiers are briefly discussed below:
- M: M connects nouns to various kinds of post-nominal modifiers without commas, such as prepositional phrases, participle modifiers, prepositional relatives, and possessive relatives. Although 'M' link is used for most kinds of post-nominal modifiers, relative clauses are exception (Relative clauses use 'R' and 'B' link.). Different suffixes after 'M' link are used to distinguish between different types of modifiers: 'p' is used for prepositional phrases, 'a' is used for adjectives, 'v' and 'g' are used for participle modifiers etc. As an illustration consider the following sentences: "The girl in the garden is beautiful", "People unhappy about the economy are angry", "The apple eaten by the child was sweat" etc. In these sentences, italicized words are connected through appropriate 'M' link.
- MX: MX connects nouns to post-nominal noun modifiers surrounded by commas. Similar to 'M' connective, the 'MX' connectors have different subscripts depending on the kind of modifying phrase.
- TH: TH connects words that take "that [clause]" complements with the word "that". These include verbs, nouns and adjectives. In this work, we consider only noun "that" clause. For example, in the sentence "Your assertion that I knew her was wrong", 'assertion' is connected to 'that' with 'TH' link.
There are several other links like R, RS, B, C etc. which are used for English Noun Phrases. However, space limitation prohibits us from discussing all those links.
3. DISSIMILARITIES IN ENGLISH AND HINDI NPs
Straightforward application of Intra-phrase Noun Phrase links for Hindi suffers from several difficulties. Most important ones are being discussed in the following subsections.
3.1 Usage of Articles
Usage of articles in English Noun Phrases is governed by certain rules. On the other hand, Hindi does not have articles. Generally, nouns not preceded with "ek" are considered as definite, and the nouns preceded with "ek" are treated as either indefinite or quantitative noun [Singh, 2003].
3.2 Dissimilarities in English-Hindi Adjectives
- Inflected adjectives: English adjectives are necessarily uninflected - they undergo no morphological changes with the variations in the nouns they qualify. But Hindi adjectives may be inflected as well as uninflected [Sastri and Apte, 1968]. For example: "achchhaa" is an inflected adjective (e.g., achchhaa laDkaa, achchhii laDkii, achchhe laDke), while "sundar" is an uninflected adjective (e.g., sundar laDkaa, sundar laDkii, sundar laDke).
- Different types of adjectives in Hindi: Hindi adjectives of certain forms do not have any equivalent structure in English [Singh, 2003]. For example, reduplicated adjective (e.g., achchhe-achchhe pakwaan), saa-adjective (e.g., laal-saa kapDaa), waalaa-adjective (e.g., saamne waalii khiDkii).
3. 3. Post modifiers
Pre-modifiers as well as post-modifiers are used in English (e.g., "blue-eyed girl", "the girl with blue eyes"), whereas only pre-modifiers are used in Hindi (e.g., niilii aankhoon waalii laDkii). In Hindi, post-modifiers are used only in the form of relative clause or ki-clause [Singh, 2003] (e.g. "yah tathya ki tumne jhooth bolaa bhulaaya nahii jaa saktaa").
3. 4. Dependence of Modifiers on Gender, Number and Case-ending of Head Noun
Like Hindi adjectives, morphology of other pre-nominal modifiers (such as, genitives, participle modifier) varies with variations in head noun. In this subsection, we discuss the variations for genitives in detail. Similar variations are observed for other modifiers.
In Hindi, genitives are indicated with kaa/ke/kii as morpho-word. Choice of kaa/ke/kii depends on the gender, number and case ending of the head noun. Table 1 explains and illustrates the usage of kaa/ke/kii for different variations of head noun.
TABLE 1: Usage of kaa/ke/kii in genetive case
Gender of head noun Number of head noun Case-ending of head noun kaa/ke/ kii Example(s) Masculine Singular Absent kaa laDke kaa bhai Masculine Plural Absent ke ladke ke bhai Masculine Don't care Present ke laDke ke bhai ne,
ladke ke bhaiyoon neFeminine Don't care Don't care kii ladke kii behan,
laDkee kii behan ne,
laDke kii behanon ne,
laDke kii behanein4. PROPOSED HINDI LINKS FOR NPs
Here we explain the Hindi links (H-links) that we propose corresponding to the above discussed English links. It may be noted that unlike the Verb Phrase links discussed in [Goyal and Chatterjee, 2005], direction for the Noun Phrase links can be specified since the relative position of various modifiers with respect to the head noun is fixed.
In the following links, the direction information is given by direction specifier '+' or '-', as is followed with respect to the English Link Grammar [Sleator and Temperley, 1991].
D Link: 'D' link connects Hindi determiner with the head noun. Like English 'D' link, this link is also followed by two suffixes to give number information of the head noun.
Genitive Links: As discussed in Section 3. 4, case ending kaa/ke/kii is used to construct Hindi genitives. We propose the following two links for this construction:
- J: 'J' H-link connects the Hindi nouns with case endings [Goyal and Chatterjee, 2005]. We propose a new suffix 'p' for 'J' H-link to mark the possessive case, i.e. 'Jp' will connect the nouns with the case ending kaa/ke/kii.
- D1: 'D1' link is proposed to connect the case ending kaa/ke/kii with the head noun. Three suffixes may follow 'D1' H-link. The first suffix can be either 'm' or 'f', denoting the gender of the head noun. The second suffix gives the number information and thus it can be 's' (singular) or 'p' (plural). The third suffix, 'j', can be used as a case ending marker of the head noun.
Thus, the case ending kaa/ke/kii has (Jp- & D1+). Table 2 gives different 'D1' suffixes for kaa/ke/kii. Figure 1 gives an example of these links.
Table 2: 'D1' H-link
Figure 1: Example of Genitive Case
Adjective Links: Below we discuss the proposed links for Hindi adjectives.
- A: This H-link connects the adjectives with the noun. All adjectives are given 'A+' H-link. As discussed earlier, Hindi adjectives can be inflected as well as uninflected. Since inflected adjectives depend on gender, number and case ending of the head noun, 'A' link is followed by three suffixes for adjective-noun agreement. These suffixes are same as those for genitive case. Figure 2 illustrates an example of this link.
- PT1: In Hindi, for present participle form of adjectives, taa/te/tii form of verb is used which is followed by morpho-word huaa/hue/huii. 'PT1' link connects the taa/te/tiii form of the verb with the morpho-word huaa/hue/huii. Further, huaa/hue/huii is connected to the head noun through 'A' link. In this case also, adjective morphology depends on noun. An example of this H-link is given in Figure 3.
Figure 2: Example of 'A' H-link
Figure 3: Example of 'PT1' H-link
- PP2: This link connects aa/e/ii form of the verb (see [Sastri and Apte, 1968]) with huaa/hue/huii. This formulation is used in the past participle form of modifiers. Like other noun modifier H-links, this too has three suffixes for adjective-noun agreement.
- AR: Unlike English, Hindi has reduplicated adjectives. 'AR' link is used in this case to connect the first repetition of adjective with second repetition of adjective. Finally second repetition of adjective makes an 'A' H-link with the head noun. No suffix is required for this H-link, since adjective-noun agreement is confirmed through 'A' link. An example of this H-link is given in Figure 4.
Figure 4: Example of 'AR' H-link
Figure 5: Example of 'AV' H-link
- AV: One more different type of adjectives is used in Hindi. These are waala-adjectives. 'AV' H-link is used to connect waala/waale/walii with the preceding word. This link is also followed by three suffixes to give gender, number and case information of the head noun. Further waala/waale/walii is linked to the head noun through 'A' link. Figure 5 gives an example of this link.
- AS: saa-adjectives are also unique to Hindi. saa/se/sii is added after the Hindi adjective to mean "like". 'AS' link is proposed to be used to connect the adjective with saa/se/sii. Choice of saa/se/sii depends on the head noun, and thus 'AS' link is followed by three suffixes, as is the case with other Hindi adjective links.
- AN, EA, G and GN: These H-links will have the same usage as in English Link Grammar.
Post-Modifier Links: In Hindi, post modifiers come either in the form of relative clause or as ki-clause. Due to lack of space, we discuss only ki-clause in this work.
- TH and C: 'TH' H-link connects the head noun with ki. Further, ki is linked with the subject of the subordinate clause through 'C' link. As an example of these links, consider the linkage of the sentence given in Figure 6.
Figure 6: Example of ki-clause
It may be noted that like in ELG, 'TH' and 'C' links can be used in various constructions in Hindi Link Grammar also. Since the focus of this work is on Noun Phrase links, we omit the discussion of other usages of these links from this work.
4. CONCLUDING REMARKS
In this work, we studied the Hindi Noun Phrase morphology to develop links that may provide syntactic relationship between the words in a Noun Phrase. We have followed an Example Based approach where links given in ELG have been considered and suitably modified to capture the characteristics of Hindi morphology. Due to lack of space, many other variations (e.g. relative clause) could not be discussed here. We are currently working on developing algorithms for parsing Hindi sentences using the proposed Hindi Link Grammar.
REFERENCES
Goyal S. and Chatterjee N.: 2005, Towards Developing a Link Grammar Based Parser for Hindi, a paper submitted to Workshop on Morphology, IIT Bombay. To appear in LANGUAGE IN INDIA http://www.languageinindia.com.
Sastri S. and Apte B.: 1968, Hindi Grammar, Dakshina Bharat Hindi Prachar Sabha, Madras, India.
Singh, S.: 2003, English-Hindi Translation Grammar, Prabhat Publication, New Delhi.
Sleator D. and Temperley D.: 1991, Parsing English with a Link Grammar, Computer Science technical report CMU-CS-91-196, Carnegie Mellon University.