Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Link Grammar for an Agglutinative Language

Similar presentations


Presentation on theme: "A Link Grammar for an Agglutinative Language"— Presentation transcript:

1 A Link Grammar for an Agglutinative Language
Ozlem Istek & Ilyas Cicekli Bilkent University, TURKEY

2 Outline Link Grammar Formalism
Some Distinctive Features of Turkish Syntax The System Architecture Of Turkish Parser and Our Adapted Link Grammar Formalism Method for Handling the Syntactic Roles of the Words with Derivations Evaluation Concluding Remarks RANLP-2007

3 Link Grammar Link grammar is a formal grammatical system developed by Sleator and Temperley The syntax of a language is defined by a grammar that includes the words and their linking requirements. The grammar is defined in a dictionary file and each of the linking requirements of words is expressed in terms of connectors A given sentence is accepted by the system if the linking requirements of all the words are satisfied (connectivity), none of the links between the words cross each other (planarity) and there is at most one link between any pair of words (exclusion) RANLP-2007

4 Link Grammar – Example The linkage requirements of three Turkish words: yedi : O- & S-; - ate kadın : S+ ; - the woman portakalı : O+; - the orange A linkage for a sentence containing these three words S | O | | | Kadın portakalı yedi (The woman ate the orange) The woman the orange ate RANLP-2007

5 Turkish Syntax The basic word order is SOV, but order of constituents may change according to the discourse context. Turkish is head-final -- modifiers precede modified item. an adjective (modifier) precedes the head noun (modified item) in a noun phrase. In the basic word order of the sentence, the subject and the object (modifiers) precede the verb (modified item). Although the head-final property can be violated at major constituent levels (SOV) of a sentence, it is preserved at sub-clause levels and smaller syntactic structures. kırmızı şapkalı kız (the girl with the red hat) red with hat girl RANLP-2007

6 Turkish Syntax (cont.) Turkish is agglutinative.
Words can take many derivational suffixes and each of these derivations can take its inflectional suffixes. Inflectional suffixes have important grammatical roles. A significant amount of interaction between syntax and morphotactics. uygarlaştı He got civilized. uygar-laş-tı uygar+Noun+A3sg+Pnon+Nom^DB+Verb+Become+Pos+Past+A3sg RANLP-2007

7 Motivation for New Formalism
In standart link grammar formalism, linking requirements are defined for words. When we consider all possible derivations and inflections for Turkish words, the number of possible words will be huge. The words in the same category behave similarly at the syntactical level. We preferred to use linking requirements based on the classes of words and their inflections (and derivations are treated as separate words) RANLP-2007

8 System Architecture of Turkish Parser
Input Sentence Morphological Analysis Stripping Lexical Parts Separating Derivation Boundaries Create Sentence List Linking Requirements for Turkish Word Classes and Derivations Parse Sentences with Link Grammar All possible linkages RANLP-2007

9 System Architecture (cont.)
Morphological Analysis: All the words in the input sentence are analyzed by the fully functional Turkish morphological analyzer. oku  oku+Verb+Pos+Past+A2sg (read) uygarlaşmak  uygar+Noun+A3sg+Pnon+Nom (to get civilized) ^DB+Verb+Become+Pos^DB+Noun+Inf1+A3sg+Pnon+Nom Stripping Lexical Parts: Lexical parts of the words are removed for all types of words except conjunctions. In fact, Turkish link grammar is designed for the classes of word types and their feature structures oku+Verb+Pos+Past+A2sg  Verb+Pos+Past+A2sg RANLP-2007

10 System Architecture (cont.)
Separating Derivation Boundaries: The words are separated at derivational boundaries and the part of speech tag of each derived form is marked in order to indicate its position in that word. Each token starts with a part of speech tag together with a position mark, and continues with inflectional feature structures. Noun+A3sg+P1pl+Loc ^DB+Adj+Rel ^DB+Noun+Zero+A3sg+Pnon+Gen NounRoot+A3sg+P1pl+Loc AdjDB NounDBEnd+A3sg+Pnon+Gen RANLP-2007

11 System Architecture (cont.)
Parsing Sentences: Each representation of the sentence is fed into the parser. A sentence is parsed with respect to the designed Turkish link grammar. Turkish link grammar contains linking requirements for: each part of speech tag, and each part of speech tag followed by one of the strings “Root”, “DB”, or “DBEnd”. A linking requirement for a token depend on the part of speech tag of the token, and the inflection suffixes in that token. RANLP-2007

12 Turkish Link Grammar Linking requirements are defined for a part of speech tag and inflectional suffixes. Noun+A3sg+Pnon+Nom : linking requirements for nouns with A3sg+Pnon+Nom inflections Noun+A3sg+Pnon+Acc : linking requirements for nouns with A3sg+Pnon+Acc inflections Verb+Pos+Past+A1sg : linking requirements for verbs with Pos+Past+A1sg inflections Verb+Pos+Past+A2sg : linking requirements for verbs with Pos+Past+A2sg inflections RANLP-2007

13 Linking Requirements for Derivations
In order to preserve the syntactic roles that the intermediate derived forms of a word play, they are treated as separate words in the grammar. In order to indicate that they are the intermediate derivations of the same word, all of them are linked with the special “DB” (derivational boundary) connector. Noun+A3sg+P1pl+Loc ^DB+Adj+Rel ^DB+Noun+Zero+A3sg+Pnon+Gen DB DB---+ | | | NounRoot+A3sg+P1pl+Loc AdjDB NounDBEnd+A3sg+Pnon+Gen RANLP-2007

14 Linking Requirements for Derivations (cont.)
A derived word consists of root word, intermediate derived forms and last derived form. Root Word only contributes left linking requirements of that word, and it is connected to the right with a DB connector. Intermediate Derived Forms also only contribute left linking requirements of that word, and it is connected to the left and right with a DB connector. Last Derived Form contributes both left and right linking requirements of that word, and it is connected to the left with a DB connector. RANLP-2007

15 Linking Requirements for Derivations (cont.)
For each part of speech tag, we will need three more linking requirements for three positions in derived words (root, intermediate and last) Example: Noun Inflections : LeftLinkingRs & RightLinkingRs NounRoot Inflections : LeftLinkingRs & DB- NounDB Inflections : LeftLinkingRs & DB- & DB+ NounDBEnd Inflections : LeftLinkingRs & RightLinkingRs & DB- RANLP-2007

16 Evaluation We tested the developed Turkish parser with a set of 250 sentences. Average number of words in the sentences is 5.19. Average number of parses per sentence is 7.49. For 84.31% of the sentences, their result sets contain the correct parse. Average ordering of the correct parse in the result set was 1.78. For 62.39% of the sentences, the first parse is the correct parse For 80.94% of the sentences, one of the first three parses is correct. RANLP-2007

17 Conclusions A Turkish grammar is developed in the link grammar formalism. The developed Turkish link grammar is not a lexical grammar. We used the morphological feature structures and the word classes. We preserved the syntactic roles of the intermediate derived forms of words in our system by separating the derived words from their derivational boundaries and treating each intermediate form as a distinct word. Our linking requirements are defined for morphological categories. Our current system does not use a POS tagger, and its addition will improve the performance in terms of both time and precision. RANLP-2007


Download ppt "A Link Grammar for an Agglutinative Language"

Similar presentations


Ads by Google