Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università

Similar presentations


Presentation on theme: "The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università"— Presentation transcript:

1 The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università di Torino, Italy

2 Goals §Wide-coverage tool §Domain-independence Approach §Manually developed rules §Two phases: Chunking and subcategorization §Extensibility to semantics §Procedural analysis of conjunctions and of identification of verbal dependents

3 TULE (Turin University Linguistic Environment) TOKENIZER Tokens Text Token Automaton Splits the text into words, numbers, punctuation marks DICTIONARY LOOKUP Sets of lexical items Morphological dictionary Suffix tables Extracts all lexical interpretations of each token POS TAGGER Tagging rules Chooses one lexical interpretation Lexical items DEPENDENCY PARSER Parse Tree Parsing rules Verbal Caseframes Establishes the connections between lexical items

4 The grammar §Rule-based dependency grammar §Chunking (non-verbal groups) + verbal subcategorization frames §Output: a projective tree represented as pointers to parents, including some null elements (understood items – e.g. pro-drop - and traces)

5 CHUNKING Chunked text Lexical Items Chunking rules Splits the text into groups of strictly connected words ANALYSIS OF CONJUNCTIONS Chunked text Procedural preference rules 1 Connects chunks linked by conjunctions, to form larger chunks SEGMENTATION Determines the dependents of verbs Lexical items VERBAL ATTACHMENT Parse Tree Verb classes Verbal Caseframes Determines the role (arc labels) of the verbal dependents Parser Architecture Procedural preference rules 2

6 An example Example: Slitta a Tirana la decisione sullo stato di emergenza. (The decision on the emergency status in Tirana has been delayed) 1 Slitta (SLITTARE VERB MAIN IND PRES INTRANS 3 SING) 2 a (A PREP MONO) 3 Tirana (TIRANA NOUN PROPER F SING ££CITY) 4 la (IL ART DEF F SING) 5 decisione (DECISIONE NOUN COMMON F SING DECIDERE INTRANS) 6 sullo ((SU PREP MONO) 6.10 (IL ART DEF M SING)) 7 stato (STATO NOUN COMMON M SING) 8 di (DI PREP MONO) 9 emergenza (EMERGENZA NOUN COMMON F SING) 10. (#\. PUNCT) Lexical Items [0;TOP-VERB] [1;PREP-RMOD] [2;PREP-ARG] [1;VERB-SUBJ] [4;DET+DEF-ARG] [5;PREP-RMOD] [6;PREP-ARG] [6.10;DET+DEF-ARG] [7;PREP-RMOD] [8;PREP-ARG] [1;END] Parse Tree Infos 1: Slitta Prep-rmod 2: a Verb-subj 4: la 3: Tirana Prep-arg 5; decisione Det+def-arg 6: su Prep-rmod Prep-arg 6.10: lo Stato di emergenza

7 Puoi V-modal-2nd-sing-pres dir V-inf [mi Pron-1st-dative ] Pron [che Adj-interr spettacoli Noun [di Prep cabaret Noun ] P-group ] N-group posso V-modal-1st-sing-pres vedere V-inf [domani Adv ] A-group ? Chunking Example: Puoi dirmi che spettacoli di cabaret posso vedere domani? (Can you tell me what cabaret plays I can see tomorrow?) Chunking Rules §Chunking rules are grouped in packets. §Each packet is associated with a lexical category, and describes the chunkable possible dependents of words of that category. §Chunkable means a dependent handled during chunking (e.g. auxiliaries, but not arguments of verbs)

8 (NOUN common (precedes (ADJ qualif T (#\- #\' #\")) (ADJ ((type qualif) (agree))) ADJC+QUALIF-RMOD)) A chunk rule Packet (governing word) feature (constrains applicability) Label of connecting arc Category of possible dep (and constraints on it) Position of dep (and possible words separating head from dep)

9 Conjunctions §When a coordinating conjunction is found, all following and preceding chunks are collected §All pairs are built, and the best one is chosen according to criteria based on structural similarity and distance §Special treatment for verbs Ho V-aux incontrato V-main [Marco Noun-Proper ] Noun e Conj-coord [Lucia Noun-Proper ] Noun e Conj-coord [li Pron-pers ] Pron ho V-aux salutati V-main Example: Ho incontrato Marco e Lucia e li ho salutati (I met Marco e Lucia and I greeted them)

10 Segmentation Puoi V-modal-2nd-sing-pres { dir V-inf [mi Pron-1st-dative ] Pron {[che Adj-interr spettacoli Noun [di Prep cabaret Noun ] P-group ] N-group posso V-modal-1st-sing-pres {vedere V-inf [domani Adv ] A-group ? } } } } §For each verb (going from left to right): l Look for possible dependents (on its right and left) l On the left, the search is blocked from the previous verb l On the right, some barriers are defined to stop the search (for instance, a subordinating conjunction acts as a barrier)

11 Verbal Subcategorization verbs nosubj- verbs subj- verbs obj- verbs basic-transempty-modal modal ssubj-inf- verbs trans indobj- verbs trans-indobj subcategorization classes bisognare camminare dovere dictionary potere need walk must can The subcategorization classes:

12 (subj-verbs (intrans) (verbs) ; *** verbs with a subject. Definition of subject ( verb-subj ((noun (agree)) (art (agree)) (pron (not (word quale) (type relat)) (case lsubj) (agree)) (adj (type (indef demons deitt interr poss)) (agree)) (num (agree)) (prep (word in) (down (cat pron) (type indef)) (agree))))) (ssubj-inf-verbs () (verbs) ; *** verbs with an inf-verb sentential subject ( verb-subj ((verb (mood infinite) (agree))))) (empty-modal () (no-subj-verbs) ; *** modals without subject ( verb-indcompl-modal ((verb (mood infinite))))) Example subcategorization class definitions:

13 Transformations: basic class (e.g. trans)transformed classes (e.g. trans, trans+passivization, trans+infinitivization, trans+prodrop, trans+passivization+infinitivization, ….. ) Example transformation: (infinitivization replacing (subj-verbs) (is-inf-form tr-verb v-casefr) (cancel-case s-subj))

14 Some statistics Chunking rules Total: 295 rules Common: 250 rules English: 34 rules Italian: 7 rules Spanish + Catalan: 4 rules Base Subcategorization Total: 118 classes Abstract: 21 classes plus verbal locutions Italian: 40 classes English: 1 class Derived surface case frames 2653 case frames

15 Conclusions §Test of the parser on other languages, using the same grammar augmented with extra rules (see previous slide) §Partial use of semantic information (about 400 words classified according to a semantic taxonomy) §The parser has been used in a project involving spoken and written linguistic interaction with a user. It has been interfaced with an repository of semantic knowledge to build a meaning representation.


Download ppt "The Rule-based Parser of the NLP Group of the University of Torino Leonardo Lesmo Dipartimento di Informatica and Centro di Scienze Cognitive, Università"

Similar presentations


Ads by Google