Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

Similar presentations


Presentation on theme: "A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart"— Presentation transcript:

1 A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart mike@ims.uni-stuttgart.de EACL 2003, Budapest April 17 th, 2003

2 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 2 Dependency-Based Evaluation " every word either depends on another word (the head) or is independent " parsing seen as classification task (Lin:95) " measured in (labelled) precision and recall: assign to every word – a pair – or a marker TOP (for independent words) " unlabelled precision and recall: neglect grammatical role: – only assign and TOP

3 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 3 Dependency Structure (Details) " PPs: – headed by internal arguments (NP), not by Prep " coordination: – multi-headed constituent: every conjunct is a head – conjunction only linked to final conjunct " verb complex (auxiliary verbs + full verb): – abstraction over verb complexes – all attachments into verb complex are correct (Lin:95)

4 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 4 Test Environment " tokenized version of NEGRA tree bank " ca. 340,000 tokens in 19,547 sentences " investigated effect of POS tagging quality I : ideal tags from tree bank L: lexicon tags from tagger trained on tree bank T: tagger tags as determiner by tagger trained on independent corpus

5 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 5 Baseline: Tagging Approach " determine dependency tuples directly " used Tree Tagger (Schmid:94) on tag trigrams " three approaches to encode head – exact position of head: pos head – distance of head from dependent: pos head -pos dep – nth-tag method (Lin:95): e.g. <<<N (third noun left) " category of head, " direction in which to find head from token, " number of words with same category between token and head

6 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 6 Tagging Approach (contd.) " hybrid method: • choose between nth-tag and distance result on the basis of POS tag • build decision list greedily so as to optimize F-value in training set (using 10-fold cross-validation) " all results achieved by 10-fold cross-validation " if no head is found, token counts as not assigned (=> precision usually higher than recall)

7 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 7 Results for Tagging Approach

8 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 8 Overview of Finite-State Parser

9 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 9 Recognition Phase " consists of cascaded deterministic transducers (like Abney:97) " noun chunker also recognizes nested noun phrases (`full noun chunks') " inflectional information checked on-line " clause chunker recognizes complete clauses, not simplex clauses (Abney:97)

10 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 10 Example Output of Noun Chunker

11 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 11 Example Output of Clause Chunker

12 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 12 Rule Interpretation " inserts – syntactic structure (AdjP, coordinated VP or Prep) – grammatical roles (13 different roles) " recognition grammar generated from interpretation grammar by removing semicolon symbols, e.g. det ;SPR ( ;[ADJP ( adv ;ADJ )* adja ;HD ;]ADJP )* nn ;HD FINAL:NP " nondeterministic transducer (like Abney:97)

13 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 13 Example Output of Rule Interpreter

14 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 14 Subcat Frame Recognition " deterministic transducer to find lexically given subcategorization frames " fine-grained distinction of complements (61 additional roles), partially disambiguates between adjuncts and complements " if no corresponding frame is found, unspecified role (CMP, ACMP) remains – only correct in half-labelled precision and recall " several frames can be encoded at once

15 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 15 Example Output of Frame Recognizer

16 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 16 Conversion into Dependency Tuples " explicit representation of ambiguities (subcat roles and attachment) with context variables " measuring performance of parsers with underspecified output (Riezler et al.:02) lower bound: random disambiguation upper bound: ideal disambiguation " also heuristic disambiguation: choose – highest attachment and – most frequent subcat frame

17 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 17 Example Output: Dependency Tuples Udo/0kennt/1[1a]:NPnom,[1b]:NPakk kennt/1TOP eine/2Frau/5SPR sehr/3nette/4ADJ nette/4Frau/5ADJ Frau/5kennt/1[1a]:NPakk,[1b]:NPnom aus/6Rio/7MRK Rio/7kennt/1ADJ [1A0] Frau/5ADJ [1A1]./8TOP

18 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 18 Results for Finite-State Parser

19 IMS Stuttgart EACL 2003 April 17 th, 2003 © Michael Schiehlen 19 Conclusion " two approaches to partial parsing: tagger, finite- state parser " hybrid model of nth-tag tagging and finite-state achieves 87.3-89.0 % on I-tags (gain of 4.8% in lower and 1% in upper bound) " some constructions not yet handled in parser – attachment of extraposed relative clauses and noun- complement clauses – distribution of constituents in the middle field under VP coordination


Download ppt "A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart"

Similar presentations


Ads by Google