Presentation is loading. Please wait.

Presentation is loading. Please wait.

Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania.

Similar presentations


Presentation on theme: "Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania."— Presentation transcript:

1 Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania

2 ● The extraction of a Tree Adjoining Grammar (TAG) ● From the Penn Treebank (English WSJ corpus) ● Using Xia's extraction tool + Other stuff Context Focus ● Extraction problems: – Some case studies

3 Teaser ● Business of grammar extraction from corpora is intended to produce a grammar with “full” coverage of the constructions in a language ● But we know we don't know how to model many syntactic phenomena ● So, what are we doing? ● We have to start looking, pragmatically, to the quality of the extracted grammars we produce

4 Sources of extraction problems 1 Lack of proper linguistic account 2 Treebank annotation style 3 Extraction tool/process itself 4 Unsuitability of the language model 5 Unsuitability of the grammar formalism 6 Annotation errors 7... and, of course, Inability on the part of the grammar developers

5 Sources of extraction problems 1 Lack of proper linguistic account 2 Treebank annotation style 3 X Extraction tool/process itself 4 Unsuitability of the language model 5 Unsuitability of the grammar formalism 6 X Annotation errors

6 VPVP VP * Adv NP N S V VP Lexicalized Tree Adjoining Grammar (LTAG) 4

7 VPVP VP * Adv NP N S V VP S NPVP Adv V NP N N LTAG: combining trees 4

8 Automatic TAG extraction Figure is thanks to Fei Xia

9 Automatic TAG extraction Figure is thanks to Fei Xia

10 A few selected problem cases 1 (PTB) Extraction of Free Relatives 2 Wh percolation up 3 “Unlike Coordinated Phrases” (UCP) 4 Extraposition (Verb Subcategorization) 5 Parentheticals 6 VP topicalization 7 X (PTB) Projection of Parts-of-speech

11 Extraction of Free Relatives ( problem due to PTB annotation style) (S-3 (NP-SBJ (PRP We)) (VP (VBP make) (SBAR-NOM (WHNP-1 (WP what)) (S we know how to make)))) (S-3 (NP-SBJ (PRP We)) (VP (VBP make) (NP (NP (WP what)) (SBAR (WHNP-1 (-NONE- 0)) (S we know how to make))))) ● Problem: Free relatives are annotated as wh sentential complements. Verb is extracted with the wrong argument category: “S (SBAR)” ● Solution: Change the free relatives to NP (relative clause has empty wh: “head” account – Bresnan 78)

12 Wh percolation up (NP (NP (DT the) (NNS researchers)) (SBAR (WHNP-3 (WP who)) (S (NP-SBJ (-NONE- *T*-3)) (VP (VBD studied) (NP (DT the) (NNS workers)))))) WHNP WP NP NNS

13 Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN NPNP WP NP * +

14 Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN NPNP WP NP * + WHN P WP WHNP *

15 Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN WHN P WP NP * NPNP WP NP * + WHN P WP WHNP *

16 Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) NP NN WHN P WP NP * NPNP WP NP * + WHN P WP WHNP * (Vijay-Schanker et al.)

17 Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) WHNP WP + ?

18 Wh percolation up (NP-SBJ (NP The bid) (PP for Great Northern) (,,) (SBAR (WHNP-1 (NP (DT a) (NN notice)) (WHPP (IN of) (WHNP (WDT which)))) (S *T* appears in an advertisement))) WHNP WP

19 Wh percolation up (SBARQ (WHNP-46 (WP What) (NN sector)) (SQ (VBZ is) (NP-SBJ-2 (-NONE- *T*-46)) (VP (VBG stepping) (ADVP-DIR (RB forward))))) WHNP NN + WHN P WP WHNP * WHN P WHNP * IN WHNP WHPP

20 Unlike Coordinated Phrases (UCP) (NP (UCP (NN construction) (CC and) (JJ commercial)) (NNS loans)) (VP (VB be) (UCP-PRD (NP (CD 35)) (CC or) (ADJP (JJR older)))) (VP (VB take) (NP (NN effect)) (UCP-TMP (ADVP 96 days later) (,,) (CC or) (PP in early February)))

21 Unlike Coordinated Phrases (UCP) (NP (UCP (NN construction) (CC and) (JJ commercial)) (NNS loans)) (VP (VB be) (UCP-PRD (NP (CD 35)) (CC or) (ADJP (JJR older)))) (VP (VB take) (NP (NN effect)) (UCP-TMP (ADVP 96 days later) (,,) (CC or) (PP in early February))) NPNP NP * UCP JJ NNCC S NP VB UCP VP [be]

22 Unlike Coordinated Phrases (UCP) ● We give the UCP the status of an independent non- terminal as if it had some intrinsic categorial significance ● Multiple conjunts: it is enough for one of them to be of a distinct category to turn the entire constituent into a UCP

23 Unlike Coordinated Phrases (UCP): as the head of a constituent (S (NP-SBJ-1 The Series 1989 B bonds) (VP (VBP are) (VP (VBN rated) (S *-1 double-A)))) (S (NP-SBJ-1 The Series 1989 B bonds) (VP (VBP are) (UCP-PRD (ADJP-PRD (JJ uninsured)) (CC and) (VP (VBN rated) (S *-1 double-A)))))

24 Extraposition (“it” extraposition) (S (NP-SBJ-1 (NP (PRP it)) (S (-NONE- *EXP*-2))) (VP (MD would) (ADVP-TMP (RB no) (RBR longer)) (VP (VB be) (ADJP-PRD (JJ possible)) (S-2 (NP-SBJ (-NONE- *-1)) (VP (TO to) (VP (VB win) (NP (NN reinstatement)))))))) VPVP VP * S [win]

25 Extraposition (relative clause) (S (ADVP-TMP (RB Soon)) (,,) (NP-SBJ (NP (NNS T-shirts)) (SBAR (-NONE- *ICH*-1))) (VP (VBD appeared) (PP-LOC (IN in) (NP (DT the) (NNS corridors))) (SBAR-1 (WHNP-2 (WDT that)) (S (NP-SBJ (-NONE- *T*-2)) (VP (VBD carried) (NP (NP the school 's familiar logo) (PP-LOC on the front) )))))) VPVP VP * SBAR [carried]

26 Extraposition (Object) (S (NP-SBJ Mr. Peters) (VP (VBZ says) (PP-LOC in his affidavit) (SBAR (IN that) (S (NP the movie 's staff) (VP (VBD was) (VP (VBN told) (NP (-NONE- *-1)) (NP-TMP last week) (SBAR that Warner was...) S NP VBZ SBAR VP + [says] S NP VBD NP VP + [told] SBAR VPVP VP * PP [in] VPVP VP * NP [week]

27 Extraposition (Object) (S (NP-SBJ Mr. Peters) (VP (VBZ says) (PP-LOC in his affidavit) (SBAR (IN that) (S (NP the movie 's staff) (VP (VBD was) (VP (VBN told) (NP (-NONE- *-1)) (NP-TMP last week) (SBAR that Warner was...) S NP VBZ SBAR VP + [says] S NP VBD NP VP + [told] SBAR VPVP VP * PP [in] VPVP VP * NP [week] Note: Chiang 2000 (sister adjunction)

28 Extraposition (Object) (S (NP-SBJ Mr. Peters) (VP (VBZ says) (PP-LOC in his affidavit) (SBAR (IN that) (S (NP the movie 's staff) (VP (VBD was) (VP (VBN told) (NP (-NONE- *-1)) (NP-TMP last week) (SBAR that Warner was...) S NP VBZ VP + [says] S NP VBD NP VP + [told] VPVP VP * PP [in] VPVP VP * NP [week] VPVP VP * SBAR [S compl] + +

29 Extraposition (Object) S NP VP SBAR VP S NP VP SBAR VP NP VBD VBZ [says] [told] Note: Multi-component tags (Bleam & Xia, TAG+ 2000)

30 Parentheticals (non-lexicalized trees !!) (NP (NP the 3 billion New Zealand dollars) (PRN (-LRB- -LRB-) (NP US$ 1.76 billion *U*) (-RRB- -RRB-))) (S (NP-SBJ The total relationship) (PRN (,,) (SBAR-ADV as Mr. Lee sees it) (,,)) (VP (VBZ is)...)) VPVP PRN VP * SBAR NPNP NP * PRN NP

31 VP Topicalization S NP VP + VPVP V VP * S NP V VP + ver sus [be] [excluded] [be] VBN VP VBN Lexical HeadSyntactic Head (S (NP-SBJ-1 investments in...) (VP (MD will) (VP (VB be) (VP (VBN excluded)))))

32 VP Topicalization S NP VP + VPVP V VP * S NP V VP + ver sus [be] [excluded] [be] VBN VP VBN Lexical HeadSyntactic Head (SINV (ADVP (RB Also)) (VP-TPC-2 (VBN excluded)) (VP (MD will) (VP (VB be) (VP (-NONE- *T*-2)))) (NP-SBJ-1 investments in...))

33 Projections of Parts-of-speech (NP (DT a) (JJR stronger) (NN argument)) (NP (DT an) (ADJP (RB even) (JJR stronger)) (NN argument)) NPNP JJR NP * [stronger] NPNP ADJP NP * JJR [stronger]

34 Projections of Parts-of-speech (NP-SBJ-1 (NNP October) (NN weather)) (NP-SBJ-1 (NP (JJ late) (NNP October)) (NN weather)) NPNP NNP NP * [October] NPNP NP NP * NNP [October]

35 Forced Projections of Parts-of-speech PROJECTEDPROJECTION NN, NNP, PRP, EX NP JJ, JJR, JJSADJP RB, RBR, RBSADVP S, SINVSBAR SQ SBARQ WP WHNP WRBADVP CDQP QPNP UHINTJ LSLST

36 Conclusion ● Full coverage of language (currently) is utopic ● Grammar extraction can/should be used to search for solutions to grammar development problems ● We presented a few selected problems in grammar extraction and discussed solutions with various degrees of acceptability (using TAGs) ● There are more and harder ones where these came from ● Question: how would these problems be handled: – By other grammar formalisms ? – By other linguistic approaches using the TAG formalism ?

37 S NP V VP S NP V VP S NP VBN VP SBAR WHNP  S NP VBN VP SBAR WHNP  S NP V VP SBAR WHNP  NP NP * LTAG Verb Trees 5

38 Automatic TAG extraction

39 Figure is thanks to Fei Xia

40 Wh percolation up (NP-SBJ (NP The bid) (PP for Great Northern) (,,) (SBAR (WHNP-1 (NP (DT a) (NN notice)) (WHPP (IN of) (WHNP (WDT which)))) (S *T* appears in an advertisement))) (NP-PRD (NP (NNS hitches)) (,,) (SBAR (RB not) (WHNP-17 (NP (DT the) (JJS least)) (WHPP (IN of) (WHNP (WDT which)))) (S *T* was that... )

41 Extraposition (S (NP-SBJ (NP (PRP it)) (S (-NONE- *EXP*-1))) (VP (VBZ is) (ADJP-PRD (JJ unjust)) (S-1 (NP-SBJ (-NONE- *)) (VP (TO to) (VP (VB reprove) (NP (NNP China)) (PP-PRP (IN for) (NP (PRP it))))))))


Download ppt "Coping with Problems in Grammars Automatically Extracted from Treebanks Carlos A. Prolo Computer and Info. Science Dept. University of Pennsylvania."

Similar presentations


Ads by Google