Presentation is loading. Please wait.

Presentation is loading. Please wait.

Acquiring Reliable Predicate- argument Structures from Raw Corpora for Case Frame Compilation Daisuke Kawahara 1 and Sadao Kurohashi 1,2 LREC2010, 2010/05/20.

Similar presentations


Presentation on theme: "Acquiring Reliable Predicate- argument Structures from Raw Corpora for Case Frame Compilation Daisuke Kawahara 1 and Sadao Kurohashi 1,2 LREC2010, 2010/05/20."— Presentation transcript:

1 Acquiring Reliable Predicate- argument Structures from Raw Corpora for Case Frame Compilation Daisuke Kawahara 1 and Sadao Kurohashi 1,2 LREC2010, 2010/05/20 1 National Institute of Information and Communications Technology 2 Kyoto University

2 2 Background NLP analyzers so far –(Mainly) supervised, (relatively) knowledge-poor e.g., PP-attachment or parsing Mary ate the salad with a fork Mary ate the salad with mushrooms –Only 1.5% of bilexical dependency was learned [Bikel, 04]  Toward knowledge-oriented NLP –Automatically compile case frames and integrate them into NLP analyzers/applications

3 3 Related work Subcategorization frames –[Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe and Carroll, 97] [Korhonen, 02] … e.g., She greeted me. NP(sbj) greet NP(obj) e.g., She gave him a book. NP(sbj) give NP(obj) NP(obj)

4 4 Related work Subcategorization frames –[Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe and Carroll, 97] [Korhonen, 02] … (Manually compiled) semantic frames –FrameNet [Baker et al., 98], PropBank [Palmer et al., 05] Japanese semantic case frames –Semantic marker-based: [Haruno, 95] [Utsuro et al., 96] –Example-based: [Kawahara and Kurohashi, 06]

5 5 CSexamples (in English) yaku (1) (bake) gaI:18, person:15, craftsman:10, … wobread:2484, meat:1521, cake:1283, … deoven:1630, frying pan:1311, … yaku (2) (have difficulty) gateacher:3, government:3, person:3, … wohand:2950 niattack:18, action:15, son:15, … yaku (3) (burn) gacompany:1, distributor:1, … wodata:178, file:107, copy:9, … niR:1583, CD:664, CDR:3, … … ga: nominative, wo: accusative, ni: dative, de: instrument Compilation of Japanese semantic case frames [Kawahara and Kurohashi, 06]

6 6 Case frames Compilation of English case frames Sentences Parsing and filtering Predicate-argument structures Clustering WordNet Dependency parser 89.9% → 91.5% (short sentences) [Kawahara and Uchimoto, 08] Sentence extraction

7 7 Examples of obtained case frames CSexamples burn (1)sbjthey:262, it:113, protester:99, … objflag:247, effigy:81, house:67, … pp:in :29, ramallah:14, brisbane:11, … pp:forweek:15, hour:6, month:5, … burn (2)sbjcandle:26, lamp:5 pp:onmotor-scooter:7, altar:3, platform:1, … pp:forday:2, steinhaeuser:1 … [Kawahara and Uchimoto, 08] surface cases and prepositions sbj, obj, obj2, sbar, pp:for, pp:in, …

8 8 Case frames Compilation of English case frames Sentences Parsing and filtering Predicate-argument structures Clustering WordNet Sentence extraction Dependency parser 89.9% → 91.5% (short sentences)

9 NP:[I] VP:[borrowed] NP:[the kits] PP:[with] NP:[a $ 25.00 deposit] O:, O:and … Procedure 1.Apply POS tagging and chunking to a raw corpus 2.Filter out unreliable and inappropriate sentences and chunks 3.Extract predicate-argument structures and apply PP-attachment disambiguation if a PP exists I borrowed the kits with a $25.00 deposit, and … Example: NP:[I] VP:[borrowed] NP:[the kits] PP:[with] NP:[a $ 25.00 deposit] sbj:[I] pred:[borrow] obj:[the kits] pp:with:[a $ 25.00 deposit]

10 POS tagging –Tsuruoka’s tagger [Tsuruoka and Tsujii, 05] accuracy: 97.1% Chunking –YamCha chunker [Kudo and Matsumoto, 01] precision: 93.89%, recall: 93.06%, F: 93.47 1. POS tagging and chunking 10

11 2. Filtering of unreliable sentences and chunks sentences to be discarded –a sentence that begins with a VP or a PP –a sentence that ends with a question mark –a sentence that has a comma being adjacent to a VP –a sentence that contains a sign (-, ;, …) –a sentence that does not have an NP before a VP –a sentence in which the first VP is a participle or an infinitive chunks to be discarded –chunks following the first comma outside an NP –chunks following wh-clauses –chunks following the second VP except participles and infinitives 11 Coverage: 17.9%

12 Evaluation of filtering results VP –precision: 96.46% (517/536) –12/19 are not harmful e.g., “ successfully contended ” → precision: 98.69% (529/536) NP –precision: 96.18% (1559/1621) –38/62 are not harmful e.g., “ about 10,000 diamond miners ” → precision: 98.52% (1597/1621) 12 His firm favors selected computer, drug and pollution-control stocks.

13 3. Extract predicate-argument structures from chunks Use straightforward rules –VP → pred –NP preceding the predicate → sbj –NP following the predicate → obj –NP following “obj” → obj2 –SBAR → sbar –a pair of adjoining PP and NP → pp 13

14 From 2G English sentences, we acquired 2.4G predicate-argument structures Manual evaluation of 200 predicate- argument structures: 97% is correct –incorrect objects of say, know and so on –incorrect detection of “sbar” –Errors of PP-attachment disambiguation sbj:[the super-user] pred:[raise] obj:[the hard limits] sbj:[it] pred:[strengthen] obj:[the action] sbj:[he] pred:[raise] obj:[a hand] sbj:[this web page] pred:[be linked] pp:to:[any other web sites] sbj:[a user] pred:[view] obj:[items] pp:from:[your catalog] sbj:[you] pred:[read] obj:[this] Experiments 14 He said the assets to be sold would be...

15 15 Conclusion and future work Acquired high-quality predicate-argument structures for case frame compilation –Real use of English predicates Future work –Apply clustering to compile case frames [Kawahara and Uchimoto, 08] –Integrate case frames to parsing (and other applications) cf.[Zeman, 02] for subcategorization frames [Kawahara and Kurohashi, 06] for case frames


Download ppt "Acquiring Reliable Predicate- argument Structures from Raw Corpora for Case Frame Compilation Daisuke Kawahara 1 and Sadao Kurohashi 1,2 LREC2010, 2010/05/20."

Similar presentations


Ads by Google