Presentation is loading. Please wait.

Presentation is loading. Please wait.

Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.

Similar presentations


Presentation on theme: "Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU."— Presentation transcript:

1 Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU

2 Finding Patterns How can we collect patterns? Supervised learning –mark information to be extracted in text –collect information and context = specific patterns –generalize patterns Annotation quite expensive Zipfian distribution of patterns means that annotation of consecutive text is inefficient … the same pattern is annotated many times

3 Unsupervised learning? The intuition: if we collect documents D R relevant to the scenario, patterns relevant to the scenario will occur more frequently in D R than in the language as a whole (cf. sublanguage predicates in Harris’s distributional analysis)

4 Riloff ‘96 Corpus manually divided into relevant and irrelevant documents Collect patterns around each noun phrase Score patterns by R log F where R = relevance rate = freq in relevant docs / overall freq Select top-ranked patterns These patterns each find one template slot; combining filled slots into templates is a separate task

5 Extending the Discovery Procedure Finding relevant documents automatically –Yangarber … use patterns to select documents –Sudo … use keywords and IR engine Defining larger patterns (covering several template slots) –Yangarber … clause structures –Nobata; Sudo … larger structures

6 Automated Extraction Pattern Discovery Goal: find examples / patterns relevant to a given scenario without any corpus tagging (Yangarber ‘00) Method: –identify a few seed patterns for scenario –retrieve documents containing patterns –find subject-verb-object pattern with high frequency in retrieved documents relatively high frequency in retrieved docs vs. other docs –add pattern to seed and repeat

7 #1: pick seed pattern Seed:

8 #2: retrieve relevant documents Seed: Fred retired.... Harry was named president. Maki retired.... Yuki was named president. Relevant documents Other documents

9 #3: pick new pattern Seed: appears in several relevant documents (top-ranked by Riloff metric) Fred retired.... Harry was named president. Maki retired.... Yuki was named president.

10 #4: add new pattern to pattern set Pattern set: Note: new patterns added with confidence < 1

11 Experiment Task: Management succession (as MUC-6) Source: Wall Street Journal Training corpus: ~ 6,000 articles Test corpus: –100 documents: MUC-6 formal training –+ 150 documents judged manually

12 Pre-processing For each document, find and classify names: –{ person | location | organization | …} Parse document –(regularize passive, relative clauses, etc.) For each clause, collect a candidate pattern: tuple: heads of –[ subject verb direct object object/subject complement locative and temporal modifiers … ]

13 Experiment: two seed patterns v-appoint = { appoint, elect, promote, name } v-resign = { resign, depart, quit, step-down } Run discovery procedure for 80 iterations

14 Evaluation Look at discovered patterns –new patterns, missed in manual training Document filtering Slot filling

15 Discovered patterns

16 Evaluation: Text Filtering How effective are discovered patterns at selecting relevant documents? IR-style documents matching at least one pattern

17

18 How effective are patterns within a complete IE system? MUC-style IE on MUC-6 corpora Caveat: filtered / aligned by hand manual–MUC54716247 70 56 manual–now697974 56 75 64 Evaluation: Slot filling 27 74 40 5272 60


Download ppt "Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU."

Similar presentations


Ads by Google