Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.

Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU

Finding Patterns How can we collect patterns? Supervised learning –mark information to be extracted in text –collect information and context = specific patterns –generalize patterns Annotation quite expensive Zipfian distribution of patterns means that annotation of consecutive text is inefficient … the same pattern is annotated many times

Unsupervised learning? The intuition: if we collect documents D R relevant to the scenario, patterns relevant to the scenario will occur more frequently in D R than in the language as a whole (cf. sublanguage predicates in Harris’s distributional analysis)

Riloff ‘96 Corpus manually divided into relevant and irrelevant documents Collect patterns around each noun phrase Score patterns by R log F where R = relevance rate = freq in relevant docs / overall freq Select top-ranked patterns These patterns each find one template slot; combining filled slots into templates is a separate task

Extending the Discovery Procedure Finding relevant documents automatically –Yangarber … use patterns to select documents –Sudo … use keywords and IR engine Defining larger patterns (covering several template slots) –Yangarber … clause structures –Nobata; Sudo … larger structures

Automated Extraction Pattern Discovery Goal: find examples / patterns relevant to a given scenario without any corpus tagging (Yangarber ‘00) Method: –identify a few seed patterns for scenario –retrieve documents containing patterns –find subject-verb-object pattern with high frequency in retrieved documents relatively high frequency in retrieved docs vs. other docs –add pattern to seed and repeat

#1: pick seed pattern Seed:

#2: retrieve relevant documents Seed: Fred retired.... Harry was named president. Maki retired.... Yuki was named president. Relevant documents Other documents

#3: pick new pattern Seed: appears in several relevant documents (top-ranked by Riloff metric) Fred retired.... Harry was named president. Maki retired.... Yuki was named president.

#4: add new pattern to pattern set Pattern set: Note: new patterns added with confidence < 1

Experiment Task: Management succession (as MUC-6) Source: Wall Street Journal Training corpus: ~ 6,000 articles Test corpus: –100 documents: MUC-6 formal training –+ 150 documents judged manually

Pre-processing For each document, find and classify names: –{ person | location | organization | …} Parse document –(regularize passive, relative clauses, etc.) For each clause, collect a candidate pattern: tuple: heads of –[ subject verb direct object object/subject complement locative and temporal modifiers … ]

Experiment: two seed patterns v-appoint = { appoint, elect, promote, name } v-resign = { resign, depart, quit, step-down } Run discovery procedure for 80 iterations

Evaluation Look at discovered patterns –new patterns, missed in manual training Document filtering Slot filling

Discovered patterns

Evaluation: Text Filtering How effective are discovered patterns at selecting relevant documents? IR-style documents matching at least one pattern

How effective are patterns within a complete IE system? MUC-style IE on MUC-6 corpora Caveat: filtered / aligned by hand manual–MUC54716247 70 56 manual–now697974 56 75 64 Evaluation: Slot filling 27 74 40 5272 60

Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.

Similar presentations

Presentation on theme: "Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.

Similar presentations

Presentation on theme: "Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU."— Presentation transcript:

Similar presentations

About project

Feedback