Presentation is loading. Please wait.

Presentation is loading. Please wait.

(Entity and) Event Extraction CSCI-GA.2591

Similar presentations


Presentation on theme: "(Entity and) Event Extraction CSCI-GA.2591"— Presentation transcript:

1 (Entity and) Event Extraction CSCI-GA.2591
NYU (Entity and) Event Extraction CSCI-GA.2591 Ralph Grishman

2 Entity Tagger Comparisons to JetLite are difficult because stages are divided differently in many systems some systems combine with coreference some combine with NE tagging closest match is mention detection and classification F = 82.5 with either NN or MEMM [Nguyen 2017]

3 Motivations forJetLite pipeline
Part of the current pipeline: Tokenizer – adds Token annotations NEtagger – adds Enamex annotations, shadowing some Token annotations parse – adds dependents feature Coref – adds Entity annotations EntityTagger –adds semType features to Entity annotations Why Do we want NE tagger before parse? Do we want parse before coref? Do we want coref before entity tagging?

4 ACE Events ACE 2005 had 6 types of events and 33 subtypes
An ACE event mention reports an event and possibly its time, place, participants, and status: Two soldiers were wounded  injure event (also, the wounded soldiers) Jane Doe and John Smith were married on May 9  marry event ACE 2005 had 6 types of events and 33 subtypes most papers report on subtypes Information about the event may appear anywhere in a sentence: variable number of arguments at longer distance than relations (makes annotation harder)

5 Ace Event Mentions Have a complex structure:
a trigger word (also called an ‘anchor’) typically a verb or nominalization a set of arguments (with roles) participants in the event (ACE entity mentions) time of the event (TIMEX) place of the event (GPE or LOCATION) a set of features time (past present future) specific / generic

6 ACE Events ACE requires coreference resolution for entities, relations, and events We are doing it for entities It is straightforward for relations: check for same relation type arg1’s corefer arg2’s corefer It is much less straightforward for events different mentions of an event may include different subsets of arguments arguments which overlap but are not exact matches

7 Benchmark ACE 2005 corpus generally assuming perfect entity mentions on input Standard set of 40 test documents (out of 600) Dual annotation and adjudication

8 Evaluation Official ACE ‘value’ metric very complicated
Combined everything into one value For R&D report separate trigger and argument metrics

9 Using MaxEnt + kNN Early description of an ACE event extractor [Ahn 2006] Did classification in four separate stages trigger arguments features coreference Also tried memory-based learner trigger classifier applied to most words in corpus very unbalanced training data rich feature set, including info on entities in same sentence F=50 with MaxEnt, F=60 with kNN

10 Using MaxEnt + kNN Argument classifier
classify every pair of <event, entity> and <event, TIMEX> in same sentence best result with separate classifier for each event type MaxEnt F = 57 kNN F = 52 Independent classification of triggers and arguments is a simplification presence of arguments affects choice of event type (1) An American tank fired on the Palestine Hotel. attack (2) He has fired his air defense chief  endPosition presence of one argument may affect assignment of other roles usually only one attacker How do we capture this info?

11 Global Features more global features for events how to capture this?
other events in this document Having one attack event in a document increases the chance of other attacks Having an attack event in a document increases the chance of injure and die events Other documents Find similar documents (news stories) using BoW Events in retrieved documents increase likelihood of events in initial document how to capture this? let’s look at specific solns Ahn: rich features Ji, Liao: rules Li: structured perceptron Nguyen: complex NN

12 Ahn’s response to the problem was to include as trigger features information about potential arguments (entities)

13 Rule-Based Ji and Grishman [ACL 08] and Liao and Grishman used rules which distinguished low-confidence and high-confidence extractions

14 Structured Prediction
A more principled solution will treat this as a structured prediction problem until now we have decomposed the language analysis task into independent subtasks each with its own loss function each making a 1-of-N prediction now we would like to predict a larger structure an event with trigger and arguments a sentence with multiple events Why is this a problem?

15 Decoding When we run a MaxEnt classifier, it computes the probability of each outcome y’and returns the most probable outcome For a structured prediction task, there are too many outcomes |Y|

16 Approximate Decoding We must approximate the decoding
iterate over likely outcomes for Event Extraction, loop over tokens in sentence, generate all possible event labels for current token keep K best [beam search] for each event, loop over entities in sentence assign entity to event with all possible role found beam size 4 was sufficient

17 Global Features supports arbitrary features within sentence
but does not support wider scope such as event elsewhere in document

18 Perceptron Basic linear model: Building block of neural networks
Trained using perceptron algorithm for each training example <xj, dj> compute output yj (t) = f(w(t), x) update weights wi(t+1) = wi (t) + (dj – yj(t))xj,i

19 Structured Perceptron
We face a decoding problem in training a structured perceptron: again use argmax zi = argmax(yεY) F(x,z) * α If (zi != yi) α = α + F(xi, yi) – F(xi,zi) Used by Collins & Roark to enhance parser output

20 Neural Net for EE Basic approach [Nguyen et al NAACL 2116]
recurrent NN Loop with memory (LSTM or GRU) One iteration for each token in sentence Bidirectional Two RNNs: one operating L to R, the other R to L Tried three memories Gitrg = trigger subtypes recognized in words 1,…,i Giarg = arg roles recognized in words 1,…,i Giarg/trg = entities recognized as argument of an event of a given subtype in words 1,…,I Only Giarg/trg improved performance

21 Results trigger F argument F Li’s baseline 65.9 43.9
structured perceptron 67.5 52.7 JRNN 69.3 55.4


Download ppt "(Entity and) Event Extraction CSCI-GA.2591"

Similar presentations


Ads by Google