NYU Coreference CSCI-GA.2591 Ralph Grishman.

NYU Coreference CSCI-GA.2591 Ralph Grishman

basically a clustering task: clustering mentions into entities

Types of Referring Expressions
names nominals pronouns

Types of ‘Coreference’
Identity Predication ACE considers this part of coreference Bridging Anaphora Not included in ACE

Strategies mention-pair model entity-mention model entity-entity model
for every pair of mentions, model determines coreferentiality then (to enforce transitivity) cluster mentions guided by these decisions entity-mention model single pass through document, building entities model chooses which entity to add mention to (if any) entity-entity model agglomerative clustering

Ordering clustering can be done
in one pass in several passes (sieve) in dynamically-determined order Ragunathan et al. report gains of 1-4% in F1 score from multi-pass over single pass But multi-pass makes incremental processing more difficult

Hand-coded rules The bulk of the cases follow well-understood patterns: predicate complement apposition role modifier relative pronouns … So many systems use hand-coded rules or hybrid systems combining corpus-trained and hand-coded rules [Jet]

Hand-coded rules Sieve: exact extent match
appositive | predicate nominative | role appositive | relative pronoun | acronym | demonym cluster head match with word inclusion and compatible modifiers cluster head match with word inclusion cluster head match with compatible modifiers relaxed cluster head match with word inclusion pronoun match [Ragunathan et al. 2010]

Hand-coded rules Sieve: P R F exact extent match 96 32 48
appositive | predicate nominative | role appositive | relative pronoun | acronym | demonym cluster head match with word inclusion and compatible modifiers cluster head match with word inclusion cluster head match with compatible modifiers relaxed cluster head match with word inclusion pronoun match nominal coref helps little

Anaphoricity Should we have a separate model for anaphoricity?

Role of deep learning Systems do worst in resolving nominal anaphors
systems typically extract features of the anaphor and candidate antecedent and then use a log-linear model to capture compatibility for example, using WordNet lexical relations Deep learning systems try to do this more directly: building a large distributed representation of the mentions and the entities (based on the word embeddings of the words in and words in the immediate context of the anaphor and the candidate antecedent) and then learning a ranking among entity pairs [Clark and Manning 2016]

Benchmarks Most common evaluation is SemEval 2011
Based on OntoNotes corpus Did not mark singletons Included event references

Evaluation Metric There is no consensus on an evaluation metric for coreference SemEval used an average of 3 scores MUC score B-cubed CEAF (not to mention the official ACE scorer)

MUC Scoring The first coreference scorer was developed for MUC-6; it is link-based The key S and the response R each define a set of equivalence classes Si and Ri To assess the recall of the response with respect to class Si, we ask how many links would have to be added to R to link all the mentions in S = p(S)-1 Recalli = |Si – p(Si)| / |Si| - 1 Recall = sum of Recalli Precision is computed by swapping S and R

MUC Scoring Example: Truth A --- B --- C Response

MUC Scoring One shortcoming of MUC scoring is that you don’t get credit for correct singletons Also, the metric rates as equal some responses which are worse than others

MUC Scoring Example: Truth A --- B --- C Response With MUC scorer, this gets the same score

B-CUBED Scoring B-CUBED is a mention-based scorer designed to avoid the problem of the MUC scorer [Bagga and Baldwin 1998] Precisioni = # of correct mentions in response equiv. class containing mentioni / # of mentions in response equiv class containing mentioni Recalli = # of correct mentions in response equiv class containing mentioni / # of mentions in key equiv class containing mentioni

CEAF Constrained Entity Alignment F-measure
Based on a similarity metric between clusters (entities) Computes an optimal alignment between key and response using this metric Leftover clusters are not scored

NYU Coreference CSCI-GA.2591 Ralph Grishman.

Similar presentations

Presentation on theme: "NYU Coreference CSCI-GA.2591 Ralph Grishman."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NYU Coreference CSCI-GA.2591 Ralph Grishman.

Similar presentations

Presentation on theme: "NYU Coreference CSCI-GA.2591 Ralph Grishman."— Presentation transcript:

Similar presentations

About project

Feedback