Presentation on theme: "Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004."— Presentation transcript:
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004
Abstract Determine if two instances with the same name from different documents refer to the same individual Use Maximum Entropy model to give the probability that two names refer to the same individual Apply modified agglomerative clustering technique to partition the instances
Related Work Mann and Yarowsky (2003) –Bag of words, biographic information –clustering task –output two clusters Bagga and Baldwin (1998) –within-document coreference resolution –Text surrounding each reference chain –vector space model
Data 2 million concept-instance pairs from ACL dataset –e.g. (president of the United States, William Jefferson Clinton) They randomly selected 2675 pairs. Each of them was matched with another concept- instance pair that had an identical instance name, but a different concept name. 1875 for training (84%) 400 for developing (85.5%) 400 for testing (83.5%)
Features (1) Name features Web features Overlap features Semantic features Estimated statistics features
Features (2) Name: –Name-common, freq of name in census data –Name-Fame, freq of name in ACL dataset –Web-Fame, num of hits in google Web: –Web Intersection, the # of hits from query (name + head1 + head2) –Web Difference, abs ((name + head1) – (name + head2)) –Web Ratio, (name + head1 + head2)/((name + head1) + (name + head2))
Features (3) Overlap features: –Sentence-Count, # of words common to context of both instances –Sentence-TF, as above but weighted by term frequency –Concept-Count –Concept-TF Semantic features: –They employ five metrics described in the literature that use WordNet to determine a semantic distance –JCN, HSO, LCH, Lin, Res
Features (4) Estimated statistics features: –Concept information is very useful. E.g. politician, lawyer –p(i1=i2 | i1->A, i2->B) –p(i1->A, i2->B | i1=i2) –p(i1->A | i2->B) + p(i2->B | i1->A) –p(i1->A, i2->B) / p(i1->A) + p(i2->B)
Maximum Entropy Model They model the probability of two instances having the same referent given a vector of features x according to a formulation
Clustering (1) Use agglomerative clustering algorithm to cluster each pairs with identical names Build a fully connected graph G with vertex set D Label each edge in G with a score While the edge with max score in G > threshold, merge the two nodes connected by the edge For each node in the graph, merge the two edges connecting it to the newly merged node, and assign the new edge a score equal to the avg. of the two old edge scores
Clustering (2) The final output of this algorithm is a new graph in which each node represents a single referent associated with a set of concept-instance pairs. It is free to choose a different number of referents for each instance name.
Experimental Design They randomly selected a set of 31 instance names from the ACL dataset, 11 of which referred to multiple individuals and 20 of which had only a single referent. They use a simple clustering accuracy as their performance metric. They compare their algorithm with two baseline systems.