Download presentation

Presentation is loading. Please wait.

Published byQuinn Groce Modified over 2 years ago

1
Learning Clusterwise Similarity with First-Order Features Aron Culotta and Andrew McCallum University of Massachusetts - Amherst NIPS Workshop on Theoretical Foundations of Clustering December 10, 2005

2
Supervised Clustering Estimate pairwise similarity metric

3
Supervised Clustering

4
Conditional Models of Identity Uncertainty with Application to Noun Coreference y 12 y 23 y 13 g 12 g 13 g 23 [McCallum, Wellner 04] g 123 transitivity checking function Learned Pairwise Metric x2x2 x1x1 x3x3 He Jon Jonathan 1 1 1

5
Inference = Graph Partitioning [McCallum, Wellner 04] [Boykov et al 99] [Bansal et al 02] x2x2 x1x1 x3x3 He Jon Jonathan 23 -12 -2

6
Inside the Pairwise Metric String x i has low edit distance to x j x i is a pronoun in the same sentence as x j x i is the same number and gender as x j

7
Drawbacks of Pairwise Metric Cannot represent cluster-wide constraints E.g. –A cluster of pronouns should have at least one non- pronoun. –A researcher is unlikely to publish in more than 5 different conferences in the same year –A person is unlikely to have more than 3 different job titles in the same year [Milch et al 04]

8
Clusterwise Metric Measures compatibility of all nodes in a cluster Enables first-order features –mean, median, mode of attributes –maximum string edit distance is K –cluster size is greater than N

9
Probabilistic Interpretation of Pairwise Metric Learning x2x2 x1x1 x3x3 y 12 y 23 y 13 g 12 g 13 g 23

10
Probabilistic Interpretation of Clusterwise Metric Learning x2x2 x1x1 x3x3 y 12 y 23 y 13 g 12 g 13 g 23 y 123 g 123

11
Empirical Results Citation matching –paper deduplication –author deduplication/disambiguation Proper noun coreference Modest but consistent improvements over pairwise metric (10-30% error reduction)

12
Implications of Clusterwise Metric x2x2 x1x1 x3x3 12 54 33 Locally compatible -122 Globally incompatible

13
Open Questions What is the geometric interpretation for clusterwise metric? What are implications of clusterwise metrics on common clustering methods? What is kernel interpretation for clusterwise metric?

14
References N. Bansal et al. Correlation Clustering. FOCS 02 Yuri Boykov et al. Fast Approximate Energy Minimization via Graph Cuts. ICCV 1999. A. Culotta and A. McCallum. Practical Markov logic containing first- order quantifiers with application to identity uncertainty. Technical Report IR-430, University of Massachusetts, September 2005. A. McCallum and B. Wellner. Conditional models of identity uncertainty with applications to proper noun coreference. NIPS 2004 B. Milch et. al. BLOG: Relational modeling with unknown objects. Statistical Relational Learning Workshop. ICML 2004.

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google