Presentation is loading. Please wait.

Presentation is loading. Please wait.

Leverage Consensus Partition for Domain-Specific Entity Coreference

Similar presentations


Presentation on theme: "Leverage Consensus Partition for Domain-Specific Entity Coreference"— Presentation transcript:

1 Leverage Consensus Partition for Domain-Specific Entity Coreference
龚赛赛

2 Contents Introduction Overview of Approach
Improve quality of labeled data from user feedback Schema independent learning approach Conclusion

3 Introduction Human intelligence is valuable for entity coreference
Manually resolve coreference of given URIs Collect training examples However, quality of labeled data from user feedback is not always satisfying To improve quality, we use the approach of consensus partition.

4 Introduction Modeling entity coreference at a high level is considerable Two challenges Usually, property matching not available or enough for entity resolution In many cases, datasets used for statistics not available To deal with the challenges, we build a classifier for entity coreference based on improved labeled data Labeled data can be relatively small c.p. datasets Classifier based on weak features can be enhanced

5 Related work Improving labeled data Modeling entity resolution
Detect good and bad worker Modeling entity resolution Rule based Graph based Learning based …..

6 Overview of Approach

7 Overview of Approach Running Example
Mike browse u1and label <u1,u2>, <u1,u3> are coreferent. Again, he browse u4 and label <u4,u5> are coreferent. So, his partition is {u1u2u3|u4u5} Similarly, Tom browses u1 and label <u1,u3> coreferent, and browses u5 labeling coreferent pair <u5,u1>. Also he browse u2 and label <u2,u4> coreferent. His partition becomes {u1 u3 u5| u2 u4} Alice’s partition is {u1u3|u2u4} Finally, consensus partition is {u1 u3| u2 u4| u5}

8 Improve quality of labeled data from user feedback
Compute a consensus partition that minimize disagreement between input partitions Using symmetric difference Maximizing Hierarchical clustering (average link)

9 Schema independent learning approach
Learning model: random forest Bagging with decision tree Handle noise Handle imbalanced training data Enhance weak learner Feature: property pair value similarity (schema independent) URIs: vsim=1 iff identical or in equivalent class Numeric literals: vsim=1 iff difference less than threshold Boolean literals: vsim=1 iff value equal Other literals: Jaccard similarity

10 Conclusion Possible contributions
Propose a new method of improving labeled data from user feedback based on consensus partition Propose a novel approach for entity coreference which is schema independent with high accuracy Evaluate the effect of consensus partition to the quality of labeled data Evaluate the performance of our learning approach compared with other approaches Information gain based SVM based

11 Thank you for you attention !


Download ppt "Leverage Consensus Partition for Domain-Specific Entity Coreference"

Similar presentations


Ads by Google