An Interactive Approach to Collectively Resolving URI Coreference

Slides:



Advertisements
Similar presentations
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Advertisements

CrowdER - Crowdsourcing Entity Resolution
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Clustering Categorical Data The Case of Quran Verses
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
Introduction to Bioinformatics
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Towards Semantic Web: An Attribute- Driven Algorithm to Identifying an Ontology Associated with a Given Web Page Dan Su Department of Computer Science.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Source-Selection-Free Transfer Learning
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Today Ensemble Methods. Recap of the course. Classifier Fusion
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Exploiting Group Recommendation Functions for Flexible Preferences.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
NTU & MSRA Ming-Feng Tsai
Probabilistic Equational Reasoning Arthur Kantor
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
Task assignment of interactive Entity resolution 龚赛赛
Semi-Supervised Clustering
Deep Feedforward Networks
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Simone Paolo Ponzetto University of Heidelberg Massimo Poesio
Constrained Clustering -Semi Supervised Clustering-
Saisai Gong, Wei Hu, Yuzhong Qu
A Continuous Optimization Approach to the Minimum Bisection Problem
Websoft Research Group
An Introduction to Support Vector Machines
K-means and Hierarchical Clustering
Lecture 9: Entity Resolution
Machine Learning Today: Reading: Maria Florina Balcan
Clustering Algorithms for Noun Phrase Coreference Resolution
NJVR: The NanJing Vocabulary Repository
REMOTE SENSING Multispectral Image Classification
Revision (Part II) Ke Chen
Information Organization: Clustering
Behrouz Minaei, William Punch
Adaptive entity resolution with human computation
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
Property consolidation for entity browsing
Rui Yang, Wei Hu and Yuzhong Qu
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
[jws13] Evaluation of instance matching tools: The experience of OAEI
Consensus Partition Liang Zheng 5.21.
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
Block Matching for Ontologies
Clustering Wei Wang.
Towards Exploratory Relationship Search: A Clustering-Based Approach
Danyun Xu, Gong Cheng*, Yuzhong Qu
Leverage Consensus Partition for Domain-Specific Entity Coreference
Actively Learning Ontology Matching via User Interaction
Evolutionary Ensembles with Negative Correlation Learning
Hierarchical Clustering
Generalized Diagnostics with the Non-Axiomatic Reasoning System (NARS)
Presentation transcript:

An Interactive Approach to Collectively Resolving URI Coreference Saisai Gong, Wei Hu, Gong Cheng, Yuzhong Qu

Contents Background Related Work Overview of our Approach Evolvement of Individual Partition Computing Consensus Partition Evaluation Conclusion

Background owl:sameAs URICoreference …… http://advogato.org/person/timbl/foaf.rdf#me http://www.w3.org/People/Berners-Lee/card#i URICoreference http://data.semanticweb.org/person/tim-berners-lee …… http://dbpedia.org/resource/Tim_Berners-Lee http://dblp.l3s.de/d2r/resource/authors/Tim_Berners-Lee

Related Work Fully automatic approaches OWL semantics Similarities between descriptions Self –training … Automatic approaches remain far from prefect (see Ferrara et al. 2013 )

Related Work (Cont.) Semi-automatic approaches Active learning Micro-task crowdsourcing … Assumptions made by semi-automatic approaches Users act as “oracle” One single right answer Not always hold Users may have different opinions Disagreement among users happen Distinguish a user's individual URI coreference from the mass Resolve disagreement among users

Our Approach iReC iReC: an interactive approach to resolve collectively URI coreference with user involvement Basic idea: achieve a good partition of the URI universe Maintain individual partition for each user Form consensus partition aggregated from individual ones Evolve partitions through user interaction Two goals Alleviate user involvement Reflect the collective power of masses

Overview of our Approach

Candidate Selector Generating Candidates Find potential coreference from various sources owl:sameAs links existing resolution services such as sameas.org, keyword-based entity search engines such as Falcons Object Search the user's individual partition the consensus partition Merge URIs belonging to the same equivalent class into a candidate entity

Learning Binary Classifier To reduce user involvement Learning model: averaged perceptron (See Collins 02) Online learning algorithm Learn individual classifier both online and offline, learn global one offline

Learning Binary Classifier Training data Online : latest URI pairs from user feedback Offline training examples Positive : URIs pairs from equivalent classes Negative URI pairs from user feedback URI pairs from different equivalent classes sharing types URI pairs Falcons search result

Learning Binary Classifier Training algorithm Feature : the cartesian product of the two candidates' properties Feature value: for each property pair, compute maximum similarity of the given two properties’ values URIs: vsim=1 iff identical or in equivalent class Numeric literals: vsim=1 iff difference less than threshold Boolean literals: vsim=1 iff value equal Other literals: Jaccard similarity

Learning Binary Classifier Training algorithm

Selecting Most Beneficial Candidate Combine individual classifier and global one by their weights (α_+ β = 1) Confidence of coreference based on margin The larger the absolute value of margin is, the higher the confidence is Uncertainty: the absolute value of margin Select candidate with minimum absolute value of margin

Comparative Snippets To facilitate user interaction Coreferent (non-coreferent resp.): values of discriminative property pairs signicantly similar (dissimilar resp.) Discriminability of property pairs: absolute values of weight in combined classifier

Comparative Snippets Compute maximum weighted matching on the bipartite graph from property pairs Get topk property value pairs based on maximum similarity of property values

Computing Consensus Partition Minimize disagreements between individual partitions In our approach, using symmetric difference distance Maximizing NP-complete

Computing Consensus Partition Approximation algorithm clustering-based Compute a partition on the union of individual partitions’ domains first initialize a similarity matrix Mtrx=( ij ) begin with each URI forming an equivalence class separately for each class pair (i, j) , where > 0, merge together classes i,j , and update Mtrx

Computing Consensus Partition

Evaluation Build link between NYT and Dbpedia of OAEI benchmark 10 fold cross validation

Evaluation F-Measure

Evaluation Examination Choose 50 popular URIs from falcons Invite 10 people to resolve URIcoreference on the 50 URIs using SView In average, 290.1 times verification, 32.0 accepted as positive 53.9 pair of URIs in individual partitions

Evaluation User study SUS Vs sigma 72 vs 68

Conclusion Averaged Perceptron is feasible User involvement is reduced

Reference A. Ferrara, A. Nikolov, J. Noessner, and F. Schare. Evaluation of instance matching tools: the experience of OAEI. Journal of Web Semantics, 21:49-60, 2013. M. Collins. Discriminative training methods for hidden markov models: theory and experiments with perceptron algorithms. In Proc. of EMNLP, pages 1-8, 2002.