[jws13] Evaluation of instance matching tools: The experience of OAEI

Slides:



Advertisements
Similar presentations
Large-Scale Entity-Based Online Social Network Profile Linkage.
Advertisements

MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Of 27 lecture 7: owl - introduction. of 27 ece 627, winter ‘132 OWL a glimpse OWL – Web Ontology Language describes classes, properties and relations.
Maurice Hermans.  Ontologies  Ontology Mapping  Research Question  String Similarities  Winkler Extension  Proposed Extension  Evaluation  Results.
Background information Formal verification methods based on theorem proving techniques and model­checking –to prove the absence of errors (in the formal.
Reducing the Cost of Validating Mapping Compositions by Exploiting Semantic Relationships Eduard C. Dragut Ramon Lawrence Eduard C. Dragut Ramon Lawrence.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 9 Functional Testing
Background Data validation, a critical issue for the E.S.S.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Chapter 6 System Engineering - Computer-based system - System engineering process - “Business process” engineering - Product engineering (Source: Pressman,
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
BACKGROUND KNOWLEDGE IN ONTOLOGY MATCHING Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich INFINT 2007 Bertinoro Workshop on Information.
A Taxonomy of Evaluation Approaches in Software Engineering A. Chatzigeorgiou, T. Chaikalis, G. Paschalidou, N. Vesyropoulos, C. K. Georgiadis, E. Stiakakis.
12th of October, 2006KEG seminar1 Combining Ontology Mapping Methods Using Bayesian Networks Ontology Alignment Evaluation Initiative 'Conference'
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
A view-based approach for semantic service descriptions Carsten Jacob, Heiko Pfeffer, Stephan Steglich, Li Yan, and Ma Qifeng
Model-Driven Analysis Frameworks for Embedded Systems George Edwards USC Center for Systems and Software Engineering
Dimitrios Skoutas Alkis Simitsis
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Advanced topics in software engineering (Semantic web)
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.
FDT Foil no 1 On Methodology from Domain to System Descriptions by Rolv Bræk NTNU Workshop on Philosophy and Applicablitiy of Formal Languages Geneve 15.
ISWC2007, Nov. 14. Discovering simple mappings between Relational database schemas and ontologies Wei Hu, Yuzhong Qu {whu,
Ontology Mapping in Pervasive Computing Environment C.Y. Kong, C.L. Wang, F.C.M. Lau The University of Hong Kong.
Theme 2: Data & Models One of the central processes of science is the interplay between models and data Data informs model generation and selection Models.
Shridhar Bhalerao CMSC 601 Finding Implicit Relations in the Semantic Web.
Formal Verification. Background Information Formal verification methods based on theorem proving techniques and model­checking –To prove the absence of.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Benchmarking Matching Applications on the Semantic Web.
Of 24 lecture 11: ontology – mediation, merging & aligning.
1 Modeling Formalism (Modeling Language Foundations) System Modeling Assessment & Roadmap Working Group Meeting – SE DSIG Reston – March, 2016 Yves BERNARD.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
COP Introduction to Database Structures
Logical Database Design and the Rational Model
The Semantic Web By: Maulik Parikh.
Classroom Assessment A Practical Guide for Educators by Craig A
Cross-Ontological Relationships
Input Space Partition Testing CS 4501 / 6501 Software Testing
Towards connecting geospatial information and statistical standards in statistical production: two cases from Statistics Finland Workshop on Integrating.
Quality Assessment in the framework of Map Generalization
Web Ontology Language for Service (OWL-S)
Ontology Evolution: A Methodological Overview
Model-Driven Analysis Frameworks for Embedded Systems
Result of Ontology Alignment with RiMOM at OAEI’06
ece 720 intelligent web: ontology and beyond
Logical architecture refinement
CSc4730/6730 Scientific Visualization
Test Case Purification for Improving Fault Localization
Extracting Semantic Concept Relations
KDD Reviews 周天烁 2018年5月9日.
An Interactive Approach to Collectively Resolving URI Coreference
Block Matching for Ontologies
Model Comparison: A Key Challenge for Transformation Testing and Version Control in Model Driven Software Development Yuehua Lin, Jing Zhang, Jeff Gray.
Actively Learning Ontology Matching via User Interaction
M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
WSExpress: A QoS-Aware Search Engine for Web Services
Presentation transcript:

[jws13] Evaluation of instance matching tools: The experience of OAEI A. Ferrara, A. Nikolov, J. Noessner , F. Scharffe

Outlines Introduction State of the art The real-data benchmark The automatically generated benchmark Current issues and open problem Concluding remarks

Introduction Abstract: Problem: The availability of large collections of data Techniques and tools capable of linking data together Retrieve potentially useful relations among data Associate together data representing the same or similar real objects Problem: A methodology and a set of benchmarks to understand the quality of results produced by matching process A framework where tools can be compared with others on the same data

Introduction Solution: Goal: Organize the Instance Matching track of the Ontology Alignments Evaluation Initiative (IM@OAEI) Goal: Discover where improvements and new solutions are possible and needed in the matching techniques and tools.

Introduction: The instance matching problem Informal definition: A special case of the relation discovery which takes two collections of data as input and produce a set of mappings between entities of them as output High-level formal definition: D = relational database, I = primary key value D = RDF graphs, I = URIs, use classification schema

Introduction: The instance matching problem Three main categories in the instance matching: Value matching: Basic building blocks of data linking tools, identify equivalence between property values of instances (i.e. String similarity metrics, Jaro-Winkler). Individual matching: Decide whether two individuals represent the same real-world objects, utilize the aggregation of similarities between property values. Dataset matching: Construct an optimal alignments between whole sets of individuals, rely on results of the individual matching and further refine them, utilize methods like similarity propagation, optimization algorithms, logical reasoning, etc.

Introduction: Requirements for the evaluation of instance matching & data linking approaches Main goal of common evaluation: Validate different proposed methods Identify most promising techniques and directions for improvements Guide further research in the area & developments of robust tools for real-world tasks

Introduction: Requirements for the evaluation of instance matching & data linking approaches Requirements of the evaluation procedure 1st requirement: Representative capabilities of the evaluation approach -- provide useful information about expected performance of techniques & tools, compare different methods and choose the best suited ones. (Benchmark & Criterion) 2nd requirement: Pragmatic -- Requirements are mutually contradictory to some extent, hardly possible to satisfy all to the full extent, aim at a reasonable compromise.

Introduction: Requirements for the evaluation of instance matching & data linking approaches Benchmarking Comprehensive: Include as many challenges in real-world tasks as possible (e.g. diversity of data formats, attributes & schemas) Illustrative: Reflect the distribution of different data features similar to the most likely parameters of real-world tasks (i.e. feature rarely occurs should not dominate data) Criterion Precision: proportion of correct mappings among the method results Recall: proportion of correct mappings identified by the tool among all actual mappings Other

Introduction: Instance matching at OAEI Process Preparation phase: Provide datasets to be matched & reference alignments. Execution phase: Use systems to automatically match the instance data from test cases. Evaluation phase: Standard evaluation measures are precision and recall computed against reference alignments, use weighted harmonic means (weights, size of true positives).

State of the art: Evaluation initiatives in the database community Evaluation test sets Real-world data sources: Include two or more publicly available datasets originate from different sources but describe the same domain, gold standard created or validated manually after an initial automatic generation Artificially generated datasets: Created by taking one reference dataset in advance and introduce artificial distortions in a controlled way(e.g. by removing/adding attributes and changing values randomly)

State of the art: Evaluation initiatives in the database community Earlier Stages of research Databases in the domain of scientific publications Citation matching Publicly availability (Cora, ACM-DBLP, etc.) Advantage of reusing Possibility to compare with the techniques developed in the database community Disadvantage Not fully representative to the challenges of the linked data Lack of version consistency

State of the art: Evaluation initiatives in the database community Solution: Create benchmarks representing realistic matching challenges & maintain “canonical” versions of benchmark datasets – primary motivations of OAEI Valid quantitative evaluation measures: Maximum F-Measure: Harmonic mean between pairwise precision and recall Pairwise accuracy for the optimal number of pairs Percentage of the correct equivalence classes Proportions of true matching pairs at different error rate levels Precision-recall curves over the whole range of possible threshold values Conclusion: Precision & Recall measure most informative, F-measure a single quantitative indicator & precision-recall curves a more fine-grained illustration

State of the art: Evaluation of ontology matching tools Development Originally, a single artificial benchmark Serve well in checking the capabilities of schema matching tools to handle presence or absence of features in ontologies Less suited for comparing the overall performance of tools Extension Include some realistic benchmarks involving real-world ontologies covering the same topics (particularly Conference & Anatomy) Conclusion To achieve effective evaluation of the tools, benchmark tests have to utilize both artificial & real-world datasets

State of the art: Evaluation of ontology matching tools Important differences between tasks not suited to be reused Larger dataset Large number of literal data values Identity & similarity Different role of names & property values Different kinds of data heterogeneities Structural differences between ontology & instances as graphs Relations between datasets & the real-world Mutual relations between ontology & instance matching

The real-data benchmark Background The need to develop a different set of benchmark specially for the instance matching task Establish the instance matching evaluation as a separate subtrack with the OAEI, has been performed 3 times (in 2009, 2010, & 2011)

The automatically generated benchmark (IIMB) Idea Automatically acquiring a potentially large set of data from an existing data source & to represent data in form of an OWL Abox, serialized in RDF Programmatically introduce several kinds of data transformations to produce a final set of Aboxes in a controlled way Match each of the transformed Aboxes against the initial one to find the correct mappings between them Advantage: have a control over the type and strength of each transformation

Creation of the benchmark Approach: SWING (Semantic Web Instance Generation)

Data acquisition techniques Operations Add super classes & super properties Convert attributes to class assertions Determine disjointness restrictions Enrich with inverse properties Specify domain & range restrictions

Data transformation techniques Test cases Deletion/addition of individuals Data value transformation Data structure transformation Data semantics transformation

Data transformation techniques

Data evaluation techniques Automatically create a ground-truth as a reference alignment for each test case Reference alignment: mappings between the reference Abox individuals and the corresponding transformed individuals in the test case

Current issues & open problems Future directions for IM@OAEI Larger datasets Identity & similarity Different role of names & property values Different kinds of data heterogeneities Structural differences between ontology & instance as graphs Relations between datasets & the real-world Mutual relations between ontology & instance matching

Current issues & open problems

Concluding remarks Approach: base on the idea of combining real-data and automatically & programmatically generated data for the evaluation Provide a realistic context for instance matching tools Provide a framework where we can reproduce different causes of data heterogeneity to analytically & programmatically verify the points of strength and weakness of each evaluated tool Future work Study of new measures for the evaluation Improve benchmarks to evaluate the behavior of the instance matching tools with respect to some crucial problems in the field

Thank You!