Discovering Missing Background Knowledge in Ontology Matching Pavel Shvaiko 17th European Conference on Artificial Intelligence (ECAI06) 30 August 2006,

Slides:



Advertisements
Similar presentations
Program Verification using Probabilistic Techniques Sumit Gulwani Microsoft Research Invited Talk: VSTTE Workshop August 2006 Joint work with George Necula.
Advertisements

Requirements Engineering Processes – 2
Mathematical Preliminaries
Chapter 13: Query Processing
Advanced Piloting Cruise Plot.
Managing diversity in Knowledge Fausto Giunchiglia ECAI 2006, Riva del Garda, Trento To be cited as: Fausto Giunchiglia, Managing Diversity in Knowledge,
Chapter 1 The Study of Body Function Image PowerPoint
Universität Innsbruck Leopold Franzens Copyright 2006 DERI Innsbruck LarCK Workshop, ISWC/ASWC Busan, Korea 16-Feb-14 Towards Scalable.
1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders.
Summary of Convergence Tests for Series and Solved Problems
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Chapter 3 Critically reviewing the literature
Projects in Computing and Information Systems A Student’s Guide
Programming Language Concepts
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Robust Window-based Multi-node Technology- Independent Logic Minimization Jeff L.Cobb Kanupriya Gulati Sunil P. Khatri Texas Instruments, Inc. Dept. of.
Week 2 The Object-Oriented Approach to Requirements
Chapter 4: Informed Heuristic Search
Fact-finding Techniques Transparencies
1 Verification of Parameterized Systems Reducing Model Checking of the Few to the One. E. Allen Emerson, Richard J. Trefler and Thomas Wahl Junaid Surve.
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
3 Logic The Study of What’s True or False or Somewhere in Between.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
Quadratic Inequalities
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
© 2012 National Heart Foundation of Australia. Slide 2.
Lecture plan Outline of DB design process Entity-relationship model
Science as a Process Chapter 1 Section 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
1 IPSI 2003 © 2003 T. Abou-Assaleh, N. Cercone, & V. Keselj An Overview of the Theory of Relaxed Unification Tony Abou-Assaleh Nick Cercone & Vlado Keselj.
25 seconds left…...
Exponential and Logarithmic Functions
Januar MDMDFSSMDMDFSSS
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
PSSA Preparation.
Chapter 11 Describing Process Specifications and Structured Decisions
From Model-based to Model-driven Design of User Interfaces.
1 Programming Languages (CS 550) Mini Language Interpreter Jeremy R. Johnson.
Minimum Vertex Cover in Rectangle Graphs
© Imperial College LondonPage 1 Model checking and refinement checking for modal transition systems and their cousins MTS meeting 2007 Adam Antonik & Michael.
S-Match: an Algorithm and an Implementation of Semantic Matching Fausto Giunchiglia July 2004, Hannover, Germany work in collaboration with Pavel Shvaiko.
S-Match: an Algorithm and an Implementation of Semantic Matching Pavel Shvaiko 1 st European Semantic Web Symposium, 11 May 2004, Crete, Greece paper with.
Semantic Matching Pavel Shvaiko Stanford University, October 31, 2003 Paper with Fausto Giunchiglia Research group (alphabetically ordered): Fausto Giunchiglia,
Web Explanations for Semantic Heterogeneity Discovery Pavel Shvaiko 2 nd European Semantic Web Conference (ESWC), 1 June 2005, Crete, Greece work in collaboration.
LDK R Logics for Data and Knowledge Representation Semantic Matching.
Reasoning with context in the Semantic Web … or contextualizing ontologies Fausto Giunchiglia July 23, 2004.
SCHEMA-BASED SEMANTIC MATCHING Pavel Shvaiko joint work on “semantic matching” with Fausto Giunchiglia and Mikalai Yatskevich joint work on “ontology matching”
Logics for Data and Knowledge Representation Semantic Matching.
BACKGROUND KNOWLEDGE IN ONTOLOGY MATCHING Pavel Shvaiko joint work with Fausto Giunchiglia and Mikalai Yatskevich INFINT 2007 Bertinoro Workshop on Information.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Semantic Matching Fausto Giunchiglia work in collaboration with Pavel Shvaiko The Italian-Israeli Forum on Computer Science, Haifa, June 17-18, 2003.
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan.
Element Level Semantic Matching Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan Paper by Fausto.
Element Level Semantic Matching
Logics for Data and Knowledge Representation
Presentation transcript:

Discovering Missing Background Knowledge in Ontology Matching Pavel Shvaiko 17th European Conference on Artificial Intelligence (ECAI06) 30 August 2006, Riva del Garda, Italy joint work with Fausto Giunchiglia and Mikalai Yatskevich

ECAI, 30 August 2006, Riva del Garda, Italy 2 Outline Introduction Semantic Matching Lack of Knowledge Iterative Semantic Matching Evaluation Conclusions and Future Work

ECAI, 30 August 2006, Riva del Garda, Italy 3 Introduction Information sources (e.g., ontologies) can be viewed as graph-like structures containing terms and their inter-relationships Matching takes two graph-like structures and produces a mapping between the nodes of the graphs that correspond semantically to each other

ECAI, 30 August 2006, Riva del Garda, Italy 4 Semantic Matching

ECAI, 30 August 2006, Riva del Garda, Italy 5 Semantic matching Semantic Matching: Given two graphs G1 and G2, for any node n 1 i G1, find the strongest semantic relation R holding with node n 2 j G2 Computed Rs, listed in the decreasing binding strength order: equivalence { = } more general/specific {, } disjointness { } I dont know {idk} We compute semantic relations by analyzing the meaning (concepts, not labels) which is codified in the elements and the structures of ontologies Technically, labels at nodes written in natural language are translated into propositional logical formulas which explicitly codify the labels intended meaning. This allows us to codify the matching problem into a propositional validity problem

ECAI, 30 August 2006, Riva del Garda, Italy 6 Concept of a label & concept of a node Hobbies and Interests Music Top Books Entertainment Concept of a label is the propositional formula which stands for the set of documents that one would classify under a label it encodes Concept at a node is the propositional formula which represents the set of documents which one would classify under a node, given that it has a certain label and that it is in a certain position in a tree

ECAI, 30 August 2006, Riva del Garda, Italy 7 1.For all labels in T1 and T2 compute concepts at labels 2.For all nodes in T1 and T2 compute concepts at nodes 3.For all pairs of labels in T1 and T2 compute relations between concepts at labels (background knowledge) 4.For all pairs of nodes in T1 and T2 compute relations between concepts at nodes Steps 1 and 2 constitute the preprocessing phase, and are executed once and each time after the ontology is changed (OFF- LINE part) Steps 3 and 4 constitute the matching phase, and are executed every time two ontologies are to be matched (ON - LINE part) Four macro steps Given two labeled trees T1 and T2, do:

ECAI, 30 August 2006, Riva del Garda, Italy 8 Step 1: compute concepts at labels The idea Translate labels at nodes written in natural language into propositional logical formulas which explicitly codify the labels intended meaning Preprocessing Tokenization. Labels (according to punctuation, spaces, etc.) are parsed into tokens. E.g., Hobbies and Interests Lemmatization. Tokens are further morphologically analyzed in order to find all their possible basic forms. E.g., Hobbies Hobby Building atomic concepts. An oracle (WordNet) is used to extract senses of lemmas. E.g., Hobby has 3 senses Building complex concepts. Prepositions, conjunctions are translated into logical connectives and used to build complex concepts out of the atomic concepts E.g., C Hobbies_and_Interests =, where U is a union of the senses that WordNet attaches to lemmas

ECAI, 30 August 2006, Riva del Garda, Italy 9 Step 2: compute concepts at nodes The idea Extend concepts at labels by capturing the knowledge residing in a structure of a tree in order to define a context in which the given concept at a label occurs Computation Concept at a node for some node n is computed as a conjunction of concepts at labels located above the given node, including the node itself C 2 = C Top C Entertainment C 4 = C Top (C Hobbies C Interests ) C Books Example Hobbies and Interests Books Top Entertainment

ECAI, 30 August 2006, Riva del Garda, Italy 10 Step 3: compute relations between (atomic) concepts at labels The idea Exploit a priori knowledge, e.g., lexical, domain knowledge, with the help of element level semantic matchers

ECAI, 30 August 2006, Riva del Garda, Italy 11 Step 3: Element level semantic matchers Sense-based matchers have two WordNet senses in input and produce semantic relations exploiting (direct) lexical relations of WordNet String-based matchers have two labels in input and produce semantic relations exploiting string comparison techniques

ECAI, 30 August 2006, Riva del Garda, Italy 12 Step 4: compute relations between concepts at nodes The idea Decompose the graph (tree) matching problem into the set of node matching problems Translate each node matching problem, namely pairs of nodes with possible relations between them, into a propositional formula Check the propositional formula for validity

ECAI, 30 August 2006, Riva del Garda, Italy 13 Step 4: Example of a node matching task ?

ECAI, 30 August 2006, Riva del Garda, Italy 14 Lack of Knowledge

ECAI, 30 August 2006, Riva del Garda, Italy 15 Problem of low recall (incompletness) - I Facts Matching has two components: element level matching and structure level matching Contrarily to many other systems, the S-Match structure level algorithm is correct and complete Still, the quality of results is not very good Why?... the problem of lack of knowledge Example

ECAI, 30 August 2006, Riva del Garda, Italy 16 Problem of low recall (incompletness) - II Preliminary (analytical) evaluation Dataset [Avesani et al., ISWC05]

ECAI, 30 August 2006, Riva del Garda, Italy 17 On increasing the recall: an overview Multiple strategies Strengthen element level matchers Reuse of previous match results from the same domain of interest PO = Purchase Order Use general knowledge sources (unlikely to help) WWW Use, if available (!), domain specific sources of knowledge UMLS

ECAI, 30 August 2006, Riva del Garda, Italy 18 Iterative Semantic Matching

ECAI, 30 August 2006, Riva del Garda, Italy 19 Iterative semantic matching (ISM) The idea Repeat Step 3 and Step 4 of the matching algorithm for some critical (hard) matching tasks ISM macro steps Discover critical points in the matching process Generate candidate missing axiom(s) Re-run SAT solver on a critical task taking into account the new axiom(s) If SAT returns false, save the newly discovered axiom(s) for future reuse

ECAI, 30 August 2006, Riva del Garda, Italy 20 ISM: Discovering critical points - Example Google (T1)Looksmart (T2) cLabsMatrix (result of Step 3)cNodesMatrix (result of Step 4)

ECAI, 30 August 2006, Riva del Garda, Italy 21 ISM: Generating candidate axioms Sense-based matchers have two WordNet senses in input and produce semantic relations exploiting structural properties of WordNet hierarchies Gloss-based matchers have two WordNet senses as input and produce relations exploiting gloss comparison techniques

ECAI, 30 August 2006, Riva del Garda, Italy 22 ISM: generating candidate axioms Hierarchy distance Hierarchy distance returns the equivalence relation if the distance between two input senses in WordNet hierarchy is less than a given threshold value (e.g., 3) and Idk otherwise Distance between these concepts is 2 (1 more general link and 1 less general). Thus, we can conclude that games and entertainment are close in their meaning and return the equivalence relation diversion entertainment games There is no direct relation between games and entertainment in WordNet

ECAI, 30 August 2006, Riva del Garda, Italy 23 Evaluation

ECAI, 30 August 2006, Riva del Garda, Italy 24 Testing methodology Dataset [Avesani et al., ISWC05] Measuring match quality Indicators Precision, [0,1]; Recall, [0,1] By construction in that dataset reference mappings represent only true positives, thus allowing us to estimate only recall Higher values of recall can be obtained at the expense of lower values of precision Additional tests to ensure that precision does not decrease Indicators

ECAI, 30 August 2006, Riva del Garda, Italy 25 Experimental results

ECAI, 30 August 2006, Riva del Garda, Italy 26 Conclusions and Future Work

ECAI, 30 August 2006, Riva del Garda, Italy 27 Conclusions The problem of missing domain knowledge is a major problem of all (!) matching systems This problem on the industrial size matching tasks is very hard We have investigated it by examples of light weight ontologies, such as Google and Yahoo Partial solution by applying semantic matching iteratively

ECAI, 30 August 2006, Riva del Garda, Italy 28 Future work Iterative semantic matching New element level matchers Interactive semantic matching GUI Cutomizing technology Extensive evaluation Testing methodology Industry-strength tasks

ECAI, 30 August 2006, Riva del Garda, Italy 29 References Project website - KNOWDIVE: F. Giunchiglia, P. Shvaiko: Semantic matching. Knowledge Engineering Review Journal, 18(3), F. Giunchiglia, P.Shvaiko, M. Yatskevich: Semantic schema matching. In Proceedings of CoopIS05. P. Bouquet, L. Serafini, S. Zanobini: Semantic coordination: a new approach and an application. In Proceedings of ISWC, P. Avesani, F. Giunchiglia, M. Yatskevich: A large scale taxonomy mapping evaluation. In Proceedings of ISWC, C. Ghidini, F. Giunchiglia: Local models semantics, or contextual reasoning = locality + compatibility. Artificial Intelligence Journal, 127(3), Ontology Matching: P. Shvaiko and J. Euzenat: A survey of schema-based matching approaches. Journal on Data Semantics, IV, 2005.

ECAI, 30 August 2006, Riva del Garda, Italy 30 Thank you!

ECAI, 30 August 2006, Riva del Garda, Italy 31 FN – False negatives TP – True positives FP – False positives TN – True negatives FN TP FP TN Reference Matches System Matches