Presentation is loading. Please wait.

Presentation is loading. Please wait.

Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University.

Similar presentations


Presentation on theme: "Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University."— Presentation transcript:

1 Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University of Toronto

2 Contents Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

3 ILIADS Goal:  Produce high-quality integration via a flexible method able to adapt to a wide variety of ontology sizes and structures Method:  Combining statistical and logical inference  Use schema (structure) and data (instances) effectively Solution:  Integrated Learning In Alignment of Data and Schema (ILIADS)

4 Contributions Show how to combine statistical and logical inference effectively Show that a small amount of inference yields high qualitative gain Show that parameters needed to perform inference over data and structure are robust Provide a thorough evaluation on 30 pairs of real-world ontologies (with ground truth)

5 Contents Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

6 Example OWL Lite ontologies (discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty) (discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty) (resultsF rom, rdfs:subPropertyOf, associatedWith)

7 Example OWL Lite ontologies An entity can be a: Class (discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty) (discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty) (resultsF rom, rdfs:subPropertyOf, associatedWith)

8 Example OWL Lite ontologies An entity can be a: Class Instance (discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty) (discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty) (resultsF rom, rdfs:subPropertyOf, associatedWith)

9 Example OWL Lite ontologies An entity can be a: Class Instance Property (discoveredBy, owl:inverseOf, discoverer); (discoveredBy, owl:type, owl:FunctionalProperty) (discoveredBy, owl:inverseOf, discoverer); (associatedWith, owl:type, owl:TransitiveProperty) (resultsF rom, rdfs:subPropertyOf, associatedWith)

10 Example OWL Lite ontologies (discoveredBy, owl:inverseOf, discoverer) (discoveredBy, owl:type, owl:FunctionalProperty) (discoveredBy, owl:inverseOf, discoverer) (associatedWith, owl:type, owl:TransitiveProperty) (resultsF rom, rdfs:subPropertyOf, associatedWith)

11 Inference in OWL Lite

12

13

14 The integration problem

15

16

17

18 Contents Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

19 State of the art Robust statistical methods  Well-known similarity measures  Used for matching data (entities) and schema  May use graph structure of schema Logical inference  Not combined with statistical inference  Basis for most schema mapping and ontology integration methods  Approaches integrate schema, but not data

20 Issues How to combine statistical inference with logical inference  Takes into account data, structure, etc. so it’s no longer obvious  In particular, how to quantify the results of logical inference into a similarity-like form? How to do logical inference in a tractable manner  For OWL-Lite, EXPTIME-complete for the worst case for the entire ontology

21 The ILIADS algorithm repeat until no more candidates 1. Compute local similarities 2. Select promising candidates 3. For each candidate a. Select relationship b. Perform logical inference c. Update score with the inference similarity 4. Select the candidate with the best score end

22 The ILIADS algorithm repeat until no more candidates 1. Compute local similarities 2. Select promising candidates 3. For each candidate a. Select relationship b. Perform logical inference c. Update score with the inference similarity 4. Select the candidate with the best score end

23 Computing local similarities sim lexical : Jaro-Winkler and Wordnet sim structural : Jaccard for neighborhoods sim extensional : Jaccard on extensions parameters: λ x, λ s, λ e  different for classes, instances and properties

24 The ILIADS algorithm repeat until no more candidates 1. Compute local similarities 2. Select promising candidates 3. For each candidate a. Select relationship b. Perform logical inference c. Update score with the inference similarity 4. Select the candidate with the best score end

25 Selecting promising candidates 1. Select candidates with sim(e,e’) > λ t 2. Use a policy based on entity type to order, e.g.: Class alignments first Instance alignments first Alternate between classes and instances

26 The ILIADS algorithm repeat until no more candidates 1. Compute local similarities 2. Select promising candidates 3. For each candidate a. Select relationship b. Perform logical inference c. Update score with the inference similarity 4. Select the candidate with the best score end

27 Selecting relationship Must decide on relation type  subClassOf vs. equivalentClass  subPropertyOf vs. equivalentProperty Determination is difficult, especially under the OWL open-world semantics Use a simple extension based technique based on a threshold λ r

28 Selecting relationship

29

30 The ILIADS algorithm repeat until no more candidates 1. Compute local similarities 2. Select promising candidates 3. For each candidate a. Represent candidate relationship b. Perform logical inference c. Update score with the inference similarity 4. Select the candidate with the best score end

31 Performing logical inference For the candidate pair (e,e’): Select an axiom to apply The logical consequences are the pairs of entities (e (i), e (j) ) that have just become equivalent Repeat a small number of times (5) to maintain tractability

32 Performing logical inference

33

34

35 (TheodorEscherich, owl:sameAs, T.S. Escherich) is a logical consequence of the candidate (E-ColiPoisoning, owl:sameAs, E-Coli)

36 The ILIADS algorithm repeat until no more candidates 1. Compute local similarities 2. Select promising candidates 3. For each candidate a. Represent candidate relationship b. Perform logical inference c. Update score with the inference similarity 4. Select the candidate with the best score end

37 Updating score For the candidate pair (e,e’): Initial local similarity sim(e,e’) Inference similarity over all consequences: Updated similarity:

38 Updating score

39

40 Consistency The constructed alignment is not guaranteed to be consistent  ILIADS can only detect inconsistencies that appear in the few logical inference steps  Pellet used to check consistency after ILIADS Experimentally, inconsistent ontologies in less than.5% of runs

41 Contents Motivation and goals Short overview of OWL Lite The ILIADS method Experimental evaluation

42 Experimental framework 30 pairs of real-world ontologies  From 194 to over 20,000 triples  From a variety of domains: medical, geographical, economical, biological Ground truth provided by human reviewers  Multiple iterations to ensure the best human- provided alignment Datasets available:  http://www.cs.umd.edu/linqs/projects/iliads

43 Experimental framework Evaluation: precision, recall and F1 quality  F1 = 2 * Precision * Recall / (Precision + Recall)  7 independent runs ILIADS Variations:  ILIADS-tailored uses the best set of parameters for each pair of ontologies  ILIADS-fixed uses one set of parameters for all pairs of ontologies Used to evaluate robustness of the parameters

44 Experimental framework ILIADS compared to two leading tools:  FCA-merge [Stumme and Maedche, IJCAI 2001] uses formal concept analysis and an external document corpus  COMA++ [Aumueller et al., SIGMOD 2005] implements multiple match strategies, including fragment and reuse-based matching

45 Precision/recall

46

47

48

49 Precision/recall comparison

50 Precision/recall for ontologies with substantial instance data

51 Number of inference steps

52 ILIADS parameters ILIADS- fixed.2.4.1.5.6.4.3.5.7.2 Min ILIADS- tailored.15.40.3.45.35.2.35.65.2 Max IILIADS- tailored.25.45.1.65.7.5.35.65.7.2 Lexical parameters Structural parameters Extensional parameters

53 Choosing ILIADS parameters Despite the number of parameters, method is quite robust  Parameters are stable around the ILIADS-fixed values if the two ontologies in a pair are not very different Strong correlations between  Structural similarity coefficients and the average node degree  Extensional coefficients and the ratio of instances to classes

54 False negative analysis

55 Concluding remarks New ontology integration algorithm  First to combine statistical and logical inference Evaluated feasibility of combined inference  Small number of logical inference steps are sufficient for integration decisions  Inference is stable to parameter settings  Parameters permit principled tuning based on ontology characteristics Dataset and code available at: http://www.cs.umd.edu/linqs/projects/iliads http://www.cs.umd.edu/linqs/projects/iliads


Download ppt "Leveraging Data and Structure in Ontology Integration Octavian Udrea 1 Lise Getoor 1 Renée J. Miller 2 1 University of Maryland College Park 2 University."

Similar presentations


Ads by Google