A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001.

A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001

Problem Schema matching: produce a mapping between elements of two schemas such that the elements in the mapping correspond semantically to each other. 2 Cust C# Cname FirstName LastName Customer CustID Company Contact Schema 1Schema 2 A real-world problem: Schema integration, Data warehouses, E-commerce, Semantic query processing

Problem (cont.) 3 The paper surveys approaches for automated schema matching and presents a taxonomy. Manual schema matching: t edious, time-consuming, error-prone and therefore expensive. Automated schema matching: t he solution

Overview Problem and applications Match operator Classification – Schema level matchers – Instance level matchers – Combining matchers Prototype implementations Conclusion Critique 4

Match Match is an abstract operator for implementing schema matching – Input: two input schemas – Output: a set of mapping elements Match is based on heuristics that approximate what the user considers to be a good match Implementations of match produces ’match candidates’ Not possible to determine all matches automatically 5

Match (cont.) 6 Match candidates S1.C# = S2.CustID S1.Cname = S2.Company S1.Firstname = S2.Contact Match Schema 1 Schema 2 User acceptance Matches S1.C# = S2.CustID S1.Cname = S2.Company

Generic Match Architecture 7

Classification 8

Schema-level matchers Element-level – Linguistic approaches: Similarity of names, e.g. FirstName  first_name Equality of synonyms, e.g. car  automobile Equality of hypernyms, i.e. book  publication, article  publication Description matching: S1: empn // employee name S2: name // name of employee – Constraint-based approaches: Data types, e.g. varchar  text Value ranges Uniqueness Structural-level 10

Classification 11

Instance-level matchers Linguistic characterization – Keywords, frequencies of words, combinations, etc. 12 CName Microsoft Apple Microsoft Lenovo Schema 1Schema 2 Company IBM Microsoft Apple Microsoft Apple EmpName Allan Steve Bob Carol match

Instance-level matchers Constraint-based characterization – Character patterns and numerical value ranges 13 Price $19.80 $136.25 $5.00 $64.36 Schema 1Schema 2 Paid $24.20 $32.54 $532.00 $33.33 match

Classification 14

Combining matchers The best result is archived by combining multiple matchers Two types: – Hybrid matchers – Composite matchers 15 Hybrid matcher Datatypes Names Value ranges Match candidates...

Combining matchers The best result is archived by combining multiple matchers Two types: – Hybrid matchers – Composite matchers 16 Composite matcher Match candidates... Name matcher Datatype matcher

Prototype implementations 17

Prototype implementations (cont.) 18

Prototype implementations (cont.) 19

Example: SemInt 15 contraint-based, 5 contant-based matching criteria Each criteria is mapped to a range [0..1] for every element. Yields an N- dimensional point for N matching criteria 20 1 1 0 0 Field length Data type C# CustID CName Company

Conclusion Proposes a taxonomy Characterizes and compares previous implementations using this taxonomy Useful for: – Programmers who need to implement Match – Researchers looking to develop better algorithms Proposes subjects for further research: – Test of performance and accuracy of existing approaches – Better utilization of instance-level information 21

Critique Good: Provides a good overview of the subject, Fig. 2 and Table 5 in particular Good at pointing out subjects that should be researched further Taxonomy is easy to understand and is explained well Could be improved: Does not compared performance or correctness of implementations No examples in the descripton of existing implementations Lacking good examples of structural level matching Relative performance of implementations are mentioned only once: ”Cupid performed somewhat better overall”. Cupid is developed by the authors. 22

Questions?

A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001.

Similar presentations

Presentation on theme: "A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001.

Similar presentations

Presentation on theme: "A survey of approaches to automatic schema matching Erhard Rahm, Universität für Informatik, Leipzig Philip A. Bernstein, Microsoft Research VLDB 2001."— Presentation transcript:

Similar presentations

About project

Feedback