Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A Survey of Approaches to Automatic Schema Matching Erhard Rahm Philip A. Bernstein The VLDB Journal 10:334-350 (2001)

Similar presentations


Presentation on theme: "1 A Survey of Approaches to Automatic Schema Matching Erhard Rahm Philip A. Bernstein The VLDB Journal 10:334-350 (2001)"— Presentation transcript:

1 1 A Survey of Approaches to Automatic Schema Matching Erhard Rahm Philip A. Bernstein The VLDB Journal 10:334-350 (2001)

2 2 The Problem zSchema matching yInput schemas yOutput mappings zMotivations yManual schema matching yGeneric and customizable schema matching

3 3 Application Domains zSchema Integration: Structures and Terminological relationships zData warehouses: Source-to-warehouse Transformation zE-commerce: Message Translation zSemantic query processing: A Run-time Scenario

4 4 The Match Operator zRepresentations of Input Schemas and Output Mapping ySchema representation xSchema elements xStructure yMapping representation xMapping elements xMapping expressions zMatching Function yMathematically unsatisfying yHeuristics

5 5 Architecture for Generic Match Tool 1 (Portal schemas) Tool 2 (E-business schemas) Tool 3 (Data warehousing schemas) Global libraries (dictionaries, schemas, …) Schema import/export Generic Match Implementation Internal schema representation

6 6 Classification of Approaches zIndividual matchers yInstance vs Schema yElement vs Structure Matching yLanguage vs Constraint yMatching Cardinality (1:1, 1:n, n:1, and n:m) yAuxiliary Information zCombinations of multiple matchers

7 7 Schema-level Approaches zGranularity of match (element-level vs. structure-level) zMatch cardinality zLinguistic approaches zConstraint-based approaches zReusing schema and mapping information

8 8 Granularity of match S1 elementsS2 elements Address Street City State Zip CustomerAddress Street City USState PostalCode Full structure match of Address and CustomerAddress AccountOwner Name Address Birthdate TaxExempt Customer Cname CAddress Cphone Partial structural match of AccountOwner and Customer

9 9 Match Cardinality Local match cardinalities S1 element(s) S2 element(s) Matching expression 1. 1:1, element level PriceAmountAmount = Price 2. n:1, element-level Price, TaxCostCost = Price * (1 + Tax/100) 3. 1:n, element-level NameFirstName, LastName FirstName, LastName = Extract(Name, …) 4. n:1, structure-level (n:m element- level) B.Title, B.PuNo, P.PuNo, P.Name A.Book, A.Publisher A.Book, A.Publisher = select B.Title, P.Name from B, P where B.PuNo = P.PuNo

10 10 Linguistic Approaches zName Matching yEquality of names yEquality of canonical name representations yEquality of synonyms yEquality of hypernyms ySimilarity of names based on common substrings, edit distance, pronunciation, and soundex yUser provided name matches zDescription Matching yEx. S1: empn //employee name yEx. S2: name //name of employee

11 11 Constraint-based Approaches

12 12 Reusing Schema and Mapping Information

13 13 Instance-level Approaches zLinguistic characterization yInformation retrieval techniques yEx. Extracting keywords and themes zConstraint-based characterization yNumeric value ranges yNumeric value averages yCharacter patterns (PhoneNr, ISBNs,, SSNs…)

14 14 Combining Different Matchers zHybrid matchers yHard-wired combination of multiple matching criteria yBetter performance zComposite matchers yIndependent basic matchers yFlexible execution order

15 15 Sample Approaches zSEMINT zLSD zSKAT zTranScm zDIKE zARTEMIS zCUPID

16 16 Sample Approaches zSEMINT zLSD zSKAT zTranScm zDIKE zARTEMIS zCUPID

17 17 SEMINTLSDTranScmCupidBYU Approach Schema TypeRelational, files XMLSGML, OOXML, relational OSM Metadata representationAttribute- based XMLLabeled graphExtended ER OSM Match granularity1:1 1:1 and 1:n1:1 and n:m Schema-level match Name-based **** Constraint-based* *** Structure matching **** Instance- level match Text-oriented * * Constraint-oriented** * Reuse/auxiliary information used** ** Combination of matchesHybridCompositeHybrid Composite Manual work/ user input***** Application areaData integration Data Integration Data Translation Generic RemarksNeural network

18 18 Conclusion zPropose a taxonomy that covers many of the existing approaches zSuggest quantitative work on the relative performance and accuracy of different approaches


Download ppt "1 A Survey of Approaches to Automatic Schema Matching Erhard Rahm Philip A. Bernstein The VLDB Journal 10:334-350 (2001)"

Similar presentations


Ads by Google