Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference.

Similar presentations


Presentation on theme: "Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference."— Presentation transcript:

1 Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference

2 Schema Matching

3 Schema Matching (Cont.) Definition: Finding a mapping between those elements of two schemas that semantically correspond to each other Applications Schema integration Data translation XML message mapping Data warehouse loading Goal

4 Taxonomy Schema vs. Instance based Element vs. Structure granularity Linguistic based Constraint based Matching cardinality Auxiliary information Individual vs. Combinational

5 Cupid Schema-based Automated linguistic-based matching Both element-based and structure-based Biased toward similarity of atomic elements Exploits internal structure Exploits keys, referential constraints and views Makes context-dependent matches of a shard type 1:n mapping

6 Similarity Coefficient Computation First Phase: Linguistic matching Names Data types Domains  Linguistic similarity coefficient: lsim Second Phase: Structural matching Contexts Linguistic similarity coefficients  Structural similarity coefficient: ssim Hybrid (wsim = w_ struct * ssim + (1-w_ struct ) * lsim)

7 Linguistic Matching Normalization Tokenization Expansion elimination Categorization Data types Schema hierarchy Linguistic contents Comparison—Linguistic Similarity Coefficient (lsim) Thesaurus Sub-string matching

8 Structural Matching Bottom-up Mutually Recursive

9 Example

10 Example (Cont.)

11

12 Schema Graphs Elements Relationships(containment, aggregation, and IsDerivedFrom) Matching Shard Types (context dependent mappings) Matching Referential Constraints General Schemas

13 Matching Shard Types

14 Matching Referential Constraints

15 Other Features Optionality Views Initial Mappings Lazy Expansion Pruning Leaves

16 Comparative Study Algorithms MOMIS DIKE Cupid Canonical Examples Real World Example

17 Canonical Examples Identical schemas Atomic elements with same names, but different data types Atomic elements with same data types, but different names (a prefix or suffix is added) Different class names, but atomic elements same names and data types Different nesting of the data – similar schemas with nested and flat structures Type substitution or context dependent mapping

18 Real World Example

19 Experimental Conclusions Linguistic matching Thesaurus Linguistic similarity with no structure similarity Granularity of similarity computation Leaves Structure information beyond the immediate vicinity Context-dependent mappings Performance parameters

20 Future Work A Truly Robust Solution Machine learning applied to instances Natural language technology Pattern matching to reuse known matches Immediate Challenges Off-the-shelf thesaurus Schema annotations Automatic tuning of the control parameters Scalability analysis and testing More comparative analysis of algorithms


Download ppt "Generic Schema Matching with Cupid Jayant Madhavan Philip A. Bernstein Erhard Raham Proceedings of the 27 th VLDB Conference."

Similar presentations


Ads by Google