Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy.

Similar presentations


Presentation on theme: "Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy."— Presentation transcript:

1 Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy

2 Data Integration

3 Problem & Solution Problem Large-scale Data Integration Systems Bottleneck: Semantic Mappings 1-1 Mappings Solution Multi-strategy Learning Integrity Constraints XML Structure Learner

4 Learning Source Descriptions (LSD) Components Base learners Meta-learner Prediction converter Constraint handler Operations Training phase Matching phase

5 Learners Basic Learners Name Matcher (Whirl) Content Matcher (Whirl) Naïve Bayes Learner County-Name Recognizer XML Learner Meta-Learner (Stacking)

6 XML Learner

7 XML Learner (Cont.)

8 Constraint Handler Domain Constraints

9 Constraint Handler (Cont.) Search Heuristic Mapping Cost

10 Training Phase

11 Example1 (Training Phase)

12 Example1 (Cont.)

13 (“location” , ADDRESS) (“Miami, FL”, ADDRESS)

14 Matching Phase

15 Example2 (Matching Phase)

16 Example2 (Cont.)

17

18 Empirical Evaluation

19 Measures Matching accuracy of a source Average matching accuracy of a source Average matching accuracy of a domain

20 Experiment Result

21 Experiment Result (Cont.) Contributions of base learners and the constraint handler

22 Experiment Result (Cont.) Contributions of Schema information and Data Instances

23 Experiment Result (Cont.) Performance sensitivity to the amount of data instances

24 Limitations Enough Training Data Domain Dependent Learners Ambiguities in Sources Efficiency Overlapping of Schemas

25 Conclusion and Future Work Improve over time Extensible framework Multiple types of knowledge Non 1-1 mapping ?


Download ppt "Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach AnHai Doan Pedro Domingos Alon Halevy."

Similar presentations


Ads by Google