Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu.

Similar presentations


Presentation on theme: "Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu."— Presentation transcript:

1 Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu

2 Background Problem: Attribute Matching Matching Possibilities (Facets) Attribute Names Data-Value Characteristics Expected Data Values Data-Dictionary Information Structural Properties

3 Approach Target Schema T Source Schema S Framework Individual Facet Matching Combining Facets Best-First Match Iteration

4 Example Source Schema S Car Year has 0:1 Make has 0:1 Model has 0:1 Cost Style has 0:1 0:* Year has 0:1 Feature has 0:* Cost has 0:1 Car Mileage has Phone has 0:1 Model has 0:1 Target Schema T Make has 0:1 Miles has 0:1 Year Model Make Year Make Model Car MileageMiles

5 Individual Facet Matching Attribute Names Data-Value Characteristics Expected Data Values

6 Attribute Names Target and Source Attributes T : A S : B WordNet C4.5 Decision Tree: feature selection f0: same word f1: synonym f2: sum of distances to a common hypernym root f3: number of different common hypernym roots f4: sum of the number of senses of A and B

7 WordNet Rule The number of different common hypernym roots of A and B The sum of distances of A and B to a common hypernym The sum of the number of senses of A and B

8 Confidence Measures

9 Data-Value Characteristics C4.5 Decision Tree Features Numeric data (Mean, variation, standard deviation, …) Alphanumeric data (String length, numeric ratio, space ratio)

10 Confidence Measures

11 Expected Data Values Target Schema T and Source Schema S Regular expression recognizer for attribute A in T Data instances for attribute B in S Hit Ratio = N’/N for (A, B) match N’ : number of B data instances recognized by the regular expressions of A N: number of B data instances

12 Confidence Measures

13 Combined Measures Threshold: 0.5 1 0 0 0 0 0 0 0 000000 1 0 0 0 0 0 0000 10 0 0000 0 0 0 0 0 1 0 0 0 00 10 0 00

14 Final Confidence Measures

15 Experimental Results Matched Attributes 100% (32 of 32); Unmatched Attributes 99.5% (374 of 376); “Feature” ---”Color”; “Feature” ---”Body Type”. F1 93.75% F2 84% F3 92% F1 98.9% F2 97.9% F3 98.4%

16 Conclusions Direct Attribute Matching – feasible Individual-Facet Matching – good Multifaceted Matching – better

17 Future Work Additional Facets More Sophisticated Combinations Additional Application Domains Automating Feature Selection Indirect Attribute Matching www.deg.byu.edu


Download ppt "Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu."

Similar presentations


Ads by Google