Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu.

Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu

Background Problem: Attribute Matching Matching Possibilities (Facets) Attribute Names Data-Value Characteristics Expected Data Values Data-Dictionary Information Structural Properties

Approach Target Schema T Source Schema S Framework Individual Facet Matching Combining Facets Best-First Match Iteration

Example Source Schema S Car Year has 0:1 Make has 0:1 Model has 0:1 Cost Style has 0:1 0:* Year has 0:1 Feature has 0:* Cost has 0:1 Car Mileage has Phone has 0:1 Model has 0:1 Target Schema T Make has 0:1 Miles has 0:1 Year Model Make Year Make Model Car MileageMiles

Individual Facet Matching Attribute Names Data-Value Characteristics Expected Data Values

Attribute Names Target and Source Attributes T : A S : B WordNet C4.5 Decision Tree: feature selection f0: same word f1: synonym f2: sum of distances to a common hypernym root f3: number of different common hypernym roots f4: sum of the number of senses of A and B

WordNet Rule The number of different common hypernym roots of A and B The sum of distances of A and B to a common hypernym The sum of the number of senses of A and B

Confidence Measures

Data-Value Characteristics C4.5 Decision Tree Features Numeric data (Mean, variation, standard deviation, …) Alphanumeric data (String length, numeric ratio, space ratio)

Confidence Measures

Expected Data Values Target Schema T and Source Schema S Regular expression recognizer for attribute A in T Data instances for attribute B in S Hit Ratio = N’/N for (A, B) match N’ : number of B data instances recognized by the regular expressions of A N: number of B data instances

Confidence Measures

Combined Measures Threshold: 0.5 1 0 0 0 0 0 0 0 000000 1 0 0 0 0 0 0000 10 0 0000 0 0 0 0 0 1 0 0 0 00 10 0 00

Final Confidence Measures

Experimental Results Matched Attributes 100% (32 of 32); Unmatched Attributes 99.5% (374 of 376); “Feature” ---”Color”; “Feature” ---”Body Type”. F1 93.75% F2 84% F3 92% F1 98.9% F2 97.9% F3 98.4%

Conclusions Direct Attribute Matching – feasible Individual-Facet Matching – good Multifaceted Matching – better

Future Work Additional Facets More Sophisticated Combinations Additional Application Domains Automating Feature Selection Indirect Attribute Matching www.deg.byu.edu

Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu.

Similar presentations

Presentation on theme: "Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu.

Similar presentations

Presentation on theme: "Multifaceted Exploitation of Metadata for Attribute Match Discovery in Information Integration David W. Embley David Jackman Li Xu."— Presentation transcript:

Similar presentations

About project

Feedback