Presentation is loading. Please wait.

Presentation is loading. Please wait.

Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF.

Similar presentations


Presentation on theme: "Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF."— Presentation transcript:

1 Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF

2 Car Schema Matching Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type

3 Mapping Direct Matches Indirect Matches Union Selection Composition Decomposition

4 Union and Selection Car Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type

5 Composition and Decomposition Car Source Car Year Cost Style Year Feature Cost Car Phone Target Car Miles Mileage Model Make & Model Color Body Type

6 Matching Techniques Terminological Relationships Value Characteristics Expected Data Values Structure

7 Terminological Relationships WordNet Machine-Learned Rules Example: (Make, Brand) The number of different common hypernym roots of A and B Sum of distances of A and B to a common hypernym The sum of the number of senses of A and B

8 Value Characteristics Machine Learning Features [LC94] String length, numeric ratio, space ratio. Mean, variation, coefficient variation, standard deviation;

9 Make & ModelBrand Model Expected Values Application Concepts Data Frames CarMake  “ford”  “honda”  … CarModel  “accord”  “mustang”  “taurus”  … Ford Mustang Ford Taurus Ford F150 … CarMake. CarModel Legend Mustang A4 … CarModel CarMake TargetSource Acura Audi BMW …

10 Structure PO POShipToPOBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder DeliverToInvoiceTo Items ItemItemCount ItemNumber QuantityUnitOfMeasure CityStreet Address TargetSource

11 Structure (Cont.) PO POShipToPOBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder DeliverToInvoiceTo Items ItemCount ItemNumber QuantityUnitOfMeasure CityStreet Address DeliverTo TargetSource

12 Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder InvoiceTo Items ItemCount ItemNumber QuantityUnitOfMeasure City Street City Street POShipToDeliverTo TargetSource

13 Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count PurchaseOrder InvoiceTo Items ItemCount City Street City Street LineQtyUoM ItemNumber Quantity LineQtyUoM ItemNumber Quantity LineQty QuantityUnitOfMeasure POShipToDeliverTo TargetSource

14 Structure (Cont.) PO POBillToPOLines CityStreetCityStreetItem Count LineQtyUoM PurchaseOrder InvoiceTo Items ItemCount ItemNumber Quantity City Street City Street City Street City Street Count LineQty QuantityUnitOfMeasure POShipToDeliverTo TargetSource

15 Experiments Methodology Measures Precision Recall F Measure

16 Results Applications (Number of Schemes) Precision (%) Recall (%) F (%) CorrectFalse Positive False Negative Course Schedule (5) 98939611929 Faculty Member (5) 100 14000 Real Estate (5) 9296942352010 Data borrowed from Univ. of Washington Indirect Matches: 94% (precision, recall, F-measure)

17 Contributions Direct Matches Indirect Matches Expected values Structure High Precision and High Recall


Download ppt "Direct and Indirect Matching of Schema Elements for Data Integration on the Web Li Xu Data Extraction Group Brigham Young University Sponsored by NSF."

Similar presentations


Ads by Google