Presentation is loading. Please wait.

Presentation is loading. Please wait.

QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,

Similar presentations


Presentation on theme: "QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,"— Presentation transcript:

1 QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts, Lowell Oct 14 th, 2003 22th International Conference on Conceptual Modeling (ER) 2003 Chicago, Illinois

2 ER 2003 2 Integration of information - A big challenge! “ Data data everywhere and …… Problem: Heterogeneous data sources Concepts: protein sequence, grams of protein Semantics: “ protein ” for a protein scientist vs “ protein ” for a nutritionist Data Formats: XML, object oriented, relational Access Methods: special purpose programs (BLAST), SQL, XQuery “ need a way to integrate ” Introduction

3 ER 2003 3 Integration of heterogeneous sources Source 1Source 2Source n...... Integrated Sources Problems: -Resolve conflicts -Integrate data -Interpret results Goal: -Automated / Semi- automated Integration via “ Schema Matching ”

4 ER 2003 4 Schema Matching - The Process Schema matching: process of finding “ semantic correspondences ” between the entities of two or more schemas –Input: two schemas –Output: set of matches between the two schemas Two entities match if their similarity value is above threshold Similarity values & thresholds tightly coupled to algorithm. –Example: CUPID[MBR01] defines similarity value as the fraction of leaves in the two subtrees that have at least one “ strong ” link to a leaf in the other subtree. Linguistic algorithm ’ s similarity values are based on the level of matching in a hypernym tree. –Thresholds are ad-hoc Problem: A match from one algorithm may not be considered a match by another algorithm!

5 ER 2003 5 Contributions of Our Work Proposal of QoM - Quality of Match metric –A metric for comparing different matches produced by the match algorithms Measurement of QoM –Qualitative measure: Match Taxonomy –Quantitative measure: Weight-based Match Model

6 ER 2003 6 Outline Motivation Our Approach –Unifying Data Model: UML –Match Taxonomy –Weight-based Match Model Related Work Conclusions and Future Work

7 ER 2003 7 Unifying Data Model VS.   UML Model

8 ER 2003 8 Definition a schema S = a class c = an attribute a = a method m =

9 ER 2003 9 Definition (Cont ’ ) a schema S = a class c = an attribute a = a method m =

10 ER 2003 10 Definition (Cont ’ ) a schema S = a class c = an attribute a = a method m =

11 ER 2003 11 Definition (Cont ’ ) a schema S = a class c = an attribute a = a method m =

12 ER 2003 12 Qualitative Measure: Taxonomy of Schema Matches Attribute/Method Level … Micro Match Goal: Describe the “quality” and the “coverage” of match Class Level … Sub-Macro Match Schema Level … Macro Match

13 ER 2003 13 Micro Match Attributes can be compared based on: label ( L ), scope ( A ), type ( T ), atomicity ( N ), intializer( I ) Match can be: Exact - Labels: exact string match or synonyms ( name vs name ) Other properties: equivalent values ( String vs char[] ) Relaxed - Labels: “almost same”, same hypernym tree ( firstName vs name ) Other properties: implied values ( protected vs private )

14 ER 2003 14 Example name =  Exact Match

15 ER 2003 15 Example name = qty = Relaxed Match  

16 ER 2003 16 Sub-Macro Match Total MatchPartial Match Classes can be compared based on: –The quality of match of its attributes (micro match) Exact vs Relaxed –The “ coverage ” : the number of micro matches between the source and the target classes Total : all attributes of the source have a match in the target Partial : some, not all, attributes of source have a match in the target.

17 ER 2003 17 Sub-Macro Match (Cont ’ ) Total Exact MatchPartial Exact Match Total Relaxed MatchPartial Relaxed Match

18 ER 2003 18 Example name, desc Recipe Total Exact Match Dish 

19 ER 2003 19 Example id, qty, item qty, item Ingredient Partial Exact Match Item Item Total Exact Match Ingredient 

20 ER 2003 20 Macro Match Schemas can be compared in a similar manner to classes Schemas can be compared based on: –The quality of match of their classes (sub-macro match) Total Exact, Total Relaxed, Partial Exact and Partial Relaxed –The “ coverage ” : the number of sub-macro matches between the source and the target schemas Total : all classes of the source have a match in the target Partial : some, not all, classes of source have a match in the target.

21 ER 2003 21 Macro Match Total Exact MatchPartial Exact Match Total Relaxed Match Partial Relaxed Match

22 ER 2003 22 Example Recipe, Ingredient, Instruction Dish, Item, Step Recipe Partial Exact Match Dish Dish Total Exact Match Recipe TEPE

23 ER 2003 23 Quantitative Measure: Weight-Based Measure of QoM Match Taxonomy : –Qualitative measure of match between two entities –Can distinguish between a total exact and a partial exact match, or a total exact and a partial relaxed match –Cannot decide if one partial exact match is better than the other, or if a total relaxed match is better than a partial exact match Weight Based Measure: –Provides a quantitative metric for the QoM  

24 ER 2003 24 =,   Match operator Weight Weight-based Match Model Match Value: “ weight ” of each match operator representing the match between two properties Example: Label match Name  W(l s, l t ) = 1.0

25 ER 2003 25 What has been done - Related Work Domain specific [BHP94,BCVB01,BM01] and domain independent [HMN+99,MBR01,DR02] algorithms Approaches exploit various types of information –Element names, structural properties, ontologies, characteristics of data instances. Example: –Doan et al. [DDH01] Combines match predictions using a set of machine learning techniques Match predictions based on element name matching, content matching, text classification and domain knowledge –Madhavan et al. (Cupid) [ MBR01] Hybrid algorithm - combines linguistic and structural match algorithm

26 ER 2003 26 Conclusions and Future Work Contributions: –Proposed QoM: a quality metric for schema matches –Two techniques to evaluate the QoM Qualitative: Match Taxonomy Quantitative: Weight-based match Model Future Work: –Combining “ user input ” for desired matches to optimize the schema match process –Refinement of QoM for XML model Accounting for order, and the different levels of nesting –Development of Match algorithms based on QoM

27 ER 2003 27 More Information: http: //www.cs.uml.edu/dslhttp: //www.cs.uml.edu/dsl email: kajal@cs.uml.edu,kajal@cs.uml.edu,

28 ER 2003 28 Micro Match Model QoM (a s, a t ) = W (L s, L t ) + W (A s, A t ) + W (T s, T t ) + W (N s, N t ) + W (I s, I t ) 5 QoM (m s, m t ) = QoM sig (m s, m t ) + (2 * QoM spec (m s, m t ) ) 3 QoM sig (m s, m t ) = W (A s, A t ) + W (O s, O t ) + W (I s, I t ) 3 QoM spec (m s, m t ) = W (pre s, pre t ) + W (post s, post t ) 2

29 ER 2003 29 Example The Recipe SchemaThe Dish Schema

30 ER 2003 30 Micro Match (Attribute) a s = vs a t = L s vs L t A s vs A t T s vs T t N s vs N t I s vs I t Exact Match Relaxed Match   

31 ER 2003 31 Micro Match (Method) m s = vs m t = A s vs A t O s vs O t  I s vs I t Pre s vs Pre t Post s vs Post  Exact Match Relaxed Match ======   

32 ER 2003 32 Weighing the Micro Match Match between attributes based on the match of the individual properties –Exact or relaxed QoM(a s, a t ): –Quantitative measure of the match between attributes a s and a t. –The normalized sum of the match values of the individual properties of an attribute.

33 ER 2003 33 Example name =  QoM (name recipe, name dish ) = 1.0 + 1.0 + 1.0 + 1.0 + 1.0 = 1.0 

34 ER 2003 34 Weighing the Sub-Macro Match Sub-Macro match: –Normalized sum of QoM of micro matches: –Coverage: –Sub-Macro Match: R W (C s, C t ) =  QoM (M s, M t ) | C s | R S (C s, C t ) = | C m s | | C t | R T (C s, C t ) = | C m s | 3 QoM (C s, C t ) = R W (C s, C t ) + R s (C s, C t ) + R T (C s, C t )

35 ER 2003 35 Example id, step direction R QoM (step Instruction, direction step ) = 0.5 + 1.0 + 1.0 + 1.0 + 1  = 0.9 QoM (Instruction, step) = (0.45+0.5+1.0) / 3 = 0.65 R W (Instruction, step) = 0.9 / 2 = 0.45 R S (Instruction, step) = 1 / 2 = 0.5 R t (Instruction, step) = 1 / 1 = 1.0

36 ER 2003 36 Weighing the Macro Match Macro match: –Normalized sum of sub-macro QoMs –Coverage –Macro Match: R W (S s, S t ) =  QoM (C s, C t ) | S s | | S t | R T (S s, S t ) = | S m s | | S s | R S (S s, S t ) = | S m s | 3 QoM (S s, S t ) = R W (S s, S t ) + R s (S s, S t ) + R T (S s, S t )

37 ER 2003 37 Example Recipe, Ingredient, Instruction Dish, Item, Direction 1.00 QoM (RECIPE, DISH) = (0.81+1.0+1.0) / 3 = 0.94 R W (RECIPE, DISH) = (1.00 + 0.78 + 0.65 ) / 3 = 0.81 R S (RECIPE, DISH) = 3 / 3 = 1.0 R t (RECIPE, DISH) = 3 / 3 = 1.0 0.780.65


Download ppt "QoM: Qualitative and Quantitative Measure of Schema Matching Naiyana Tansalarak and Kajal T. Claypool (Kajal Claypool - presenter) University of Massachusetts,"

Similar presentations


Ads by Google