Presentation is loading. Please wait.

Presentation is loading. Please wait.

Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences by Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal Younis Presented.

Similar presentations


Presentation on theme: "Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences by Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal Younis Presented."— Presentation transcript:

1 Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences by Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal Younis Presented by Prasanna Kunchavaram (800690762)‏ ITCS 6265 3 rd November 2009

2 Introduction schema integration – is the process of unification of heterogeneous data sources to obtain a single non redundant, consistent data source schema integration - the process of combining local schemas into a global, integrated schema Examples :  Combining data bases/ tables due to a merger or acquisition of companies.  Combining two products into one and resulting combination of historic sales data.  Creation of new table for employees using employment data and medical records.

3 Introduction (continued)‏ Correspondence- is the matching between the elements of heterogeneous schemas. There is a weight and direction associated to each correspondence based on the confidence of the matches. For example weight of correspondence between A to B might be different from the weight of correspondence between B to A. Previous Schema integration tools provide interactive option for the users to select a desired integrated schema using the surviving correspondences.

4 Problem Previous schema integration techniques  Do not consider direction and weight of correspondence.  Need user interaction for a final integration decision.  Laborious process of selecting easy as well as difficult integration.  Time consuming.  Resource intensive.

5 Problem (continued)‏ Example : Weighted and directed correspondence Options for schema integration

6 Solution 1)Relationships in the integrated schema are defined using direction and weight of correspondence between elements. 2)Such relationships are ranked based on priority of similarity and coverage to produce top k schemas. 3)Easy integrations are adopted without user interaction. 4)For difficult integrations user is provided an option to select constraints on the schemas involved. 5)System generates revised top k schemas that satisfy constraints. 6)Steps 4 and 5 are repeated till final schema is obtained.

7 Example of Easy integration (Integration without user interaction)‏

8 Concept and Concept graph A concept is a relation name associated with a set of attributes in a schema. Correspondences between schemas are expressed using Concept graph. A concept graph is a pair (V, has) where V is a set of concepts and has is a set of directed and labeled edges between concepts. Correspondence of concepts across schemas is defined by the pair of weights (in both directions). Considering pair (C1,C2) of concepts, where C1 is from schema S1 and C2 is from schema S2. The weight of the directed correspondence C1 → C2, can be denoted by ˆs(C1,C2). The weight of the directed correspondence C2 → C1, can be denoted by ˆs(C2, C1). Correspondence of concepts across S1 and S2 is defined by pair [ˆs(C1,C2),ˆs(C2, C1)]

9 Example- Concept graph

10 Assignment An assignment A is a fixed-sized, ordered vector of bits where each bit X represents the state of one correspondence, value 1 representing a correspondence and value 0 representing an absence of correspondence. Set of assignments are ranked to get the top K assignments. For each assignment with value 1 the concepts involved in the respective correspondence should be combined. There are two ways by which concepts can be combined based on the weight and direction of the similarity. The two methods are merge and has A threshold λ is used for deciding which method is to be used for the combination

11 Example of λ effect on integration decision

12 Algorithm

13 Cost function (used to rank assignments)‏ And n is total number of non-zero correspondences Example Assignment with weights

14 Top K algorithm 1)Calculate ^S i and ^D i and assign 1 for correspondences where ^S i > ^D i and 0 where ^S i < ^D i. 2)The result is the optimal assignment for k=1. Next k-1 best assignments is based on decision to flip the bits of assignment vector. 3)Let the vector Δf be the difference between ^S i and ^D i. calculate Δf to quantify the cost impact of flipping the bit i from its current value in the assignment A 1. For each i, Δf represents the increase in cost with respect to cost(A 1 ) if the bit i in A 1 were to be flipped. 4) Sort Δf in increasing order and denote as Δf s. Find the next assignment that minimizes the increase in cost. 5)Now the 2 nd best assignment can be obtained by flipping bit X i that gives the least cost increase. 6)Next compute the 3 rd best assignment, we need to change the variable with the next cost increase and leave X i unflipped. If there are two choices, select the choice that gives smaller cost increase. 7)Other assignments are calculated likewise.

15 Top K Algorithm- Example

16 Tuning λ As stated before is the threshold which is used for combining concepts in an integration based on the following rules Steps to calculate λ 1. iteratively scan all the correspondences in E, where E is the set of correspondences that are selected by at least one of the top k assingments. 2. for each such correspondence, record max(ˆs1, ˆs2) and add this value to a list L, and finally 3. set λ to be the minimum of the values in L.

17 Example of Schema Integration with different λ values

18 Results

19 Conclusion Top K algorithm for schema integration that executes in polynomial time is developed. Important information like weight and direction of the correspondence are efficiently used to reduce user interaction. Easy integrations are performed by the system without any user interaction while keeping the data consistent. Results clearly state that the algorithm can be efficient in reducing user interaction and thus reducing the time taken to achieve complex schema integration. Future work includes automation (integration without user interaction) and enhancements to the algorithm to implement with couple of hundred schemas.

20 Questions ?


Download ppt "Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences by Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal Younis Presented."

Similar presentations


Ads by Google