Presentation is loading. Please wait.

Presentation is loading. Please wait.

ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.

Similar presentations


Presentation on theme: "ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION."— Presentation transcript:

1 ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION

2 Outline  Introduction to model management and motivation  The merge operator  The ModelGen operator  The Invert operator

3 Model Management Operators  We saw operators for creating mappings between pairs of schemas.  But you can imagine other operators on schemas and mappings:  Merge schemas, compose and invert mappings, translate schemas from one data model to another  In fact, imagine an entire algebra of operators that apply to schemas and to mappings:  Many common workflows can be formulated as a sequence of such operators [Bernstein, 2000]  Note: “model” = “schema”. More terminology coming soon.

4 Example of Model Management (1)  In a data integration scenario, you may proceed as follows, beginning with sources S 1 and S 2 :  Use a match operator to create a mapping between S 1 and S 2  Use merge to create a merged (mediated) schema of S 1 and S 2 with mappings. Merge will create the minimal schema that includes both S 1 and S 2.

5 Example of Model Management (2)  Suppose we have another source S 3, which is very similar to S 1.  We could first use match to create a mapping from S 1 to S 3  Then use compose to create a mapping from S 3 to the mediated schema G.

6 Operators  Match: see previous chapters  Merge: create a merged schema of S 1 and S 2 w.r.t. a mapping M 12  ModelGen: create an equivalent model but in a different data model (e.g., relational  XML)  Invert: given M 12, create M 21  Diff: find the difference between two models (see bibliography)

7 Some Terminology  Model: a specific description of a set of data in a given data model.  Meta model: a data model, such as relational schema, XML DTD, java class definitions, …  Meta-meta-model: a generic language that is independent of a particular meta-model  Usually, some a graph-based formalism.

8 Outline Introduction to model management and motivation  The merge operator  The ModelGen operator  The Invert operator

9 The Merge Operator  Given  Two models, M 1 and M 2  A mapping from M 1 to M 2  Create:  A merged model M 12 that contains only the information in M 1 and M 2, but does not repeat information that is in both  Mappings from M 1 and M 2 to M 12  Challenge to many model management operators:  Can you develop algorithms that are generic, i.e., not specific to particular data models?

10 Merge Challenges: Example  Challenge 1: different attribute representations. Resolution should be part of the input mappings.

11 Merge Challenges: Example  Challenge 2: merging models of different data models. (What if one data model supports sub- attributes and another doesn’t?)  See ModelGen.

12 Merge Challenges: Example  Challenge 3: “fundamental conflicts”. Zipcode is an integer in one model and string in another. Merged model cannot have both:  Solutions depend on particular conflict and data models involved.

13 Outline Introduction to model management and motivation The merge operator  The ModelGen operator  The Invert operator

14 The ModelGen Operator  Transform a schema from one meta-model (e.g,. Java object model, relational, XML) to another meta- model.  Main challenge: features that exist in the source meta-model may not exist in the target (e.g., sub- classes and inheritance).  The need for ModelGen is very common in practice and is used by several of the other operators.

15 ModelGen Example Java classes  relational tables No classes or inheritance in the relational model

16 ModelGen Strategy  Possible to design specific transformations from one meta-model to another, but we want a generic approach.  Design a super meta-model that has (almost) all features that exist in the meta-models.  The super meta-model knows which features are present in each meta-model.  The algorithm will translate a given model into the super meta-model and from there to the target meta-model.

17 ModelGen Algorithm  Input: model M 1 in meta-model MM1  Output: a model M 2 in meta-model MM2 that is equivalent to M 1.  Transform M 1 to the super-model, yielding M’.  While M’ includes features that are not present in MM2, apply transformations to remove these features (e.g., remove class hierarchy by translating it to multiple vertically partitioned tables)  Transform M’ into M 2

18 Outline Introduction to model management and motivation The merge operator The ModelGen operator  The Invert operator

19 The Invert Operator  Schema mappings are often directional:  They map data in source schema into a target schema.  Natural question:  Can we find an inverse mapping?  But what is the right definition of inverse.  We’ll see a couple of failed attempts before we see a good one.  Note: algorithms here are not generic. Highly dependent on the meta-model.

20 Invert Definition: Attempt 1  Given a mapping M between a source S and target T.  M defines a relation between pairs of instances (I,J) that are consistent with each other:  I is an instance of S, J is an instance of T.  Hence, a natural definition is: M -1 should define the relation (J,I), where (I,J) in M.  However, inverses defined this way will not be expressible with tuple-generating dependencies/GLAV mappings.  Why? See next slide.

21 Attempt #1 Problem Explained  Any relation defined by TGDs is closed up on the right and closed down on the left.  Formally, assume  (I,J) is in M  I’ is a subset of I, J is a subset of J’, then  (I’, J’) is also in M.  However, by definition, M’ would have to be closed up on the left and closed down on the right  Hence, cannot be defined with TGDs or GLAV.

22 Invert Definition: Attempt 2  Definition by composition:  M composed with M’ should be the identity mapping!  However, it can be shown that under that condition, a mapping has an inverse only if the following holds:  If I 1 and I 2 are two distinct instances of S, then their targets under M should be distinct instances of T.  The above result considerably limits the mappings that have inverses. m 1 and m 2 won’t have inverses:

23 Third Time’s a Charm: Quasi inverses  Define equivalence between two instances w.r.t. M as:  Define M’ to be the quasi-inverse of M if the composition of M and M’ always maps I to an instance I’ such that  Example: So m is a quasi-inverse of m’

24 Summary of Chapter 6  Generic model management operators save a lot of repetitive code and can result in several forms of efficiency gains  Employing such operators also ensures that applications think carefully about the meaning of what they are doing.  Two main open challenges:  Can the implementation of these operators be described in a meta-model independent fashion?  Is model management a system in itself that should be built or should operator implementations be individual services?


Download ppt "ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION."

Similar presentations


Ads by Google