Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Exchange & Composition of Schema Mappings Phokion G. Kolaitis IBM Almaden Research Center.

Similar presentations


Presentation on theme: "Data Exchange & Composition of Schema Mappings Phokion G. Kolaitis IBM Almaden Research Center."— Presentation transcript:

1 Data Exchange & Composition of Schema Mappings Phokion G. Kolaitis IBM Almaden Research Center

2 2 Joint work with  Ron Fagin, IBM Almaden Research Center  Renee Miller, University of Toronto  Lucian Popa, IBM Almaden Research Center  Wang-Chiew Tan, UC Santa Cruz Studied foundational aspects of schema mappings:  data exchange based on schema mappings  composition of schema mappings

3 3 Schema Mappings & Data Exchange Schema S 1 Schema S 2  12 Schema Mappings are logic-based specifications that describe the relationship between a “source” schema S 1 and a “target” schema S 2 The Data Exchange Problem associated with such a schema mapping M 12 is as follows:  Input: Source instance I 1  Output: Target instance I 2 such that satisfy the specifications of the schema mapping I1I1 I2I2 q

4 4 Main issues in data exchange For a given source instance, there may be more than one target instance satisfying the specifications of the schema mapping. Thus,  When more than one solution exist, which solutions are “better” than others?  How do we compute a “best” solution?  Can the certain answers of target queries be obtained by evaluating them on a “best” solution?

5 5 Schema Mapping Specification Language  The relationship between the source and the target is given by a set Σ 12 of source-to-target tuple generating dependencies (s-t tgds)  (x)   y  (x, y), where   (x) is a conjunction of atoms over the source and   (x, y) is a conjunction of atoms over the target.  Among the most general assertions used in data integration  Generalize LAV (local-as-views) and GAV (global-as-views) specifications in data integration  Equivalent to GLAV (local-and-global-as-views) specifications

6 6 Universal Solutions in Data Exchange We introduced the notion of universal solutions as the “best” solutions in data exchange  By definition, they have homomorphisms to all other solutions (thus, they are the most general solutions). Main Results (FKMP in ICDT 2003)  Universal solutions are unique up to homomorphic equivalence; they represent the entire solution space.  The chase procedure produces a universal solution in polynomial time.  The certain answers of target conjunctive queries can be obtained by evaluation on an arbitrary universal solution

7 7 Composing Schema Mappings Given  12 = (S 1, S 2,  12 ) and  23 = (S 2, S 3,  23 ), derive a schema mapping  13 = (S 1, S 3,  13 ) that is “equivalent” to the successive application of  12 and  23. What is the semantics of composition of schema mappings? What does “equivalent” mean in this context? Schema S 1 Schema S 2 Schema S 3  12  23  13

8 8 Earlier Work Metadata Model Management (Bernstein in CIDR 2003)  Composition is one of the fundamental operators  However, no semantics is given Composing Mappings among Data Sources (Madhavan & Halevy in VLDB 2003)  First to propose a semantics for composition  However, their definition is in terms of maintaining the same certain answers relative to a class of queries.  Their notion of composition depends on the class of queries and may not be unique up to logical equivalence.

9 9 Semantics of Composition  Definition: (FKPT in PODS 2004) A schema mapping  13 is a composition of  12 and  23 if for every instance I 1 of S 1 and every instance I 3 of S 3,   13 if and only if there exists I 2 such that   12 and   23.  In other words, Inst(  13 ) = Inst(  12 )  Inst(  23 ), where Inst(  ) = { |   ST } Thus,  13 defines the composition of the binary relations of the instances associated with  12 and  23. Schema S 1 Schema S 2 Schema S 3  12  23  13

10 10 The Composition of Schema Mappings Fact: If  = (S 1, S 3,  ) and  ’ = (S 1, S 3,  ’) are both compositions of  12 and  23, then  are  ’ are logically equivalent. For this reason:  We say that  (or  ’) is the composition of  12 and  23.  We write  12   23 to denote it Definition: The composition query of  12 and  23 is the set Inst(  12 )  Inst(  23 )

11 11 Issues in Composition of Schema Mappings The semantics of composition was the first main issue. Some other key issues: Is the language of finite sets of s-t tgds closed under composition? That is, if  12 and  23 are specified by finite sets of s-t tgds, is  12   23 also specified by a finite set of s-t tgds? If not, what is the “right” language for composing schema mappings? What is the complexity of the associated composition query?

12 12 Composition: Expressibility & Complexity Σ 12 Σ 23 Σ 13 Composition Query finite set of full s-t tgds  (x)   (x) finite set of s-t tgds  (x)   y  (x, y) finite set of s-t tgds  (x)  y  (x,y) in PTIME finite set of s-t tgds  (x)   y  (x,y) finite set of (full) s-t tgds  (x)   (x) may not be definable: by any set of s-t tgds; in FO-logic; in Datalog in NP; can be NP-complete

13 13 Enrollments Example  12 :  n  c (Takes(n,c)   s Students(n,s))  n  c (Takes(n,c)  Takes 1 (n,c))  23 :  n  s  c (Students(n,s)  Takes 1 (n,c)  Enrollments(s,c)) Implied by the composition. But what if Alice takes 3 courses ?  n  c 1  c 2 ( Takes(n,c 1 )  Takes(n,c 2 )   s (Enrollments(s,c 1 )  Enrollments(s,c 2 )) ) AliceMath AliceArt Takes AliceMath AliceArt Takes 1 Alice1234 Students 1234Math 1234Art Enrollments I1I1 I2I2 I3I3

14 14 Enrollments Example - continued There are infinitely many s-t tgds that are implied by the composition.  12   23 = (S 1, S 3,  13 ), where  13 = { …  n  c 1 …  c k ( Takes(n,c 1 )  …  Takes(n,c k )   s (Enrollments(s,c 1 )  …  Enrollments(s,c k )) ), … } We show that  13 is not equivalent to any finite set of s-t tgds

15 15 Employee Example  12 :   e ( Emp(e)   m Mgr1(e,m) )  23 :   e  m( Mgr1(e,m)  Mgr(e,m) )   e ( Mgr1(e,e)  SelfMgr(e) ) Theorem: The composition  12   23  is not definable by any finite set of s-t tgds;  is not FO-definable;  is not definable in Datalog. Emp e Mgr1 e m Mgr e m SelfMgr e

16 16 Second-Order Tgds Definition: Let S be a source schema and T a target schema. A second-order tuple-generating dependency (SO tgd) is a formula of the form:  f 1 …  f m ( (  x 1 (  1   1 ))  …  (  x n (  n   n )) ), where  Each f i is a function symbol  Each  i is a conjunction of atoms from S and equalities of terms  Each  i is a conjunction of atoms from T Theorem: The composition of two finite sets of s-t tgds is always definable by a SO-tgd.

17 17 Employee Example - revisited  12 :   e ( Emp(e)   m Mgr1(e,m) )  23 :   e  m( Mgr1(e,m)  Mgr(e,m) )   e ( Mgr1(e,e)  SelfMgr(e) ) Fact: The composition is definable by the SO-tgd  13 :   f (  e( Emp(e)  Mgr(e,f(e) )   e( Emp(e)  (e=f(e))  SelfMgr(e) ) )

18 18 Composing SO-Tgds and Data Exchange Theorem:  The composition of two SO-tgds is definable by a SO-tgd  There is a polynomial-time algorithm for composing SO-tgds  The chase procedure can be extended to schema mappings specified by SO-tgds, so that it produces universal solutions in polynomial time  For schema mappings specified by SO-tgds, the certain answers of target conjunctive queries are polynomial-time computable.

19 19 Synopsis of Schema Mapping Composition s-t tgds are not closed under composition. SO-tgds form a well-behaved fragment of second-order logic.  SO-tgds are closed under composition; thus, they are a “good” language for composing schema mappings.  SO-tgds are “chasable”. Polynomial-time data exchange with universal solutions Polynomial-time computation of certain answers of target conjunctive queries. SO-tgds form the basis of the schema-mapping language used in the Criollo metadata management system.

20 20 "The notion of composition of maps leads to the most natural account of fundamental notions of mathematics, from multiplication, addition, and exponentiation, through the basic notions of logic." Conceptual Mathematics by F.W. Lawevere and S.H. Schanuel

21 21 Reduction from 3-Colorability  12   x  y (E(x,y)   u  v (C(x,u)  C(y,v)))   x  y (E(x,y)  F(x,y))  23   x  y  u  v (C(x,u)  C(y,v)  F(x,y)  D(u,v)) Let I 3 = { (r,g), (g,r), (b,r), (r,b), (g,b), (b,g) } Given G=(V, E),  let I 1 be the instance over S 1 consisting of the edge relation E of G G is 3-colorable iff  Inst(  12 )  Inst(  23 ) [Dawar98] showed that 3-colorability is not expressible in L  

22 22 Algorithm Compose(  12,  23 ) Input: Two schema mappings  12 and  23 Output: A schema mapping  13 =  12   23 Step 1: Split up tgds in  12 and  23  C 12 = Emp(e)  (Mgr1(e, f(e))  C 23 = Mgr1(e,m)  Mgr(e,m) Mgr1(e,e)  SelfMgr(e) Step 2: Compose C 12 with C 23   1 : Emp(e 0 )  (e=e 0 )  (m=f(e 0 ))  Mgr1(e,m)   2 : Emp(e 0 )  (e=e 0 )  (e=f(e 0 ))  SelfMgr(e) Step 3: Construct  13  Return  13 = (S 1, S 3,  13 ) where   13 =  f(  e 0  e  m  1   e 0  e  2 )

23 23 Data Exchange with SO tgds Example Let  = (S, T,  ST ) where  ST is:  f(  x  y (R(x,y)  U(x,y,f(x))   x  x’  y  y’ (R(x,y)  R(x’,y’)  (f(x)=f(x’))  T(y,y’)) ) abf(a) ac def(d) ab ac de RU bb bc cb cc ee T

24 24 Data Exchange with SO tgds Example Let  = (S, T,  ST ) where  ST is:  f(  x  y (R(x,y)  U(x,y,f(x))   x  x’  y  y’ (R(x,y)  R(x’,y’)  (f(x)=f(x’))  T(y,y’)) ) abN0N0 acN0N0 deN1N1 ab ac de RU bb bc cb cc ee T


Download ppt "Data Exchange & Composition of Schema Mappings Phokion G. Kolaitis IBM Almaden Research Center."

Similar presentations


Ads by Google