Implementing Mapping Composition Todd J. Green * University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research),

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Relational Algebra Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Efficient Query Evaluation on Probabilistic Databases
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
FALL 2004CENG 351 File Structures and Data Managemnet1 Relational Algebra.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
Data Exchange & Composition of Schema Mappings Phokion G. Kolaitis IBM Almaden Research Center.
1 Relational Algebra. 2 Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model supports.
Lecture 4: Relational algebra
CS 4432query processing1 CS4432: Database Systems II.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Rutgers University Relational Algebra 198:541 Rutgers University.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 10 Slide 1 Formal Specification.
Relational Algebra, R. Ramakrishnan and J. Gehrke (with additions by Ch. Eick) 1 Relational Algebra.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
Chapter 1 Overview of Database Concepts Oracle 10g: SQL
Lecture 2 An Overview of Relational Database IST 318 – DB Admin.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Enabling Ontology Evolution in Data Integration Haridimos Kondylakis Dimitris Plexousakis Yannis Tzitzikas Computer Science Department, University of Crete.
1 Relational Algebra. 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of data from a database. v Relational model supports.
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
CS 4432query processing1 CS4432: Database Systems II Lecture #11 Professor Elke A. Rundensteiner.
1 Relational Algebra and Calculas Chapter 4, Part A.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
Database Management Systems 1 Raghu Ramakrishnan Relational Algebra Chpt 4 Xin Zhang.
Relational Algebra.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
Database Management Systems 1 Raghu Ramakrishnan Relational Algebra Chpt 4 Xin Zhang.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
CMPT 258 Database Systems Relational Algebra (Chapter 4)
Database Management Systems 1 Raghu Ramakrishnan Relational Algebra Chpt 4 Jianping Fan.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
1 CS122A: Introduction to Data Management Lecture #7 Relational Algebra I Instructor: Chen Li.
Relational Algebra. CENG 3512 Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model supports.
Relational Algebra Chapter 4 1.
Chapter 2: Intro to Relational Model
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra.
Relational Algebra 1.
LECTURE 3: Relational Algebra
Relational Algebra Chapter 4 1.
Lesson 1.1 How do you evaluate algebraic expressions and powers?
Implementing Mapping Composition
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Semantic Adaptation of Schema Mappings when Schemas Evolve
CENG 351 File Structures and Data Managemnet
Relational Algebra Chapter 4 - part I.
Presentation transcript:

Implementing Mapping Composition Todd J. Green * University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research), Alan Nash (UC San Diego) VLDB 2006Seoul, Korea *Work partially supported by NSF grants IIS and IIS

2 Mapping: a correspondence between instances of different schemas Schema mappings Students Name, Address Names SID, Name Addresses SID, Address m S1S1 S2S2 Students   Name,Address (Names ⋈ Addresses)

3 Schema evolution Applications of mappings Students Name, Address, Country Names SID, Name Addresses SID, Address, Country... m 12 m 23 S3S3 S2S2 Names SID, Name Local SID, Address Foreign SID, Address, Country Names  Names σ Country = KR (Addresses)   SID,Address (Local) £ { KR } σ Country  KR (Addresses)  Foreign S1S1 Students   Name,Address,Country (Names ⋈ Addresses)

4 Data integration, data exchange Applications of mappings Students Name, Address, Country Names SID, Name Addresses SID, Address, Country... m1m1 mnmn S1S1 Names SID, Name Foreign SID, Address, Country Local SID, Address Students   Name,Address (Names ⋈ Addresses) Names  Names Local   SID,Address (  Country = KR (Addresses)) Foreign   Country  KR (Addresses) S n−1 SnSn

5 Requirements for constraints “First attribute in R is a key for R”  2,4 (R ⋈ 1=3 R) µ  2,2 (R) “View V equals R joined with S” V µ R ⋈ S, V ¶ R ⋈ S “Second attribute of R is a foreign key in S”  2 (R) µ  1 (S)  2,4 (S ⋈ 1=3 S) µ  2,2 (S) Data integration, data exchange – GLAV R ⋈ S µ T ⋈ U

6 Names SID, Name Addresses SID, Address, Country S2S2 Students Name, Address, Country Names SID, Name Local SID, Address Foreign SID, Address, Country m 12  m 23 Students   Name,Address, Country (Names ⋈  (  SID,Address (Local) £ { KR } [ Foreign)) Mapping composition S1S1 S3S3 m 12 Students   Name,Address,Country (Names ⋈ Addresses) Names  Names σ Country = KR (Addresses)   SID,Address (Local) £ { KR } σ Country  KR (Addresses)  Foreign m 23

7 Composition is hard Hard part: write composition in the same language as the input mappings. Depending on language: Not always possible Not even decidable whether possible Strategy 1: use powerful (second-order) mapping language closed under composition [FKPT04] Not supported by DBMS today Expensive to check Source-target restriction Strategy 2: settle for partial solutions [NBM05] Containment mappings  easier integration with DBMS The strategy we adopt in this work

8 Our contributions New algorithm for composition problem Incorporates view unfolding and left- composition (new technique) Makes best effort in failure cases Algebraic rather than logic-based mappings Use of monotonicity to handle more operators Modular and extensible factoring of algorithm First implementation of composition Experimental evaluation

9 ) R ⊆  (U) ⋈ (V - W) Formal definition of composition Mapping: set of pairs of instances of db schemas The composition m 12 ± m 23 is the mapping { h A,C i : ( 9 B)( h A,B i 2 m 12 and h B,C i 2 m 23 )} where A,B,C are instances of S 1, S 2, S 3 Composition problem: find constraints in same language as input mappings giving the composition of the input mappings Example: S 1 = {R}, S 2 = {S,T}, S 3 = {U,V,W} R ⊆ S ⋈ T, S ⊆  (U), T = V – W R(∙,∙,∙) S(∙,∙) T(∙,∙) U(∙,∙,∙) V(∙,∙) S1S1 S2S2 S3S3 m 12 m 23 R ⊆ S ⋈ T S ⊆  (U), T = V – W W(∙,∙)

10 Best-effort composition problem Composition not always possible “Best-effort” composition problem: compute set of constraints equivalent to input constraints, but with as many symbols from S 2 eliminated as possible R ⊆ U,R ⊆ V,  1,4 (  2=3 (U  U)) ⊆ U,  1,4 (  2=3 (V  V)) ⊆ V, U ⊆ T, V ⊆ T Can eliminate U (cross out left column) or V (right column), but not both [NBM05]

11 Composition algorithm overview For each relation R in S 2 Try to eliminate R via (1) view unfolding Replace = by pairs of ⊆, ⊇ For each relation R in S 2 not yet eliminated Try to eliminate R via (2) left compose Else, try to eliminate R via (3) right compose Output: New constraints and list of relations successfully eliminated

12 (1) View unfolding Idea: exploit equality constraints (if we have any) Standard technique: substitute view definition for occurrences of view relation in mappings T = V – W, R ⊆ S ⋈ T, T  X ⊆  (U)  R ⊆ S ⋈ (V – W), (V – W)  X ⊆  (U) Body must not mention view relation itself Doesn’t matter what else is in body Can substitute everywhere

13 (2) Left compose “View unfolding” for containment constraints  (V) ⊆ R – U, R ⊆ S ⋈ T  (V) ⊆ (S ⋈ T) – U Needs monotonicity of expressions in R. E 1 ⊆ E 2 (R), R ⊆ E 3 ´ E 1 ⊆ E 2 (E 3 ) if E 2 (R) is monotone in R (and R not in E 3 ) Partial check for monotonicity “Is S – (T – R) monotone in R?”

14 Normalization for left compose Need one constraint of form R ⊆ E 1 Use identities to normalize, e.g.: R ⊆ E 1 and R ⊆ E 2 iff R ⊆ E 1  E 2 E 1  E 2 ⊆ E 3 iff E 1 ⊆ E 3 and E 2 ⊆ E 3  (E 1 ) ⊆ E 2 iff E 1 ⊆ E 2  D r More identities in paper After left compose, try to eliminate D

15 (3) Right compose Dual to left compose, from [NBM05] Example: S ⋈ T  R, R – U  (V)  (S ⋈ T) – U   (V) Monotonicity check needed here too Normalization may introduce Skolem functions E 1   (E 2 ) iff f(E 1 )  E 2 Must eliminate Skolem functions after composition Lots of effort coding this step!

16 User-defined operators User specifies: Monotonicity of operator in its arguments “If E 1 monotone in R and E 2 antimonotone in R or independent of R, then E 1 * E 2 monotone in R” “if E 1 monotone in R or independent of R and E 2 antimonotone in R, then E 1 * E 2 monotone in R” Identities for normalization “E 1 * E 2  E 3 iff E 1  E 2  E 3 ” User-defined operators and standard relational operators treated uniformly

17 Implementation 12K lines of C# code, command-line tool # Test case 13: PODS05 example 2 SCHEMA R(2), S(2), T(2) CONSTRAINTS R <= S, P_{0,2} J_{0,1:1,2} (S S) <= R, S <= T ELIMINATE S; Output: P_{0,2} J_{0,1:1,2}(R R) <= R, R <= T

18 Experimental evaluation First attempt at a composition benchmark Schema editing and schema reconciliation scenarios “Add a column to R to produce S”:  (R) = S Measure % of symbols eliminated Running time As a function of Editing primitives allowed, length of edit sequence, presence/absence of keys, starting schema size, … Synthetic data

19 Summary of results Algorithm often effective in eliminating most or even all relation symbols from S 2 Running time in subsecond range even for large problems containing hundreds of constraints Certain schema editing primitives problematic Key constraints did not reduce effectiveness, although did increase running time (and output size)

20 Schema editing Random starting schema (30 relations of 2-10 attributes) 100 random edits 100 different runs, sorted by execution time

21 Schema reconciliation (1) Random schema (30 relations of 2-10 attributes), random edits Point represents median time of reconciliation step of 500 runs

22 Schema reconciliation (2) Random schema (variable # relations of 2-10 attributes) 100 random edits 100 different runs, sorted by execution time

23 Related work [MH03] J. Madhavan, A. Y. Halevy. Composing mappings among data sources. VLDB, [FKPT04] R. Fagin, Ph. G. Kolaitis, L. Popa, W.C. Tan. Composing schema mappings: second-order dependencies to the rescue. PODS, [NBM05] A. Nash, P. A. Bernstein, S. Melnik. Composition of mappings given by embedded dependencies. PODS, 2005.

24 Conclusion and future work We motivated and described the mapping composition problem We presented an implementation of a practical new algorithm for the composition problem We also presented an experimental evaluation To do: theoretical analysis of impact of user- defined operators To do: output constraints from algorithm can be a mess! How to clean up?