1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania Principles of Provenance (PrOPr) Philadelphia, PA June 26, 2007.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh.
Presented by: Ms. Maria Estrellita D. Hechanova, ECE
Daniel Deutch Tel Aviv Univ. Tova Milo Tel Aviv Univ. Sudeepa Roy Univ. of Washington Val Tannen Univ. of Pennsylvania.
ANHAI DOAN ALON HALEVY ZACHARY IVES CHAPTER 14: DATA PROVENANCE PRINCIPLES OF DATA INTEGRATION.
Provenance analysis of algorithms 10/1/13 V. Tannen University of Pennsylvania 1WebDam someTowards ?
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
Basic Structures: Sets, Functions, Sequences, Sums, and Matrices
Annotated XML: Queries and Provenance Nate Foster T.J. Green Val Tannen University of Pennsylvania PODS ’08 Vancouver, B.C. June 11, 2008.
Efficient Query Evaluation on Probabilistic Databases
Containment of Conjunctive Queries on Annotated Relations Todd J. Green University of Pennsylvania March 25, ICDT 09, Saint Petersburg.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 Lecture 11: Provenance and Data privacy December 8, 2010 Dan Suciu -- CSEP544 Fall 2010.
An Overview of Data Provenance
1 2. Constraint Databases Next level of data abstraction: Constraint level – finitely represents by constraints the logical level.
1 Section 10.1 Boolean Functions. 2 Computers & Boolean Algebra Circuits in computers have inputs whose values are either 0 or 1 Mathematician George.
Cs3431 Relational Algebra : #I Based on Chapter 2.4 & 5.1.
1 Provenance in O RCHESTRA T.J. Green, G. Karvounarakis, Z. Ives, V. Tannen University of Pennsylvania Principles of Provenance (PrOPr) Philadelphia, PA.
Propositional Calculus Math Foundations of Computer Science.
Boolean Functions.
Section Section Summary Introduction to Boolean Algebra Boolean Expressions and Boolean Functions Identities of Boolean Algebra Duality The Abstract.
Approximated Provenance for Complex Applications
Boolean Algebra – I. Outline  Introduction  Digital circuits  Boolean Algebra  Two-Valued Boolean Algebra  Boolean Algebra Postulates  Precedence.
Systems Architecture I1 Propositional Calculus Objective: To provide students with the concepts and techniques from propositional calculus so that they.
Logic Gates & Boolean Algebra Chin-Sung Lin Eleanor Roosevelt High School.
Combinational Logic 1.
Boolean Algebra and Computer Logic Mathematical Structures for Computer Science Chapter 7.1 – 7.2 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Boolean.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
SQL Part I: Standard Queries. COMP-421: Database Systems - SQL Queries I 2 Example Instances sid sname rating age 22 debby debby lilly.
Reconcilable Differences Todd J. GreenZachary G. IvesVal Tannen University of Pennsylvania March 24, ICDT 09, Saint Petersburg.
These courseware materials are to be used in conjunction with Software Engineering: A Practitioner’s Approach, 6/e and are provided with permission by.
Propositional Calculus CS 270: Mathematical Foundations of Computer Science Jeremy Johnson.
CompSci 102 Discrete Math for Computer Science
Webdamlog and Contradictions Daniel Deutch Tel Aviv University Joint work with Serge Abiteboul, Meghyn Bienvenu, Victor Vianu.
Logic Gates & Boolean Algebra Chin-Sung Lin Eleanor Roosevelt High School.
Advanced Relational Algebra & SQL (Part1 )
Extra slides for Chapter 3: Propositional Calculus & Normal Forms Based on Prof. Lila Kari’s slides For CS2209A, 2009 By Dr. Charles Ling;
Discrete Mathematics CS 2610 September Equal Boolean Functions Two Boolean functions F and G of degree n are equal iff for all (x 1,..x n )  B.
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI Dale Roberts, Lecturer Computer Science, IUPUI
© BYU 03 BA1 Page 1 ECEn 224 Boolean Algebra – Part 1.
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania PODS 2007.
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
Provenance for Database Transformations Val Tannen University of Pennsylvania Joint work with J.N. Foster T.J. Green G. Karvounarakis IPAW ’08 Salt Lake.
Update Exchange with Provenance Schemas are related by GLAV schema mappings (tgds) : M4: Domain_Ref(SrcID, 'Interpro', ITAcc), Entry2Meth(ITAcc, DBAcc,
Querying and storing XML
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Chapter 2 1. Chapter Summary Sets (This Slide) The Language of Sets - Sec 2.1 – Lecture 8 Set Operations and Set Identities - Sec 2.2 – Lecture 9 Functions.
1 Ullman et al. : Database System Principles Notes 6: Query Processing.
1 Working Models for Uncertain Data Anish Das Sarma, Omar Benjelloun, Alon Halevy, Jennifer Widom Stanford InfoLab.
Chapter 12. Chapter Summary Boolean Functions Representing Boolean Functions Logic Gates Minimization of Circuits (not currently included in overheads)
Logic Gates & Boolean Algebra
Digital Technology.
A Course on Probabilistic Databases
Basic Rules Of ALGEBRA.
Boolean Algebra – Part 1 ECEn 224.
STRUCTURE OF PRESENTATION :
Queries with Difference on Probabilistic Databases
Propositional Calculus: Boolean Algebra and Simplification
Set Operations Section 2.2.
Discrete Mathematics CS 2610
Boolean Algebra Introduction CSCI 240
Chapter 2: Intro to Relational Model
Lecture 5 Binary Operation Boolean Logic. Binary Operations Addition Subtraction Multiplication Division.
Logic Logic is a discipline that studies the principles and methods used to construct valid arguments. An argument is a related sequence of statements.
On Provenance of Queries on Linked Web Data
Probabilistic Databases with MarkoViews
Lesson 3.3 Writing functions.
Presentation transcript:

1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania Principles of Provenance (PrOPr) Philadelphia, PA June 26, 2007

2 PROPR 2007 Provenance ● First studied in data warehousing ▪ Lineage [Cui,Widom,Wiener 2000] ● Scientific applications (to assess quality of data) ▪ Why-Provenance [Buneman,Khanna,Tan 2001] ● Our interest: P2P data sharing in the O RCHESTRA system (project headed by Zack Ives) ▪ Trust conditions based on provenance ▪ Deletion propagation

3 PROPR 2007 Annotated relations ● Provenance: an annotation on tuples ● Our observation: propagating provenance/lineage through views is similar to querying ▪ Incomplete Databases (conditional tables) ▪ Probabilistic Databases (independent tuple tables) ▪ Bag Semantics Databases (tuples with multiplicities) ● Hence we look at queries on relations with annotated tuples

4 PROPR 2007 semantics: a set of instances Incomplete databases: boolean C-tables a b c p d b e r f g e s a b c f g e { } I(R)=,,,,,,, ; d b ea b cf g e d b e f g e a b c d b e f g e R a b c d b e boolean variables

5 PROPR 2007 Imielinski & Lipski (1984): queries on C -tables R a b cp d b er f g es a c(p Æ p) Ç (p Æ p) a ep Æ r d cr Æ p d e(r Æ r) Ç (r Æ r) Ç (r Æ s) f e(s Æ s) Ç (s Æ s) Ç (s Æ r) p p Æ r r s = q(R) q(x,z) :- R(x, _,z), R( _, _,z) q(x,z) :- R(x,y, _ ), R( _,y,z) r r rr s union of conjunctive queries (UCQ) a c f e p=true r=false s=true

6 PROPR 2007 Why-provenance/lineage a b cp d b er f g es R Which input tuples contribute to the presence of a tuple in the output? q(R) tuple ids same query a c{p} a e{p,r} d c{p,r} d e{r,s} f e{r,s} [Cui,Widom,Wiener 2000] [Buneman,Khanna,Tan 2001]

7 PROPR 2007 C – tables vs. Why-provenance a c ({p}  {p})  ({p}  {p}) a e {p}  {r} d c {r}  {p} d e ({r}  {r})  ({r}  {r})  ({r}  {s}) f e ({s}  {s})  ({s}  {s})  ({s}  {r}) a c (p Æ p) Ç (p Æ p) a e p Æ r d c r Æ p d e (r Æ r) Ç (r Æ r) Ç (r Æ s) f e (s Æ s) Ç (s Æ s) Ç (s Æ r) c-table calculations Why-provenance calculations The structure of the calculations is the same!

8 PROPR 2007 Another analogy, with bag semantics a b c2 d b e5 f g e1 R tuple multiplicities a c 2 ¢ ¢ 2 a e 2 ¢ 5 d c 5 ¢ 2 d e 5 ¢ ¢ ¢ 1 f e 1 ¢ ¢ ¢ 5 q(R) a c8 a e10 d c10 d e55 f e7 multiplicity calculations same query a c (p Æ p) Ç (p Æ p) a e p Æ r d c r Æ p d e (r Æ r) Ç (r Æ r) Ç (r Æ s) f e (s Æ s) Ç (s Æ s) Ç (s Æ r) c-table calculations The structure of the calculations is the same!

9 PROPR 2007 Abstracting the structure of these calculations These expressions capture the abstract structure of the calculations, which encodes the logical derivation of the output tuples We shall use these expressions as provenance C-tablesBagsWhy-provenanceAbstract join Æ¢[¢ union Ç + [ + a c (p ¢ p) + (p ¢ p) a e p ¢ r d c r ¢ p d e (r ¢ r) + (r ¢ r) + (r ¢ s) f e (s ¢ s) + (s ¢ s) + (s ¢ r) abstract calculations

10 PROPR 2007 Positive K -relational algebra ● We define an RA+ on K -relations: ▪ The ¢ corresponds to join: ▪ The + corresponds to union and projection ▪ 0 and 1 are used for selection predicates ▪ Details in the paper (but recall how we evaluated the UCQ q earlier and we will see another example later)

11 PROPR 2007 RA+ identities imply semiring structure! ● Common RA+ identities ▪ Union and join are associative, commutative ▪ Join distributes over union ▪ etc. (but not idempotence!) These identities hold for RA+ on K -relations iff (K, +, ¢, 0, 1) is a commutative semiring (K,+,0) is a commutative monoid (K, ¢,1) is a commutative monoid ¢ distributes over +, etc

12 PROPR 2007 Calculations on annotated tables are particular cases ( B, Ç, Æ, false, true) usual relational algebra ( N, +, ¢, 0, 1) bag semantics (PosBool(B), Ç, Æ, false, true) boolean C-tables ( P ( ­ ), [, Å, ;, ­ ) probabilistic event tables ( P (X), [, [, ;, ; ) lineage/why-provenance

13 PROPR 2007 Provenance Semirings ● X = {p, r, s, …}: indeterminates (provenance “tokens” for base tuples) ● N [X] : multivariate polynomials with coefficients in N and indeterminates in X ● ( N [X], +, ¢, 0, 1) is the most “general” commutative semiring: its elements abstract calculations in all semirings ● N [X] –relations are the relations with provenance! ▪ The polynomials capture the propagation of provenance through (positive) relational algebra

14 PROPR 2007 A provenance calculation a b cp d b er f g es R a c2p 2 a epr d cpr d e2r 2 + rs f e2s 2 + rs q(R) a c{p} a e{p,r} d c{p,r} d e{r,s} f e{r,s} Why-provenance ● Not just why- but also how-provenance (encodes derivations)! ● More informative than why-provenance same why-provenance, different polynomials q(x,z) :- R(x, _,z), R(_, _,z) q(x,z) :- R(x,y, _), R(_,y,z)

15 PROPR 2007 Further work ● Application: P2P data sharing in the O RCHESTRA system: ▪ Need to express trust conditions based on provenance of tuples ▪ Incremental propagation of deletions ▪ Semiring provenance itself is incrementally maintainable ● Future extensions: ▪ full relational algebra: For difference we need semirings with “proper subtraction” ▪ richer data models: nested relations/complex values, XML