Presentation is loading. Please wait.

Presentation is loading. Please wait.

LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom.

Similar presentations


Presentation on theme: "LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom."— Presentation transcript:

1 LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom

2  ULDB Data Model and the Trio System  Uncertainty & Lineage  LIVE Data Model (LDM)  Uncertainty, Lineage & Versioning  Data Modifications  Insert/Delete Tuples, Update Values, Update Confidences  Query Evaluation  Valid-At vs. Snapshot Queries, Interval Computations, Confidence Computations, Complexity  Experiments/Conclusions Agenda LIVE - A lineage-supported, versioned DBMS

3 ULDB Data Model LIVE - A lineage-supported, versioned DBMS  Different types of uncertainty:  1. Tuple Alternatives  2. ‘?’ (Maybe) Annotations  3. Confidences  Implementation of the ULDB data model:  Trio System  TriQL query language  TrioExplorer browser frontend, trioplus client, API  Enhanced PostgreSQL backend (SPI)  Search for “Stanford Trio”

4 ULDBs – Alternatives LIVE - A lineage-supported, versioned DBMS  1. Alternatives: uncertainty about attribute values  2. ‘?’ (Maybe) Annotations  3. Confidences Saw (witness, color, car) Amy red, Honda ∥ red, Toyota ∥ orange, Mazda Three possible worlds

5 ULDBs – Maybe Annotations LIVE - A lineage-supported, versioned DBMS Six possible worlds  1. Alternatives  2. ‘?’ (Maybe): uncertainty about tuple presence  3. Confidences ? Saw (witness, color, car) Amy red, Honda ∥ red, Toyota ∥ orange, Mazda Bettyblue, Acura

6 ULDBs – Confidences LIVE - A lineage-supported, versioned DBMS  1. Alternatives  2. ‘?’ (Maybe) Annotations  3. Confidences: weighted uncertainty Six possible worlds, each with a probability ? Saw (witness, color, car) Amy red, Honda 0.5 ∥ red, Toyota 0.3 ∥ orange, Mazda 0.2 Betty blue, Acura 0.6

7 ULDBs – Closure LIVE - A lineage-supported, versioned DBMS Saw (witness, car) Cathy Mazda ∥ Honda Drives (person, car) Jimmy, Toyota ∥ Jimmy, Mazda Billy, Honda ∥ Frank, Honda Hank, Honda Suspects Jimmy Billy ∥ Frank Hank Suspects = π person (Saw ⋈ Drives) ? ? ? Does not correctly capture possible worlds in the result! CANNOT

8 ULDBs – Lineage LIVE - A lineage-supported, versioned DBMS IDSaw (witness, car) 11Cathy Honda ∥ Mazda IDDrives (person, car) 21 Jimmy, Toyota ∥ Jimmy, Mazda 22 Billy, Honda ∥ Frank, Honda 23Hank, Honda IDSuspects 31Jimmy 32 Billy ∥ Frank 33Hank Suspects = π person (Saw ⋈ Drives) ? ? ? λ (31) = (11,2)  (21,2) λ (32,1) = (11,1)  (22,1) λ (33) = (11,1)  23 ; λ (32,2) = (11,1)  (22,2)

9 ULDBs – Summary LIVE - A lineage-supported, versioned DBMS 1. Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences 4. Lineage ULDBs are closed and complete Uncertainty-Lineage Databases (ULDBs)

10  Can exclusively utilize lineage in order to compute the confidence of a result tuple.  #P-complete for general Boolean formulas  Approximation algorithms: Luby-Karp, etc. Lineage & Confidences LIVE - A lineage-supported, versioned DBMS λ (21) = (11  12  13) IDSaw(witness, car) 11(Mary, Honda) : (Susan, Honda) : (Betty, Honda) : 0.5 IDSuspectCars(car) 21 Honda : ? Select distinct car from Saw; P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5) 0.99

11 IDPhoto(Number,Name) 2 11 (1, Amy) [0,1] : (1, Bob) [0,  ] : (2, Carl) [0,1] : (3, Dale) [1,1] : 0.1 Versioning (LDM Data Model) LIVE - A lineage-supported, versioned DBMS  Version intervals for tuples  Contiguous version numbers 0,…,   Database has current version v D  Tuples have a validity intervals [s, e]  Valid-At Queries:  Select * from Photo valid-at 2;  Snapshot Queries:  View Photo at 2;  Possible Worlds:  LDM databases encode lists of sets of possible worlds. IDPhoto(Number,Name) 2 12 (1, Bob) [0,  ] : 0.6 (Number,Name) 12 (1, Bob) : 0.6

12  Insert Tuple:  Insert t with version [v D +1,  ]  commit;  Increase v D Data Modifications – Insert LIVE - A lineage-supported, versioned DBMS IDPeople(Name, State, Job) 0 21 (Bob, NY, Analyst) [0,  ] : (Carl, IL, Teacher) [0,  ] : (David, PA, Manager) [0,  ] : (Frank, CA, Eng.) [1,  ] : 0.3 IDPeople(Name, State, Job) 1 IDPeople(Name, State, Job) 2 25 (David, PA, CEO) [2,  ] : 0.3 (1) (2)

13  Insert Tuple:  Insert t with version [v D +1,  ]  Delete Tuple:  Set end(t) to v D  commit;  Increase v D Data Modifications – Delete LIVE - A lineage-supported, versioned DBMS IDPeople(Name, State, Job) 2 21 (Bob, NY, Analyst) [0,  ] : (Carl, IL, Teacher) [0,  ] : (David, PA, Manager) [0,  ] : (Frank, CA, Eng.) [1,  ] : (David, PA, CEO) [2,  ] : (Carl, IL, Teacher) [0,2] : 1.0 IDPeople(Name, State, Job) 3 (1) (2) (3) (2)

14  Insert Tuple:  Insert t with version [v D +1,  ]  Delete Tuple:  Set end(t) to v D  Update Value:  Set end(t) to v D  Insert t’ with version [v D +1,  ]  commit;  Increase v D Data Modifications – Update LIVE - A lineage-supported, versioned DBMS IDPeople(Name, State, Job) 3 21 (Bob, NY, Analyst) [0,  ] : (Carl, IL, Teacher) [0,2] : (David, PA, Manager) [0,  ] : (Frank, CA, Eng.) [1,  ] : (David, PA, CEO) [2,  ] : (Bob, CA, Student) [4,  ] : (Bob, NY, Analyst) [0,3] : 1.0 (1) (2) (3) (2) (4) IDPeople(Name, State, Job) 4

15  Insert Tuple:  Insert t with version [v D +1,  ]  Delete Tuple:  Set end(t) to v D  Update Value:  Set end(t) to v D  Insert t’ with version [v D +1,  ]  Update Probability:  Set end(t) to v D  Insert t’=t with probability p’ and version [v D +1,  ]  commit;  Increase v D Data Modifications – Update LIVE - A lineage-supported, versioned DBMS IDPeople(Name, State, Job) 4 21 (Bob, NY, Analyst) [0,3] : (Carl, IL, Teacher) [0,2] : (David, PA, Manager) [0,  ] : (Frank, CA, Eng.) [1,  ] : (David, PA, CEO) [2,  ] : (Bob, CA, Student) [4,  ] : 0.3 (1) (2) (3) (2) (4) 21 (Bob, CA, Student) [5,  ] : (Bob, CA, Student) [4,4] : 0.3 (5) IDPeople(Name, State, Job) 5

16  Insert Tuple:  Insert t with version [v D +1,  ]  Delete Tuple:  Set end(t) to v D  Update Value:  Set end(t) to v D  Insert t’ with version [v D +1,  ]  Update Probability:  Set end(t) to v D  Insert t’=t with probability p’ and version [v D +1,  ]  Possible worlds:  Updates may create duplicate worlds, which are merged (a t any version v). Data Modifications – Summary LIVE - A lineage-supported, versioned DBMS IDPeople(Name, State, Job) 4 21 (Bob, NY, Analyst) [0,3] : (Carl, IL, Teacher) [0,2] : (David, PA, Manager) [0,  ] : (Frank, CA, Eng.) [1,  ] : (David, PA, CEO) [2,  ] : (Bob, CA, Student) [4,  ] : 0.3 (1) (2) (3) (2) (4) 21 (Bob, CA, Student) [5,  ] : (Bob, CA, Student) [4,4] : 0.3 (5) IDPeople(Name, State, Job) 5

17  1) Data Computation (regular SQL, including lineage)  2) Interval Computation (stored procedure) Query Evaluation LIVE - A lineage-supported, versioned DBMS D D D 1, D 2, …, D n1 possible worlds at versions Q on each world encoding of possible worlds Q(D 1 ), Q(D 2 ), …, Q(D n ) implementation of Q operational semantics D + Result D 1, D 2, …, D (1) D 1, D 2, …, D (v D ) … (0)

18  Can exclusively utilize lineage in order to compute the confidence of any result tuple.  Can exclusively utilize lineage in order to compute the version interval of any result tuple. Lineage, Confidences & Versions LIVE - A lineage-supported, versioned DBMS

19  Positive Lineage (disjunctions & conjunctions)  In the lineage formula λ (t)  Replace every tuple t’ by its version interval  Replace every  with  and every  with  Version Interval Computation LIVE - A lineage-supported, versioned DBMS λ (21) = (11  12  13) IDSaw(witness, car) 3 11 (Mary, Honda) [1,  ] : (Susan, Honda) [2,  ] : (Betty, Honda) [3,  ] : 0.5 ID SuspectCars(car) 3 21 (Honda) ? : ? Select distinct car from Saw; P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5) [1,  ] : 0.99

20  Positive Lineage (disjunctions & conjunctions)  In the lineage formula λ (t)  Replace every tuple t’ by its version interval  Replace every  with  and every  with  Version & Confidence Computation LIVE - A lineage-supported, versioned DBMS λ (21) = (11  12) IDSaw(witness, car) 3 11 (Mary, Honda) [1,  ] : (Susan, Honda) [2,  ] : (Betty, Honda) [3,  ] : 0.5 ID SuspectCars(car) 3 21 (Honda) [1,  ] : 0.99 Select distinct car from Saw; P(21) = 1 – (1-0.8) X (1-0.9) ID SuspectCars(car) 2 21 (Honda) ? : ? Select distinct car from Saw valid-at 2; [1,  ] : 0.98

21 LIVE - A lineage-supported, versioned DBMS  Can decouple interval computation from data computation  Or: push interval computation into query plans  only when there is no negation. Interval Computations & Query Plans Select R.A from R EXCEPT ( Select R.A from R EXCEPT Select S.A from S ); r=(a) [0,10] u=(a) [0,10] t=(a) [0,10] r=(a) [0,10] s=(a) [5,15] – – Select R.A from R,S Where R.A=S.A; r=(a) [0,10] s=(a) [5,15] t=(a) [5,10] 

22  Positive Lineage (disjunctions & conjunctions)  Version interval computation  PTIME (linear)  Confidence computation  #P-complete  Arbitrary Lineage (including negation)  Version interval computation  PTIME (linear) if all confidences are known  NP-hard if confidences are not known (need to check for idempotence of negated tuples)  Confidence computation  #P-complete Complexity Results LIVE - A lineage-supported, versioned DBMS

23  Probabilistic & versioned TPC-H setting  Queries over Lineitem, Orders tables with varying join selectivity from 0.1% to 1% (6,000-60,000 and1,500-15,000 tuples for Lineitem & Orders)  Update 0.1% to 1% of the input data  Assign probabilities within [0,1] uniform-randomly to tuples  Additional indexes for versioning  Two B + -trees on (start, end) and end points of intervals  Rewrite valid-at & snapshot queries using WHERE (start ≤ v ≤ end) predicates Experiments – Setup LIVE - A lineage-supported, versioned DBMS

24 Experiments – Results (I) LIVE - A lineage-supported, versioned DBMS  Join query  Overhead of versioned system vs. non-versioned system (versions not computed)  Join query  Overhead of computing versions (versioned system) (%)

25 Experiments – Results (II) LIVE - A lineage-supported, versioned DBMS  Join query  Progressive data updates (overwrite multiple times)  Join query  Valid-at queries vs. full version computation

26 Experiments – Results (III) LIVE - A lineage-supported, versioned DBMS  Overhead of version computation, different query types (1% data modified)

27  LDMs are closed and complete  Generalizes to full ULDB data model (including value alternatives & maybe (?) annotations)  Can employ lineage also for update propagations  Supports all of INSERT/DELETE/UPDATE with INTERSECT/UNION/EXCEPT set operations Conclusions LIVE - A lineage-supported, versioned DBMS Lineage UncertaintyVersioning DBMS


Download ppt "LIVE A lineage-supported, versioned DBMS  Anish Das Sarma  Martin Theobald  Jennifer Widom."

Similar presentations


Ads by Google