Presentation is loading. Please wait.

Presentation is loading. Please wait.

On Propagation of Deletions and Annotations through Views Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev.

Similar presentations


Presentation on theme: "On Propagation of Deletions and Annotations through Views Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev."— Presentation transcript:

1 On Propagation of Deletions and Annotations through Views Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev Khanna

2 Wang-Chiew Tan, Penn Database Group2 Data Annotations (share annotations) Knowledge sharing through annotations Annotations on data at various levels of granularity, annotations on annotations Improve accuracy of data –data and annotations can be reviewed by independent parties Annotations: –loosely structured Source Data: –proprietary –fixed schema A system that overlays annotations on existing data big business in scientific databases

3 Wang-Chiew Tan, Penn Database Group3 Restaurant CostType Peacock Alley Bull & Bear Pacifica Soho Kitchen & Bar $$$French $$$Seafood $Chinese $ American Restaurant CostType Pacifica Soho Kitchen & Bar $Chinese $ American All Restaurants (View 1) Cheap Restaurants (View 2) Yummy chicken curry!! NYRestaurants (Source Table) Restaurant CostType Peacock Alley Bull & Bear Pacifica Soho Kitchen & Bar Zip $$$French10022 $$$Seafood10022 $Chinese10013 $ American10022 Serves fine French Cuisine in elegant setting. Jackets required. Extensive wine list! Data Annotations (share annotations)

4 Wang-Chiew Tan, Penn Database Group4 Data Annotations Communicate meta data through annotations –bounce or spread annotations around by piggybacking annotations on data items in the source-query-view model. An annotation is placed in the view –where do we place the annotation on source? Annotation placement problem presented in relational setting –results carry over to fragments of XML (hierarchical model) Source: Relational Database View : result of query applied on source Model: Not an easy problem! Query

5 Wang-Chiew Tan, Penn Database Group5 Location and Propagation Rules A location is a triple: (R, t, A) A1A1 A2A2 A3A3 A1A1 A2A2 A3A3 A3A3 A1A1 A2A2 A3A3 A1A1 A2A2 A2A2 A3A3 A1A1 A2A2 A3A3 A1A1 A2A2 A3A3 A1A1 A2A2 A3A3 A1A1 A2A2 A3A3 R R R1R1 R2R2 R1R1 R2R2 relation nametuple in RA is an attribute in schema of R Propagation Rules: –Select: –Project: –Join: –Union:

6 Wang-Chiew Tan, Penn Database Group6 Annotation Placement Problem Annotation Placement Problem: –Given a view V = Q(S) and an annotation A placed in the view V, decide if there is an annotation in the source that when propagated to the view, produces no other annotation except A. Q = query S = data source –side-effect-free annotation : an annotation on the source that produces no other annotation except A in the view S Q V=Q(S)

7 Wang-Chiew Tan, Penn Database Group7 A Dichotomy Theorem (a) It is NP-hard to decide if there is a side-effect-free annotation for a PJ query. (b) There is a polynomial time algorithm for queries which do not simultaneously contain a Project and a Join operation. Theorem: S Q V=Q(S)

8 Wang-Chiew Tan, Penn Database Group8 Project and Join Query Intuition: PJ can encode 3SAT (x 1 + x 2 + x 3 )... ( x 3 + x 5 + x 2 ) x1x1 x2x2 x3x3 C1C1 C1C1 CmCm C1C1... CmCm Query Output Query:Join, then Project on C 1 … C m... C1C1 ddd T - true F - false Assignment tuples: All possible satisfying assignments for C 1 C1C1 C1C1 F F F T F F C1C1 F T F C1C1 T T F C1C1 F F T C1C1 F T T C1C1 T T T Dummy tuple Assignment tuples: All possible satisfying assignments for C m x3x3 x5x5 x2x2 CmCm CmCm CmCm CmCm ddd T F F F T F CmCm T T F CmCm F F T CmCm T F T CmCm F T T CmCm T T T Dummy tuple...

9 Wang-Chiew Tan, Penn Database Group9 Intuition: PJ can encode 3SAT (x 1 + x 2 + x 3 ) … ( x 3 + x 5 + x 2 ) Assignment tuples: All possible satisfying assignments for C 1 x1x1 x2x2 x3x3 C1C1 C1C1 C1C1 C1C1 Assignment tuples: All possible satisfying assignments for C m ddd C1C1... CmCm Output C1C1 CmCm F F F T F F C1C1 F T F C1C1 T T F C1C1 F F T C1C1 F T T x3x3 x5x5 x2x2 CmCm CmCm CmCm CmCm ddd T F F F T F CmCm T T F CmCm F F T CmCm T F T CmCm F T T T - true F - false C1C1 T T T CmCm T T T Dummy tuple Dummy tuples CmCm ddd C1C1... CmCm Query: Join, then Project on C 1 … C m Project and Join Query

10 Wang-Chiew Tan, Penn Database Group10 Related Work on Annotations Superimposed Information ( D. Maier, L. Delcambre [WebDB99]) –data placed over existing information eg. bookmark files, schema of a database Annotation Systems –Annotea ( W3C) annotate web pages location is defined with XPointer –Multivalent Browser (R. Wilensky, T. A. Phelps. UC Berkeley DL Project) annotate on PDF files, HTML, etc. robust locations –BioDAS (Distributed Annotation Server) ( L.Stein et. al ) annotate on genome sequences notion of location is genome specific No one has formally studied annotation placement problem

11 Wang-Chiew Tan, Penn Database Group11 The classical view deletion problem A view tuple is to be deleted –What changes should be made to the source? Many kinds of view-to-source deletion translations –eg. deletion-to-insertion, deletion-to-modification, etc. Update Semantics of Relational Views ( F. Banchilon, N. Spyratos, [TODS81] ) On the correct translation of Update Operations on Relational Views ( U. Dayal, P. Bernstein, [TODS82] ) Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections and Joins ( A. M. Keller, [PODS85] ) –deletion-to-deletion Run-Time translations of View Tuple Deletions Using Data Lineage ( Y. Cui, J. Widom, [2001] ) –exploits lineage information to find side-effect free deletions whenever possible

12 Wang-Chiew Tan, Penn Database Group12 View Deletion Problem (Deletion-to-deletion translation) View Deletion Problem (minimize view side-effect): –Given a view V=Q(S) and a tuple t in V, decide if there is a side- effect free deletion for t –side-effect-free deletion : a set of source tuples whose removal from the database will only remove t from the view Source: Relational Database View : result of query applied on source Query

13 Wang-Chiew Tan, Penn Database Group13 A Dichotomy Theorem (a) It is NP-hard to decide if there is a side-effect free deletion for a PJ or JU query in normal form. (b) There is a polynomial time algorithm to find the set of source deletions with minimum side-effects for all other queries, i.e., queries that involve only S,P,U or S,J operators). Theorem (a) is true even for a constant size PJ query involving only two relations! Theorem: PROJ A,C (R1 JOIN R2)

14 Wang-Chiew Tan, Penn Database Group14 View Deletion: PJ Query It is NP-hard to decide if there is a side-effect free deletion for a PJ query in normal form. AB BC c2c2 x2x2 c2c2 x4x4 c2c2 x5x5 c3c3 x4x4 c3c3 x1x1 c3c3 x3x3 ( x 1 +x 2 +x 3 )(x 2 +x 4 +x 5 )(x 4 +x 1 +x 3 ) R1 R2 AC ac ac1c1 ac3c3 c2c2 c c2c2 c1c1 c2c2 c3c3 PROJ A,C (R1 JOIN R2) c1c1 x2x2 c1c1 x3x3 c1c1 x1x1 ax5x5 ax1x1 ax2x2 ax3x3 ax4x4 c x1x1 c x2x2 c x3x3 c x4x4 c x5x5 For each x i, decide whether to delete (a,x i ) or (x i,c). Theorem:

15 Wang-Chiew Tan, Penn Database Group15 Ongoing and Future Work Implementation of annotation system –on RDBMS special cases of PJ queries with polynomial time algorithm –PJ queries that do not project out key information –on XML –effects on query languages?

16 Wang-Chiew Tan, Penn Database Group16 Do we need an annotation-conscious QL? The same query in different languages, but different annotation behavior Emp(Name, Sal, Dept) [Name:Joe, Sal:50K, Dept:Marketing ] Relational Algebra: Emp JOIN Department SQL: SELECT e.Name, e.Sal, e.Dept, d.Manager FROM Emp e, Department d WHERE e.Dept = d.Dept [Name:Joe, Sal:50k ] Department(Dept, Manager) [Dept:Marketing, Manager:Jane] [Name:Joe, Sal:50K, Dept:Marketing, Manager:Jane] Q 1 = SELECT e.Name, e.Sal FROM Emp e WHERE e.Sal = 50K Q 2 = SELECT e.Name, 50K AS Sal FROM Emp e WHERE e.Sal = 50K Equivalent queries in the same language, but different annotation behavior =a=a

17 Wang-Chiew Tan, Penn Database Group17 Relational algebra seems to suggest a natural set of propagation rules SQL seems to suggest another natural propagation rule –one that is based on variable bindings Not clear how we extend the semantics of query languages so that annotation propagation is well-behaved. Should a query language be annotation-conscious ? OR Should the user be allowed to control which annotation gets propagated to where? Do we need an annotation-conscious QL?

18 Wang-Chiew Tan, Penn Database Group18 End of Talk


Download ppt "On Propagation of Deletions and Annotations through Views Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev."

Similar presentations


Ads by Google