Download presentation

Presentation is loading. Please wait.

Published byMargery Lambert Modified over 2 years ago

1
Annotated XML: Queries and Provenance Nate Foster T.J. Green Val Tannen University of Pennsylvania PODS ’08 Vancouver, B.C. June 11, 2008

2
Need to Track XML Provenance For scientific data processing [Buneman+ 08] – Tree-structured data, heterogeneous sources – XML is the natural data model – Data annotated with source info; annotations need to be propagated during query processing For incomplete/probabilistic data [Senellart&Abiteboul 07] – Query output annotated with Boolean formulas – Annotations indicate logical correlations between source data and output data For data warehousing [Cui+ 00] – Even when data is relational, often have XML views 2

3
Background: Provenance for Relational Algebra Views 3 ABC abc dbe fge AC ac ae dc de fe V := ¼ AC ( ¼ AB (R) ⋈ ¼ BC (R)) source R view V ? ? ? ?

4
Background: Semiring-Annotated Relations [G.,Karvounarakis,Tannen 07] Associate each tuple in database with an annotation from a commutative semiring (K, +, ¢, 0, 1) – + and ¢ are abstract operations Combine and propagate annotations during (positive) relational query processing – ⋈, £, Å combine annotations using ¢ – ¼, [ combine annotations using + – ¾ multiplies annotations by 0 or 1 4

5
Background: Annotated Relations Example 5 ABC abcp dber fges R AB ac2p22p2 aepr dc de2r 2 + rs fe2s 2 + rs V V := ¼ AB (( ¼ AC (R) ⋈ ¼ BC (R)) [ ( ¼ AB (R) ⋈ ¼ BC (R)))

6
Background: Semiring Bestiary ( B, Ç, Æ, ?, > )Set semantics ( N, +, ¢, 0, 1)Bag semantics (PosBool(B), Ç, Æ, ?, > )Incomplete dbs ( P ( ), [, Å, ;, )Probabilistic dbs ( P ( P (X)), [, d, ;, { ; })Why-provenance where A d B := {a [ b : a 2 A, b 2 B} ( N [X], +, ¢, 0, 1)Prov. polynomials “most informative” (universal) 6

7
Our Contribution: Annotated XML Data model: unordered XML data with semiring annotations (K-UXML) Query language: positive, unordered XQuery fragment (K-UXQuery) Semantics: how queries operate on annotated data Correctness – Sanity checks: agrees with encoded relational queries, bag semantics, probabilistic XML,... – Main theorem: commutation with homomorphisms Applications: security, incomplete databases,... 7

8
K-UXML No attributes, no text values, no repeated children (inessential); no order (essential!) Each subtree decorated with a value k from semiring K (1 “neutral,” 0 “not present”) K-collection: a finite set of elements annotated with values from K The child subtrees of a node form a K- collection 8

9
c b c b c ad c ad In NRC K : { h a, { h b, { h a, { h c, {} i y 3, h d, {} i 1 } i 1 } i x 1, h c, {...} i y 1 } i 1 } K-UXML Example 9 a bx1bx1 cy3cy3 cy1cy1 ad a cy2cy2 bx2bx2 d a bc a d 1 1y3y3 x1x1 1 y1y1 y2y2 x2x2 1 ´ Annotations are on elements of K-collections. There are 5 K-collections in this tree (all colored differently). To annotate whole tree, must include in singleton K- collection.

10
K-UXQuery Syntax Based on Core XQuery [W3C] –if... then... else... instead of where – nested for loops instead of complex XPath /a/b/c We added one new construct: annot k p – Construct any K-UXML document via K-UXQuery 10

11
Semantics for K-UXQuery How do annotations propagate through these query constructs? We adopt a principled approach that leads to a compositional semantics and makes previous work on relations a precise special case We do this by translation to Nested Relational Calculus (NRC) with a tree type and annotations (NRC details in paper; illustrated here by example on UXQuery) 11

12
a dudu x b dvdv ewew y c f z,, K-UXQuery Semantics: for -Loops 12 Answer: axax dudu byby dvdv, czcz f, ewew d xu + yv, e yw, f z Computation: axax dudu byby dvdv czcz f, ewew, Source, $S: d xu, d vy, e yw, f z x d u, y d y, y e w, z f Query: for $t in $S return $t/*

13
for -loops Example With K = N 13 Answer: a d b2b2 d2d2, c3c3 f, e d 1+2 ¢ 2 = 5, e 2, f 3 Source, $S: Query: for $t in $S return $t/* d, d, d, d, d, e, e, f, f, f i.e., 5 d’s appear as children in source; 5 d’s in answer a d b d, c f, ed, c f b d, ed i.e., c f,

14
Annotation of result is a sum over products of annotations along paths to root K-UXQuery Semantics: // Operator 14 Source, $S: r c x 1 ¢ y 3 + y 1 ¢ y 2 cy1cy1 d a cy2cy2 bx2bx2 Answer: Query: $S//c a bx1bx1 cy3cy3 cy1cy1 ad a cy2cy2 bx2bx2 d

15
Data annotated with clearance levels from total order C : P < C < S < T < 0 Joint use of data ( ¢ ) requires access to both (max of clearances); alternative use of data (+) requires access to either (min of clearances) ( C, min, max, 0, P) is a commutative semiring p d min(max(P,C,C), max(P,C,S)) e max(P,C,T) Application: Access Control 15 Query: $S/*/* bCbC dCdC cCcC dSdS eTeT a dCdC eTeT p

16
For any given clearance level (e.g., C), want the following diagram to commute: Security Condition: Non-Interference 16 query erase > C a bCbC dCdC cCcC dSdS eTeT p dCdC eTeT p dCdC a dCdC bCbC cCcC

17
Application: Incomplete XML Data annotated with Boolean expressions; tree T represents set of possible worlds Rep(T) 17 T = a b cycy cxcx a d a czcz b d a b c c ad a cb d Rep(T) = a b a d a b c a d a bc ad a b d,,,..., 7 possible worlds

18
Correctness: Possible Worlds 18 For every incomplete tree T, and every UXQuery query q, want this diagram to commute: TRep(T) q(Rep(T)) = Rep(q(T)) q(T)q(T) q q Rep q(Rep(T))

19
Commutation with Homomorphisms Ex: access control h c : C C h c (k) := if k · c then k else 0 Ex: incomplete databases º : Vars B Eval º : PosBool(Vars) B Ex: duplicate elimination ± : N B ± (k) := if k = 0 then ? else > 19 Theorem: Let h : K 1 K 2 be a semiring homomorphism. Then for any UXQuery query q, for any K 1 -UXML document D, we have h(q(D)) = q(h(D)).

20
Provenance is Universal 20 Corollary: The semantics of K-UXQuery evaluation on K- UXML for any commutative semiring K factors through evaluation using provenance polynomials N [X]. i.e., for any K-UXML document D, for any K-UXQuery q, we have q(D) = Eval º (q(D’)) where 1. D’ is obtained by replacing K-annotations in D with fresh variables from X 2. º : X K is the corresponding valuation 3.Eval º : N [X] K is the unique semiring homomorphism such that for the one-variable monomials, Eval º (x) = º (x).

21
Related Work Bag semantics for NRC [Libkin&Wong 97] Incomplete XML [Kanza+ 99, Abiteboul+ 06] Probabilistic XML [Nierman&Jagadish 02, van Keulen+ 05, Abit.&Senellart 06, Sen.&Abit. 07, Hung+ 07] XML provenance [Buneman+ 01] NRC provenance [Hidders+ 07] Soft CSPs [Bistarelli et al] Semiring-annotated XPath [Grahne+ 07] Negation, expressiveness of RA K [Geerts&Poggi 08] 21

22
Conclusion We showed how to annotate unordered XML trees (NRC complex values) with values from a commutative semiring K, and propagate those annotations in queries for a large, positive fragment of XQuery (NRC + srt) We saw novel applications in security and incomplete dbs, made possible by a fundamental property of our framework, commutation with homomorphisms 22

23
Future Work Practical applications based on framework – Access control – Jointly recording provenance, security, multiplicities, uncertainty, etc. (product of semirings is also a semiring!) Query optimization: containment/equivalence w.r.t. annotated semantics depends on K! – In paper, we show K-equivalence for UXQuery is the same as B -equivalence when K is a distributive lattice (e.g., access control, incomplete dbs) 23

24
24

25
XPath Descendant Operator Uses srt // ¤ applied to forest $T translates to [ (x 2 $T) ¼ 1 ((srt(b, s). f) x) where f := let self = Tree(b, [ (x 2 s) { ¼ 2 (x)} in let matches = [ (x 2 s) { ¼ 1 (x)} in (matches [ {self}, self)) //a, similar to above 25

26
K-UXQuery Semantics: Union Sums annotations 26 auau bvbv auau bvbv auau bwbw, Query: return $S, $T a2ua2u bvbv auau bwbw, Source: $S$T Answer: auau bvbv answer has 2 copies of trees distinguished by annotations of children

27
Semantics of Big Union Ordinary NRC: iterates and collects results in set « [ (x 2 e 1 ) e 2 ¬ := [ a 2« e 1 ¬ « e 2 ¬ [x := a] Annotated NRC: sums and multiplies annotations « [ (x 2 e 1 ) e 2 ¬ K (y) := « e 1 ¬ K (a i ) ¢ « e 2 ¬ K [x := a i ] (y) where the support (the set of elements with non-zero annotations) of « e 1 ¬ K is {a 1,..., a n } 27

28
a b2b2 c3c3 c a d r c 2 ¢ 3 + 1 = 7 XPath Example With K = N 28 Source, $S: Answer: Query: element r { $S//c } ´ a b c c a dcc b c a dcc ´ r ccccccc

29
XPath Example With K = N Source, $S: Answer: Query: $S//c a b2b2 c3c3 c ad a c b2b2 d r c 2 ¢ 3+1 = 7 c d a c b2b2 r cccccccc d a cbb i.e., a b c c a d a cb dcc b b c a dcc 29

Similar presentations

OK

XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.

XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on retail sales Free ppt on human circulatory system Ppt on pricing policy in economics Free ppt on employee motivation Ppt on power generation by coal Export pdf to ppt online free Ppt on project management in software engineering Ppt on share market in india Ppt on solar energy utilization Ppt on high voltage testing of transformer