Presentation is loading. Please wait.

Presentation is loading. Please wait.

Annotated XML: Queries and Provenance Nate Foster T.J. Green Val Tannen University of Pennsylvania PODS ’08 Vancouver, B.C. June 11, 2008.

Similar presentations

Presentation on theme: "Annotated XML: Queries and Provenance Nate Foster T.J. Green Val Tannen University of Pennsylvania PODS ’08 Vancouver, B.C. June 11, 2008."— Presentation transcript:

1 Annotated XML: Queries and Provenance Nate Foster T.J. Green Val Tannen University of Pennsylvania PODS ’08 Vancouver, B.C. June 11, 2008

2 Need to Track XML Provenance For scientific data processing [Buneman+ 08] – Tree-structured data, heterogeneous sources – XML is the natural data model – Data annotated with source info; annotations need to be propagated during query processing For incomplete/probabilistic data [Senellart&Abiteboul 07] – Query output annotated with Boolean formulas – Annotations indicate logical correlations between source data and output data For data warehousing [Cui+ 00] – Even when data is relational, often have XML views 2

3 Background: Provenance for Relational Algebra Views 3 ABC abc dbe fge AC ac ae dc de fe V := ¼ AC ( ¼ AB (R) ⋈ ¼ BC (R)) source R view V ? ? ? ?

4 Background: Semiring-Annotated Relations [G.,Karvounarakis,Tannen 07] Associate each tuple in database with an annotation from a commutative semiring (K, +, ¢, 0, 1) – + and ¢ are abstract operations Combine and propagate annotations during (positive) relational query processing – ⋈, £, Å combine annotations using ¢ – ¼, [ combine annotations using + – ¾ multiplies annotations by 0 or 1 4

5 Background: Annotated Relations Example 5 ABC abcp dber fges R AB ac2p22p2 aepr dc de2r 2 + rs fe2s 2 + rs V V := ¼ AB (( ¼ AC (R) ⋈ ¼ BC (R)) [ ( ¼ AB (R) ⋈ ¼ BC (R)))

6 Background: Semiring Bestiary ( B, Ç, Æ, ?, > )Set semantics ( N, +, ¢, 0, 1)Bag semantics (PosBool(B), Ç, Æ, ?, > )Incomplete dbs ( P (  ), [, Å, ;,  )Probabilistic dbs ( P ( P (X)), [, d, ;, { ; })Why-provenance where A d B := {a [ b : a 2 A, b 2 B} ( N [X], +, ¢, 0, 1)Prov. polynomials “most informative” (universal) 6

7 Our Contribution: Annotated XML Data model: unordered XML data with semiring annotations (K-UXML) Query language: positive, unordered XQuery fragment (K-UXQuery) Semantics: how queries operate on annotated data Correctness – Sanity checks: agrees with encoded relational queries, bag semantics, probabilistic XML,... – Main theorem: commutation with homomorphisms Applications: security, incomplete databases,... 7

8 K-UXML No attributes, no text values, no repeated children (inessential); no order (essential!) Each subtree decorated with a value k from semiring K (1 “neutral,” 0 “not present”) K-collection: a finite set of elements annotated with values from K The child subtrees of a node form a K- collection 8

9 c b c b c ad c ad In NRC K : { h a, { h b, { h a, { h c, {} i y 3, h d, {} i 1 } i 1 } i x 1, h c, {...} i y 1 } i 1 } K-UXML Example 9 a bx1bx1 cy3cy3 cy1cy1 ad a cy2cy2 bx2bx2 d a bc a d 1 1y3y3 x1x1 1 y1y1 y2y2 x2x2 1 ´ Annotations are on elements of K-collections. There are 5 K-collections in this tree (all colored differently). To annotate whole tree, must include in singleton K- collection.

10 K-UXQuery Syntax Based on Core XQuery [W3C] –if... then... else... instead of where – nested for loops instead of complex XPath /a/b/c We added one new construct: annot k p – Construct any K-UXML document via K-UXQuery 10

11 Semantics for K-UXQuery How do annotations propagate through these query constructs? We adopt a principled approach that leads to a compositional semantics and makes previous work on relations a precise special case We do this by translation to Nested Relational Calculus (NRC) with a tree type and annotations (NRC details in paper; illustrated here by example on UXQuery) 11

12 a dudu x b dvdv ewew y c f z,, K-UXQuery Semantics: for -Loops 12 Answer: axax dudu byby dvdv, czcz f, ewew d xu + yv, e yw, f z Computation: axax dudu byby dvdv czcz f, ewew, Source, $S: d xu, d vy, e yw, f z x d u, y d y, y e w, z f Query: for $t in $S return $t/*

13 for -loops Example With K = N 13 Answer: a d b2b2 d2d2, c3c3 f, e d 1+2 ¢ 2 = 5, e 2, f 3 Source, $S: Query: for $t in $S return $t/* d, d, d, d, d, e, e, f, f, f i.e., 5 d’s appear as children in source; 5 d’s in answer a d b d, c f, ed, c f b d, ed i.e., c f,

14 Annotation of result is a sum over products of annotations along paths to root K-UXQuery Semantics: // Operator 14 Source, $S: r c x 1 ¢ y 3 + y 1 ¢ y 2 cy1cy1 d a cy2cy2 bx2bx2 Answer: Query: $S//c a bx1bx1 cy3cy3 cy1cy1 ad a cy2cy2 bx2bx2 d

15 Data annotated with clearance levels from total order C : P < C < S < T < 0 Joint use of data ( ¢ ) requires access to both (max of clearances); alternative use of data (+) requires access to either (min of clearances) ( C, min, max, 0, P) is a commutative semiring p d min(max(P,C,C), max(P,C,S)) e max(P,C,T) Application: Access Control 15 Query: $S/*/* bCbC dCdC cCcC dSdS eTeT a dCdC eTeT p

16 For any given clearance level (e.g., C), want the following diagram to commute: Security Condition: Non-Interference 16 query erase > C a bCbC dCdC cCcC dSdS eTeT p dCdC eTeT p dCdC a dCdC bCbC cCcC

17 Application: Incomplete XML Data annotated with Boolean expressions; tree T represents set of possible worlds Rep(T) 17 T = a b cycy cxcx a d a czcz b d a b c c ad a cb d Rep(T) = a b a d a b c a d a bc ad a b d,,,..., 7 possible worlds

18 Correctness: Possible Worlds 18 For every incomplete tree T, and every UXQuery query q, want this diagram to commute: TRep(T) q(Rep(T)) = Rep(q(T)) q(T)q(T) q q Rep q(Rep(T))

19 Commutation with Homomorphisms Ex: access control h c : C  C h c (k) := if k · c then k else 0 Ex: incomplete databases º : Vars  B Eval º : PosBool(Vars)  B Ex: duplicate elimination ± : N  B ± (k) := if k = 0 then ? else > 19 Theorem: Let h : K 1  K 2 be a semiring homomorphism. Then for any UXQuery query q, for any K 1 -UXML document D, we have h(q(D)) = q(h(D)).

20 Provenance is Universal 20 Corollary: The semantics of K-UXQuery evaluation on K- UXML for any commutative semiring K factors through evaluation using provenance polynomials N [X]. i.e., for any K-UXML document D, for any K-UXQuery q, we have q(D) = Eval º (q(D’)) where 1. D’ is obtained by replacing K-annotations in D with fresh variables from X 2. º : X  K is the corresponding valuation 3.Eval º : N [X]  K is the unique semiring homomorphism such that for the one-variable monomials, Eval º (x) = º (x).

21 Related Work Bag semantics for NRC [Libkin&Wong 97] Incomplete XML [Kanza+ 99, Abiteboul+ 06] Probabilistic XML [Nierman&Jagadish 02, van Keulen+ 05, Abit.&Senellart 06, Sen.&Abit. 07, Hung+ 07] XML provenance [Buneman+ 01] NRC provenance [Hidders+ 07] Soft CSPs [Bistarelli et al] Semiring-annotated XPath [Grahne+ 07] Negation, expressiveness of RA K [Geerts&Poggi 08] 21

22 Conclusion We showed how to annotate unordered XML trees (NRC complex values) with values from a commutative semiring K, and propagate those annotations in queries for a large, positive fragment of XQuery (NRC + srt) We saw novel applications in security and incomplete dbs, made possible by a fundamental property of our framework, commutation with homomorphisms 22

23 Future Work Practical applications based on framework – Access control – Jointly recording provenance, security, multiplicities, uncertainty, etc. (product of semirings is also a semiring!) Query optimization: containment/equivalence w.r.t. annotated semantics depends on K! – In paper, we show K-equivalence for UXQuery is the same as B -equivalence when K is a distributive lattice (e.g., access control, incomplete dbs) 23

24 24

25 XPath Descendant Operator Uses srt // ¤ applied to forest $T translates to [ (x 2 $T) ¼ 1 ((srt(b, s). f) x) where f := let self = Tree(b, [ (x 2 s) { ¼ 2 (x)} in let matches = [ (x 2 s) { ¼ 1 (x)} in (matches [ {self}, self)) //a, similar to above 25

26 K-UXQuery Semantics: Union Sums annotations 26 auau bvbv auau bvbv auau bwbw, Query: return $S, $T a2ua2u bvbv auau bwbw, Source: $S$T Answer: auau bvbv answer has 2 copies of trees distinguished by annotations of children

27 Semantics of Big Union Ordinary NRC: iterates and collects results in set « [ (x 2 e 1 ) e 2 ¬ := [ a 2« e 1 ¬ « e 2 ¬ [x := a] Annotated NRC: sums and multiplies annotations « [ (x 2 e 1 ) e 2 ¬ K (y) := « e 1 ¬ K (a i ) ¢ « e 2 ¬ K [x := a i ] (y) where the support (the set of elements with non-zero annotations) of « e 1 ¬ K is {a 1,..., a n } 27

28 a b2b2 c3c3 c a d r c 2 ¢ = 7 XPath Example With K = N 28 Source, $S: Answer: Query: element r { $S//c } ´ a b c c a dcc b c a dcc ´ r ccccccc

29 XPath Example With K = N Source, $S: Answer: Query: $S//c a b2b2 c3c3 c ad a c b2b2 d r c 2 ¢ 3+1 = 7 c d a c b2b2 r cccccccc d a cbb i.e., a b c c a d a cb dcc b b c a dcc 29

Download ppt "Annotated XML: Queries and Provenance Nate Foster T.J. Green Val Tannen University of Pennsylvania PODS ’08 Vancouver, B.C. June 11, 2008."

Similar presentations

Ads by Google