Presentation is loading. Please wait.

Presentation is loading. Please wait.

Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin.

Similar presentations


Presentation on theme: "Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin."— Presentation transcript:

1 Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin

2 Introduction Here are investigated Incremental validation algorithm for XML document presented as DTD (Data Type Definition) in O(m log n). Incremental validation algorithm for XML schema in O(m log 2 n). Using the auxiliary structure of size O(n) for both.

3 Example of an XML document Honda 92 BMW

4 An XML document as Labeled Ordered Tree Dealer UsedCarsNewCars Ad Model YearModel Year Honda92Subaru99BMW Mazda

5 Abstraction of Document Type Definitions (DTDs) The basic mechanism for specifying the type of XML documents. root : dealer dealer → UC NC UC → ad* NC → ad* ad → model (year| ε) model → ε year → ε

6 Specialized DTD abstraction (XML Schema) A specialized DTD is a 4-tuple ‹ ,  t,d,μ› where  is a finite alphabet of labels,  t is a finite alphabet of types, d is a DTD over  and μ is a mapping from  t to .

7 Specialized DTD (XML Schema) example root : d t d t → UC t NC t μ(d t ) = dealer UC t → ad u * μ(UC t ) = UC NC t → ad n * μ(NC t ) = NC ad u → m t y t μ(ad u ) = ad ad n → m t μ(ad n ) = ad m t → ε μ(m t ) = model y t → ε μ(y t ) = year

8 Specialized DTD example Dealer UCNC Ad Model YearModel Year dtdt NC t ad u ytyt ytyt mtmt mtmt mtmt mtmt ad n UC t

9 Incremental Validation Problem Given a specialized DTD , a tree   sat(  ), and a sequence of updates to  yielding another tree  ’, we wish to efficiently check if  ’  sat(  ). Use and maintain the auxiliary structure  (  ) to help in the validation.

10 Update types Replace the current label of a specified node by another label; Insert a new leaf node after a specified node; Insert a new leaf node as the first child of a specified node; Delete a specified leaf node.

11 Node label renaming u(a i,b) r … a i-1 aiai a i+1 … c1c1 c2c2 cncn … … …

12 New node inserting Insert a i r … a i-1 aiai a i+1 … … …

13 Deleting of a node Delete a i r … a i-1 aiai a i+1 … … …

14 Warmup: incremental validation of Strings Check the validity of a string a 1 … a n with respect to NFA N = ‹ ,Q,Q 0,F,δ› after a sequence of element renames u(a i1,b 1 )…u(a im,b m ), where i 1 < i 2 <…< i m. Validating the new string from scratch by running it throw N takes O(n |Q 2 | log|Q|)

15 Incremental validation of Strings (the first attempt) Consider a single renaming u(i,b) for 1≤i≤n. Pre(i)= δ(q 0,a 1 …a i-1 ) Post(i)={s | δ(s,a i+1 …a n )  F } b s2s2 Pre(i) Post(i) S 2  δ(b,s 1 )

16 Definition of Transition Relation For each I,j 1≤ I < j ≤ n T i,j = {‹p,q› | p,q  Q, q  δ(p, a i …a j ) } δ b = { | r,s  Q, s  δ(r,b)} q a i+1 p aiai ajaj s b r

17 Checking of validity with Transition Relation The updated string a 1 …a i1-1 b 1 a i1+1 …a im-1 b m a im+1 …a n is valid iff  T o(i1-1) o δ b1 o T (i1+1)(i2-1) o … o T (im+1)(n) Time complexity here is O(m|Q 2 | log |Q|)

18 Divide-and-conquer validation with Transition Relation Tree Validates a sequence of m renamings to a string of length n. The time taken is O(m|Q| 2 log|Q| log n) The auxiliary structure size is O(|Q| 2 n)

19 Transition Relation Tree example Τ 18 Τ 14 Τ 58 Τ 12 Τ 34 Τ 56 Τ 78 Τ 11 Τ 22 Τ 33 Τ 44 Τ 55 Τ 66 Τ 77 Τ 88 a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a8 The number of nodes in T 1n is 2n-1. Its depth is log n.

20 Label renaming with Transition Relation Tree Consider a 1 …a n  L(n) and a sequence of renames u(i 1,b 1 ), …,u(i m, b m ), where i 1 <i 2 <…<i m. The updated string is a 1 …a i1 b 1 a i+1 …a i,m-1 b m a i,m+1 …a m. The relations T ij which are affected by the updates are those laying on the path from a leaf changed to the root of T n. The number of relations changed is at most mlogn.

21 Label Renaming by Divide-and- Conquer approach in O(log n) U(a 3,b) Τ 18 Τ 14 Τ 58 Τ 12 Τ 34 Τ 56 Τ 78 Τ 11 Τ 22 Τ 33 Τ 44 Τ 55 Τ 66 Τ 77 Τ 88 a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a8 b

22 Dealing with inserts and deletes: Why B-trees? Inserts and deletes cause the position of the nodes in the string to change. The length of the string and the set of relevant intervals used to construct T n are now dynamic. Tree should continue to be balanced and have depth O(log n)

23 B-trees 3 cells in each node; The cell is either empty or contains a set T s corresponding to some subsequence s of the string. At most one of the 3 cells in a node can be empty. Each nonempty cell is either at a leaf or has one node as a child.

24 B-Trees for dealing with inserts and deletes in O(log n) T sa,T sb,T sc T s1,T s2 T s3,T s5,T s6 T s7,T s9 n1n1 n3n3 n5n5 n6n6 n7n7 n9n9 n2n2 T sa = T s1 o T s2 T sb = T s3 o T s5 o T s6

25 Validation with B-trees with respect to NFA N = ‹ ,Q,Q 0,F,δ › When T for the updated string is computed, check that for some f  F, belongs to the composition of the sets T s in the cells of the root node of T. The cost of checking is O(|Q| 2 log|Q|)

26 Insertion to a Transition Relation Tree Insertion of nodes n 4 and n 8 T sa,T sb,T sc T s1,T s2 T s3,T s5,T s6 T s7,T s9 n1n1 n3n3 n5n5 n6n6 n7n7 n9n9 n2n2 n8n8 n4n4

27 Insertion to a Transition Relation Tree Insertion of nodes n 4 and n 8 T se,T sf T s1,T s2 T s7,T s8,T s9 n1n1 n3n3 n5n5 n6n6 n7n7 n9n9 n2n2 n8n8 n4n4 T s3,T s4 T s5,T s6 T sb’’,T sc T sa,T sb’

28 B-Tree validation algorithm costs Renaming: update propagates from the leaf to the root – O(log n) updates. Insertion or deletion: may involve splits and merges of the cells all the way to the root. The worst case complexity is O(|Q| 2 log|Q| log n)

29 Incremental DTD validation d → r(d) root d … a1a1 a i-1 aiai a i+1 anan … c1c1 c2c2 c3c3 c4c4 … … … b v

30 Incremental DTD validation The auxiliary structure maintained: for each sequence of siblings in the tree the transition relations T s of the divide-and-conquer algorithm are preemptively computed. The auxiliary structure size is at most O(|  | |d| 2 |T|), where |T| is the size of T and |d|=max{|r a | | a → r a  d} The total validation time is O(m |  | |d| 2 log |d| log |T|)

31 Specialized DTDs: a first attempt Tree T is valid iff root(d)  types(root(T)) r v a i-1 aiai a i+1 c1c1 c2c2 c3c3 c4c4 cncn … … … types(r) types(v) types(a n ) types(a i ) b

32 Specialized DTDs: a first attempt The auxiliary structure size is the same as for DTDs, at most O(|  | |d| 2 |T|), where |T| is the size of T and |d|=max{|r a | | a → r a  d}. The total validation time for DTD is O(m |  | |d| 2 log |d| log |T|). The total validation time for specialized DTD is O(m |  t | |d| 2 log |d| depth(T) log |T|).

33 Binary tree encoding of unranked tree a b d j k e fh gi a bkj # d c # # #c ## e f # i #h # # # g#

34 One of the standard encodings in the literature (F.Neven. Automata, Logic and XML. In Computer Science Logic, 2002) Lemma: For each specialized DTD  = ‹ ,  t,d,μ› there exists a BNTA A  over  # whose number of states is O(|  t ||d| ) such that  ( A  ) = {enc(T) | T  sat(  ),

35 Principle lines a bkj # d# # #c ## e f # i #h # # # g#

36 From BNTA to NFA on principal lines a bkj # d# # #c ## e f # i #h # # # g# T C, bd, T j T g, f

37 From BNTA to NFA on principal lines abdefih c k j g T c, bd, T j T g, f

38 NFA construction We’ll construct NFA N which accepts the string a n …a 1 iff NTFA A = ‹  #,Q,Q 0,q f,δ› accepts enc(T) Let NFA N = ‹  ’,Q,q 0,F’,δ’›, where  ’= {#} υ (Q x  ) υ (  x Q), F’= {q f }, and δ’(#,q 0 ) = Q 0 ; δ’(‹a,S›,q) = υ q’  S δ(a,q,q’) for a   ; δ’(q,‹a,S›) = υ q’  S δ(a,q’,q) for a   ;

39 Line rearrangement for insertions and deletions v v’ l l0l0 l’ l’’

40 Complexity Results Given sequence of m updates for DTD XML abstraction we get The auxiliary structure size is at most O(|  | |d| 2 |T|), where |T| is the size of T and |d|=max{|r a | | a → r a  d} The total validation time is O(m |  | |d| 2 log |d| log |T|)

41 Complexity Results Given sequence of m updates for specialized DTD (XML schema) we get The auxiliary structure size is at most O(|  | |d| 2 |T|); The total validation time is O(m |  t | 2 |d| 2 log (|  t ||d|) log 2 |T|)


Download ppt "Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu Presented by Claudia Levin."

Similar presentations


Ads by Google