Download presentation

Presentation is loading. Please wait.

Published byIsabell Chesshir Modified over 3 years ago

1
Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2 IBM T. J. Watson Research Center

2
Motivation XML is gaining tremendously in popularity in recent years Used to represent many kinds of data Major DB vendors are rushing to incorporate solutions for native XML repositories and retrieval IBM DB2, Oracle, Microsoft SQL Server Tamino, Natix, X-Hive, …

3
Overview 1 paper 4 section 5 title 6 algorithm “1-index” 7 proof 8 section 9 title 10 “A(k)-index” 11 proof 12 uses algorithm 13 section 14 “experiments” 1516 17 18 about title 2 section 3 title “intro” exp

4
Label Path Expressions 1 paper 4 section 5 title 6 algorithm “1-index” 7 proof 8 section 9 title 10 “A(k)-index” 11 proof 12 uses algorithm 13 section 14 “experiments” 1516 17 18 about title 2 section 3 title “intro” exp /paper/section/algorithm

5
Structural Indexes Why do we need them? Speedup the evaluation of path expressions Provides a structural summary of the data graph Structural indexes DataGuide [Goldman & Widom 97] 1-index [Milo & Suciu 99] A(k)-index [Kaushik et al. 02], D(k)-index [Qun et al. 03], M(k)-index [He & Yang 04] Integration of structural indexes and inverted lists [Kaushik et al. 04] Focus on maintenance Has a major effect on index efficiency Remains an overlooked issue

6
Outline 1 paper 4 section 5 title 6 algorithm “1-index” 7 proof 8 section 9 title 10 “A(k)-index” 11 proof 12 uses algorithm 13 section 14 “experiments” 1516 17 18 about title 2 section 3 title “intro” exp

7
1-Index: Definition Constructed by using bisimilarity Definition based on stability Partition data nodes into index nodes dnode ( v ) and inode (I[v]) I[u] is v ’s index parent if u is v ’s parent An inode is stable if all of its dnodes have the same index parents In a 1-index, all inodes are stable v I[v] u I[u]

8
1-Index: Example 1 paper 2,4,8,13 section 3,5,9,14 title 6,10 algorithm 7 proof 11 proof 12 uses 15,16 17,18 about exp 1-index 1 paper 4 section 5 title 6 algorithm 7 proof 8 section 9 title 10 11 proof 12 uses algorithm 13 section 14 15 16 17 18 about title 2 section 3 title exp data graph /paper/section/algorithm

9
1-Index: Quality Assigning dnodes that are bisimilar into different inodes does not affect correctness, but does affect efficiency The quality of an index # inodes # inodes in the minimum 1-index − 1 X 100% 1 paper 2,4,8,13 section 3,5,9,14 title 6,10 algorithm 7 proof 11 proof 12 uses 15,16 17,18 about exp 2,48,13 Ideal: quality = 0%

10
Previous Results Construction The PT algorithm [Paige & Tarjan 87], in time O(m log n) m – # edges, n - # nodes Edge changes The propagate algorithm [Kaushik et al. 02] Quality of the 1-index after update No guarantee on the quality of the resulted index 3 ~ 5% after 500 edge insertions in experiments Subgraph addition Index-reconstruction

11
Edge Insertion: An Example (1) R AB C1C2C3 D1D2D3 Data Graph R AB C1, C2C3 D3 1-Index D1, D2 R AB C3 D3 Split 1 D1, D2 C1C2

12
Edge Insertion: An Example (2) R AB C3 D3 Split 2 C1C2 D1D2 R AB C2, C3 D3 Merge 1 C1 D1D2 R AB C2, C3 D2, D3 Merge 2 C1 D1 Indeed the minimum 1-index for the data graph after update Not a coincidence!

13
Minimum & Minimal Indexes Minimum: with the smallest number of inodes Minimal: no two inodes can be merged R A1A2 B2B1 R A1A2 B2B1 R A1,A2 B1,B2 Data graph Minimum 1-index Minimal 1-index

14
Quality Guarantee Theorem: The split/merge algorithm always maintains a minimal 1-index Lemma: For acyclic data graphs, there is a unique minimal 1-index The minimum 1-index is always maintained For cyclic data graphs, there could be more than one minimal 1-index One of them is maintained

15
Outline 1 paper 4 section 5 title 6 algorithm “1-index” 7 proof 8 section 9 title 10 “A(k)-index” 11 proof 12 uses algorithm 13 section 14 “experiments” 1516 17 18 about title 2 section 3 title “intro” exp

16
A(k)-Index: Definition k-bisimilarity Definition based on stability A(0)-index: p artition by label … A(k)-Index An inode in A(k)-index is stable if all of its dnodes have the same index parents in A(k-1)-index Only interested in paths of length ≤k Shown to be much smaller and more efficient than 1-index [Kaushik et al. 02] But, no efficient maintenance algorithms are known!

17
A(k)-index: Example R AB C3 C6 C1C2 C4C5 R AB C2,C3C1 C4C5,C6 R AB C2,C3C1 C4,C5,C6 R AB C1,C2,C3 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0) Maintenance of A(i)-index requires the information in A(i-1)-index

18
A(k)-index: Refinement Tree R AB C3 C6 C1C2 C4C5 R AB C2,C3C1 C4C5,C6 R AB C2,C3C1 C4,C5,C6 R AB C1,C2,C3 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0)

19
A(k)-index: Refinement Tree R AB C3 C6 C1C2 C4C5 R AB CC CC R AB CC C R AB C Data graph A(2) A(1) A(0) 0.5% ~ 13% additional storage 1.Reduce storage cost 2.Reduce maintenance cost

20
Quality Guarantee Theorem: The split/merge algorithm always maintains A(k)-index Lemma: There is a unique minimal A(k)-index for any data graph, acyclic or cyclic 1-indexA(k)-index Acyclicminimum Cyclicminimalminimum a minimalthe minimum

21
Outline 1 paper 4 section 5 title 6 algorithm “1-index” 7 proof 8 section 9 title 10 “A(k)-index” 11 proof 12 uses algorithm 13 section 14 “experiments” 1516 17 18 about title 2 section 3 title “intro” exp

22
Experiments on Edge Changes Datasets Real-life: IMDB (272,000 nodes) Benchmark: XMark (198,000 nodes) Setup First delete a portion of existing ID-REF links Then do random mixed insertions/deletions Compare with 1-index: propagate (+ reconstruction) A(k)-index: recompute affected portion (+ reconstruction)

23
Experiment Results: 1-index

24
Experiment Results: A(k)-index kspeedup 21.35 36.15 416.6 515.3 running times

25
Conclusions The first solutions for the maintenance (edge & subgraph additions/deletions) of 1-index and A(k)-index that are both effective and efficient Effective: quality guarantee on the resulted index Efficient: the algorithms themselves are fast Thank you!

26
Graphical Illustration size index valid 1-index split merge the index can only grow in size due to splitting, if merging is not enforced

Similar presentations

OK

Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.

Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google