Incremental Maintenance of XML Structural Indexes

Slides:



Advertisements
Similar presentations
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Advertisements

Incremental Maintenance of XML Structural Indexes Ke Yi 1, Hao He 1, Ioana Stanoi 2 and Jun Yang 1 1 Department of Computer Science, Duke University 2.
Dynamic Graph Algorithms - I
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Lecture 8 Jianjun Hu Department of Computer Science and Engineering University of South Carolina CSCE350 Algorithms and Data Structure.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Graphs & Graph Algorithms 2 Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Covering Indexes for Branching Path Queries Raghav Kaushik, Philip Bohannon, Jeffrey F Naughton and Henry F Korth 1Abdullah Mueen.
Exploiting Local Similarity for Indexing Paths in Graph-Structured Data by Raghav Kaushik, Pradeep Shenoy, Philip Bohannon and Ehud Gudes 1Abdullah Mueen.
1 Intrusion Monitoring of Link-State Routing Protocols Akshay Aggarwal Poornima Balasubramanyam Karl Levitt Computer Security Laboratory Department of.
Efficiently Answering Reachability Queries on Large Directed Graphs Ruoming Jin Kent State University Joint work with Yang Xiang (KSU), Ning Ruan (KSU),
Managing XML and Semistructured Data Lecture 16: Indexes Prof. Dan Suciu Spring 2001.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Graphs & Graph Algorithms 2 Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Wayne State University Joint work with Mustafa Atay,
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
1 Binomial heaps, Fibonacci heaps, and applications.
I/O-Efficient Graph Algorithms Norbert Zeh Duke University EEF Summer School on Massive Data Sets Århus, Denmark June 26 – July 1, 2002.
XML as a Boxwood Data Structure Feng Zhou, John MacCormick, Lidong Zhou, Nick Murphy, Chandu Thekkath 8/20/04.
Rate-based Data Propagation in Sensor Networks Gurdip Singh and Sandeep Pujar Computing and Information Sciences Sanjoy Das Electrical and Computer Engineering.
2IL05 Data Structures Fall 2007 Lecture 13: Minimum Spanning Trees.
Spring 2015 Lecture 11: Minimum Spanning Trees
Validated Model Transformation Tihamér Levendovszky Budapest University of Technology and Economics Department of Automation and Applied Informatics Applied.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
1 Network Coding and its Applications in Communication Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
1 ε -Optimal Minimum-Delay/Area Zero-Skew Clock Tree Wire-Sizing in Pseudo-Polynomial Time Jeng-Liang Tsai Tsung-Hao Chen Charlie Chung-Ping Chen (National.
Partial Soluti on and Entropy Tadao Takaoka Department of Computer Science University of Canterbury Christchurch, New Zealand.
Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
En tropy as Computational Complexity Computer Science Replugged Tadao Takaoka Department of Computer Science University of Canterbury Christchurch, New.
Union-Find  Application in Kruskal’s Algorithm  Optimizing Union and Find Methods.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
1 Review of report "LSDX: A New Labeling Scheme for Dynamically Updating XML Data"
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
A Fully Polynomial Time Approximation Scheme for Timing Driven Minimum Cost Buffer Insertion Shiyan Hu*, Zhuo Li**, Charles Alpert** *Dept of Electrical.
Graph Indexing From managing and mining graph data.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
November 22, Algorithms and Data Structures Lecture XII Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Trie Indexes for Efficient XML Query Processing
CS 540 Database Management Systems
New Characterizations in Turnstile Streams with Applications
Multiway Search Trees Data may not fit into main memory
Sofus A. Macskassy Fetch Technologies
RE-Tree: An Efficient Index Structure for Regular Expressions
Structure and Value Synopses for XML Data Graphs
PC trees and Circular One Arrangements
Aditya Ganjam, Bruce Maggs*, and Hui Zhang
Automatic Physical Design Tuning: Workload as a Sequence
Graphs & Graph Algorithms 2
Finding Fastest Paths on A Road Network with Speed Patterns
Algorithms and Data Structures Lecture XII
On Inferring K Optimum Transformations of XML Document from Update Script to DTD Nobutaka Suzuki Graduate School of Library, Information and Media Studies.
CS 583 Analysis of Algorithms
XML indexing – A(k) indices
SEG5010 Presentation Zhou Lanjun.
Dynamic Graph Algorithms
Compact routing schemes with improved stretch
Combinatorial Optimization of Multicast Key Management
Disjoint Sets DS.S.1 Chapter 8 Overview Dynamic Equivalence Classes
Wei Wang University of New South Wales, Australia
Donghui Zhang, Tian Xia Northeastern University
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Incremental Maintenance of XML Structural Indexes Ke Yi1, Hao He1, Ioana Stanoi2 and Jun Yang1 1Department of Computer Science, Duke University 2IBM T. J. Watson Research Center

Motivation XML is gaining tremendously in popularity in recent years Used to represent many kinds of data Major DB vendors are rushing to incorporate solutions for native XML repositories and retrieval IBM DB2, Oracle , Microsoft SQL Server Tamino, Natix, X-Hive, …

Overview paper 1 13 section section 2 title 14 title 3 8 section 4 “experiments” exp “intro” algorithm 15 16 5 title 6 exp algorithm 9 title 10 proof 7 17 “A(k)-index” “1-index” 11 18 about proof about 12 uses

Label Path Expressions paper /paper/section/algorithm 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 5 title 6 exp algorithm 9 title 10 proof 7 17 “A(k)-index” “1-index” 11 18 about proof about 12 uses

Structural Indexes Why do we need them? Structural indexes Speedup the evaluation of path expressions Provides a structural summary of the data graph Structural indexes DataGuide [Goldman & Widom 97] 1-index [Milo & Suciu 99] A(k)-index [Kaushik et al. 02], D(k)-index [Qun et al. 03], M(k)-index [He & Yang 04] Integration of structural indexes and inverted lists [Kaushik et al. 04] Focus on maintenance Has a major effect on index efficiency Remains an overlooked issue

Outline “1-index” paper 1 13 section section 2 title 14 title 3 8 “experiments” exp “intro” algorithm 15 16 5 title 6 exp algorithm 9 title 10 proof 7 17 “A(k)-index” “1-index” 11 18 about proof about 12 uses

1-Index: Definition Constructed by using bisimilarity Definition based on stability Partition data nodes into index nodes dnode (v) and inode (I[v]) I[u] is v’s index parent if u is v’s parent An inode is stable if all of its dnodes have the same index parents In a 1-index, all inodes are stable I[u] u I[v] v

1-Index: Example /paper/section/algorithm paper paper 1 1 13 section title 14 section 2 2,4,8,13 section 8 4 section 3 15 exp exp title exp algorithm title algorithm 16 10 15,16 3,5,9,14 6,10 6 9 algorithm title 5 title 18 about proof proof 17 11 17,18 proof 7 7 uses 11 about about proof 12 12 /paper/section/algorithm uses data graph 1-index

1-Index: Quality paper Assigning dnodes that are bisimilar into different inodes does not affect correctness, but does affect efficiency The quality of an index 1 section 2,4 2,4,8,13 8,13 exp title algorithm 15,16 3,5,9,14 6,10 proof 11 17,18 # inodes 7 − 1 X 100% about # inodes in the minimum 1-index proof 12 uses Ideal: quality = 0%

Previous Results Construction Edge changes Subgraph addition The PT algorithm [Paige & Tarjan 87], in time O(m log n) m – # edges, n - # nodes Edge changes The propagate algorithm [Kaushik et al. 02] Quality of the 1-index after update No guarantee on the quality of the resulted index 3 ~ 5% after 500 edge insertions in experiments Subgraph addition Index-reconstruction

Edge Insertion: An Example (1) B A B A B C1 C2 C3 C1, C2 C3 C1 C2 C3 D1 D2 D3 D1, D2 D3 D1, D2 D3 Data Graph 1-Index Split 1

Edge Insertion: An Example (2) B A B A B C1 C2 C3 C1 C2, C3 C1 C2, C3 D1 D2 D3 D1 D2 D3 D1 D2, D3 Split 2 Merge 1 Merge 2 Indeed the minimum 1-index for the data graph after update Not a coincidence!

Minimum & Minimal Indexes Minimum: with the smallest number of inodes Minimal: no two inodes can be merged R R R A1 A2 A1 A2 A1,A2 B1 B2 B1,B2 B1 B2 Data graph Minimum 1-index Minimal 1-index

Quality Guarantee Theorem: The split/merge algorithm always maintains a minimal 1-index Lemma: For acyclic data graphs, there is a unique minimal 1-index The minimum 1-index is always maintained For cyclic data graphs, there could be more than one minimal 1-index One of them is maintained

Outline “A(k)-index” paper 1 13 section section 2 title 14 title 3 8 “experiments” exp “intro” algorithm 15 16 5 title 6 exp algorithm 9 title 10 proof 7 17 “A(k)-index” “1-index” 11 18 about proof about 12 uses

A(k)-Index: Definition k-bisimilarity Definition based on stability A(0)-index: partition by label … A(k)-Index An inode in A(k)-index is stable if all of its dnodes have the same index parents in A(k-1)-index Only interested in paths of length ≤k Shown to be much smaller and more efficient than 1-index [Kaushik et al. 02] But, no efficient maintenance algorithms are known!

A(k)-index: Example R R R R A B A B A B A B C1 C2 C3 C1 C2,C3 C1 C2,C3 Data graph A(2) (=1-index) A(1) A(0) Maintenance of A(i)-index requires the information in A(i-1)-index

A(k)-index: Refinement Tree B A B A B A B C1 C2 C3 C1 C2,C3 C1 C2,C3 C1,C2,C3 C4,C5,C6 C4 C5 C6 C4 C5,C6 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0)

A(k)-index: Refinement Tree B A B A B A B C1 C2 C3 C C C C C C4 C5 C6 C C C Data graph A(2) A(1) A(0) Reduce storage cost Reduce maintenance cost 0.5% ~ 13% additional storage

Quality Guarantee Theorem: The split/merge algorithm always maintains A(k)-index Lemma: There is a unique minimal A(k)-index for any data graph, acyclic or cyclic the minimum a minimal 1-index A(k)-index Acyclic minimum Cyclic minimal

Outline “experiments” paper 1 13 section section 2 title 14 title 3 8 “intro” algorithm 15 16 5 title 6 exp algorithm 9 title 10 proof 7 17 “A(k)-index” “1-index” 11 18 about proof about 12 uses

Experiments on Edge Changes Datasets Real-life: IMDB (272,000 nodes) Benchmark: XMark (198,000 nodes) Setup First delete a portion of existing ID-REF links Then do random mixed insertions/deletions Compare with 1-index: propagate (+ reconstruction) A(k)-index: recompute affected portion (+ reconstruction)

Experiment Results: 1-index

Experiment Results: A(k)-index speedup 2 1.35 3 6.15 4 16.6 5 15.3 running times

Conclusions The first solutions for the maintenance (edge & subgraph additions/deletions) of 1-index and A(k)-index that are both effective and efficient Effective: quality guarantee on the resulted index Efficient: the algorithms themselves are fast Thank you!

Graphical Illustration size valid 1-index merge split index the index can only grow in size due to splitting, if merging is not enforced