Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,

Slides:



Advertisements
Similar presentations
Ting Chen, Jiaheng Lu, Tok Wang Ling
Advertisements

Jiaheng Lu, Ting Chen and Tok Wang Ling National University of Singapore Finding all the occurrences of a twig.
APWeb 2004 Hangzhou, China 1 Labeling and Querying Dynamic XML Trees Jiaheng Lu and Tok Wang Ling School of Computing National University of Singapore.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Jiaheng Lu, Ting Chen, Tok Wang Ling National University of.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
On Boosting Holism in XML Twig Pattern Matching Using Two Data Streaming Techniques Presenter: Lu Jiaheng Supervisor: Prof. Ling Tok Wang Joint work: Chen.
A General Algorithm for Subtree Similarity-Search The Hebrew University of Jerusalem ICDE 2014, Chicago, USA Sara Cohen, Nerya Or 1.
1 Virtual Cursors for XML Joins Beverly Yang (Stanford) Marcus Fontoura, Eugene Shekita Sridhar Rajagopalan, Kevin Beyer CIKM’2004.
Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
Jianxin Li, Chengfei Liu, Rui Zhou Swinburne University of Technology, Australia Wei Wang University of New South Wales, Australia Top-k Keyword Search.
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Haris Georgiadis Minas Charalambides Vasilis Vassalos Athens University of Economics and Business 1 Efficient Physical Operators for a cost-based XPath.
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
Restart Trail for Stackless BVH Traversal Samuli Laine NVIDIA Research.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
The Volcano/Cascades Query Optimization Framework
Min LuTIMBER: A Native XML DB1 TIMBER: A Native XML Database Author: H.V. Jagadish, etc. Presenter: Min Lu Date: Apr 5, 2005.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
QUANZHONG LI BONGKI MOON Indexing & Querying XML Data for../Regular Path Expressions/* SUNDAR SUPRIYA.
An Algorithm for Streaming XPath Processing with Forward and Backward Axes Charles Barton, Philippe Charles, Deepak Goyal, Mukund Raghavchari IBM T. J.
Web Data Management XML Query Evaluation 1. Motivation PTIME algorithms for evaluating XPath queries: – Simple tree navigation – Translation into logic.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
XML Views El Hazoui Ilias Supervised by: Dr. Haddouti Advanced XML data management.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
1 Optimizing Cursor Movement in Holistic Twig Joins Marcus Fontoura, Vanja Josifovski, Eugene Shekita (IBM Almaden Research Center) Beverly Yang (Stanford)
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
1 Prefix Path Streaming: a New Clustering Method for XML Twig Pattern Matching Ting Chen, Tok Wang Ling, Chee-Yong Chan School of Computing, National University.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002.
Pattern tree algebras: sets or sequences? Stelios Paparizos, H. V. Jagadish University of Michigan Ann Arbor, MI USA.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Querying Structured Text in an XML Database By Xuemei Luo.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.
Database Systems Part VII: XML Querying Software School of Hunan University
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.
Tree-Pattern Queries on a Lightweight XML Processor MIRELLA M. MORO Zografoula Vagena Vassilis J. Tsotras Research partially supported by CAPES, NSF grant.
Streaming XPath Engine Oleg Slezberg Amruta Joshi.
CS4432: Database Systems II Query Processing- Part 2.
Benefits of Path Summaries in an XML Query Optimizer Supporting Multiple Access Attila Barta Mariano P. Consens Alberto O. Mendelzon University of Toronto.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.
Efficient Discovery of XML Data Redundancies Cong Yu and H. V. Jagadish University of Michigan, Ann Arbor - VLDB 2006, Seoul, Korea September 12 th, 2006.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
Efficient Evaluation of XQuery over Streaming Data
Optimizing Parallel Algorithms for All Pairs Similarity Search
Efficient processing of path query with not-predicates on XML data
Database Management System
RE-Tree: An Efficient Index Structure for Regular Expressions
(b) Tree representation
Structure and Content Scoring for XML
Early Profile Pruning on XML-aware Publish-Subscribe Systems
Structure and Content Scoring for XML
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Wei Wang University of New South Wales, Australia
Presentation transcript:

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung, Divykant Agrawal and K. Selcuk Candan NEC Laboratories America * University of California, Santa Barbara

2VLDB' Seoul, Korea Background XML –Hierarchical (tree) structured data –Provide flexibility to model semi-structured data –Widely accepted as universal data exchange format Query over XML –XPath, XQuery [W3C] –Extensively used by many applications –Adopted by a number of commercial systems

3VLDB' Seoul, Korea State-of-the-art: XML Query Processing Path Tree Binary Structure Joins [Timber] – Large intermediate results Holistic Approach Algebraic Approach PathStack [Bruno, et. al] TwigStack [Bruno, et. al] (GTP) Generalized Tree Pattern Optimize multiple path expressions of XQuery [Chen, et. al] – Expensive post-processing ? Twig 2 Stack

4VLDB' Seoul, Korea Processing Generalized Tree Pattern (GTP) Queries B A D XQuery: FOR $b in //A[E]/B, $d in $b/$D LET $c = $b/C RETURN $b, $c, $d C Type Algebraic Approach [Chen et.al] Return node Group return node Structural Joins Non return node Example Mandatory Axis Optional Axis Structural Outer Joins – Grouping Duplication Elimination Sort //A//B //A/B a2 b1 a1 a2 b1 a1 b2 Our goal: Avoid ALL these!

5VLDB' Seoul, Korea Motivation: PathStack [Bruno et.al] Query: //A//B; Data: Key observation: minimize intermediate results through compact representation of path matches, by –Inter-node: record AD relationship between elements in different query nodes, e.g., b1→a2, b2→a2 –Intra-node: record AD relationship between elements within the same query nodes, e.g., b1, b2 TwigStack [Bruno et.al] minimizes intermediate results through: –Output only those path matches that are in final twig results –However, such optimality cannot be guaranteed [Choi, et.al] –Not helpful for processing GTP queries Question: can we minimize intermediate results for twig queries through compact result encoding (similar to PathStack)? –Useful for processing GTP queries as well? S[A] a1 S[B] b1 b2a2 b1 a1 b2 

6VLDB' Seoul, Korea Hierarchical Stack Encoding Inter-node: //A//B –Can still use explicit edges Intra-node: A –Matching elements forms a tree structure as well Associate each query node with a hierarchical stack –Push element e into hierarchical stack HS[E] iff e satisfies the sub-twig query rooted at E Matching can be determined when entire sub-tree of e seen Require post-order document traversal a2 a3a4 a1 HS[A] a3a4 a2 a1

7VLDB' Seoul, Korea Twig 2 Stack: Running Example C B A D a2 c1 b2 b1 d1 a1 [1,20], 1 [2,15], 2 [3,14], 3 [4,11], 4 [8, 9], 6 [5,10], 5 d2 [6,7], 6 c2 [12,13], 4 b3 d3d3 [16,19], 2 [17,18], 3 HS[B] b2 HS[C] c1 b1 HS[A] a2 HS[D] d2 d1 c2c2d3d3 TwigStack needs to enumerate 3 matches for //A/B//D and 2 for //A/B//C then join them together. Twig 2 Stack requires neither path joins nor path enumeration! Merging Stacks

8VLDB' Seoul, Korea GTP Result Enumeration Bottom-up Computation.vs. Top-down Enumeration –Visit Only those that are in the twig matches Handling grouping results –Automatic grouping through Inter-node edges Handling duplicates and out-of-order results –Problems coming from non-return nodes –If D is return node while B is not b1 → d1, d2, d3 and b2 →d2, d3 (duplicates) –Observation: Intra-node hierarchy provides hints c2 a4 d3 d2 c1 b2 b1 d1

9VLDB' Seoul, Korea Experiment Setup Implementation –Twig 2 Stack: Java –TwigStack, TJFast: Java Kindly provided by Jiaheng Lu from National University of Singapore (NUS) Datasets –XMark, DBLP, TreeBank Metrics –Query processing time –IO time

10VLDB' Seoul, Korea Processing Full Twig Queries Optimization of Query Processing: TwigStack Twig 2 Stack Optimization of IO: TJFast

11VLDB' Seoul, Korea Not yet done: Memory Usage Hierarchical Stack Encoding could hold entire document in memory in the worst case –Unlike DOM approach, only matches need to be stored Tag match (Partial) twig match Predicate evaluation Early result enumeration dramatically reduces the memory usage –Enumerate query results before the end of document and release buffer –Main idea: hybrid of top-down (PathStack) and bottom-up (Twig 2 Stack) approaches

12VLDB' Seoul, Korea Early Result Enumeration (ERM) Enumerate results and release buffer when elements in top- branch node are popped from PathStack S[A] S[B] S[D]S[C] a1 a2 d3 HS[D]HS[C] HS[B] HS[A] b2 c1 d2 b1 d1 c2 a2 c1 b2 b1 d1 a1 [1,20], 1 [2,15], 2 [3,14], 3 [4,11], 4 [8, 9], 6 [5,10], 5 d2 [6,7], 6 c2 [12,13], 4 b3 d3d3 [16,19], 2 [17,18], 3 C B A D

13VLDB' Seoul, Korea Memory Usage article dblp titleyear open_auctions site bidreserve bidder Small sub-tree Huge sub-tree  increase

14VLDB' Seoul, Korea Conclusions and Future Work Proposed a bottom-up GTP processing solution –A twig encoding scheme –A GTP enumeration algorithm that avoids any post-processing operations –A hybrid scheme to reduce memory usage Future directions –Handling worst case memory issues –Optimizing IO cost by exploiting indexes –Handling other axes, full XQuery, graph input –Handling XML streams –…

16VLDB' Seoul, Korea Processing GTP Optimization of non-return nodes Automatic grouping