1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute.

Slides:



Advertisements
Similar presentations
Bottom-up Evaluation of XPath Queries Stephanie H. Li Zhiping Zou.
Advertisements

HUX: Handling Updates in XML DataBase Systems Research Group Departmet of Computer Science Worcester Polytechnic Institute, Worcester, MA 01609, USA
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute.
Boosting XML filtering through a scalable FPGA-based architecture A. Mitra, M. Vieira, P. Bakalov, V. Tsotras, W. Najjar.
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.
CS412/413 Introduction to Compilers Radu Rugina Lecture 16: Efficient Translation to Low IR 25 Feb 02.
STREAM: The Stanford Data Stream Management System Rebuttal Team Mingzhu Wei Di Yang CS525s - Fall 2006.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
RAINDROP: XML Stream Processing Engine Murali Mani, DB seminar June 08, 2006 Partially Supported by NSF grant IIS
Elke A. Rundensteiner Topics projects in database and Information systems, such as, web information systems, distributed databases, Etc. Database Systems.
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.
A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD.
1 Efficient XML Stream Processing with Automata and Query Algebra A Master Thesis Presentation Student: Advisor: Reader: Jinhui Jian Prof. Elke A. Rundensteiner.
A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin.
Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov.
1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
AGGREGATE PATH INDEX FOR INCREMENTL WEB VIEW MAINTENANCE Author: Li Chen and Elke Rundensteiner Department of Computer Science Worcester Polytechnic Institure.
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Using XSLT and XPath to Enhance HTML Documents Reference: Roger L. Costello
Query Processing Presented by Aung S. Win.
Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
A Succinct Physical Storage Scheme for Efficient Evaluation of Path Queries in XML Represented by: Ai Mu Based on the paper written by Ning Zhang, Varun.
XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
CS 345: Topics in Data Warehousing Tuesday, October 19, 2004.
1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002.
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Querying Structured Text in an XML Database By Xuemei Luo.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.
August Chapter 6 - XPath & XPointer Learning XML by Erik T. Ray Slides were developed by Jack Davis College of Information Science and Technology.
BLAS: An Efficient XPath Processing System Zhimin Song Advanced Database System Professor: Dr. Mengchi Liu.
R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner.
1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin.
FlexTable: Using a Dynamic Relation Model to Store RDF Data IDS Lab. Seungseok Kang.
XML and Database.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.
XML Native Query Processing Chun-shek Chan Mahesh Marathe Wednesday, February 12, 2003.
Indexing and Querying XML Data for Regular Path Expressions Quanzhong Li and Bongki Moon Dept. of Computer Science University of Arizona VLDB 2001.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
Efficient Evaluation of XQuery over Streaming Data
Ch. 4 – Semantic Analysis Errors can arise in syntax, static semantics, dynamic semantics Some PL features are impossible or infeasible to specify in grammar.
Holistic Twig Joins: Optimal XML Pattern Matching
Chapter 2: Intro to Relational Model
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams
Querying XML XPath.
Querying XML XPath.
Chapter 2: Intro to Relational Model
Adaptive Query Processing (Background)
Structural Joins: A Primitive for Efficient XML Query Pattern Matching
Wei Wang University of New South Wales, Australia
Software Design Lecture : 36.
Presentation transcript:

1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute XSDM Workshop, 2006 Supported by USA National Science Foundation

2 What’s Special for XML Streams Token-by-Token access manner timeline Token: not a counterpart of a self-contained tuple Pattern Retrieval on Token Streams Jack, Brooks Q1: for $a in stream(“persons”)//person return $a, $a//name

3 Running Example D1: Jack, Brooks Amy D1 : not recursiveD2 : recursive Q1: for $a in stream(“persons”)//person return $a, $a//name D2: 1 2 Jack, Brooks Will, Brooks

4 Retrieving Patterns Using Automata Q2: for $a in stream(“persons”) /person return $a, $a/name s0 person s1 name s2 How to process “/” pattern retrieval in automata? How to process “//” pattern retrieval in automata? λ s0s1 person s2 λ s3 name s4 Automata of Q1 and its stack s0s0 s 1, s 2 s0s0 s0s0 s 1, s 3,s 4 s 1, s 2 s0s0 s 1, s 3,s 4 Jack s 1, s 2 s0s0 Q1: for $a in stream(“persons”)//person return $a, $a//name *

5 Raindrop Algebra Plan Stream data op1 op5 λ s0s1 person s2 λ s3 name s4 op2 op4 op3 ExtractUnnest $a Navigate //person->$a Navigate $a//name->$b ExtractNest $b StructuralJoin $a … … … Note that structural join (in-time structural join) only perform Cartesian products! The person element will be purged after generating output!

6 Problems with Recursion Stream data op1 op5 λ s0s1 person s2 λ s3 name s4 op2 op4 op3 ExtractUnnest $a Navigate //person->$a Navigate $a//name->$b ExtractNest $b StructuralJoin $a … … … … D2: 1 2 Jack, Brooks Will, Brooks After the second person and name and joined, we can’t get the correct result for the first person. op3

7 Goals How to correctly process recursive data and recursive queries? How to guarantee that data is output as early as possible? When data is non-recursive, how to make the cost of the plan as cheap as possible?

8 Recursive-Mode Operators Each operator has recursive mode operator Associate IDs with elements Each element is associated with a triple (startID, endID, level) Given two elements and the corresponding triples, we can determine ancestor-descendent and parent-child relationships. 1 2 Jack Amy , 12, 1 2, 4, 2

9 Features of Recursive Navigate Operators Keep track of the triple for each element. Call structural join only when all triples in Navigate operator are complete. 1, -,12, -,22, 4, 2 7, 9, Jack Amy Navigate //person->$a Navigate $a//name->b Navigate $a//name->b Token1Token2Token 9 6, 10, 3 Navigate //person->$a 12 Token12 1,, 1

10 Features of Recursive Extract Operators ExtractUnnest Compose the tokens into tuples Associate ID information for each corresponding element ExtractNest Collect the tokens and creates one tuple for the whole collection. Move the groupby functionality to the top structural join

11 Changes of Structural Join In-time structural join Do Cartesian product ID based Structural Join Change from In-time structural join to ID-based-comparison method ID-based-comparison condition: (a.startID < b.startID && b.endID < a.endID && [b.level = a.level +1]) (a.startID < b.startID && b.endID < a.endID) Structural Join $a a b1 b2 a, b1 a, b2 Structural Join $a ExtractUnnest $a 2, 4, 2 7, 9, 4 1, 12, 1 a1, b1 ExtractUnnest $b 6, 10, 3 a1, b2 a2, b2 Valid for parent – child relationship a1 a2 b1 b2

12 Structural Join Invoking Issue Invoking strategy: structural join will be invoked only when all the triples are complete. Structural Join $a ExtractUnnest$a 2, 4, 2 7, 9,4 1, 12, 1 a1, b1 ExtractUnnest$b 6, 10, 3 a1, b2 a2, b2 a1 a2 b1 b2 clean

13 Another Query With ExtractNest Operators StructuralJoin $x ExtractNest$yExtractNest $z Navigate //a -> $x Stream data Navigate $x //c->$z Q3: for $x in //a return $x//b, $x//c ExtractNest = ExtractUnnest + GroupBy Navigate $x//b-> $y a abc bbc (1,14 ) (2, 9)(10,11)(12,13 ) (3,4)(5,6)(7,8)

14 Process ExtractNest GroupBy Structural Join $x ExtractNest$y Navigate //a -> $x Navigate$x//b-> $y Stream data 3, 4 5, 6 10, c, 7, 8 c,12, a,2, 9 a, 1,14 c1 c1, c2 b2, b3 b1, b2, b3Push GroupBy Up Navigate$x//c-> $z ExtractNest$z Q2: for $x in //a return $x//b, $x//c It is better to do groupby in structural join here! b1 b2 b3 c1 c2 a1 a2

15 Further Optimization Using context-aware structural join Context Check Automata Recursive Structural Join In-time Structural Join Output tuples Purge tuples Navigate Data is recursive Data is not recursive Run-time switching from id-based structural join to the efficient in-time- structural join strategy.

16 Plan Optimization with Multiple Structural Joins f or $a in stream (“s”)//a return { for $b in $a//b return { for $c in $b//c return {$c//d, %c//e, $c//f }, $b//f }, $a//g } StructuralJoin $a ExtractNest $g StructuralJoin $b StructuralJoin $c Navigate$a//g - >$g ExtractNest $f op1 ExtractNest $d ExtractNest $e Navigate $//a ->$a Navigate$a//b ->$b Navigate$b//c ->$c Navigate $c//d - >$d Navigate $c//e ->$e Navigate $b//f ->$f op2 op3 Goal: Try to generate as many non-recursive operators as possible. Traverse the query plan in a top-down manner. When a structural join that corresponds to a path expression with “//” is encountered, we instantiate this structural join and its descendents as recursive mode operators.

17 Experiments Advantages of early invocation of structural join Context-aware structural join VS recursive structural join

18 Recursion-free Mode VS Recursive Mode

19 Related work Stack-Tree-Anc[AJK02] Use stack to store the chain of ancestor candidates Can be combined to our system Transducer-based XML query processor[LPY02] FSA without stack are not sufficient for handling recursion. YFilter: NFA-based path navigation [DF03] Do not guarantee that the structural join is processed at first possible moment

20 Conclusions Propose a new class of stream operators for recursive XQuery stream processing Propose a context-aware structural join Use cheaper algebra operators whenever possible in plan generation Illustrate performance benefits with little overhead in experiments

21

22