Bose, Fegaras, Levine, Chaluvadi DBPL 20031 A Query Algebra for Fragmented XML Stream Data Sujoe Bose Leonidas Fegaras.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams Bernhard Stegmaier (TU München) Joint work with.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
Xyleme A Dynamic Warehouse for XML Data of the Web.
Midterm Review Spring Overview Sorting Hashing Selections Joins.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
1 COS 425: Database and Information Management Systems XML and information exchange.
Storing and Querying Ordered XML Using Relational Database System Swapna Dhayagude.
1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
Query Execution Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 23, 2004.
Module 9 Designing an XML Strategy. Module 9: Designing an XML Strategy Designing XML Storage Designing a Data Conversion Strategy Designing an XML Query.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
MATH 224 – Discrete Mathematics
XForms: A case study Rajiv Shivane & Pavitar Singh.
Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn.
Xpath Query Evaluation. Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity.
Leonidas Fegaras May Query Processing of XML Data Leonidas Fegaras University of Texas at Arlington.
Relational Algebra Instructor: Mohamed Eltabakh 1.
Comparing XSLT and XQuery Michael Kay XTech 2005.
XML Signature Prabath Siriwardena Director, Security Architecture.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
CSE 6331 © Leonidas Fegaras System R1 System R Optimizer Read the paper (available at the course web page): G. Selinger, M. Astrahan, D. Chamberlin, R.
Accessing to Spatial Data in Mobile Environment Presented By Jekkin Shah.
CSCE Database Systems Chapter 15: Query Execution 1.
Database Management 9. course. Execution of queries.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Querying Structured Text in an XML Database By Xuemei Luo.
XRules An XML Business Rules Language Introduction Copyright © Waleed Abdulla All rights reserved. August 2004.
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
XStreamCast OGI 11/12/03 1 XStreamCast: Broadcasting and Query Processing of Streamed XML Leonidas Fegaras University of Texas at Arlington.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
CSE 6331 © Leonidas Fegaras XML Research at UTA.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
Keyword Searching Weighted Federated Search with Key Word in Context Date: 10/2/2008 Dan McCreary President Dan McCreary & Associates
XML and Database.
Leonidas FegarasThe Joy of SAX1 The Joy of SAX Leonidas Fegaras University of Texas at Arlington
CS4432: Database Systems II Query Processing- Part 2.
INT-2: XQuery Levels the Data Integration Playing Field Carlo (Minollo) Innocenti DataDirect XML Technologies, Program Manager.
Fegaras & Elmasri UTA 1 VLDB 2001 Query Engines for Web-Accessible XML Data Leonidas Fegaras Ramez Elmasri University of Texas at Arlington.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
CS4432: Database Systems II Query Processing- Part 1 1.
Execution Plans Detail From Zero to Hero İsmail Adar.
Partial Query-Evaluation in Internet Query Engines Jayavel Shanmugasundaram Kristin Tufte David DeWitt David Maier Jeffrey Naughton University of Wisconsin.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
Efficient Evaluation of XQuery over Streaming Data
A Fully Pipelined XQuery Processor
Lecture 2 Lexical Analysis
Open Source distributed document DB for an enterprise
Goal for this lecture Demonstrate how we can prove that one query language is more expressive than (i.e., “contained in” as described in the book) another.
Query Processing of XML Data
Querying XML XPath.
Querying XML XPath.
Chapter 2: Intro to Relational Model
XQuery Leonidas Fegaras.
Query Optimization.
Adaptive Query Processing (Background)
Presentation transcript:

Bose, Fegaras, Levine, Chaluvadi DBPL A Query Algebra for Fragmented XML Stream Data Sujoe Bose Leonidas Fegaras David Levine Vamsi Chaluvadi University of Texas at Arlington

Bose, Fegaras, Levine, Chaluvadi DBPL Processing Streamed XML Data Most web servers are pull-based: A client submits a request, the server returns the requested data. This doesn’t scale very well for large number of clients and large query results. Alternative method: pushed-based dissemination –The server broadcasts/multicasts data in a continuous stream –The client connects to multiple streams and evaluates queries locally –No handshaking, no error-correction –All processing is done at the client side –The only task performed by the server is slicing, scheduling, and broadcasting data: Critical data may be repeated more often than no-critical data Invalid data may be revoked New updates may be broadcast as soon as they become available.

Bose, Fegaras, Levine, Chaluvadi DBPL A Framework for Processing XML Streams The server slices an XML data source into XML fragments. Each fragment: –is a filler that fills a hole –may contain holes which can be filled by other fragments –is wrapped with control information, such as its unique hole ID, the path that reaches this fragment, etc. The client opens connections to streams and evaluates XQueries against these streams –For large streams, it’s a bad idea to reconstruct the streamed data in client’s memory need to process fragments as soon they become available from the server –There are blocking operators that require unbounded memory: Sorting Joins between two streams or self-joins Group-by with aggregation.

Bose, Fegaras, Levine, Chaluvadi DBPL The Fragmented Hole-Filler Model Wal-Mart PDA HP PalmPilot Calculator Casio FX

Bose, Fegaras, Levine, Chaluvadi DBPL An Algebra for Stored XML Data Based on the nested-relational algebra:  v (T)access the XML data source T using v  pred (X)select fragments from X that satisfy pred  v1,….,vn (X)project X  Ymerge X pred Yjoin  pred v,path (X) unnest (retrieve descendents of elements)  pred ,h (X) apply h and reduce by   gs,pred v, ,h (X)group-by gs, apply h to each group, and reduce each group by 

Bose, Fegaras, Levine, Chaluvadi DBPL Semantics  v (T)= { }  pred (X)= { t | t  X, pred(t) }  v1,….,vn (X)= { | t  X } X  Y= X ++ Y X pred Y= { t x  t y | t x  X, t y  Y, pred(t x,t y ) }  pred v,path (X)= { t  | t  X, w  PATH(t,path), pred(t,w) }  pred ,h (X) =  /{ h(t) | t  X, pred(t) }  gs,pred v, ,h (X) = …

Bose, Fegaras, Levine, Chaluvadi DBPL Example #1 where

Bose, Fegaras, Levine, Chaluvadi DBPL Example #1 (cont.)  ,element(“book”,$b/title)   $v/bib/book $b $v document(“  $b/publisher=“Addison-Wesley” and > 1991

Bose, Fegaras, Levine, Chaluvadi DBPL Example #2 for $u in document(“users.xml”)//user_tuple return { $u/name } { for $b in document(“bids.xml”)//bid_tuple[userid=$u/userid]/itemno $i in document(“items.xml”)//item_tuple[itemno=$b] return { $i/description/text() } sortby(.) } sortby(name)     document(“users.xml”) $us $us/users/user_tuple   document(“bids.xml”) $bs $bs/bids/bid_tuple   document(“items.xml”) $is $is/items/item_tuple $u $i $b  $c/itemno $c/userid=$u/userid $c $i/itemno=$b sort, elem(“bid”,$i/description/text()) sort($u/name), elem(“user”,$u/name++  )

Bose, Fegaras, Levine, Chaluvadi DBPL XPath Expressions Path evaluation is central to the algebra: PATH: ( XML-data, simple-XPath )  set(XML-data) Some rules for stored XML data: PATH( x,A/path) = PATH(x,path) PATH( x,A) = { x } PATH(x 1 x 2,path) = PATH(x 1,path)  PATH(x 2,path) PATH(x,path) =  otherwise Predicates have existential semantics $v/A/B = “text”   x  PATH(v,A/B): x = “text”

Bose, Fegaras, Levine, Chaluvadi DBPL The Streamed XML Algebra Much like the stored XML algebra, but works on streams. A stream  takes the forms: t ;  ’ a fragment t followed by the rest of the stream  ’ eosend-of-stream Each stored XML algebraic operator has a streamed counterpart eg,  pred (t ;  ) = t ;  pred (  )if pred is true for t  pred (t ;  ) =  pred (  )otherwise  pred (eos) = eos but … we may not be able to validate pred due to holes in t

Bose, Fegaras, Levine, Chaluvadi DBPL Streamed Algebra Semantics To keep the suspended fragments, each streamed algebraic operator has –one state  0 for the output and –optional state(s)  1 /  2 for the input(s) The result of PATH may now be unspecified: PATH(,path) = PATH(  1 (m),path)if m   1 = {  }otherwise When in predicates,  requires 3-value logic Incomplete fragments are suspended when necessary, eg:  pred (t ;  ) = t ;  pred (  )if true  PATH(t,pred)  pred (t ;  ) =  pred (  )otherwise  0   0  {t}if  PATH(t,pred)

Bose, Fegaras, Levine, Chaluvadi DBPL Join Much like main-memory symmetric join states: –  0 all suspended output tuples due to unfilled holes –  1 all tuples from left stream –  2 all tuples from right stream a tuple from left stream: (t 1 ;  1 ) pred  2 = { t 1  t 2 | t 2  2, true  PATH(t 1  t 2,pred) }; (  1 pred  2 )  1   1  t 1  0   0  { t 1  t 2 | t 2  2,  PATH(t 1  t 2,pred) } a tuple from right stream:  1 pred (t 2 ;  2 ) = { t 1  t 2 | t 1  1, true  PATH(t 1  t 2,pred) }; (  1 pred  2 )  2   2  t 2  0   0  { t 1  t 2 | t 1  1,  PATH(t 1  t 2,pred) }

Bose, Fegaras, Levine, Chaluvadi DBPL Reconstructing the XML Data  : set(int  XML-data) is an environment that binds filler ids to XML. x   replaces holes with fillers in x using the environment  : x   = x   (x 1 x 2 )   = (x 1   ) (x 2   )   =  [m]if m  x   = xotherwise R(  ) returns a pair (a,  ), where and a is  [0] (the reconstructed data): if R(  ) = (a,  ) then R( ;  ) = R(eos) = ( ,  ) Basically, R(t ;  ) = f(R(  )) (x  ,  ) if m=0 (a   ’,  ’) if m  0 where  ’={(m,x   )}   [m/x] {

Bose, Fegaras, Levine, Chaluvadi DBPL Equivalence Between Stored & Streamed Algebras If we reconstruct the XML document from the streamed fragments and evaluate a query using the stored algebra, we get the same result as when we use the equivalent streamed algebra over the streamed XML fragments and reconstruct the result. XML document XML fragments result reconstruction stored XML algebra streamed XML algebra XML fragments reconstruction Proof sketch: We prove R(  p (  ))=  p (R(  )) inductively, where  p is the stream version of  p. If true  PATH(t,pred), then R(  p (t;  ))=R(t;  p (  ))=f(R(  p (  )))=f(  p (R(  ))) =  p (f(R(  ))) =  p (R(t;  )) …

Bose, Fegaras, Levine, Chaluvadi DBPL Conclusion Fragmented XML data are easier to handle and synchronize than an infinitely long stream Associating holes with fillers takes care of out-of-sequence transmission, repetitions, replacements, and removals Our streamed algebra has similar operators but different semantics than our stored algebra Our algebra can capture most non-recursive XQueries Our future work includes –the development of main-memory algorithms for processing XML data streams under memory and power constraints –The development of a comprehensive approach to optimizing XQueries that utilizes our main-memory algorithms.