1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.

Slides:



Advertisements
Similar presentations
Querying on the Web: XQuery, RDQL, SparQL Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology.
Advertisements

CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute.
Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams Bernhard Stegmaier (TU München) Joint work with.
Boosting XML filtering through a scalable FPGA-based architecture A. Mitra, M. Vieira, P. Bakalov, V. Tsotras, W. Najjar.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
Querying Streaming XML Data. Layout of the presentation  Introduction  Common Problems faced  Solution proposed  Basic Building blocks of the solution.
RAINDROP: XML Stream Processing Engine Murali Mani, DB seminar June 08, 2006 Partially Supported by NSF grant IIS
11/08/2002WIDM20021 An Algebraic Approach For Incremental Maintenance of Materialized XQuery Views Maged EL-Sayed, Ling Wang, Luping Ding, and Elke A.
Querying XML (cont.). Comments on XPath? What’s good about it? What can’t it do that you want it to do? How does it compare, say, to SQL?
IS432: Semi-Structured Data Dr. Azeddine Chikh. 7. XQuery.
QSX (LN 3)1 Query Languages for XML XPath XQuery XSLT (not being covered today!) (Slides courtesy Wenfei Fan, Univ Edinburgh and Bell Labs)
XQuery: 1 W3C (World Wide Web Consortium) What is W3C? –An industry consortium, best known for standardizing HTML and XML. –Working Groups create or adopt.
A Graphical Environment to Query XML Data with XQuery
A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD.
1 Efficient XML Stream Processing with Automata and Query Algebra A Master Thesis Presentation Student: Advisor: Reader: Jinhui Jian Prof. Elke A. Rundensteiner.
Query Languages - XQuery Slides partially from Dan Suciu.
A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov.
The Raindrop Engine: Continuous Query Processing Elke A. Rundensteiner Database Systems Research Lab, WPI 2003.
4/15/2002Bo Du 1 - Bo Du, April 15, XML - QL A Query Language for XML.
WIDM 2002 DSRG, Worcester Polytechnic Institute1 Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke.
XML QUERY LANGUAGE Prepared by Prof. Zaniolo, Hung-chih Yang, Ling-Jyh Chen Modified by Fernando Farfán.
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
Processing of structured documents Spring 2003, Part 8 Helena Ahonen-Myka.
Advisor: Prof. Zaniolo Hung-chih Yang Ling-Jyh Chen XML Query Language.
Querying Tree-Structured Data Using Dimension Graphs Dimitri Theodoratos (New Jersey Institute of Technology, USA) Theodore Dalamagas (National Techn.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
CSE3201/CSE4500 XPath. 2 XPath A locator for elements or attributes in an XML document. XPath expression gives direction.
XML-QL A Query Language for XML Charuta Nakhe
1 XQuery Slides From Dr. Suciu. 2 FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...
TDDD43 XML and RDF Slides based on slides by Lena Strömbäck and Fang Wei-Kleiner 1.
XPath Processor MQP Presentation April 15, 2003 Tammy Worthington Advisor: Elke Rundensteiner Computer Science Department Worcester Polytechnic Institute.
CSE3201/CSE4500 Information Retrieval Systems
1 XPath XPath became a W3C Recommendation 16. November 1999 XPath is a language for finding information in an XML document XPath is used to navigate through.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
1 XML-KSI, 2004 XML- : an extendible framework for manipulating XML data Jaroslav Pokorny Charles University Praha.
End of XML February 19 th, FLWR (“Flower”) Expressions FOR... LET... WHERE... RETURN... FOR... LET... WHERE... RETURN...
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
Microsoft Office 2007 Intermediate© 2008 Pearson Prentice Hall1 PowerPoint Presentation to Accompany GO! With Microsoft ® Office 2007 Intermediate Chapter.
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
Database Systems Part VII: XML Querying Software School of Hunan University
SDPL 2002Notes 9: XQuery1 9 Querying XML Data and Documents n XQuery, W3C XML Query Language –"work in progress", Working Draft, 30 April 2002 –joint work.
5/2/20051 XML Data Management Yaw-Huei Chen Department of Computer Science and Information Engineering National Chiayi University.
R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
Network Computing Laboratory A programming framework for Stream Synthesizing Service.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
IS432 Semi-Structured Data Lecture 6: XQuery Dr. Gamal Al-Shorbagy.
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: –e.g., structured files, scientific data, XML. Managing.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Efficient Evaluation of XQuery over Streaming Data
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams
Query Processing for High-Volume XML Message Brokering
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
G-CORE: A Core for Future Graph Query Languages
Alin Deutsch, University of Pennsylvania Mary Mernandez, AT&T Labs
Early Profile Pruning on XML-aware Publish-Subscribe Systems
Probabilistic Databases
XQuery Leonidas Fegaras.
Adaptive Query Processing (Background)
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Presentation transcript:

1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003

2 Need for Stream Processing New environment  Data source is everywhere  Data request is everywhere New applications  Sensor networks  Analysis of XML web logs  Selective dissemination of XML information (e.g., news) New features  On-line arriving data  Potentially unstable data  Real-time response requirement  Scalability requirement

3 Specific Challenges for XML Streams Pattern retrieval on nested data + filtering/restructuring FOR $b in doc (bib.xml) //book LET $p := $b/price $t := $b/title WHERE $p > 50 Return $t TCP/IP Illustrated Stevens W. Addison-Wesley … Token-by-Token access manner timeline TCP/IP Illustrated … A token: can be an open tag/close tag/PCDATA is not a direct counterpart of a tuple

4 Observations and Questions Observations  Pattern retrieval->The Automata model is long studied for pattern retrieval on tokens  Filtering/Structuring->The Algebraic model is long studied for optimizing query plan on tuples Questions  How to integrate the two models?  How to optimize a query within the integrated query model?

5 Uniform Modeling in an Algebraic Framework

6 A Running Example Give me book titles whose price is greater than 50: FOR $b in doc (bib.xml) //book WHERE $b/price > 50 RETURN $b/title TCP/IP Illustrated Stevens W. Addison-Wesley Languages and Machines Sudkamp T. Addison-Wesley … TCP/IP Illustrated … timeline TCP/IP Illustrated Stevens … … Input XML stream

7 Automata Computation: NFAs + Buffers FOR $b in doc (bib.xml) //book WHERE $b/price > 50 RETURN $b/title 1 book * 2 4 title 3 price TCP/IP Illustrated Buffer for title Buffer for price t0 t1 t2 t3 t4 t5 t6 t7 TCP/IP Illustrated … input active states+0+1+1,2+1,4-1,4+1,3…… stack[0] [1] [0] [1] [1,2] [0] [1] [1,2] [1,4] [0] [1] [1,2] [0] [1] [1,2] [1,3] …… No materialization needed Multiple patterns resolved in one pass

8 Algebraic Computation FOR $b in doc (bib.xml) //book WHERE $b/price > 50 RETURN $b/title Extract //book Navigate //book, price Select price > 50 Tagger Navigate //book, title book title author last first publisherprice Text Selection push- down enabled

9 The Raindrop Approach  Uniform Automata computation modeled in an algebraic manner  Tight-coupling Automata and regular tuple-based computation interchangeable

10 Path Bindings in XQuery FOR $b in doc (bib.xml) //book LET $p := $b/price, $t := $b/title WHERE $p > 50 RETURN $t FLWR expression: FOR…LET...WHERE…RETURN… Path bindingsFiltering and restructuring “The purpose of path bindings is to produce a tuple stream in which each tuple consists of one or more bound variables” [W3C]

11 Data Flow Automata plan Regular algebraic plan Tuple stream XML data stream Query answer

12 Modeling the Automata Plan: Black Box[xscan] vs. White Box Automata Plan Q1 := //book Q2 := //book/price Q3 := //book/title SJoin //book Extract //book/price Extract //book/title Black Box White Box

13 A Unified Process at the Logical View Select //book/price >5 0 Navigate //book, //book/title SJoin //book Extract //book/price Extract //book/title

The Algebra Core OpSymbolSemantic Selection Filter tuples based on the predicate pred Projection Filter columns in the input tuples based on the variable list v Join Join input tuples based on the predicate pred Aggregate Aggregate over input tuples with the aggregate function f, e.g., sum and average Tagger Format outputs based on the pattern pt, i.e., reconstruct XML tags Navigate Take input elements of path p1 and output ancestor elements of path p2 Extract Identify elements of path p from the input stream Structural Join Join input tuples on their structural relationship, e.g, the common parent relationship p Relatinal- like XML- Specific

15 The Extract Operator 12 book * Extract //book/title TCP/IP Illustrated … … 1 title TCP/IP Illustrated Data on the Web Advanced Programming in the Unix environment

16 The Structural Join Operator 12 book 3 title * 4 price Extract //book/title Extract //book/price SJoin //book FOR $b in doc (bib.xml) //book LET $p := $b/price, $t := $b/title WHERE $p > 50 RETURN $t … TCP/IP Illustrated … … … Tight coupling …

17 The Navigate Operator TCP/IP Illustrated Stevens W. Addison-Wesley … … … … … … … … … Navigate //book, title

18 Optimization

19 In or Out? Automata plan Regular algebraic plan Tuple stream XML data stream Query answer Pattern retrieval

Pattern Retrieval Alternatives … …</price … TCP/IP Illustrated Stevens W. Addison-Wesley … … … … … … … … … … … … In Automata (/title, /price) Out of Automata(/title, /price) 1 book * 2 4 title 3 price 1 book * 2

21 Plan Alternatives 1 Extract //book * Navigate //book, price 2 book Select price >5 0 Navigate //book, title The pull-out plan Extract //book/price title price Extract //book/title * SJoin //book 2 book Select //book/price >50 The push-in plan Tagger

22 Experiment 1:

23 Experiment 2

24 Camp 1: Complete Automata Model [XSQ, XSM, XPush] All details on the same level  Hard to understand  Not suitable for optimizing at different levels Little studied for using automata as query processing paradigm For $x in $R/a return for $Y in $X/b return $Y, $X 0,0,0 1,0,0 2,1,0 2,2,1 2,2,2 2,1,3 1,1,3 1,2,2 1,2,1 1,1,0 *r=er|r++ *r=sr|r++ *r!= |r++ *r= |w(x,sx),w(x, ), r++,x”++ *r= |w(x, ), w(x,ex),r++,xs=x *r!= &*r!= | w(x,*r),r++,x”++ *r= |w(x, ),r++ * true|xm=x’, w(o, ),w(o, ),x’++ *r!= &*r!= |w(x,*r),w(o,*r), x”++,r++ *r= |w(x, ),w(o, ),r++,x”++ !AE(x’)&*x’!=ex| w(o,*x’),x’++ AE(x’)&*r!= |w(x,*r), w(o,*r),r++,x”++ AE(x’)&*r= |w(x, ),w(o, ),w(x,ex),r++,x’++ !AE(x’)&x’!=ex|w(o,*x’),x’++ !AE(x”)&x”= |w(o, ),x”++ !AE(x”)&*x”!= | w(o,*x”),x”++ True|xm=x’,w(o, ), w(o, ),x’++ !AE(x”)&*x”= |x”++ !AE(x”)&*x”!= &*x”!=ex|x”++ !AE(x”)&*x” =ex|xs=x”

25 Camp 2: Automata-Algebra Loosely Coupled Model [Tukwila, YFilter] Fixed interface for automata computation (all pattern retrieval pushed down)  No opportunity of pushing/pulling computation into/from automata Bloated, black box operator  Algebraic rewriting impossible for internal optimization Automata Plan $b := //book $p := //book/price $t := //book/title $b$p$t

26 Contribution Automata and algebra modeled into one framework allowing a uniform logical view Opportunity of push-into-automata and pull-out of- automata provided via query rewriting Optimization necessity verified by experiments

27