A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov.

Slides:



Advertisements
Similar presentations
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Advertisements

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
CS 540 Database Management Systems
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams Hong Su, Elke Rundensteiner, Murali Mani, Ming Li Worcester Polytechnic Institute.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
RAINDROP: XML Stream Processing Engine Murali Mani, DB seminar June 08, 2006 Partially Supported by NSF grant IIS
11/08/2002WIDM20021 An Algebraic Approach For Incremental Maintenance of Materialized XQuery Views Maged EL-Sayed, Ling Wang, Luping Ding, and Elke A.
Staged-DB IC-65 Advances in Data Management Systems 1 Scheduling in Staged- DB Systems Nicolas Bonvin, Rammohan Narendula, and Surender Reddy Yerva.
Elke A. Rundensteiner Topics projects in database and Information systems, such as, web information systems, distributed databases, Etc. Database Systems.
Continuous Stream Monitoring Technology Elke A. Rundensteiner Database Systems Research Laboratory Department of Computer Science Worcester Polytechnic.
A Graphical Environment to Query XML Data with XQuery
1 Efficient XML Stream Processing with Automata and Query Algebra A Master Thesis Presentation Student: Advisor: Reader: Jinhui Jian Prof. Elke A. Rundensteiner.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 1: Introduction to Decision Support Systems Decision Support.
The Raindrop Engine: Continuous Query Processing Elke A. Rundensteiner Database Systems Research Lab, WPI 2003.
1 A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
WIDM 2002 DSRG, Worcester Polytechnic Institute1 Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach Xin Zhang, Bradford Pielech and Elke.
Sangam: A Transformation Modeling Framework Kajal T. Claypool (U Mass Lowell) and Elke A. Rundensteiner (WPI)
1 Rainbow XML-Query Processing Revisited: The Incomplete Story (Part II) Xin Zhang.
A Unified Model for XQuery Evaluation over XML Data Streams Jinhui Jian Hong Su Elke A. Rundensteiner Worcester Polytechnic Institute ER 2003.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Prefetching for Visual Data Exploration Punit R. Doshi, Elke A. Rundensteiner, Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.
1 Processing Recursive Xquery over XML Streams: The Raindrop Approach Mingzhu Wei Ming Li Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute.
CSC2012 Database Technology & CSC2513 Database Systems.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
1 Distributed Monitoring of Peer-to-Peer Systems By Serge Abiteboul, Bogdan Marinoiu Docflow meeting, Bordeaux.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Schema-Based Query Optimization for XQuery over XML Streams Hong Su Elke A. Rundensteiner Murali Mani Worcester Polytechnic Institute, Massachusetts, USA.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner.
1 XQuery to SQL by XML Algebra Tree Brad Pielech, Brian Murphy Thanks: Xin.
Aum Sai Ram Security for Stream Data Modified from slides created by Sujan Pakala.
The Volcano Optimizer Generator Extensibility and Efficient Search.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
INT-2: XQuery Levels the Data Integration Playing Field Carlo (Minollo) Innocenti DataDirect XML Technologies, Program Manager.
CS 540 Database Management Systems
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Database Management System
Software Design and Architecture
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Chapter 2 Database Environment Pearson Education © 2009.
Raindrop: An Algebra-Automata Combined XQuery Engine over XML Streams
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 12 Query Processing (1)
Query Optimization.
Adaptive Query Processing (Background)
Presentation transcript:

A Uniform and Layered Algebraic Framework for XQueries on XML Streams Hong Su Jinhui Jian Elke A. Rundensteiner Worcester Polytechnic Institute CIKM, Nov 5, 2003

2 Need for Stream Processing New computing environment  Data sources can be anywhere/anytime On-line arriving data  Data requests can be anywhere/anytime Real-time response requirement New applications  Relational Sensor networks  XML Analysis of XML web logs Selective dissemination of XML information (e.g., news)

3 What ’ s Special for XML Stream Processing Dream Catcher King S. Bt Bound 30 … Dream Catcher … Token-by-Token access manner timeline Pattern retrieval + Filtering + Restructuring FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t Token: not a direct counterpart of a tuple 30Bt BoundS.KingDream2001 pricepublisherfirstlasttitleyear Pattern Retrieval on Token Streams

4 Two Computation Paradigms Automata-based [yfilter02, xscan01, xsm02, xsq03, xpush03…] Algebraic [niagara00, …] This Raindrop framework intends to integrate both paradigms into one

5 Automata-Based Paradigm FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t 1 book * 2 4 title price Auxiliary structures for: 1.Buffering data 2.Filtering 3.Restructuring … //book //book/title //book/price 3

6 Algebraic Computation book title author last first publisherprice Text … Navigate $b, /title -> $t Navigate $b, /price->$p Navigate $b, /title-> $t Tagger Select $p < 30 Logic Plan Navigate //$b, /title->$t Rewrite by “pushing down selection” Navigate $b,/price->$p Select $p < 30 Tagger Rewritten Logic Plan Navigate-Index $b, /price -> $p Select $p < 30 Tagger Navigate-Scan $b, /title -> $t Physical Plan Choose low- level implementation alternatives FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t $b $t … …

7 Observations Either paradigm has deficiencies Both paradigms complement each other Automata ParadigmAlgebra Paradigm Good for pattern retrieval on tokensDoes not support token inputs Need patches for filtering and restructuring Good for filtering and restructuring Present all details on same low levelSupport multiple descriptive levels (declarative->procedural) Little studied as query processing paradigm Well studied as query process paradigm

8 How to Integrate Two Paradigms

9 How to Integrate Two Models? Design choices  Extend algebraic paradigm to support automata?  Extend automata paradigm to support algebra?  Come up with completely new paradigm? Extend algebraic paradigm to support automata  Practical Reuse & extend existing algebraic query processing engines  Natural Present details of automata computation at low level Present semantics of automata computation (target patterns) at high level

10 Raindrop: Four-Level Framework Semantics-focused Plan Stream Logic Plan Stream Physical Plan Stream Execution Plan Abstraction Level High (Declarative) Low (Procedural)

11 Level I: Semantics-focused Plan [Rainbow- ZPR02] Express query semantics regardless of stored or stream input sources Reuse existing techniques for stored XML processing  Query parser  Initial plan constructor  Rewriting optimization Decorrelation Selection push down …

12 FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t Dream Catcher King S. Bt Bound 30 … $S1 … $S1 … $b … … … $S1 … $b … $p 30 … … … $S1 … $b … $p 30 $t Dream Catcher …... …… NavUnnest $S1, //book ->$b NavNest $b, /price/text() ->$p NavNest $b, /title ->$t Select $p<30 Tagger “Inexpensive”, $t->$r Example Semantics-focused Plan

13 Level II: Stream Logical Plan Extend semantics-focused plan to accommodate tokenized stream inputs  New input data format: contextualized tokens  New operators: StreamSource, Nav, ExtractUnnest, ExtractNest, StructuralJoin  New rewrite rules: Push-into-Automata

14 One Uniform Algebraic View Token-based plan (automata plan) Tuple-based plan Tuple stream XML data stream Query answer Algebraic Stream Logical Plan

15 Modeling the Automata in Algebraic Plan: Black Box[XScan01] vs. White Box $b := //book $p := $b/price $t := $b/title Black Box FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 20 Return $t XScan StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t White Box Navigate $b, /price->$p Navigate $b, /title->$t Navigate $S1, //book ->$b

16 Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return $t Tuple-based plan Token-based plan (automata plan)

17 Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return $t StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Navigate $b, /price->$p Navigate $b, /title->$t Navigate $S1, //book ->$b Tuple-based plan

18 Example Uniform Algebraic Plan FOR $b in stream(biditems.xml) //book LET $p := $b/price $t := $b/title WHERE $p < 30 Return $t StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Navigate $b, /price->$p Navigate $b, /title->$t Navigate $S1, //book ->$b Select $p<30 Tagger “Inexpensive”, $t->$r

19 From Semantics-focused Plan to Stream Logical Plan StructuralJoin $b ExtractNest $b, $p ExtractNest $b, $t Nav $b, /price/text()->$p Nav $b, /title->$t Nav $S1, //book ->$b Select $p<30 Tagger “Inexpensive”, $t->$r NavUnnest $S1, //book ->$b NavNest $b, /price/text() ->$p NavNest $b, /title ->$t Select $p<30 Tagger “Inexpensive”, $t->$r Apply “push into automata”

20 Level III: Stream Physical Plan For each stream logical operator, define how to generate outputs when given some inputs  Multiple physical implementations may be provided for a single logical operator  Automata details of some physical implementation are exposed at this level Nav, ExtractNest, ExtractUnnest, Structural Join

21 One Implementation of Extract/Structural Join 1 book title * 4 price 3 2 ExtractNest $b, $t ExtractNest /$b, $p SJoin //book … Dream Catcher … … … Nav $b, /price->$p Nav $b, /title->$t Nav., //book ->$b

22 Level IV: Stream Execution Plan Describe coordination between operators regarding when to fetch the inputs  When input operator generates one output tuple  When input operator generates a batch  When a time period has elapsed  … Potentially unstable data arrival rate in stream makes fixed scheduling strategy unsuitable  Delayed data under scheduling may stall engine  Bursty data not under scheduling may cause overflow

23 Raindrop: Four-Level Framework (Recap) Semantics-focused Plan Stream Logic Plan Stream Physical Plan Stream Execution Plan Express the semantics of query regardless of input sources Accommodate tokenized input streams Describe how operators manipulate given data Decides the Coordination among operators

24 Optimization Opportunities

25 Optimization Opportunities Semantics-focused Plan Stream Logic Plan Stream Physical Plan Stream Execution Plan General rewriting (e.g., selection push down) Break-linear-navigation rewriting Physical implementations choosing Execution strategy choosing

26 From Semantics-focused to Stream Logical Plan: In or Out? Token-based plan (automata plan) Tuple-based Plan Tuple stream XML data stream Query answer Pattern retrieval in Semantics- focused plan Apply “push into automata”

27 Plan Alternatives Nav $b, /price->$p ExtractNest $b, $p ExtractNest $b, $t SJoin //book Select price < 30 Tagger Nav $b, /title->$t Nav $S1, //book->$b ExtractNest $S1, $b Navigate /price Select price<30 Navigate book/title Tagger Nav $S1, //book->$b NavUnnest $S1, //book ->$b NavNest $b, /price ->$p NavNest $b, /title ->$t Select $p<30 Tagger “Inexpensive”, $t->$r Out In

28 Experimentation Results

29 Contributions Combined automata and algebra based paradigms into one uniform algebraic paradigm Provided four layers in algebraic paradigm  Query semantics expressed at high layer  Automata computation on streams hidden at low layer Supported optimization at an iterative manner (from high abstraction level to low abstraction level) Illustrated enriched optimization opportunities by experiments

30