A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD.

Slides:



Advertisements
Similar presentations
Inside an XSLT Processor Michael Kay, ICL 19 May 2000.
Advertisements

System Integration and Performance
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
TOPIC : Finite State Machine(FSM) and Flow Tables UNIT 1 : Modeling Module 1.4 : Modeling Sequential circuits.
CSE 6331 © Leonidas Fegaras XML and Relational Databases 1 XML and Relational Databases Leonidas Fegaras.
Schema-based Scheduling of Event Processors and Buffer Minimization for Queries on Structured Data Streams Bernhard Stegmaier (TU München) Joint work with.
Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
Fine Grained Access Control in XML DataBase Systems Naveen Yajamanam April 27,2006.
Adaptive Monitoring of Bursty Data Streams Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
SCORE - Stream Computations Organized for Reconfigurable Execution Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, Yury Markovskiy Andre DeHon, John.
A Compiler-Based Approach to Schema-Specific Parsing Kenneth Chiu Grid Computing Research Laboratory SUNY Binghamton Sponsored by NSF ANI
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Recap. The Memory Hierarchy Increasing distance from the processor in access time L1$ L2$ Main Memory Secondary Memory Processor (Relative) size of the.
Combining Static and Dynamic Data in Code Visualization David Eng Sable Research Group, McGill University PASTE 2002 Charleston, South Carolina November.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
ObjectStore Martin Wasiak. ObjectStore Overview Object-oriented database system Can use normal C++ code to access tuples Easily add persistence to existing.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
High Performance Architectures Dataflow Part 3. 2 Dataflow Processors Recall from Basic Processor Pipelining: Hazards limit performance  Structural hazards.
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
Streaming Processing of Large XML Data Jana Dvořáková, Filip Zavoral processing of large XML data using XSLT with optimal memory complexity formal model.
Navigation-Driven Evaluation of Virtual Mediated Views Bertram Ludäscher, SDSC/UCSD Yannis Papakonstantinou, UCSD Pavel Velikhov, UCSD Overview Mediator.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 2.
OGSA-DAI Architecture The OGSA-DAI Team
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
May 16-18, Skeletons and Asynchronous RPC for Embedded Data- and Task Parallel Image Processing IAPR Conference on Machine Vision Applications Wouter.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
ESO SDD - Henning Lorch ESO Instrumentation Software Workshop Henning Lorch “Reflex” Pipeline Frontend.
EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM
Introduction to Code Generation and Intermediate Representations
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
Leonidas FegarasThe Joy of SAX1 The Joy of SAX Leonidas Fegaras University of Texas at Arlington
Streaming XPath Engine Oleg Slezberg Amruta Joshi.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
Chapter 1 Computers, Compilers, & Unix. Overview u Computer hardware u Unix u Computer Languages u Compilers.
INT-2: XQuery Levels the Data Integration Playing Field Carlo (Minollo) Innocenti DataDirect XML Technologies, Program Manager.
9/25/08IEEE ICWS 2008 High-Performance XML Parsing and Validation with Permutation Phrase Grammar Parsers Wei Zhang & Robert van Engelen Department of.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Programming Language Concepts (CIS 635) Elsa L Gunter 4303 GITC NJIT,
Processing XML Streams with Deterministic Automata Denis Mindolin Gaurav Chandalia.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
A Framework For Testing Web Services Based On XQPN Petri Nets Piotr Szwed, Dariusz Wadowski and Krzysztof Paździora Institute of Automatics, AGH University.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
XML: Extensible Markup Language
Advanced Computer Systems
Efficient Evaluation of XQuery over Streaming Data
CS 326 Programming Languages, Concepts and Implementation
A DFA with Extended Character-Set for Fast Deep Packet Inspection
SOFTWARE DESIGN AND ARCHITECTURE
Morgan Kaufmann Publishers
Chapter 12: Query Processing
1.1 The Evolution of Database Systems
Programming Languages
Querying XML XPath.
Querying XML XPath.
2/18/2019.
Course Overview PART I: overview material PART II: inside a compiler
CS703 - Advanced Operating Systems
Adaptive Query Processing (Background)
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD

Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

Efficient Processing of Sequentially Accessed XML Data XML Message Transformer Transformed XML message Web Service XML message Web Service Implementations & RMI

Web Front-End Efficient Processing of Sequentially Accessed XML Data XML-to-XHTML Transformer XML file Web Development XHTML page

Efficient Processing of Sequentially Accessed XML Data Archive Transformation & ETL (Extraction Transformation & Loading) Applications XML Processor XML archive file XML target file

Efficient Processing of Sequentially Accessed XML Data Sensor Data Processor Stream Acting/ Mining Software XML Sensor Data Analysis

Bandwidth & Connectivity will Increase the Amount of Data … XML Sensor Data Processor XML stream XML stream XML stream XML stream XML

…Hardware Advances do not Favor Conventional Architectures Magnitude Year CPU Speed CPU2Memory Speed Bandwidth

Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

Transducer-Based Processing: On-the-Fly & Minimal Memory Condition | Action … … Buffers XML Stream Machine … … Input buffer Output buffer Condition | Action

XML Stream Machine (XSM) High-Level Architecture XQuery Compiler XSM-to-C Compiler XSM XQuery C program Optional Input DTD

Components of the XQuery Compiler XQuery-to-Network Translation XSM Composition XSM Network Single XSM XQuery Schema Optimization Optional Input DTD

Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

for-where-return Expressions XQuery Subset Path Expressions Element Construction Concatenation for $X in $R/a return for $Y in $X/b return $Y, $X

XML Stream: Tags, Data & Control Tokens … 5 1 XML Stream is Sequence of  Data  Open Tag & Close Tag Tokens  Control Tokens S $R E $R

Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

XML Stream Machine (XSM) *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C Concatenation of bindings of Y, X into bindings of Z 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy 5 x y Input Buffer Y Input Buffer X SzSz Output Buffer Z EzEz

XML Stream Machine (XSM) *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy 5 x y z Input Buffer Y Input Buffer X Output Buffer Z

XML Stream Machine (XSM) *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy 5 x y z Input Buffer Y Input Buffer X Output Buffer Z

XML Stream Machine (XSM) *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z

XML Stream Machine (XSM) *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z

XML Stream Machine (XSM) *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z 5

XML Stream Machine (XSM) *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z 5

XML Stream Machine (XSM) *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z 5

XML Stream Machine (XSM) *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy SzSz 5 x y z Input Buffer Y Input Buffer X Output Buffer Z 5 5 1

XML Stream Machine (XSM) *y=S y | y++ *x=S x | w(z,S z ), x++ *y=E y | y++ *y!=E y | w(z,*y), y++ *x!=E x | w(z,*x), x++ *x=E x | w(z,E z ), x++ C 5 1 SxSx ExEx … SxSx 1 SySy … EyEy SySy EyEy SySy 5 x y z Input Buffer Y Input Buffer X SzSz Output Buffer Z EzEz

Comparison of XSM against State Automata & Transducers State Automata Do not construct Do not store intermediate results Sufficient for XPath only Transducers Finite alphabets State is the only memory No reset of input pointers XSM Unbounded alphabet Buffers Pointer reset

Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

XSM Networks: Intermediate Step in Translating Queries to XSMs XQuery-to-Network Translation XSM Composition XSM Network Single XSM XQuery

XSM Network for $X in $R/a return for $Y in $X/b return $Y, $X $R $R/a $X $X/b $Y For $Y [$Y,$X]  [$Y’,$X’] $X’ $Y’ $Z $O $Y’,$X’ $Z

From XQueries to XSM Networks: Non-FLWR Expressions $Y, $X $X $Y $O $Z $Y,$X $X $Y $Z $O

From XQueries to XSM Networks: FLWRs without Free Variables for $X in G return expr($X) $X $R G expr($X) $O

From XQueries to XSM Networks: FLWRs with Free Variables for $Y in $X/b return $Y, $X free variable $X $X $X/b $Y For $Y [$Y,$X]  [$Y’,$X’] $X’ $Y’ $Y’, $X’ $O

Overview Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions

Composition Merges Two XSMs Into One $R $R/a $X $X/b $Y For $Y [$Y,$X]  [$Y’,$X’] $X’ $Y’ $Z $O $Y’,$X’ $Z

Composition Merges Two XSMs into One $R $R/a $X $X/b $Y For $Y [$Y,$X]  [$Y’,$X’] $X’ $Y’ $O $Y’, $X’

XSM Composition: “State Product” Emulates Producer-Consumer Producer M 1 Consumer M 2 q1q1 q1q1 q2q2 “State Product” M 3 = (M 2 o M 1 ) q2q2

M1M1 M2M2 Naive Composition q1q1 q1’q1’  1 |A 1... q2q2 q2’q2’  2 |A 2... q1q1 q2q2 q1q1 q2’q2’  2 |A 2... q1q1 q2q2 q1’q1’ q2q2 ¬ 1 |A 1... M3M3 = (M 2 o M 1 ) M 2 step if (q 2 ) M 1 step if ¬(q 2 ) (q 2 ) = ¬AE(r 1 ) ...  ¬AE(r n ) = “no shared read-pointer r i of q 2 is At End” r 1... r n

Smart Composition Normalization Assumptions: #( read-pointers-into-shared-buffer(q 2 ) )  1 Atomic actions only Basic idea: avoid runtime tests (“At-End”) whenever outcome can be determined at compile- Different “modes”: go: consumer M 2 proceeds (full buffer) no: producer M 1 proceeds (empty buffer) may be consumer can follow immediately ae: do runtime check AE:

Smart Composition: no Case (shared buffer is empty) A 1 does not write to the shared buffer M 2 does not wait on shared buffer Transition insertedCase no q’ 1 q2q2  1 |A 1 q1q1 q2q2 no  2 |A 2 no q’ 2 q1q1 q2q2 no M1M1 M2M2 q1q1 q1’q1’  1 |A 1... q2q2 q2’q2’  2 |A 2... q1q1

Smart Composition: Producer fills buffer CaseTransition inserted If A 1 writes token to the shared buffer and M 2 consumes token If A 1 writes to the shared buffer, but M 2 doesn’t advance its read pointer no q’ 1 q’ 2  12 |A 12 q1q1 q2q2 no goq’ 1 q’ 2  12 |A 12 q1q1 q2q2 no Combination of A 1 with A 2 Combination of  1 with  2

Smart Composition: go - ae - no no q1q1 q’ 2 goq1q1 q’ 2 goq1q1 q2q2  2 |A 2 if A 2 advances the read pointer into shared buffer in go mode if A 2 does not advance read pointer into shared buffer goq1q1 q2q2

Smart Composition: go - ae - no in ae mode: insert transitions for M 2 step if possible... If ø 2 ; A 2 has no read from the shared buffer if ø 2 ; A 2 has a read from the shared buffer aeq1q1 q’ 2  2 |A 2 q1q1 q2q2 ae q1q1 q’ 2 ¬ AE(r) 2 |A 2 q1q1 q2q2 ae

Smart Composition: go - ae - no q’ 1 q2q2 ae q’ 1 q2q2 AE(r) 1 |A 1 if A 1 has one write into the shared buffer AND transitions corresponding to M 1 step... if A 1 has more than one write into the shared buffer q1q1 q2q2 ae noq’ 1 q2q2 AE(r) 1 |A 1 if A 1 has no write into the shared buffer q1q1 q2q2 ae q1q1 q2q2 go

Performance Datapoint (Transformation Query on DBLP) Data Size (KB) Xalan (ms) XSM Java XSM C

Conclusions & Future Work Novel query processor model Success in filtering & transformation To be extended for joins & aggregations Memory footprint questions Facilitated by model’s simplicity

Related Work Relational Data Streams & Sequence Data Models Pipelined Join Operators Aggregates & Approximations Fast XPath on streams Memory requirements of validating XML

Smart Composition: go - ae - no aeq’ 1 q2q2  1 |A 1 if A 1 does not advance shared write pointer in no mode: execute M 1 step... if A 1 does advance shared write pointer q1q1 q2q2 no if A 2 advances shared read pointer if A 2 does not advance shared read pointer goq’ 1 q’ 2  12 |A 12 q1q1 q2q2 no... AND possibly M 2 step simplified composed  1  2 and (A 1 ;A 2 ) no q’ 1 q2q2  1 |A 1 q1q1 q2q2 no q’ 1 q’ 2  12 |A 12 q1q1 q2q2 no