Presentation is loading. Please wait.

Presentation is loading. Please wait.

Πανεπιστήμιο Κρήτης Σχολή Θετικών Επιστημών Τμήμα Επιστήμης Υπολογιστών ΗΥ-561: Διαχείριση Δεδομένων στον Παγκόσμιο Ιστό Xquery Streaming à la Carte &

Similar presentations


Presentation on theme: "Πανεπιστήμιο Κρήτης Σχολή Θετικών Επιστημών Τμήμα Επιστήμης Υπολογιστών ΗΥ-561: Διαχείριση Δεδομένων στον Παγκόσμιο Ιστό Xquery Streaming à la Carte &"— Presentation transcript:

1 Πανεπιστήμιο Κρήτης Σχολή Θετικών Επιστημών Τμήμα Επιστήμης Υπολογιστών ΗΥ-561: Διαχείριση Δεδομένων στον Παγκόσμιο Ιστό Xquery Streaming à la Carte & Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation

2 XQuery Streaming à la Carte Introduction Existing XML query evaluation techniques Existing XML query evaluation techniques Algebraic optimization with algorithms for persistent data Algebraic optimization with algorithms for persistent data Streaming algorithms for transient data Streaming algorithms for transient data New Idea New Idea Physical algebra for XQuery Physical algebra for XQuery  À la carte use of streaming algorithms & optimization techniques HY-561 Paper PresentationKonstantinos GalanakisSlide 2 Introduction

3 XQuery Streaming à la Carte Introduction HY-561 Paper PresentationKonstantinos GalanakisSlide 3 Diverse Data Sources Join of local repository and streaming source Join of local repository and streaming source

4 List List Immutable ordered sequence of homogenous values Immutable ordered sequence of homogenous values Cursor Cursor Mutable ordered sequence of homogenous values Mutable ordered sequence of homogenous values Destructive Destructive C(α): Cursor containing values of type α C(α): Cursor containing values of type α Operators Operators  fromList  next  peek HY-561 Paper PresentationKonstantinos GalanakisSlide 4 XQuery Streaming à la Carte Preliminaries Preliminaries

5 HY-561 Paper PresentationKonstantinos GalanakisSlide 5 XQuery Streaming à la Carte Physical Data Model Physical Data Model 1/2 Physical Value Physical Value Physical XML value, (Xml) Physical XML value, (Xml)  Cursor of XML tokens, C(Tok)  List of tree values, L(Tree) Physical table, (Table) Physical table, (Table)  Cursor of tuples, C(τ) Physical Tuple, τ: record of fields containing physical XML valuesPhysical Tuple, τ: record of fields containing physical XML values  List of tuples, L(τ) XML Token, (Tok): XML Token, (Tok): Parsing event produced by SAX parser Parsing event produced by SAX parser

6 HY-561 Paper PresentationKonstantinos GalanakisSlide 6 XQuery Streaming à la Carte Physical Data Model Physical Data Model 2/2 XML Token, (Tok) : Parsing event produced by SAX parser XML Token, (Tok) : Parsing event produced by SAX parser startElem startElem endElem endElem text text atomic atomic hole hole

7 HY-561 Paper PresentationKonstantinos GalanakisSlide 7 Physical Representation & Conversion XQuery Streaming à la Carte Physical Representation & Conversion

8 HY-561 Paper PresentationKonstantinos GalanakisSlide 8 XQuery Streaming à la Carte Physical Algebra – Overview & Operators Physical algebra for logical Algebra proposed in C. Re, J. Simeon and M. Fernandez, “A complete and efficient algebraic compiler for XQuery”, In ICDE 2006

9 HY-561 Paper PresentationKonstantinos GalanakisSlide 9 XQuery Streaming à la Carte Physical Algebra - Constructors Constructors

10 TreeProject TreeProject Projection of path expressions on a tree. Projection of path expressions on a tree. Injected after Parse to reduce the plan input size. Injected after Parse to reduce the plan input size. TreeJoin TreeJoin Returns a node sequence in document order with no duplicate Returns a node sequence in document order with no duplicate Strictly-forward path expressions Strictly-forward path expressions  self axes  child axes  descendant axes  descendant-or-self axes  attribute axes HY-561 Paper PresentationKonstantinos GalanakisSlide 10 XQuery Streaming à la Carte Physical Algebra – Navigation Operators Navigation Operators 1/3

11 HY-561 Paper PresentationKonstantinos GalanakisSlide 11 Navigation Operators 2/3 XQuery Streaming à la Carte Physical Algebra – Navigation Operators desc-or-self::sectionchild::title Compiled in physical plan Applying the plan to an input document

12 Polymorphic Operators except MapFromItem Polymorphic Operators except MapFromItem MapFromItem MapFromItem Input → Item sequence Input → Item sequence Output → Tuple for each item Output → Tuple for each item 2 implementations 2 implementations  For Lists of trees and for token cursors Relies to map and split Relies to map and split HY-561 Paper PresentationKonstantinos GalanakisSlide 12 XQuery Streaming à la Carte Physical Algebra – Tuple Operators Tuple operators 1/2

13 HY-561 Paper PresentationKonstantinos GalanakisSlide 13 XQuery Streaming à la Carte Physical Algebra – Tuple Operators Tuple operators 2/2

14 Mapping from a logical plan (Op) to a physical plan (POp). Mapping from a logical plan (Op) to a physical plan (POp). CS(Op) → POp CS(Op) → POp Physical plan correctness Physical plan correctness Stream safety Stream safety Sufficient to ensure correctness Sufficient to ensure correctness HY-561 Paper PresentationKonstantinos GalanakisSlide 14 XQuery Streaming à la Carte Physical Algebra – Code Selection Code selection 1/4

15 HY-561 Paper PresentationKonstantinos GalanakisSlide 15 XQuery Streaming à la Carte Physical Algebra – Code Selection Code selection 2/4 Conditions for Stream Safety Navigational access on the XML values returned by Op is strictly forward Navigational access on the XML values returned by Op is strictly forward Tuples returned by Op consumed in the order of creation Tuples returned by Op consumed in the order of creation Tuple fields returned by Op accessed at most once Tuple fields returned by Op accessed at most once Op

16 Code selection heuristic based assumptions Code selection heuristic based assumptions conversion between physical representations is expensive conversion between physical representations is expensive streaming operators are more efficient on streamed sources streaming operators are more efficient on streamed sources copying whole sub-trees is expensive and should be avoided copying whole sub-trees is expensive and should be avoided Following rules are applied to each subplan Op of a whole plan Op 0, bottom-up Following rules are applied to each subplan Op of a whole plan Op 0, bottom-up 1. If a)inputs of Op are streamed, b)streaming operators POp exists for OP c)Op is stream-safe, then CS(Op) selects Op 2. If Op is a constructor operator, CS(Op) uses a streaming operator. HY-561 Paper PresentationKonstantinos GalanakisSlide 16 XQuery Streaming à la Carte Physical Algebra – Code Selection Code selection 3/4

17 Experiments on synthetic data Experiments on synthetic data verify linear scalability of streaming operators w.r.t. query and document sizes verify linear scalability of streaming operators w.r.t. query and document sizes run over MemBeR documents in XCheck framework run over MemBeR documents in XCheck framework XMark benchmarks XMark benchmarks Q2, 6, 15 are fully streamable Q2, 6, 15 are fully streamable Q1, 4, 5, 7, 14, 16 – 19 are partially streamable Q1, 4, 5, 7, 14, 16 – 19 are partially streamable Self-join queries Q8 – 12 /Q20 not streamable Self-join queries Q8 – 12 /Q20 not streamable HY-561 Paper PresentationKonstantinos GalanakisSlide 17 XQuery Streaming à la Carte Experimental Evaluation Experimental Evaluation 1/2

18 HY-561 Paper PresentationKonstantinos GalanakisSlide 18 Experimental Evaluation 1/2 XQuery Streaming à la Carte Experimental Evaluation

19 Buffer manager of a streaming Xquery will Buffer manager of a streaming Xquery will 1. Only relevant query evaluation data put into buffer 2. Avoid keeping data buffered longer than necessary 3. Avoid keeping multiple copies of the data in buffers Claim: Combination of static analysis and dynamic buffer minimization techniques needed Claim: Combination of static analysis and dynamic buffer minimization techniques needed HY-561 Paper PresentationKonstantinos GalanakisSlide 19 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Introduction General

20 HY-561 Paper PresentationKonstantinos GalanakisSlide 20 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Introduction Previous Work 1/2 { { for $b in /bib/book where ($b/author= “ A. Turing ” and fn:exists($b/price)) fn:exists($b/price)) return $b/title } } { /bib/book, /bib/book, /bib/book/author/ /bib/book/author/dos::node(), /bib/book/price, /bib/book/price, /bib/book/title/ /bib/book/title/dos::node()} XQuery Projection Paths bibbook authorpricetitle book authorpricetitle ………… article ……… isbnisbn ………… XML Document

21 HY-561 Paper PresentationKonstantinos GalanakisSlide 21 Previous Work 2/2 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Introduction { { for $x1 in //book return for $x1 in //book return for $x2 in //* return for $x2 in //* return for $x3 in //article return for $x3 in //article return } } XQuery Two approaches: (1) Single DOM-tree (1) Single DOM-tree (2) Buffers for variables

22 Buffer management technique for Xquery Engines Buffer management technique for Xquery Engines Both static and dynamic analysis is exploited Both static and dynamic analysis is exploited Basic idea Basic idea Which data objects won’t be accessed in the future Which data objects won’t be accessed in the future A.G.C. Strategy A.G.C. Strategy Reference counting Reference counting New approach New approach Roles assigned to nodes Roles assigned to nodes  Multiple roles per node  Multiple nodes per role signOff-statement signOff-statement HY-561 Paper PresentationKonstantinos GalanakisSlide 22 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Active Garbage Collection Active Garbage Collection

23 HY-561 Paper PresentationKonstantinos GalanakisSlide 23 Role removal (A.G.C.) Variablebindings Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Main Idea

24 XQ is an XQuery Fragment XQ is an XQuery Fragment Nested for-expressions Nested for-expressions Conditions Conditions Joins Joins Covers syntactically simple fragments of Xquery Covers syntactically simple fragments of Xquery Assume that syntactically richer fragment could be evaluated Assume that syntactically richer fragment could be evaluated Remove let-expressions → Query normalization Remove let-expressions → Query normalization Rewrite where-conditions to if-then-else expressions Rewrite where-conditions to if-then-else expressions Replace for-loop with nested single step for-loops Replace for-loop with nested single step for-loops HY-561 Paper PresentationKonstantinos GalanakisSlide 24 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Query Language Query Language

25 HY-561 Paper PresentationKonstantinos GalanakisSlide 25 { { for $b in /bib for $b in /bib where (fn:exists($b/book)) where (fn:exists($b/book)) return return { $b/book } { $b/book } } } { { for $b in /bib return for $b in /bib return ( if (fn:exists($b/book)) if (fn:exists($b/book)) then else (), then else (), if (fn:exists($b/book)) if (fn:exists($b/book)) then $b/book else (), then $b/book else (), if (fn:exists($b/book)) if (fn:exists($b/book)) then else () then else () ) } } where-expressions → if-statement Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Query Language

26 HY-561 Paper PresentationKonstantinos GalanakisSlide 26 { { for $b in /bib for $b in /bib where (fn:exists($b/book)) where (fn:exists($b/book)) return return { $b/book } { $b/book } } } { { for $b in /bib return for $b in /bib return ( if (fn:exists($b/book)) if (fn:exists($b/book)) then else (), then else (), if (fn:exists($b/book)) if (fn:exists($b/book)) then $b/book else (), then $b/book else (), if (fn:exists($b/book)) if (fn:exists($b/book)) then else () then else () ) } } i.where-expressions → if-statement ii.pushing if-statements Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Query Language

27 HY-561 Paper PresentationKonstantinos GalanakisSlide 27 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Query Language { { for $bib in /bib return for $bib in /bib return (for $x in $bib/* return (for $x in $bib/* return if (not(fn:exists($x/price))) if (not(fn:exists($x/price))) then $x else (), then $x else (), for $b in $bib/book for $b in $bib/book return $b/title ) return $b/title ) } } /bib /*/book / /title/dos::node()/price[1]dos::node() Role extraction

28 HY-561 Paper PresentationKonstantinos GalanakisSlide 28 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Query Language Role assignment bib book author title { r 2 } { r 3, r 5, r 6 } { r 5 } { r 5, r 7 } r 1 / r 2 /bib r 3 /bib/* r 4 /bib/*/price[1] r 5 /bib/*/dos::node() r 6 /bib/book r 7 /bib/book/title/dos::node() r 1 / r 2 /bib r 3 /bib/* r 4 /bib/*/price[1] r 5 /bib/*/dos::node() r 6 /bib/book r 7 /bib/book/title/dos::node() Roles XML document Roles assigned to document node when projected into buffer Roles assigned to document node when projected into buffer On-the-fly role assignment On-the-fly role assignment Nodes without roles and role-carrying ancestors need not to be buffered Nodes without roles and role-carrying ancestors need not to be buffered

29 HY-561 Paper PresentationKonstantinos GalanakisSlide 29 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Query Language Role update inserting { { for $bib in /bib return for $bib in /bib return (for $x in $bib/* return (for $x in $bib/* return if (not(fn:exists($x/price))) if (not(fn:exists($x/price))) then $x else (), then $x else (), for $b in $bib/book for $b in $bib/book return $b/title) return $b/title) } } { { for $bib in /bib return for $bib in /bib return ( for $x in $bib/* return for $x in $bib/* return ( if (not(exists($x/price))) if (not(exists($x/price))) then $x then $x else (), else (), signOff($x,r3), signOff($x,r3), signOff($x/price[1],r4), signOff($x/price[1],r4), signOff($x/dos::node(),r5) signOff($x/dos::node(),r5) ), ), for $b in $bib/book return for $b in $bib/book return ( $b/title, $b/title, signOff($b,r6), signOff($b,r6), signOff($b/title/dos::node(),r7))) signOff($b/title/dos::node(),r7))) ), ), signOff($bib,r2) signOff($bib,r2) ) } } r 1 / r 2 /bib $bib r 3 /bib/* $x r 4 /bib/*/price[1] $x/price r 5 /bib/*/dos::node() $x r 6 /bib/book $b r 7 /bib/book/title/dos::node() $b/title r 1 / r 2 /bib $bib r 3 /bib/* $x r 4 /bib/*/price[1] $x/price r 5 /bib/*/dos::node() $x r 6 /bib/book $b r 7 /bib/book/title/dos::node() $b/title

30 HY-561 Paper PresentationKonstantinos GalanakisSlide 30 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Query Language Active Garbage Collection { { for $bib in /bib return for $bib in /bib return ( for $x in $bib/* return for $x in $bib/* return ( if (not(exists($x/price))) if (not(exists($x/price))) then $x then $x else (), else (), signOff($x,r3), signOff($x,r3), signOff($x/price[1],r4), signOff($x/price[1],r4), signOff($x/dos::node(),r5) signOff($x/dos::node(),r5) ), ), for $b in $bib/book return for $b in $bib/book return ( $b/title, $b/title, signOff($b,r6), signOff($b,r6), signOff($b/title/dos::node(),r7))) signOff($b/title/dos::node(),r7))) ), ), signOff($bib,r2) signOff($bib,r2) ) } } Buffer: Output stream: Input stream: <bib><book><title/><author/></book>… <r><book><title/><author/></book> bib book title {r 2 } {r 3, r 5, r 6 } {r 5, r 7 } author {r 5 } {r 5, r 6 } {r 7 } {} {r 6 }

31 Aggregated roles Aggregated roles Remove redundant roles Remove redundant roles HY-561 Paper PresentationKonstantinos GalanakisSlide 31 { { for $bib in /bib for $bib in /bib return $bib/book return $bib/book } } { { for $bib in /bib for $bib in /bib (return $bib/book, (return $bib/book, signOff($bib,r 1 ), signOff($bib,r 1 ), signOff($bib/book/dos::node(),r 2 )) signOff($bib/book/dos::node(),r 2 )) } } Path steps → for-expressions Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Optimizations { { for $bib in /bib return for $bib in /bib return (for $_1 in $bib/book (for $_1 in $bib/book (return $_1/book, (return $_1/book, signOff($_1/book/dos::node(),r 2 )), signOff($_1/book/dos::node(),r 2 )), signOff($bib,r 1 )) signOff($bib,r 1 )) } } { { for $bib in /bib return for $bib in /bib return for $_1 in $bib/book for $_1 in $bib/book return $_1/book return $_1/book } }

32 HY-561 Paper PresentationKonstantinos GalanakisSlide 32 Time and memory consumption Time and memory consumption Queries and documents from the XMark Benchmark Queries and documents from the XMark Benchmark Queries and documents modified to match the supported fragment Queries and documents modified to match the supported fragment 3GHz CPU Intel Pentium IV with 2GB RAM 3GHz CPU Intel Pentium IV with 2GB RAM SuSe Linux 10.0, J2RE v1.4.2 for Java-based systems SuSe Linux 10.0, J2RE v1.4.2 for Java-based systems Time limit: 1 hour Time limit: 1 hour Benchmarks against the following systems Benchmarks against the following systems FluX FluX Java in-memory engine for streaming XQuery evaluation. MonetDB v4.12.0/XQuery v0.12.0 MonetDB v4.12.0/XQuery v0.12.0 A secondary storage engine written in C++. Loading of the document is included in time measurements. QizX/open v1.1 QizX/open v1.1 Free in-memory XQuery engine written in Java. Saxon v8.7.1 Saxon v8.7.1 Free in-memory XQuery engine written in Java. Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Benchmarking Benchmark Results 1/5

33 HY-561 Paper PresentationKonstantinos GalanakisSlide 33 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Benchmarking Benchmark Results 2/5 <query1>{ for $s in /site return for $p in $s/people return for $pe in $pe/person return if ($pe/person_id="person0") then { $pe/name } else () for $s in /site return for $p in $s/people return for $pe in $pe/person return if ($pe/person_id="person0") then { $pe/name } else ()}</query1> XMark Q1: Running time (s)

34 HY-561 Paper PresentationKonstantinos GalanakisSlide 34 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Benchmarking Benchmark Results 3/5 Memory Consumption (MB) <query1>{ for $s in /site return for $p in $s/people return for $pe in $pe/person return if ($pe/person_id="person0") then { $pe/name } else () for $s in /site return for $p in $s/people return for $pe in $pe/person return if ($pe/person_id="person0") then { $pe/name } else ()}</query1> XMark Q1:

35 HY-561 Paper PresentationKonstantinos GalanakisSlide 35 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Benchmarking Benchmark Results 4/5 { { for $root in (/) return for $site in $root/site return for $people in $site/people return for $person in $people/person return { for $root in (/) return for $site in $root/site return for $people in $site/people return for $person in $people/person return { ( { $person/name }, { for $site2 in $root/site return for $cas in $site2/closed_auctions return for $ca in $cas/closed_auction return for $buyer in $ca/buyer return if ($buyer/buyer_person=$person/person_id) then { $ca } else () } ) ( { $person/name }, { for $site2 in $root/site return for $cas in $site2/closed_auctions return for $ca in $cas/closed_auction return for $buyer in $ca/buyer return if ($buyer/buyer_person=$person/person_id) then { $ca } else () } ) } } XMark Q8:

36 HY-561 Paper PresentationKonstantinos GalanakisSlide 36 Combined Static and Dynamic Analysis for Effective Buffer Minimization in Streaming Xquery Evaluation Benchmarking Benchmark Results 5/5 XMark Q8 Failure for 100MB: MonetDB – Failure for 200MB: GCX, FluxQuery, MonetDB


Download ppt "Πανεπιστήμιο Κρήτης Σχολή Θετικών Επιστημών Τμήμα Επιστήμης Υπολογιστών ΗΥ-561: Διαχείριση Δεδομένων στον Παγκόσμιο Ιστό Xquery Streaming à la Carte &"

Similar presentations


Ads by Google