We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byKaliyah Faye
Modified over 3 years ago
Using Partial Evaluation in Distributed Query Evaluation Peter Buneman, Gao Cong, Wenfei Fan, Anastasios (Tasos) Kementsietsidis
© Anastasios KementsietsidisVLDB 2006 2 name NASDAQ Cutting Down Trees… portofolio broker name market name stock code YHOO stock NASDAQ Merill Lynch broker name market Bache market name NYSE Tell me when GOOG stock sells for 376: [//stock[code = GOOG sell = 376] buy $33 sell $35 code GOOG buy $374 sell $373 stock code IBM buy $80 sell $78 stock code AAPL stock buy $71 sell $65 code GOOG buy $370 sell $372 … … Lets stream! Not P0P0 P1P1 P2P2 P2P2 Lets do a Depth-first traversal. We visit: P 0 P 1 P 2 P 1 P 0 P 2 P 0
© Anastasios KementsietsidisVLDB 2006 3 Status report… We have XML Trees arbitrarily fragmented and distributed We want to execute Boolean Xpath queries Q = [q] over the fragmented trees. q := p | p/text()=str | label() = A | ¬q | q q | q q p := | A | * | p//p | p/p | p[q] Lessons learned: We want to visit each peer only once, irrespectively of the number of (tree) fragments it stores. We want to minimize communication costs. Ideally, no fragment data should be send while evaluating a query. Our motto: Send processing to data NOT data to processing
© Anastasios KementsietsidisVLDB 2006 4 Partial Evaluation Consider a function f (s, d ) and part of its input, say s. Then, partial evaluation is to specialize f (s, d ), i.e., to perform the part of f s computation that depends only on s. This generates a residual function g(d) that depends only on d.
© Anastasios KementsietsidisVLDB 2006 5 Tree Fragments F1F1 F3F3 F2F2 Fragment F 0 Fragment F 1 Fragment F 2 Fragment F 3 F0F0 F1F1 F2F2 F3F3 Fragment Tree portofolio broker name Bache market name NYSE stock code IBM buy $80 sell $78 … broker name Merill Lynch … market name stock code YHOO stock buy $33 sell $35 code GOOG buy $374 sell $373 NASDAQ name market stock code AAPL stock buy $71 sell $65 code GOOG buy $370 sell $372 NASDAQ
© Anastasios KementsietsidisVLDB 2006 6 F1F1 F3F3 portofolio broker name Bache market name NYSE stock code IBM buy $80 sell $78 … Partial Evaluation in Distributed Query Evaluation Main idea: Given a query Q, send Q to every peer holding a fragment [//stock[code = GOOG sell = 376] P0P0 P1P1 P2P2 Compute Partial Answers (Boolean formulas): Q is evaluated bottom-up We use Boolean variables for the evaluation of fragment nodes Compute Partial Answers (Boolean formulas): Q is evaluated bottom-up We use Boolean variables for the evaluation of fragment nodes P 2 has two fragments but is only visited once Answer of Q:Computed by solving a linear system of Boolean equations Answer of Q:Computed by solving a linear system of Boolean equations
© Anastasios KementsietsidisVLDB 2006 7 Query Evaluation Q = [//stock[code = GOOG sell = 376] q 0 : code = GOOG q 1 : sell = 376 q 2 : */q 0 */q 1 q 3 : stock[q 2 ] q 4 : //q 3 Q = Query Representation: stock code GOOG buy $370 sell $376 market … Query Evaluation Example 1: stock code GOOG buy $370 F market … Query Evaluation Example 2:
© Anastasios KementsietsidisVLDB 2006 8 Three stages Stage 1: Querying peer P Q sends query Q to all peers having a fragment (use the fragment tree to identify all such peers) Stage 2: Evaluate Q, in parallel, over each fragment F i in peer P j Stage 3: Collect partial answers in P Q and compute the answer to Q. Key considerations/concerns: (Total/Parallel) Computation costs. Communication costs. Level of fragmentation. The ParBoX Algorithm F 0 (P 0 ) F 1 (P 1 ) F 2 (P 2 ) F 3 (P 2 ) ParBoX comes in flavors: HybridParBoX FullDistParBoX LazyParBoX
© Anastasios KementsietsidisVLDB 2006 9 Analysis of Algorithms AlgorithmVisits/PeerComputationCommunication NaiveCentralized1 O (|Q| |T|) O (|T|) NaiveDistributedcard(S i ) O (|Q| |T|) O (|Q|card(T)) ParBoX1 Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| (max Pj |F Pj | + card(T))) HybridParBoX1 Tot O (|Q| |T|) O (|T|) Par O (|Q| (max Pj |F Pj | + card(T))) FullDistParBoXcard(S i ) Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| (max Pj |F Pj | + card(T))) LazyParBoXcard(S i ) Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| card(T) max T |F i | ) card(S i ) = # of fragments in peer P i card(T) = # of fragments of tree T. Note that card(T) |T| |F Sj | = sum of fragments (sizes) in peer P j Communication costs are LOW and independent of T (the data) Communication costs are LOW and independent of T (the data) Computation costs are comparable to the best-known centralized algorithm Computation costs are comparable to the best-known centralized algorithm
© Anastasios KementsietsidisVLDB 2006 10 The Experimental Study The setting: Ten (10) Linux machines (peers) distributed over a local LAN XMark sites are fragmented and distributed over the network. Their sizes vary between 5MB-150MB. The parameters: # of machines participating in each experiment Size of query Q Size of tree T The shape of the fragment tree –Number of fragments in the tree –Nesting level (deep vs. shallow fragment trees) –Number of fragments per machine
© Anastasios KementsietsidisVLDB 2006 11 NaiveCentralized vs. ParBoX |T| = 50MB |Q| = 8 # fragment/peer = 1 |T| = 50MB |Q| = 8 # fragment/peer = 1 With |T| fixed, as we increase the number of machines, the difference (between iterations) in the size of the fragment that is allocated in each machine decreases. Parallelism works! Shipping data costs! Parallelism works! Shipping data costs!
© Anastasios KementsietsidisVLDB 2006 12 Varying Query and Data Size # peers = 8 # fragment/peer = 1 # peers = 8 # fragment/peer = 1 F0F0 F1F1 F4F4 F2F2 F3F3 F6F6 F7F7 F5F5
© Anastasios KementsietsidisVLDB 2006 13 Summary We (practically) proved that partial evaluation is effective in XML query processing of fragmented XML document trees. We presented the family of ParBoX algorithms to evaluate Boolean Xpath queries. Our algorithms guarantee that: –Optimal computation costs. –Each peer is visited only once. –Communication is depends only on the query size (and not the tree) The question in everybodys mind… Can we extend this idea to non-boolean Xpath queries??? The answer is YES… but you have to wait a bit to read about it!!
Degree Distribution of XORed Fountain codes
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
1 Using Partial Order Bounding in Shogi Game Programming Workshop 2003 Reijer Grimbergen, Kenji Hadano and Masanao Suetsugu Department of Information Science.
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.
Mark Dixon, School of Computing SOFT 120Page 1 5. Passing Parameters by Reference.
Dynamic Programming ACM Workshop 24 August Dynamic Programming Dynamic Programming is a programming technique that dramatically reduces the runtime.
Mike Paterson Uri Zwick Overhang. Mike Paterson Uri Zwick Overhang.
Efficient Implementation of Property Directed Reachability Niklas Een, Alan Mishchenko, Robert Brayton.
Utility Optimization for Event-Driven Distributed Infrastructures Cristian Lumezanu University of Maryland, College Park Sumeer BholaMark Astley IBM T.J.
OLAP Over Uncertain and Imprecise Data T.S. Jayram (IBM Almaden) with Doug Burdick (Wisconsin), Prasad Deshpande (IBM), Raghu Ramakrishnan (Wisconsin),
17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.
Splines I – Curves and Properties
Distributed Query Processing Donald Kossmann University of Heidelberg
Warm Up Problem of the Day Lesson Presentation Lesson Quizzes.
HEURISTIC SEARCH Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.
Solve by Substitution: Isolate one variable in an equation
© 2017 SlidePlayer.com Inc. All rights reserved.