We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byKaliyah Faye
Modified over 2 years ago
Using Partial Evaluation in Distributed Query Evaluation Peter Buneman, Gao Cong, Wenfei Fan, Anastasios (Tasos) Kementsietsidis
© Anastasios KementsietsidisVLDB name NASDAQ Cutting Down Trees… portofolio broker name market name stock code YHOO stock NASDAQ Merill Lynch broker name market Bache market name NYSE Tell me when GOOG stock sells for 376: [//stock[code = GOOG sell = 376] buy $33 sell $35 code GOOG buy $374 sell $373 stock code IBM buy $80 sell $78 stock code AAPL stock buy $71 sell $65 code GOOG buy $370 sell $372 … … Lets stream! Not P0P0 P1P1 P2P2 P2P2 Lets do a Depth-first traversal. We visit: P 0 P 1 P 2 P 1 P 0 P 2 P 0
© Anastasios KementsietsidisVLDB Status report… We have XML Trees arbitrarily fragmented and distributed We want to execute Boolean Xpath queries Q = [q] over the fragmented trees. q := p | p/text()=str | label() = A | ¬q | q q | q q p := | A | * | p//p | p/p | p[q] Lessons learned: We want to visit each peer only once, irrespectively of the number of (tree) fragments it stores. We want to minimize communication costs. Ideally, no fragment data should be send while evaluating a query. Our motto: Send processing to data NOT data to processing
© Anastasios KementsietsidisVLDB Partial Evaluation Consider a function f (s, d ) and part of its input, say s. Then, partial evaluation is to specialize f (s, d ), i.e., to perform the part of f s computation that depends only on s. This generates a residual function g(d) that depends only on d.
© Anastasios KementsietsidisVLDB Tree Fragments F1F1 F3F3 F2F2 Fragment F 0 Fragment F 1 Fragment F 2 Fragment F 3 F0F0 F1F1 F2F2 F3F3 Fragment Tree portofolio broker name Bache market name NYSE stock code IBM buy $80 sell $78 … broker name Merill Lynch … market name stock code YHOO stock buy $33 sell $35 code GOOG buy $374 sell $373 NASDAQ name market stock code AAPL stock buy $71 sell $65 code GOOG buy $370 sell $372 NASDAQ
© Anastasios KementsietsidisVLDB F1F1 F3F3 portofolio broker name Bache market name NYSE stock code IBM buy $80 sell $78 … Partial Evaluation in Distributed Query Evaluation Main idea: Given a query Q, send Q to every peer holding a fragment [//stock[code = GOOG sell = 376] P0P0 P1P1 P2P2 Compute Partial Answers (Boolean formulas): Q is evaluated bottom-up We use Boolean variables for the evaluation of fragment nodes Compute Partial Answers (Boolean formulas): Q is evaluated bottom-up We use Boolean variables for the evaluation of fragment nodes P 2 has two fragments but is only visited once Answer of Q:Computed by solving a linear system of Boolean equations Answer of Q:Computed by solving a linear system of Boolean equations
© Anastasios KementsietsidisVLDB Query Evaluation Q = [//stock[code = GOOG sell = 376] q 0 : code = GOOG q 1 : sell = 376 q 2 : */q 0 */q 1 q 3 : stock[q 2 ] q 4 : //q 3 Q = Query Representation: stock code GOOG buy $370 sell $376 market … Query Evaluation Example 1: stock code GOOG buy $370 F market … Query Evaluation Example 2:
© Anastasios KementsietsidisVLDB Three stages Stage 1: Querying peer P Q sends query Q to all peers having a fragment (use the fragment tree to identify all such peers) Stage 2: Evaluate Q, in parallel, over each fragment F i in peer P j Stage 3: Collect partial answers in P Q and compute the answer to Q. Key considerations/concerns: (Total/Parallel) Computation costs. Communication costs. Level of fragmentation. The ParBoX Algorithm F 0 (P 0 ) F 1 (P 1 ) F 2 (P 2 ) F 3 (P 2 ) ParBoX comes in flavors: HybridParBoX FullDistParBoX LazyParBoX
© Anastasios KementsietsidisVLDB Analysis of Algorithms AlgorithmVisits/PeerComputationCommunication NaiveCentralized1 O (|Q| |T|) O (|T|) NaiveDistributedcard(S i ) O (|Q| |T|) O (|Q|card(T)) ParBoX1 Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| (max Pj |F Pj | + card(T))) HybridParBoX1 Tot O (|Q| |T|) O (|T|) Par O (|Q| (max Pj |F Pj | + card(T))) FullDistParBoXcard(S i ) Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| (max Pj |F Pj | + card(T))) LazyParBoXcard(S i ) Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| card(T) max T |F i | ) card(S i ) = # of fragments in peer P i card(T) = # of fragments of tree T. Note that card(T) |T| |F Sj | = sum of fragments (sizes) in peer P j Communication costs are LOW and independent of T (the data) Communication costs are LOW and independent of T (the data) Computation costs are comparable to the best-known centralized algorithm Computation costs are comparable to the best-known centralized algorithm
© Anastasios KementsietsidisVLDB The Experimental Study The setting: Ten (10) Linux machines (peers) distributed over a local LAN XMark sites are fragmented and distributed over the network. Their sizes vary between 5MB-150MB. The parameters: # of machines participating in each experiment Size of query Q Size of tree T The shape of the fragment tree –Number of fragments in the tree –Nesting level (deep vs. shallow fragment trees) –Number of fragments per machine
© Anastasios KementsietsidisVLDB NaiveCentralized vs. ParBoX |T| = 50MB |Q| = 8 # fragment/peer = 1 |T| = 50MB |Q| = 8 # fragment/peer = 1 With |T| fixed, as we increase the number of machines, the difference (between iterations) in the size of the fragment that is allocated in each machine decreases. Parallelism works! Shipping data costs! Parallelism works! Shipping data costs!
© Anastasios KementsietsidisVLDB Varying Query and Data Size # peers = 8 # fragment/peer = 1 # peers = 8 # fragment/peer = 1 F0F0 F1F1 F4F4 F2F2 F3F3 F6F6 F7F7 F5F5
© Anastasios KementsietsidisVLDB Summary We (practically) proved that partial evaluation is effective in XML query processing of fragmented XML document trees. We presented the family of ParBoX algorithms to evaluate Boolean Xpath queries. Our algorithms guarantee that: –Optimal computation costs. –Each peer is visited only once. –Communication is depends only on the query size (and not the tree) The question in everybodys mind… Can we extend this idea to non-boolean Xpath queries??? The answer is YES… but you have to wait a bit to read about it!!
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
1. Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism.
Holt Algebra Using Graphs and Tables to Solve Linear Systems 3-1 Using Graphs and Tables to Solve Linear Systems Holt Algebra 2 System of linear.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Jiaheng Lu, Ting Chen, Tok Wang Ling National University of.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Splines I – Curves and Properties based on: Michael Gleicher Curves, chapter 15 in Fundamentals of Computer Graphics, 3 rd ed. (Shirley & Marschner) Slides.
1 Using Partial Order Bounding in Shogi Game Programming Workshop 2003 Reijer Grimbergen, Kenji Hadano and Masanao Suetsugu Department of Information Science.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 6 Parallel Program Development An Introduction to Parallel Programming Peter Pacheco.
Chapter 3: Supervised Learning. CS583, Bing Liu, UIC 2 Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification.
Divide-and-Conquer CIS 606 Spring Analyzing Divide-and-Conquer Algorithms Use a recurrence to characterize the running time of a divide-and-conquer.
Load Balancing Parallel Applications on Heterogeneous Platforms.
Distributed Query Processing. Agenda Recap of query optimization Transformation rules for P&D systems Memoization Query evaluation strategies Eddies.
Revision Lecture. 2 Topics Peer-to-peer computing –Algorithms & issues –Gnutella –Scalability –Security –Freenet –JXTA –BitTorrent Agent-based computing.
UNIVERSITY OF JYVÄSKYLÄ Resource Discovery in P2P Networks Using Evolutionary Neural Networks Presentation for International Conference on Advances in.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Chapter 7 – Design and Implementation 1Chapter 7 Design and implementation Note: These are a modified version of Ch 7 slides available from the authors.
Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan (To appear in ICDE 2007)
1 Computational Complexity Size Matters!. 2 Suppose there are several algorithms which can all be used to perform the same task. We need some way to judge.
Estimating Distinct Elements, Optimally David Woodruff IBM Based on papers with Piotr Indyk, Daniel Kane, and Jelani Nelson.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Assistance for Systems Biology of Aging Thanks to.
Daniel Deutch Tel Aviv Univ. Tova Milo Tel Aviv Univ. Sudeepa Roy Univ. of Washington Val Tannen Univ. of Pennsylvania.
Self-Supervised Relation Learning from the Web Ronen Feldman Data Mining Laboratory Bar-Ilan University, ISRAEL Joint work with Benjamin Rosenfeld.
Utility Optimization for Event-Driven Distributed Infrastructures Cristian Lumezanu University of Maryland, College Park Sumeer BholaMark Astley IBM T.J.
RMS and Scheduling for Future Generation Grids Ramin Yahyapour University Dortmund Leader CoreGRID Institute on Resource Management and Scheduling CoreGRID.
Model Checking From Tools to Theory Rajeev Alur University of Pennsylvania 25MC, FLOC, August 2006.
Multipath Routing for Video Delivery over Bandwidth-Limited Networks S.-H. Gary Chan Jiancong Chen Department of Computer Science Hong Kong University.
From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching Jiaheng Lu, Tok Wang Ling, Chee-Yong Chan, Ting Chen National.
CS4/MSc Parallel Architectures CS4 Parallel Architectures - Introduction Instructor : Marcelo Cintra – 1.03 IF)
On necessary and sufficient cryptographic assumptions: the case of memory checking Lecture 2 : Authentication and Communication Complexity Lecturer: Moni.
Chapter 1 - Managerial Decision Making Takesh Luckho.
© 2016 SlidePlayer.com Inc. All rights reserved.