We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byKaliyah Faye
Modified over 2 years ago
Using Partial Evaluation in Distributed Query Evaluation Peter Buneman, Gao Cong, Wenfei Fan, Anastasios (Tasos) Kementsietsidis
© Anastasios KementsietsidisVLDB 2006 2 name NASDAQ Cutting Down Trees… portofolio broker name market name stock code YHOO stock NASDAQ Merill Lynch broker name market Bache market name NYSE Tell me when GOOG stock sells for 376: [//stock[code = GOOG sell = 376] buy $33 sell $35 code GOOG buy $374 sell $373 stock code IBM buy $80 sell $78 stock code AAPL stock buy $71 sell $65 code GOOG buy $370 sell $372 … … Lets stream! Not P0P0 P1P1 P2P2 P2P2 Lets do a Depth-first traversal. We visit: P 0 P 1 P 2 P 1 P 0 P 2 P 0
© Anastasios KementsietsidisVLDB 2006 3 Status report… We have XML Trees arbitrarily fragmented and distributed We want to execute Boolean Xpath queries Q = [q] over the fragmented trees. q := p | p/text()=str | label() = A | ¬q | q q | q q p := | A | * | p//p | p/p | p[q] Lessons learned: We want to visit each peer only once, irrespectively of the number of (tree) fragments it stores. We want to minimize communication costs. Ideally, no fragment data should be send while evaluating a query. Our motto: Send processing to data NOT data to processing
© Anastasios KementsietsidisVLDB 2006 4 Partial Evaluation Consider a function f (s, d ) and part of its input, say s. Then, partial evaluation is to specialize f (s, d ), i.e., to perform the part of f s computation that depends only on s. This generates a residual function g(d) that depends only on d.
© Anastasios KementsietsidisVLDB 2006 5 Tree Fragments F1F1 F3F3 F2F2 Fragment F 0 Fragment F 1 Fragment F 2 Fragment F 3 F0F0 F1F1 F2F2 F3F3 Fragment Tree portofolio broker name Bache market name NYSE stock code IBM buy $80 sell $78 … broker name Merill Lynch … market name stock code YHOO stock buy $33 sell $35 code GOOG buy $374 sell $373 NASDAQ name market stock code AAPL stock buy $71 sell $65 code GOOG buy $370 sell $372 NASDAQ
© Anastasios KementsietsidisVLDB 2006 6 F1F1 F3F3 portofolio broker name Bache market name NYSE stock code IBM buy $80 sell $78 … Partial Evaluation in Distributed Query Evaluation Main idea: Given a query Q, send Q to every peer holding a fragment [//stock[code = GOOG sell = 376] P0P0 P1P1 P2P2 Compute Partial Answers (Boolean formulas): Q is evaluated bottom-up We use Boolean variables for the evaluation of fragment nodes Compute Partial Answers (Boolean formulas): Q is evaluated bottom-up We use Boolean variables for the evaluation of fragment nodes P 2 has two fragments but is only visited once Answer of Q:Computed by solving a linear system of Boolean equations Answer of Q:Computed by solving a linear system of Boolean equations
© Anastasios KementsietsidisVLDB 2006 7 Query Evaluation Q = [//stock[code = GOOG sell = 376] q 0 : code = GOOG q 1 : sell = 376 q 2 : */q 0 */q 1 q 3 : stock[q 2 ] q 4 : //q 3 Q = Query Representation: stock code GOOG buy $370 sell $376 market … Query Evaluation Example 1: stock code GOOG buy $370 F market … Query Evaluation Example 2:
© Anastasios KementsietsidisVLDB 2006 8 Three stages Stage 1: Querying peer P Q sends query Q to all peers having a fragment (use the fragment tree to identify all such peers) Stage 2: Evaluate Q, in parallel, over each fragment F i in peer P j Stage 3: Collect partial answers in P Q and compute the answer to Q. Key considerations/concerns: (Total/Parallel) Computation costs. Communication costs. Level of fragmentation. The ParBoX Algorithm F 0 (P 0 ) F 1 (P 1 ) F 2 (P 2 ) F 3 (P 2 ) ParBoX comes in flavors: HybridParBoX FullDistParBoX LazyParBoX
© Anastasios KementsietsidisVLDB 2006 9 Analysis of Algorithms AlgorithmVisits/PeerComputationCommunication NaiveCentralized1 O (|Q| |T|) O (|T|) NaiveDistributedcard(S i ) O (|Q| |T|) O (|Q|card(T)) ParBoX1 Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| (max Pj |F Pj | + card(T))) HybridParBoX1 Tot O (|Q| |T|) O (|T|) Par O (|Q| (max Pj |F Pj | + card(T))) FullDistParBoXcard(S i ) Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| (max Pj |F Pj | + card(T))) LazyParBoXcard(S i ) Tot O (|Q| (|T| + card(T))) O (|Q|card(T)) Par O (|Q| card(T) max T |F i | ) card(S i ) = # of fragments in peer P i card(T) = # of fragments of tree T. Note that card(T) |T| |F Sj | = sum of fragments (sizes) in peer P j Communication costs are LOW and independent of T (the data) Communication costs are LOW and independent of T (the data) Computation costs are comparable to the best-known centralized algorithm Computation costs are comparable to the best-known centralized algorithm
© Anastasios KementsietsidisVLDB 2006 10 The Experimental Study The setting: Ten (10) Linux machines (peers) distributed over a local LAN XMark sites are fragmented and distributed over the network. Their sizes vary between 5MB-150MB. The parameters: # of machines participating in each experiment Size of query Q Size of tree T The shape of the fragment tree –Number of fragments in the tree –Nesting level (deep vs. shallow fragment trees) –Number of fragments per machine
© Anastasios KementsietsidisVLDB 2006 11 NaiveCentralized vs. ParBoX |T| = 50MB |Q| = 8 # fragment/peer = 1 |T| = 50MB |Q| = 8 # fragment/peer = 1 With |T| fixed, as we increase the number of machines, the difference (between iterations) in the size of the fragment that is allocated in each machine decreases. Parallelism works! Shipping data costs! Parallelism works! Shipping data costs!
© Anastasios KementsietsidisVLDB 2006 12 Varying Query and Data Size # peers = 8 # fragment/peer = 1 # peers = 8 # fragment/peer = 1 F0F0 F1F1 F4F4 F2F2 F3F3 F6F6 F7F7 F5F5
© Anastasios KementsietsidisVLDB 2006 13 Summary We (practically) proved that partial evaluation is effective in XML query processing of fragmented XML document trees. We presented the family of ParBoX algorithms to evaluate Boolean Xpath queries. Our algorithms guarantee that: –Optimal computation costs. –Each peer is visited only once. –Communication is depends only on the query size (and not the tree) The question in everybodys mind… Can we extend this idea to non-boolean Xpath queries??? The answer is YES… but you have to wait a bit to read about it!!
Degree Distribution of XORed Fountain codes 1 Lucie Nodin, Anya Apavatjrut, Claire Goursaud, Jean-Marie Gorce.
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
1 Using Partial Order Bounding in Shogi Game Programming Workshop 2003 Reijer Grimbergen, Kenji Hadano and Masanao Suetsugu Department of Information Science.
Global States. Topics r Usefulness of global state r Difficulties in determining snapshot r Determining global state r Examples.
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.
Mark Dixon, School of Computing SOFT 120Page 1 5. Passing Parameters by Reference.
Dynamic Programming ACM Workshop 24 August Dynamic Programming Dynamic Programming is a programming technique that dramatically reduces the runtime.
Mike Paterson Uri Zwick Overhang. Mike Paterson Uri Zwick Overhang.
Efficient Implementation of Property Directed Reachability Niklas Een, Alan Mishchenko, Robert Brayton.
Utility Optimization for Event-Driven Distributed Infrastructures Cristian Lumezanu University of Maryland, College Park Sumeer BholaMark Astley IBM T.J.
OLAP Over Uncertain and Imprecise Data T.S. Jayram (IBM Almaden) with Doug Burdick (Wisconsin), Prasad Deshpande (IBM), Raghu Ramakrishnan (Wisconsin),
17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.
Splines I – Curves and Properties based on: Michael Gleicher Curves, chapter 15 in Fundamentals of Computer Graphics, 3 rd ed. (Shirley & Marschner) Slides.
Distributed Query Processing Donald Kossmann University of Heidelberg
Aim: How do we solve equations with variables on both sides? Warm Up Warm Up Lesson Presentation Lesson Presentation Problem of the Day Problem of the.
HEURISTIC SEARCH Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.
Solve by Substitution: 1)Isolate one variable in an equation 2)Substitute into the other equation with ( ) 3)Solve the second equation 4)Plug answer into.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Impossibility of Consensus in Asynchronous Systems (FLP) Ali Ghodsi – UC Berkeley / KTH alig(at)cs.berkeley.edu.
Principles of Electric Circuits - Floyd© Copyright 2006 Prentice-Hall Chapter 6.
Pool of the fragments is predefined inside the logP calculator program. A unique name and a calculated value is assigned to each fragments. logP of a molecule.
12-Apr-15 Analysis of Algorithms. 2 Time and space To analyze an algorithm means: developing a formula for predicting how fast an algorithm is, based.
Great Theoretical Ideas in Computer Science for Some.
Introduction to Evolutionary Algorithms Session 4 Jim Smith University of the West of England, UK May/June 2012.
1 Concurrency: Deadlock and Starvation Chapter 6.
Parallel Algorithms Examples Examples Concepts & Definitions Concepts & Definitions Analysis of Algorithms Analysis of Algorithms.
CS-424 Gregory Dudek Today’s outline Administrative issues –Assignment deadlines: 1 day = 24 hrs (holidays are special) –The project –Assignment 3 –Midterm.
A View Based Security Framework for XML Wenfei Fan, Irini Fundulaki, Floris Geerts, Xibei Jia, Anastasios Kementsietsidis University of Edinburgh Digital.
Holt Algebra Using Graphs and Tables to Solve Linear Systems 3-1 Using Graphs and Tables to Solve Linear Systems Holt Algebra 2 System of linear.
QUERY OPTIMIZATION 2 The execution cost is expressed as weighted combination of I/O, CPU and communication cost. In earlier distributed query optimizers.
1 Introduction to Algorithms L ECTURE 15 Greedy Algorithms II Activity-Selection Problem Knapsack Problem Huffman Codes Task Scheduling.
Problem # Problem #
An Application of Linear Programming Lesson 12 The Transportation Model.
Lesson Simple Interest. 2 Lesson California Standards: Number Sense 1.3 Convert fractions to decimals and percents and use these representations.
Quantitative Methods Session 1 Chapter 1 - AVERAGE Pranjoy Arup Das.
Holt McDougal Algebra 1 Solving Systems by Elimination Holt Algebra 1 Warm Up Warm Up Lesson Presentation Lesson Presentation Lesson Quiz Lesson Quiz Holt.
Channel Assignment in Cellular Networks Ivan Stojmenovic
MOHD. YAMANI IDRIS/ NOORZAILY MOHAMED NOOR 1 Introduction to Logic Gates Logical gates –Inverter –AND –OR –NAND –NOR –Exclusive OR (XOR) –Exclusive NOR.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
1 Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation.
1 LP, extended maxflow, TRW OR: How to understand Vladimirs most recent work Ramin Zabih Cornell University.
THE CENTRAL LIMIT THEOREM The World is Normal Theorem.
Minimum Weight Plastic Design For Steel-Frame Structures EN 131 Project By James Mahoney.
A clustering algorithm to find groups with homogeneous preferences J. Díez, J.J. del Coz, O. Luaces, A. Bahamonde Centro de Inteligencia Artificial. Universidad.
Greedy Algorithms Analysis of Algorithms. Greedy Algorithm Paradigm Characteristics of greedy algorithms: make a sequence of choices each choice is the.
© 2017 SlidePlayer.com Inc. All rights reserved.