Eddies: Continuously Adaptive Query Processing Ross Rosemark.

Eddies: Continuously Adaptive Query Processing Ross Rosemark

Goals of Presentation  What are traditional query optimizers?  What are the problems with these optimizers?  What is an eddy?  How does an eddy work?  Discuss how an eddy self tunes query plans?

Traditional Query Optimizer  Metadata (statistics) about the distribution of data is collected by the query optimizer  Based on the metadata the query optimizer determines the most energy efficient query plan  Query plan is executed.  This is good for snapshot queries (short lived queries)

Problems  The problem with this approach is that data became streaming  Internet  Sensor nodes  Etc.  The query plan chosen by the query optimizer could eventually become inefficient if the metadata statistics change  Cost of operators  Operator selectivies could change  Rate tuples arrive from inputs  Also it’s difficult to just re-optimize a query  Determining when to re-optimize the query is a difficult research issue that has not been addressed.  Must determine not only when you should re-optimize but that re-optimizing still leaves produces the same state

Example  Assume a query that asks for Employees age and salary > 100000 is issued into the sensor network.  Initially the predicate salary > 100000 will be very selective… hence eliminating a lot of tuples  As older employees in the database start to be read the predicate salary > 100000 will be less selective

Lets show some things Eddie must consider.  Eddie should:  Increase synchronization  Increase the moments of Symmetry

Synchronization Barriers  Synchronization barriers  Where one operation hinders the speed of another operations  Extreme example  Assume you have a merge join on two duplicate-free inputs. (slowlo and fasthi)  Assume that during processing the next tuple is always consumed from the relation that had the lowest values  Assume that slowlo is a slowly delivered external relation with many low columns in it’s bandwith  Assume that fasthi is high bandwith (i.e. delivers tuples fast) and has only high values in it’s column  In this example fasthi is delayed why slowlo delivers tuples  Known as a synchronization barrier  Desirable to minimize the number of Synchronization barriers

Moments of Symmetry  You can only re-optimize at a moment of symmetry  A moment of symmetry is when the query is executed to a point that the optimizer can change the query plan without affecting the way the query plans predicates are performed  Typically happens in joins  Example  Assume you have a nested loop join with inner relation R and outer relation S.  In this example you can only re-optimize this join when R is completely scanned.  It is possible to re-optimize in the middle of scanning S but the join algorithm would then have to be changed

Eddy  An Eddy was designed to dynamically re- optimize queries.  The authors implemented Eddies in a River  A river is a shared nothing parallel query processing framework that dynamically adapts to fluctuations and workloads   An eddy is implemented via a module in a river containing an arbitrary number of input relations, a number of participating unary and binary modules, and a single output relation  An Eddy also maintains a fixed size buffer of tuples that need to be processed  Each operator takes two tuples, processes them and delivers them back to the eddy

Eddy (Cont)  A tuple in a eddy is associated with a tuple descriptor  Contains a vector of Ready bits and Done bits  The Eddy ships the tuple to only the operators that have the Ready bits turned on  After an operators is processed it’s Done bits are set  If all done bits are set the tuple is sent to the Eddy’s output  Else it’s sent to another operators

Eddy (Cont)  So how do you route tuples to the different Eddy operators  In this paper they look at multiple different ways

Naïve Scheme  Eddies buffer is implemented as a priority queue  Recall the buffer is used to store tuples that should be processed by the Eddies  When a tuple enters a buffer it’s priority is set to low  After it’s processed by an operator in the Eddy and returned to the buffer it’s priority is set to high  This ensures that tuples do not get stuck in the Eddy. I.e. starvation  This scheme dynamically adjusts to work required of operators  Operators that are slower (i.e. take 4 seconds vs. 1 second will receive less tuples)  Note each operator has a fixed size queue  Once queue is filled up no more tuples can be inserted into queue

Lottery Scheme  Each time a tuple is routed to a operator the operator is credited with a ticket  When the operator returns a tuple the eddy one ticket is debited from the eddies running count for that operator  Tracks how efficiently a operator drains tuples from the system  When a tuple is to be routed to an operator the Eddy holds a lottery  Only the operators that have their Ready bit sets can participate in the lottery   An operator’s chance of “winning the lottery” and receiving the tuple corresponds to the count of tickets for that operator.   Dynamically adjusts to selectivity of operators

Window Scheme  Problem with lottery scheme is that it uses to much past info  Problem: An operator that gained a lot of ticket initially but then became slow  In this scheme the lottery scheme is modified such that the lottery only looks at tickets gained by an operator in a fixed window.  Keeps track of two types of tickets  Banked tickets  Used when running the lottery  Escrow tickets  Used to measure efficiency during the window  At the end of a window  Banked Tickets = Escrow Tickets  Escrow Tickets = 0  Ensure operators re-approve themselves each window

Comparison  Shows that due to “Fluid Dynamics” (i.e. the varying rates of operators) the Naïve approach naturally adjusts based on the cost of operators  Shows that Lottery also adjusts based on workload

Comparison  Naïve eddies does not adjust based on selectivity  Naïve performs between the best and worse  Lottery does adjust based on selectivity

Joins  Shows that Lottery performs nearly optimally  Naïve performs between the best and worst

Joins (Cont)  All joins are ripple joins  Change selectivity of join predicate  In all cases the Eddy with the Lottery is close to optimal

Window Scheme  Both cases are related to different query plans  Shows eddy is in between the best and worst case

Adaptability  Toggle the cost of the operator 3 times throughout experiment  Notice that Eddy switches how many tuples are first processed by that operator

Problems  Does not work well when there is initially a long delay at an operator  Eddy gives all tuples to operator because operator is not returning tuples that satisfy the join predicates criteria  Not until after the delay  Notice eventually this problem is figured out by Eddy  Just make take some time

Eddies: Continuously Adaptive Query Processing Ross Rosemark.

Similar presentations

Presentation on theme: "Eddies: Continuously Adaptive Query Processing Ross Rosemark."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Eddies: Continuously Adaptive Query Processing Ross Rosemark.

Similar presentations

Presentation on theme: "Eddies: Continuously Adaptive Query Processing Ross Rosemark."— Presentation transcript:

Similar presentations

About project

Feedback