Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.

Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management Technologies Lab Department of Computer Science University of Pittsburgh VLDB 2006

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 2 Motivating Example Tell me when there are airplane tickets such that: Itinerary:Pittsburgh -> Korea -> Pittsburgh Dates:September 8 -> September 16 Price < $1200 This is a form of a Continuous Query (CQ): CQs registered ahead of time Arrival of new data triggers execution CQs support monitoring applications:

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 3 Data Stream Management System (DSMS) DSMS = Database system + Online system Our Goal: Improve the online performance of a DSMS Input Data Streams Output Data Stream D 1 Query Scheduler Continuous Query Q n 123 Output Data Stream D n Load Shedder Memory ManagerQuery Optimizer Query Scheduler 123 Continuous Query Q 1

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 4 Need for Query Scheduling The execution order of continuous queries determines the overall behavior of the system e.g., memory usage [Babcock et. al., SIGMOD’03] Traditionally: One operator per thread Resource management done by OS Problems: No objective for optimization Does not exploit query semantics

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 5 Scheduling Multiple Continuous Queries (MCQ) Given: A set of n queries ready to execute (queries with pending updates) A certain metric to optimize Then: The MCQ Scheduler decides the execution order of the n queries so that to optimize the given metric 1 22 33 1 22 33 1 22 33 … CQ 1 CQ 2 CQ n

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 6 Outline Introduction Scheduling for Quality of Service (QoS) Average response time Average slowdown Balancing the trade-off between average and worst case Implementation issues Conclusions

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 7 Response Time The response time of a tuple is the interval of time between its arrival at the DSMS until its departure Tuples that are filtered out (discarded) during query processing do not contribute to the metric Shortest Remaining Processing Time (SRPT) is the policy to optimize response time in Web servers Would SRPT optimize response time for multiple CQs ?! No … because it does not exploit CQs characteristics!

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 8 Impact of Selectivity Selectivity of a query (S): is the probability of producing an output tuple after processing an input tuple (i.e., detecting a related event) S=0.1: 10 input tuples  1 output event S=1.0: 10 input tuples  10 output events If two queries have the same cost then: the one with higher selectivity produces more tuples per time unit (higher Output Rate).

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 9 Impact of Output Rate Q 1 : S 1 =1.0 and C 1 =1 mS then OR 1 =1.0 Q 2 : S 2 =0.2 and C 2 =1 mS then OR 2 =0.2 5 pending tuples arrived at time 0 RT Q 2 then Q 1 12.2 Q 1 then Q 2 7.1 Q2Q2 Q2Q2 510 Q1Q1 Q1Q1 Q 1 then Q 2 Q1Q1 Q1Q1 Q1Q1 Q2Q2 Q2Q2 Q2Q2 0 Q 2 then Q 1 Q2Q2 Q2Q2 510 Q1Q1 Q1Q1 Q1Q1 Q1Q1 Q1Q1 Q2Q2 Q2Q2 Q2Q2 0

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 10 Highest Rate Policy Assign each query a priority equal to its output rate The output rate of a query = selectivity/cost How to compute the output rate of a query with more than one operator ? 1 22 33 At each scheduling point, schedule the query with the highest global output rate…Highest Rate Policy (HR)

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 11 Simulation Testbed Developed a DSMS simulator in C++ Policies for multi-query scheduling: Round Robin (RR; Aurora) Highest Rate (HR) First Come First Serve (FCFS) Shortest Remaining Processing Time (SRPT) Input traces from Internet traffic Generate 500 continuous queries: select-join-project Uniform distribution of costs and selectivities Assigned costs and selectivities determine the system’s utilization (or load)

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 12 Avg. Response Time (  Sec) Results: Average Response Time 65% 73%

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 14 Slowdown Slowdown (or stretch): [Mehta & DeWitt, VLDB’93] Ratio between the tuple’s response time to its ideal processing time if it were the only tuple in the system slowdown is more fair than response time: It relates response time to demand: tuples for an expensive query are expected to stay longer as they contribute more to the load Ideally, slowdown = 1 Slowdown increases with increasing load

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 15 SRPT vs. HR In Web Servers, SRPT is: Optimal for response time, and Near optimal for slowdown Short jobs spend shorter time in the system In DSMSs: HR minimizes average response time but what about average slowdown ? Is it possible under HR for short queries to experience high slowdown leading to an overall high slowdown ? Queries with low selectivity are penalized !

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 16 Example Q 1 : S 1 =1.0 and C 1 =5 mS then OR 1 =0.2 Q 2 : S 2 =0.33 and C 2 =2 mS then OR 2 =0.17 3 pending tuples arrived at time 0 Q2Q2 Q2Q2 Q2Q2 51015171921 SD=1SD=2SD=3SD=9.5 Q1Q1 Q1Q1 Q1Q1 HR policy: Q1Q1 Q1Q1 Q1Q1 111621246 SD=2SD=2.2SD=3.2SD=4.2 Q2Q2 Q2Q2 Q2Q2 Another policy: RTSD HR12.23.8 Other132.9

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 17 Parameters for Scheduling S x = s 1 * s 2 * s 3 C x avg = c 1 + (c 2 *s 1 ) + (c 3 *s 1 *s 2 ) C x = cost of detecting an event = c 1 + c 2 +c 3 = ideal processing time W x = the current wait time of the oldest tuple in Q x input queue 11 2∞2∞ 33

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 18 Scheduling for Slowdown (1) Compute slowdown (H) under two policies: Policy X: first Q 1 then Q 2 Policy Y: first Q 2 then Q 1 Probability that t 1 is produced Wait time Extra wait time for Q 1 to finish execution t1t1 Q1Q1 W1W1 C 1 avg S1S1 Q2Q2 W2W2 C 2 avg S2S2 Processing time t2t2 t 1 ’s slowdownt 2 ’s slowdown C2C2 C1C1

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 19 Scheduling for Slowdown (2) Under policy X: first Q 1 then Q 2 t1t1 Q1Q1 W1W1 C1C1 S1S1 Q2Q2 W2W2 C2C2 S2S2 t2t2 Under policy Y: first Q 2 then Q 1 For H X < H Y: t1t1 Q1Q1 W1W1 C 1 avg S1S1 Q2Q2 W2W2 C 2 avg S2S2 t2t2 C2C2 C1C1

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 20 Scheduling for Slowdown (3) S x /C x avg is the output rate (OR x ) of Q x C x is the ideal processing time of a tuple produced by Q x Our Highest Normalized Rate (HNR) policy emphasizes the tuple ideal processing time Inexpensive queries with low productivity are not penalized For equal costs: C i = 1  HNR = HR For selectivity 1: S i = 1  HNR = SRPT Priority of Q x = =

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 21 Avg. Slowdown Results: Average Slowdown 20%

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 23 Worst-Case Performance Queries/Events may experience starvation Queries with low selectivity and/or high cost Typically measured using: maximum response time, or maximum slowdown Maximum slowdown (or response time) is: A very sensitive metric It does not consider the average-case performance

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 24 Trade-off between Avgerage Case and Worst Case Maximum slowdown = worst-case performance Average slowdown = average-case performance We need to look at both metrics at the same time L p norm of slowdowns captures both metrics L 2 norm of N tuples = it takes into account all values it penalizes outliers

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 25 Scheduling for the L 2 Norm of Slowdowns Balance Slowdown Policy (BSD) Priority of Q x = A query is scheduled either because: It has a high normalized rate, or Its pending tuples accumulated high slowdown All users are satisfied = Fairness Normalized RateCurrent Slowdown

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 26 Max. Slowdown Results: Balancing the trade-off 77% 31% Avg. Slowdown

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 27 L 2 Norm of Slowdowns Results: L 2 Norm of Slowdowns 24%

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 28 Slowdown per Class (same cost queries)

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 29 Outline Introduction Scheduling for Quality of Service (QoS) Implementation issues Scheduling overhead Shared operators (details in paper) Conclusions

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 30 Optimization Methods L 2 SD of BSD-Logarithmic / L 2 SD of BSD-Hypothetical BSD-Hypothetical = BSD without overhead

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 31 Conclusions In this talk, we presented: QoS metrics for evaluating the performance of a DSMS Scheduling policies that exploit the properties of CQs Policies to improve QoS : Highest Rate (HR) for average response time Highest Normalized Rate (HNR) for average slowdown Balance Slowdown (BSD) for balancing the trade-off between average- and worst-case performance Addressed implementation issues to ensure the applicability of our proposed policies We empirically evaluated the gains provided by the proposed policies compared to existing policies

University of Pittsburgh Sharaf, Chrysanthis, Labrinidis, Pruhs 32 Thank You Questions ? http://db.cs.pitt.edu/streams Thanks: NSF IIS-0534531 (AQSIOS Project)

Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.

Similar presentations

Presentation on theme: "Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.

Similar presentations

Presentation on theme: "Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management."— Presentation transcript:

Similar presentations

About project

Feedback