Runtime Optimization of Continuous Queries Balakumar K. Kendai and Sharma Chakravarthy Information Technology Laboratory Department of Computer Science.

Runtime Optimization of Continuous Queries Balakumar K. Kendai and Sharma Chakravarthy Information Technology Laboratory Department of Computer Science and Engineering THE UNIVERSITY OF TEXAS AT ARLINGTON

Introduction  Data Stream Management System (DSMS)  Processes queries over continuous streams of data  Supports monitoring applications  Needs to provide real-time or near real-time response  Continuous queries (CQ)  Evaluated continuously producing an output stream  Quality of Service (QoS) is important  Examples of streaming applications  GPS, Network/Traffic monitoring, Stock feeds  Sensor, RFID

Current MavStream Architecture Instantiator Master Scheduler Schedulers User Input O/P Streams S1S2 Input Processor Data Streams Feeder Monitored QoS Ready Queues P1 S1S2 J1 P2 S3 J2 Run-Time Optimizer Decision Maker System Monitor MavStream Server Control Flow Data Flow

Modules  Input processor  Processes queries to create query plan object  Instantiator  Creates actual instances of operators  CQ execution  Each operator is modeled as a separate thread  Select, Project  Window based - Join, Aggregates  Scheduling Strategies [Kendai:BNCOD’08]  Round robin (RR) / Weighted round robin Each operator allocated equal amount of time (based on priority)  Path capacity scheduling (PCS) Lowest tuple latency.  Segment scheduling Lowest memory utilization.  Simplified segment scheduling. (SS) Intermediate performance  Each scheduling strategy executes as a separate thread    Query Tree

Motivation  Why Runtime Optimization?  Multiple QoS requirements Tuple Latency : –Delay in results –Difference between the entry and exit time of a tuple Memory Utilization –Size of tuples in the input queues of operators Throughput : –Tuples produced each second

Motivation  Why Runtime Optimization?  Various scheduling strategies Scheduling strategies optimize different QoS measures Chain: –Low memory usage and high latency Path Capacity Strategy –Low latency and high memory usage  Long running queries and bursty input  Which scheduling strategy to use for a query? Based on QoS measures of interest Adapt scheduling strategy of queries to meet requirements of QoS

Alternatives For Optimization  Need to choose the best scheduling strategy  Input based  When input rate changes drastically  Monitor selectivity  QoS requirements are not considered  QoS based  Difference between expected and monitored values  Finer control

Runtime Optimizer Architecture  Feedback control system  Optimizes queries individually  System Monitor  Monitors QoS parameters  Provides monitored values to decision maker  Decision Maker  Selects scheduling strategy  Activate and Deactivates Load shedders Master Scheduler Schedulers O/P Streams Monitored QoS Ready Queues P1 S1S2 J1 Run-Time Optimizer Decision Maker System Monitor Feedback

Runtime Optimizer Architecture  Inputs  Prioritized QoS measures of interest  Expected values for QoS measures  Monitored values for QoS measures Provided by System Monitor  Logic / Decision Making  Decision table driven approach to choose a scheduling strategy  Output /Actions  Choose a strategy to improve QoS  Update monitoring interval

Piecewise Approximation of QoS  Fixed value  Inflexible  Expected values may not be same for all time periods  Multiple values  Modeled as piecewise linear functions  Ordered by time X values – time intervals Y values – QoS –Tuple latency –Memory Utilization –Throughput time Latency time Latency

QoS From Piecewise Functions  Flexibility to specify single or multiple intervals  QoS Specification  List of x, y values  A pair of (x, y) values form an interval  Expected values  Calculated using slope and boundary values within an interval  Between intervals extrapolate using values on either side  Time point outside all intervals : Endpoint values are extrapolated  Provides a value for comparison throughout the lifetime time Latency/ Memory/ Throughput

Priority Classes And Usage  QoS Measures: Tuple latency, Memory Utilization, Throughput  Algorithm independent of number of priority classes  Initial weights can be assigned Priority Classes Action Must Satisfy 1.Select best scheduling strategy 2.Load shedding 3.Initial weight - 1 Best Effort 1.Select best scheduling strategy 2.Initial weight 0.5 Don’t Care 1.Select scheduling strategy 2.Initial weight 0.01

Need For Decision Table  System supports multiple scheduling strategies  Characteristics of strategies  Example Tuple Latency belongs to Must Satisfy Class –PCS best strategy, Segment worst Memory Utilization belongs to Must Satisfy Class –Segment best, PCS worst Tuple Latency and Memory Utilization belong to Must Satisfy Class –Simplified Segment (SS) better for both Possible combinations increase with more strategies and QoS measures  Hardwired Logic  Inflexible  Decision logic mixed with code  Decision Table  Easy to extend or modify  What if scenarios can be easily explored

Ranking Of Scheduling Strategies  Each strategy is relatively ranked for each QoS measure  Example Four strategies –4 indicates best strategy –1 indicates worst strategy  Addition of strategies revises decision table  Runtime optimizer logic does not change Round Robin (RR) Path Capacity (PCS) SegmentSimplified Segment (SS) Tuple Latency2413 Memory Utilization 2143 Throughput2413 Round Robin (RR) Path Capacity (PCS) SegmentSimplified Segment (SS) Chain Tuple Latency35241 Memory Utilization 21435 Throughput35241

Example : Decision Making QoS Measure ClassInitial Weight Tuple Latency Must Satisfy 1 RRPCSSegmentSS Tuple Latency 2413 Memory Utilization 2143 Throughput 2413 Input QoS Measure ClassInitial Weight Tuple Latency Must Satisfy 1 Memory Utilization Must Satisfy 1 Tuple Latency Violated RR : 1*(2/4) = 0.5 PCS : 1*(4/4) = 1 Segment : 1*(1/4) = 0.25 SS : 1*(3/4) = 0.75 Tuple Latency & Memory Utilization Violated RR : 1*(2/4) + 1*(2/4) = 1 PCS : 1*(4/4) + 1*(1/4) = 1.25 Segment : 1*(1/4) + 1*(4/4) = 1.25 SS : 1*(3/4) + 1*(3/4) = 1.5

QoS Measures Belong To Different Priority Classes  Multilevel approach for selecting strategies  QoS measures are considered level wise  Best Effort QoS measures are considered only when Must Satisfy measures are met  Don’t Care class QoS measures are considered only when Must Satisfy and Best Effort measures are met

Reduction of Weights  Domination of Must Satisfy measures while considering Best Effort measures  Keep track of margin by which measure is satisfied  Use the margin to reduce weights  Higher the margin by which measure is satisfied, weights can be lowered more without affecting the measure  Margin = (Expected – Observed) / Expected  Lowest weight allowed > Initial weight of next lower priority class

Example : Decision Making RRPCSSegmentSS Tuple Latency 2413 Memory Utilization 2143 Throughput 2413 Input QoS Measure ClassInitial Weight Tuple Latency Must Satisfy 1 Memory Utilization Best Effort 0.5 ThroughputDon’t Care 0.01 RR : 0.75*(2/4) + 0.5*(2/4) = 0.625 PCS : 0.75*(4/4) + 0.5*(1/4) = 0.875 Segment : 0.75*(1/4) + 0.5*(4/4) = 0.6875 SS : 0.75*(3/4) + 0.5*(3/4) = 0.9375 Tuple latency satisfied & Memory Utilization Violated QoS MeasureExpected Value Observed value Reduction Percentage Tuple Latency21(2-1)/2 = 0.5 Reduced weight = Initial Weight – (Reduction Percentage * Weight Range) 1 - (0.5 * (1-0.5) ) = 0.75

Design Issues  Delays due to strategy switching  Synchronization  Moving schedulable objects between ready queues  Overhead  Number of switches  Monitoring Frequency

Look Ahead Time  Query has to be executed in a new strategy for some period of time for changes to appear  Expected QoS may not be the same  Look ahead for expected QoS values  Compare monitored QoS values with expected QoS values at time + ∆t  ∆t > time to switch + time to schedule all operators once with new strategy  Can reduce unwanted switches time Y value (x 1, y 1) (x 2, y 2)

Overhead  Decision making overhead  Time taken to evaluate all measures  Choosing a strategy Fixed number of strategies Fixed number of measures  Can be considered constant  Actions to change scheduling strategy  Overhead proportional to  Number of switches Worst case – number of times monitored  Overhead can be reduced by  Reducing number of switches

Monitoring Frequency  Monitoring frequency determines maximum number of switches  Low monitoring frequency Runtime optimizer may not react in time  Determining Monitoring Interval  Number of times to be monitored in each interval Monitor at least at begin, center and end points of an interval Minimum time between monitoring cycles – All operators in a query must be scheduled for some period of time using a scheduling strategy for effects to become visible – t 1 = m * time required to schedule all operators, m >= 1 Maximum time between monitoring cycles –t 2 = n * time required to schedule all operators –Query should monitored at least every t 2 seconds –m, n can be configured, m<< n –Defaults »m = 2, n = 10

Runtime Optimizer Flowchart Current Strategy If strategy different from current True Compare QOS Violating Satisfied check lower priority measures Get strategy using violated and satisfied measures Switch to new strategy

Implementation Details  Input Processor  Processes the list of expected QoS values creates data structure QoS Parameters  Decision Maker  Holds Decision Table, algorithm to choose strategy  Hash table contains QoS Data for each query –QoS Parameters –Methods provided »Expected QoS values, given a time »Next time to monitor for a query  System Monitor  Maintains hash table containing information required to monitor a query  Wakes up and monitors output in a cycle  Determines amount of time monitor should wait between cycles  Invokes decision maker

Implementation Issues  Individual monitoring thread for queries  High overhead  Runtime optimizer changing scheduling strategies  Synchronization issues Potentially can block Other queries to monitor can be delayed  Switching strategy for a query delegated to Master Scheduler

Experimental Setup  AMD Opteron 2 GHz, Dual Core - Quad processor  Red Hat Enterprise Linux AS 4  JDK 1.5, 16 GB RAM, 8GB Max heap size  Monitoring interval fixed for comparison purposes  Query  Two joins, 8 operators  Three input streams, 2 million tuples each      

Multiple QoS Measures – Different Priority Mean rates for Poisson distribution  800, 550, 900 tuples/sec  Mean doubled at different points  Tuple based window - 500 tuples/window  Choice of strategy by runtime optimizer  Tuple latency Must Satisfy (1,0.5)-(10,0.5) Seconds  Memory Utilization Don’t Care (1,10K)-(500,10K), (550,1K) – (3000,1K) Kilo Bytes  Initial Strategy was provided  Segment

Multiple QoS Measures – Different Priority

Multiple QoS Measures – Same Priority  Mean rates for Poisson distribution  2000,1800,2200 tuples/sec  Tuple latency  Must Satisfy  (1,0.5)-(10,0.5)  Seconds  Memory Utilization  Must Satisfy  (1,10K)-(500,10K), (550,1K) – (3000,1K)  Kilo Bytes  Initial Strategy was provided  Segment

Multiple QoS Measures – Same Priority

Related Work  Aurora: A New Model and Architecture for Data Stream Management. D. Abadi, et al. In VLDB Journal (12)2: 120-139, August 2003  Two level scheduling  Load shedding QoS driven  Borealis, Streambase  STREAM: The Stanford Stream Data Manager. The STREAM Group. IEEE Data Engineering Bulletin, March 2003  CQL  Chain scheduling  Load shedding for aggregation queries

Related Work  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. Sirish Chandrasekaran, et al CIDR 2003. Ingress and caching Query processing Adaptive routing  NiagaraCQ: A Scalable Continuous Query System for Internet Databases. Jianjun Chen, et al SIGMOD 2000, p379-390.  Cougar: Towards Sensor Database Systems. Philippe Bonnet, et al International Conference on Mobile Data Management. January 2001.

Related Work  A framework for supporting quality of service requirements in a data stream management system, Qingchun Jiang. PhD Thesis, UTA 2005  Continuous Query Modeling System capacity planning Choose QoS delivery mechanisms QoS verification  Scheduling Strategies Path Capacity Strategy (minimize tuple latency) Segment Strategies (Greedy, MOS, Simplified) Threshold Strategy – hybrid of PC and MOS  Load Shedding System Load Estimation Optimal location of shedders Shedding-load distribution among shedders

To Conclude  Presented the issues involved in the design, implementation, and evaluation of a run-time optimizer  Introduced a decision table that stores information about performance of various scheduling strategies  Runtime optimizer uses this decision table to select the appropriate strategy  Extensive experimental validation indicates the correctness of the RO under disparate input characteristics

Thank you…

Runtime Optimization of Continuous Queries Balakumar K. Kendai and Sharma Chakravarthy Information Technology Laboratory Department of Computer Science.

Similar presentations

Presentation on theme: "Runtime Optimization of Continuous Queries Balakumar K. Kendai and Sharma Chakravarthy Information Technology Laboratory Department of Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Runtime Optimization of Continuous Queries Balakumar K. Kendai and Sharma Chakravarthy Information Technology Laboratory Department of Computer Science.

Similar presentations

Presentation on theme: "Runtime Optimization of Continuous Queries Balakumar K. Kendai and Sharma Chakravarthy Information Technology Laboratory Department of Computer Science."— Presentation transcript:

Similar presentations

About project

Feedback