Presentation is loading. Please wait.

Presentation is loading. Please wait.

Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Similar presentations


Presentation on theme: "Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,"— Presentation transcript:

1 Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3, 2006

2 Data stream management systems Applications Financial analysis Mobile services Sensor networks Network monitoring More … Continuous data, discarded after being processed Continuous query Data-active query-passive model

3 DSMS architecture Network of query operators (O1 – O3) Each operator has its own queue (q1 – q4) Scheduler decides which operator to execute Query results (Q1, Q2) pushed to clients Example systems: Aurora/Borealis STREAM

4 Qualities in DSMS data processing Data processing in DSMS is quality-critical tuple delay data loss sampling rate, window size, … Overloading during spikes  degraded quality (delay) Solution: adjust data loss (i.e., load shedding) On DSMS side Eliminating excessive load by dropping data items The real problem is: tuple delay is the major concern: results generated from old data are useless! How to maintain processing delays while minimizing data loss ?

5 Related work Accuracy of aggregate queries under load shedding (Babcock et al., ICDE04) Data triage (Reiss & Hellerstein, ICDE05) Put data into an asylum upon overloading LoadStar (Chi et al., VLDB05) QoS-driven load shedding (Tatbul et al., VLDB03) Key questions - When? - How much? - Where? Use a load shedding roadmap to decide where Simple, intuitive algorithm to decide when and how much

6 What ’ s wrong? Highly dynamic environment is reality Bursty data input Variable unit processing cost Fail to capture current system status (queue length) and output (delay) Delay positively related to queue length Examples 1. Unbounded increase of delay Example 2. Unnecessary data loss

7 Our approach The feedback control loop: Plant Monitor Controller Actuator How it works Error ( e ) = desirable output ( y r ) - measured output ( y ) Focal point: controller, which maps e to control signal u Disturbances View load shedding as a control problem Control: manipulation of system behavior by adjusting system input Cruise control of automobiles, room temperature control, etc. Open-loop vs. closed-loop (feedback) control

8 Why feedback control ? Open loop Closed-loop 1/a

9 Challenges Can we model the system? Analytical model may not be easy to derive System identification: experimental methods How to design the controller? Use control theoretical tools for guaranteed performance DSMS-specific problems Lack of real-time measurement of output signal ( y ) How to set control period ( T ) Real system evaluation we use Borealis in our study

10 Modeling a DSMS Borealis data stream manager Round robin operator scheduler FIFO waiting queues For now, fix the per-tuple processing cost c Proposed model: y = qc where q is the number of outstanding data tuples Discrete form: y(k) = q(k-1)c Denote the input load as f i and system processing power as f o:

11 Controller design Design based on pole placement Guaranteed performance targeting Convergence rate - responsiveness Damping - smoothness The controller:

12 Control period Provides complete answer to the question “when to shed load”? Arbitrarily set in previous studies Case-by-case decision with some systematic rules In our problem, a tradeoff between: Sampling theory (Nyquist-Shannon Theorem): in order to capture the moving trends of the disturbances, higher (shorter) sampling frequency (period) is preferred Stochastic feature of output ( y ) and parameter ( c ): more samples are needed  longer period is preferred The first factor should be given more weight

13 Experiments Controller and load shedder implemented in Borealis Synthetic (“pareto”) and real (“Web”) data streams Small query network with variable average processing cost

14 Experimental results Experiments for comparison Aurora – open loop solution Baseline – a simple feedback method Target delay : 2000ms Control period : 1 second Total time: 400 seconds For both data types, data loss are almost the same for three load shedding strategies

15 Future work Time-varying DSMS model For example, time-varying cost c Possible solution: adaptive control Adaptation other than load shedding New disturbances? Model changes? Other database problems?

16 Summary Load shedding is an important quality adaptation method Ad hoc solutions do not work under dynamic load and system features We propose an approach to guide load shedding in a highly dynamic environment based on feedback control theory Initial experimental results performed in a real-world DSMS show promising potential of our approach

17 Acknowledgements Dr. Song Liu, Hurco Companies, Inc., Indianapolis, IN. Prof. Bin Yao, School of Mechanical Engineering, Purdue University Ms. Nesime Tatbul, Profs. Ugur Cetentimel, Stan Zdonik, CS Department, Brown University

18 Backup - 1

19 Backup - 2 Lack of robustness of open-loop solution More optimistic policy adapted in Aurora Unstable performance Our solution is robust Under input streams with different burstiness

20 Backup - 3

21 Backup - 4 :Model verification Feed Borealis with synthetic streams Input rate: step function or sinusoidal function of time Average processing cost is fixed


Download ppt "Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,"

Similar presentations


Ads by Google