Presentation is loading. Please wait.

Presentation is loading. Please wait.

Control-Based Load Shedding in Data Stream Management Yicheng Tu †, Song Liu ‡, Sunil Prabhakar †, Bin Yao ‡ † Indiana Center of Database Systems, Department.

Similar presentations


Presentation on theme: "Control-Based Load Shedding in Data Stream Management Yicheng Tu †, Song Liu ‡, Sunil Prabhakar †, Bin Yao ‡ † Indiana Center of Database Systems, Department."— Presentation transcript:

1 Control-Based Load Shedding in Data Stream Management Yicheng Tu †, Song Liu ‡, Sunil Prabhakar †, Bin Yao ‡ † Indiana Center of Database Systems, Department of Computer Sciences, 305 N. University Street, West Lafayette, IN 47907 ‡ School of Mechanical Engineering, 140 S. Intramural Drive, West Lafayette, IN 47907 Introduction Data Stream Management Systems (DSMSs) process large number of data streams to answer user-specified queries. These systems are generally built following a query-passive data-active model, in which all data are pushed to the database server for processing and query results are sent to the users continuously. Data processing delay is critical in DSMSs since query results generated from old data are useless to users. In case of overloading, data tuples have to be discarded without processing in order to achieve desired processing delay. This is called load shedding. Key Questions: When? How much? Where? We focus on the first two questions. Acknowledgements This is joint work with my advisor, Prof. Sunil Prabhakar (sunil@cs.purdue.edu), Dr. Song Liu (liu1@purdue.edu) and Prof. Bin Yao (byao@purdue.edu) of the School of Mechanical Engineering in Purdue University. The author would also like to thank Ms. Nesime Tatbul and Prof. Ston Zdonik, both from the Computer Science department of Brown University, for providing the Aurora/Borealis source code. Figure 1. Pushed-based DSMS system model. Objective To design and implement a load shedding framework that minimize the data loss; maintains processing delays in rejection to disturbances: - bursty data arrivals; - internal dynamics of DSMS. is robust, i.e., works for a wide range of input streams. Figure 2. Examples of disturbances in data processing in DSMS. Top: bursty arrival rates; Bottom: unit processing costs. Our approach - View it as a feedback control problem - Develop a dynamic model for a specific DSMS - Design controller via rigorous control-theoretical methods - Work on a real DSMS – the open-source Borealis system Figure 3. The feedback control loop for load shedding. Output ( y ): average tuple delay; Input ( u ): tuple injection rate to DSMS; target delay value ( y r ) and control error ( e ). Results - Obtained a first-order linear model for Borealis - Pole placement-based design ended up a PD controller: where c and H are system-specific constants and T is the control period. - Identified and solved several DSMS-specific problems - Control framework evaluated with real and synthetic data Figure 4. Performance of our load shedding solution (CTRL), AURORA, an open-loop solution that represents state-of-the- art in DSMS load shedding, and BASELINE, a naïve feedback-based solution. Figure 5. Relative performance of CTRL to AURORA and BASELINE. A, B, C: various aspects of delay violations; D: percentage of data discarded. Figure 6. Robustness of CTRL and AURORA tested with input streams of different burstiness (smaller bias factor represents more bursty stream). Conclusions 1. First database work that uses feedback-control- theoretical methods; 2. Rigorous system modeling and controller design generate a PD controller that controls average tuple delays by adjusting the amount of load shedding; 3. Control framework implemented and evaluated in real DSMS. Experiments show that feedback-control-based method significantly improves control of delays with the same amount of data loss as compared to current solutions. 4.The above solution is also robust.


Download ppt "Control-Based Load Shedding in Data Stream Management Yicheng Tu †, Song Liu ‡, Sunil Prabhakar †, Bin Yao ‡ † Indiana Center of Database Systems, Department."

Similar presentations


Ads by Google