Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,

Slides:



Advertisements
Similar presentations
Feedback Control Real- time Scheduling James Yang, Hehe Li, Xinguang Sheng CIS 642, Spring 2001 Professor Insup Lee.
Advertisements

Simulation of Feedback Scheduling Dan Henriksson, Anton Cervin and Karl-Erik Årzén Department of Automatic Control.
Feedback Control Real-Time Scheduling: Framework, Modeling, and Algorithms Chenyang Lu, John A. Stankovic, Gang Tao, Sang H. Son Presented by Josh Carl.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song.
The Design of the Borealis Stream Processing Engine Daniel J. Abadi1, Yanif Ahmad2, Magdalena Balazinska1, Ug ̆ur C ̧ etintemel2, Mitch Cherniack3, Jeong-Hyon.
Backlog Estimation and Management for Real-Time Data Services Kyoung-Don Kang, Jisu Oh, and Yan Zhou Department of Computer Science State University of.
Adaptive Monitoring of Bursty Data Streams Brian Babcock, Shivnath Babu, Mayur Datar, and Rajeev Motwani.
Exponential Tracking Control of Hydraulic Proportional Directional Valve and Cylinder via Integrator Backstepping J. Chen†, W. E. Dixon‡, J. R. Wagner†,
Load Shedding in a Data Stream Manager Kevin Hoeschele Anurag Shakti Maskey.
Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.
SIGMETRICS 2008: Introduction to Control Theory. Abdelzaher, Diao, Hellerstein, Lu, and Zhu. CPU Utilization Control in Distributed Real-Time Systems Chenyang.
AQM for Congestion Control1 A Study of Active Queue Management for Congestion Control Victor Firoiu Marty Borden.
Diffusion Mechanisms for Active Queue Management Department of Electrical and Computer Engineering University of Delaware May 19th / 2004 Rafael Nunez.
Aurora Proponent Team Wei, Mingrui Liu, Mo Rebuttal Team Joshua M Lee Raghavan, Venkatesh.
Adaptive Sampling in Distributed Streaming Environment Ankur Jain 2/4/03.
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
LDU Parametrized Discrete-Time Multivariable MRAC and Application to A Web Cache System Ying Lu, Gang Tao and Tarek Abdelzaher University of Virginia.
Applying Control Theory to Stream Processing Systems Wei Xu Bill Kramer Joe Hellerstein.
CH 1 Introduction Prof. Ming-Shaung Ju Dept. of Mechanical Engineering NCKU.
Diffusion Early Marking Department of Electrical and Computer Engineering University of Delaware May / 2004 Rafael Nunez Gonzalo Arce.
Open loop vs closed loop By Norbert Benei ZI5A58.
CSE 425: Industrial Process Control 1. About the course Lect.TuLabTotal Semester work 80Final 125Total Grading Scheme Course webpage:
Unit 3a Industrial Control Systems
Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.
Applying Feedback Control to QoS management - an introduction -
System & Control Control theory is an interdisciplinary branch of engineering and mathematics, that deals with the behavior of dynamical systems. The desired.
Security and QoS Self-Optimization in Mobile Ad Hoc Networks ZhengMing Shen and Johnson P. Thomas Presented by: Sharanpal singh.
Freshness-Aware Scheduling of Continuous Queries in the Dynamic Web Mohamed A. Sharaf Alexandros Labrinidis Panos K. Chrysanthis Kirk Pruhs Advanced Data.
Final Exam, May 25, 2007 Quality Management in Multimedia Databases and Data Stream Management Systems Yicheng Tu Department of Computer Sciences Purdue.
Preference-Aware Query and Update Scheduling in Web-databases Huiming Qu Department of Computer Science University of Pittsburgh Joint work with Prof.
1 A Feedback Control Architecture and Design Methodology for Service Delay Guarantees in Web Servers Presentation by Amitayu Das.
Brief Review of Control Theory
5: Processor-based Control Systems CET360 Microprocessor Engineering J. Sumey.
Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.
A new model and architecture for data stream management.
U N I V E R S I T Y O F S O U T H F L O R I D A The basic idea is to start from a difference equation with unknown parameters and orders in the following.
Runtime Optimization of Continuous Queries Balakumar K. Kendai and Sharma Chakravarthy Information Technology Laboratory Department of Computer Science.
Model Reference Adaptive Control (MRAC). MRAS The Model-Reference Adaptive system (MRAS) was originally proposed to solve a problem in which the performance.
Control Systems and Adaptive Process. Design, and control methods and strategies 1.
Reinforcement Learning Control with Robust Stability Chuck Anderson, Matt Kretchmar, Department of Computer Science, Peter Young, Department of Electrical.
Load Shedding Techniques for Data Stream Systems Brian Babcock Mayur Datar Rajeev Motwani Stanford University.
Mechanical Engineering Department Automatic Control Dr. Talal Mandourah 1 Lecture 1 Automatic Control Applications: Missile control Behavior control Aircraft.
Load Shedding in Stream Databases – A Control-Based Approach Yicheng Tu, Song Liu, Sunil Prabhakar, and Bin Yao Department of Computer Science, Purdue.
Control systems KON-C2004 Mechatronics Basics Tapio Lantela, Nov 5th, 2015.
Control-Based Load Shedding in Data Stream Management Yicheng Tu †, Song Liu ‡, Sunil Prabhakar †, Bin Yao ‡ † Indiana Center of Database Systems, Department.
A new model and architecture for data stream management.
Networks in Engineering A network consists of a set of interconnected components that deliver a predictable output to a given set of inputs. Function InputOutput.
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
7/6/99 MITE1 Fully Parallel Learning Neural Network Chip for Real-time Control Students: (Dr. Jin Liu), Borte Terlemez Advisor: Dr. Martin Brooke.
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey.
Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3,
UNIT: User-ceNtrIc Transaction Management in Web-Database Systems Huiming Qu, Alexandros Labrinidis, Daniel Mosse Advanced Data Management Technologies.
Demand Response Analysis and Control System (DRACS)
Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Song Liu‡, Sunil Prabhakar†, and Bin Yao‡ † Department of Computer.
Name of Student : PATEL ARPITKUMAR RAJNIKANT Enrollment No
Quality management in database systems A thesis proposal Yicheng Tu January 24, 2006 Advisor: Prof. Sunil Prabhakar.
Applying Control Theory to Stream Processing Systems
Load Shedding CS240B notes.
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
Data Stream Management System (DSMS)
Load Shedding in Stream Databases – A Control-Based Approach
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Brief Review of Control Theory
QuaSAQ: Enabling End-to-End QoS for Distributed Multimedia Databases
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
CONTROL SYSTEM AN INTRODUCTION.
Load Shedding CS240B notes.
Control Theory in Log Processing Systems
Presentation transcript:

Control-Based Load Shedding in Data Stream Management Systems Yicheng Tu and Sunil Prabhakar Department of Computer Sciences, Purdue University April 3, 2006

Data stream management systems Applications Financial analysis Mobile services Sensor networks Network monitoring More … Continuous data, discarded after being processed Continuous query Data-active query-passive model

DSMS architecture Network of query operators (O1 – O3) Each operator has its own queue (q1 – q4) Scheduler decides which operator to execute Query results (Q1, Q2) pushed to clients Example systems: Aurora/Borealis STREAM

Qualities in DSMS data processing Data processing in DSMS is quality-critical tuple delay data loss sampling rate, window size, … Overloading during spikes  degraded quality (delay) Solution: adjust data loss (i.e., load shedding) On DSMS side Eliminating excessive load by dropping data items The real problem is: tuple delay is the major concern: results generated from old data are useless! How to maintain processing delays while minimizing data loss ?

Related work Accuracy of aggregate queries under load shedding (Babcock et al., ICDE04) Data triage (Reiss & Hellerstein, ICDE05) Put data into an asylum upon overloading LoadStar (Chi et al., VLDB05) QoS-driven load shedding (Tatbul et al., VLDB03) Key questions - When? - How much? - Where? Use a load shedding roadmap to decide where Simple, intuitive algorithm to decide when and how much

What ’ s wrong? Highly dynamic environment is reality Bursty data input Variable unit processing cost Fail to capture current system status (queue length) and output (delay) Delay positively related to queue length Examples 1. Unbounded increase of delay Example 2. Unnecessary data loss

Our approach The feedback control loop: Plant Monitor Controller Actuator How it works Error ( e ) = desirable output ( y r ) - measured output ( y ) Focal point: controller, which maps e to control signal u Disturbances View load shedding as a control problem Control: manipulation of system behavior by adjusting system input Cruise control of automobiles, room temperature control, etc. Open-loop vs. closed-loop (feedback) control

Why feedback control ? Open loop Closed-loop 1/a

Challenges Can we model the system? Analytical model may not be easy to derive System identification: experimental methods How to design the controller? Use control theoretical tools for guaranteed performance DSMS-specific problems Lack of real-time measurement of output signal ( y ) How to set control period ( T ) Real system evaluation we use Borealis in our study

Modeling a DSMS Borealis data stream manager Round robin operator scheduler FIFO waiting queues For now, fix the per-tuple processing cost c Proposed model: y = qc where q is the number of outstanding data tuples Discrete form: y(k) = q(k-1)c Denote the input load as f i and system processing power as f o:

Controller design Design based on pole placement Guaranteed performance targeting Convergence rate - responsiveness Damping - smoothness The controller:

Control period Provides complete answer to the question “when to shed load”? Arbitrarily set in previous studies Case-by-case decision with some systematic rules In our problem, a tradeoff between: Sampling theory (Nyquist-Shannon Theorem): in order to capture the moving trends of the disturbances, higher (shorter) sampling frequency (period) is preferred Stochastic feature of output ( y ) and parameter ( c ): more samples are needed  longer period is preferred The first factor should be given more weight

Experiments Controller and load shedder implemented in Borealis Synthetic (“pareto”) and real (“Web”) data streams Small query network with variable average processing cost

Experimental results Experiments for comparison Aurora – open loop solution Baseline – a simple feedback method Target delay : 2000ms Control period : 1 second Total time: 400 seconds For both data types, data loss are almost the same for three load shedding strategies

Future work Time-varying DSMS model For example, time-varying cost c Possible solution: adaptive control Adaptation other than load shedding New disturbances? Model changes? Other database problems?

Summary Load shedding is an important quality adaptation method Ad hoc solutions do not work under dynamic load and system features We propose an approach to guide load shedding in a highly dynamic environment based on feedback control theory Initial experimental results performed in a real-world DSMS show promising potential of our approach

Acknowledgements Dr. Song Liu, Hurco Companies, Inc., Indianapolis, IN. Prof. Bin Yao, School of Mechanical Engineering, Purdue University Ms. Nesime Tatbul, Profs. Ugur Cetentimel, Stan Zdonik, CS Department, Brown University

Backup - 1

Backup - 2 Lack of robustness of open-loop solution More optimistic policy adapted in Aurora Unstable performance Our solution is robust Under input streams with different burstiness

Backup - 3

Backup - 4 :Model verification Feed Borealis with synthetic streams Input rate: step function or sinusoidal function of time Average processing cost is fixed