Presentation on theme: "DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song."— Presentation transcript:
DEXA 2005 Control-based Quality Adaptation in Data Stream Management Systems (DSMS) Yicheng Tu†, Mohamed Hefeeda‡, Yuni Xia†, Sunil Prabhakar†, and Song Liu ￥ † Department of Computer Sciences, Purdue University, USA ‡ School of Computing Science, Simon Fraser University at Surrey, Canada ￥School of Mechanical Engineering, Purdue University, USA
DEXA 2005 Data Stream Management Continuous data, discarded after being processed Continuous query Data-active query- passive model Applications –Financial analysis –Mobile services –Sensor networks –Network monitoring –More …
DEXA 2005 DSMS architecture Network of query operators (O1 – O3) Each operator has its own queue (q1 – q4) Scheduler decides which operator to execute Query results (Q1, Q2) pushed to clients Example systems: –Aurora/Borealis –STREAM
DEXA 2005 Quality-of-Service (QoS) in DSM Data processing is QoS-critical in DSMS –Tuple delay is the major concern: results generated from old data are useless! Highly dynamic environment hard to maintain QoS –Bursty data input –Unpredictable unit processing cost Overloading during spikes degraded (delay) QoS Solution: adjust the following (i.e. quality adaptation) –Sampling rate (source side) –Data loss (DSMS side) load shedding
DEXA 2005 Load Shedding Eliminating excessive load by dropping data items less QoS violations Basic algorithm (Tatbul et al., 2003): periodically CPU is the bottlenecking resource Key questions –When? –How much? –Where? –Which tuples?
DEXA 2005 What’s missing? Current solutions focus on steady-state performance Assuming input level changes between stable states However, arrivals are bursty in practice – always in transient state Taking averages (baseline) wouldn’t work
DEXA 2005 Our approach View load shedding as a feedback control problem Feedback Control: manipulation of system behavior by adjusting system input based on system output –Cruise control of automobiles, room temperature control, etc. The feedback control loop: –Plant –Monitor –Controller –Actuator How it works –Error = measured output – desirable output –Focal point: controller, which maps error to control signal
DEXA 2005 Why Feedback Control ? Maintain system performance under internal/external uncertainties Control theory provides tools to choose and tune controller toward desired performance –Current load shedding solution is also feedback-based –Difference: we use control theory to guide the controller design Steps of problem-solving using control theory 1.Mapping problem to feedback control loop, determine input/output 2.System identification: modeling input/output relationship 3.Controller design: can be done analytically
DEXA 2005 The feedback control loop Plant : current DSMS –Input : load admitted –Output : delay QoS –Reference output: specified by DBA Actuator –adaptor: load shedder –admission controller Monitor : new Controller : new System dynamics: disturbances Discrete control: control period T
DEXA 2005 System identification To build dynamic model that describes the relationship between input and output Most system can be modeled by the following linear difference equation: –I(x): input at period x –O(x): output at period x –n: order of the equation –a i, b i : system-specific coefficients Determine n, a i, b i by experiments using synthetic inputs
DEXA 2005 Controller design PI controller: –E(k) : error –g, r : controller coefficients –I d (k) : desirable input More efficiently: Transfer function of the PI controller: For example, a second order system has TF: Closed-loop TF (CLTF): determine g and r by pole placement of the CLTF (details skipped)
DEXA 2005 Actuator (load shedder) design I d (k) is the desirable load (# of data tuples) entering the DSMS during the next control period k Let S(k) be the real load during period k, we need to discard S(k) - I d (k) tuples Two implementations of load shedder: –Admit the first I d (k) tuples during period k Pros: easy to implement, generate (100%) accurate control signal Cons: skewed to the early arrivals –Sampling based shedding: each tuple is discarded with probability 1- p, i.e. p = I d (k) / S(k) However, S(k) is unknown at the beginning of period k Solution: use S(k-1) to estimate S(k) and this does not affect controller performance (see backup slide)
DEXA 2005 Determining control period Control period T is critical in controller design Two primary concerns in setting T –Should be short enough to capture the changes of input rate Nyquist-Shannon theorem of sampling The shorter the better –Output signal (delay) is measured as an average of all data tuples in one control period T is too short small number of sampled tuples T cannot be too short as the output signal may fail to represent real system status We make tradeoffs between the above two factors and set T to one second
DEXA 2005 Experiments We evaluate our control-based solution by simulations Set four classes of delays: 500ms – 2000ms Operator scheduling policy: Earliest Deadline First –Input: CPU utilization –Output: deadline miss ratio Small query network with 13 operators Stream data: –Synthetic: Poisson, Pareto –Real: TCP traces Comparison: static shedding –Amount of shedding follows a pre-determined STEPSIZE –Similar to TCP rate control
DEXA 2005 Simulation results: Poisson inputs Target deadline miss ratio (control goal) is set to zero Inputs Outputs
DEXA 2005 Simulation results: bursty inputs a. Pareto b. TCP trace Much less deadline misses than static shedding The same or lower level of data loss (load shed) Hard to get an appropriate STEPSIZE in static shedding – not a problem in control- based approach
DEXA 2005 Summary Load shedding is an important quality adaptation method Current solutions focusing on steady-state performance do not work well under bursty inputs We propose an approach to guide load shedding in a highly dynamic environment based on feedback control theory Initial experimental results by simulation show promising potential of our approach
DEXA 2005 Verification of model First order linear model
DEXA 2005 Simulation: unpredictable unit processing cost Control-based method learns the real cost
DEXA 2005 Controller stability after replacing S(k) with S(k-1) Let I d ’(k) be the input signal as a result of using S(k-1) instead of S(k), we have I d ’(k) = p S(k-1) and thus S(k-1) I d (k) = S(k) I d ’(k). In the z -domain, we get I d (k) = z I d ’(k). Plugging above into the CLTF, we have According to control theory, controller is still stable.
DEXA 2005 Ongoing work Performed all three steps in a real DSMS – the Borealis system We set output to average delay System identification gives a first-order model structure Control function Controller analysis gives the following set of parameters:
Your consent to our cookies if you continue to use this website.