Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University

Slides:

Advertisements

Similar presentations

FINANCIAL TIME-SERIES ECONOMETRICS SUN LIJIAN Feb 23,2001.

Advertisements

SMA 6304 / MIT / MIT Manufacturing Systems Lecture 11: Forecasting Lecturer: Prof. Duane S. Boning Copyright 2003 © Duane S. Boning. 1.

Chapter 9. Time Series From Business Intelligence Book by Vercellis Lei Chen, for COMP

A Prediction-based Real-time Scheduling Advisor Peter A. Dinda Carnegie Mellon University.

An Evaluation of Linear Models for Host Load Prediction Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.

Part II – TIME SERIES ANALYSIS C5 ARIMA (Box-Jenkins) Models

Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 7: Box-Jenkins Models – Part II (Ch. 9) Material.

Time Series Building 1. Model Identification

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.

How should these data be modelled?. Identification step: Look at the SAC and SPAC Looks like an AR(1)- process. (Spikes are clearly decreasing in SAC.

The Case For Prediction-based Best-effort Real-time Peter A. Dinda Bruce Lowekamp Loukas F. Kallivokas David R. O’Hallaron Carnegie Mellon University.

Moving Averages Ft(1) is average of last m observations

CMPT 855Module Network Traffic Self-Similarity Carey Williamson Department of Computer Science University of Saskatchewan.

Host Load Trace Replay Peter A. Dinda Thesis Seminar 11/23/98.

On the Self-Similar Nature of Ethernet Traffic - Leland, et. Al Presented by Sumitra Ganesh.

Analyzing stochastic time series Tutorial

Modeling Host Load Peter A. Dinda Thesis Seminar 2/9/98.

On the Constancy of Internet Path Properties Yin Zhang, Nick Duffield AT&T Labs Vern Paxson, Scott Shenker ACIRI Internet Measurement Workshop 2001 Presented.

Responsive Interactive Applications by Dynamic Mapping of Activation Trees February 20, 1998 Peter A. Dinda School of Computer.

Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.

1 Yi Qiao Jason Skicewicz Peter A. Dinda Prescience Laboratory Department of Computer Science Northwestern University Evanston, IL An Empirical Study.

Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.

Resource Signal Prediction and Its Application to Real-time Scheduling Advisors or How to Tame Variability in Distributed Systems Peter A. Dinda Carnegie.

A Nonstationary Poisson View of Internet Traffic T. Karagiannis, M. Molle, M. Faloutsos University of California, Riverside A. Broido University of California,

ARIMA Forecasting Lecture 7 and 8 - March 14-16, 2011

Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University

Multi-resolution Resource Behavior Queries Using Wavelets Jason Skicewicz Peter A. Dinda Jennifer M. Schopf Northwestern University.

A Prediction-based Approach to Distributed Interactive Applications Peter A. Dinda Department of Computer Science Northwestern University

A Prediction-based Real-time Scheduling Advisor Peter A. Dinda Prescience Lab Department of Computer Science Northwestern University

Load Analysis and Prediction for Responsive Interactive Applications Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.

The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications Peter A. Dinda Carnegie Mellon University.

Realistic CPU Workloads Through Host Load Trace Playback Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.

R. Werner Solar Terrestrial Influences Institute - BAS Time Series Analysis by descriptive statistic.

BOX JENKINS METHODOLOGY

Traffic modeling and Prediction ----Linear Models

1 Chapters 9 Self-SimilarTraffic. Chapter 9 – Self-Similar Traffic 2 Introduction- Motivation Validity of the queuing models we have studied depends on.

AR- MA- och ARMA-.

Simulation Output Analysis

1 FARIMA(p,d,q) Model and Application n FARIMA Models -- fractional autoregressive integrated moving average n Generating FARIMA Processes n Traffic Modeling.

Intro. ANN & Fuzzy Systems Lecture 26 Modeling (1): Time Series Prediction.

It’s About Time Mark Otto U. S. Fish and Wildlife Service.

K. Ensor, STAT Spring 2005 Long memory or long range dependence ARMA models are characterized by an exponential decay in the autocorrelation structure.

1 Self Similar Traffic. 2 Self Similarity The idea is that something looks the same when viewed from different degrees of “magnification” or different.

Analysis of day-ahead electricity data Zita Marossy & Márk Szenes (ColBud) MANMADE workshop January 21, 2008.

Lecture V Probability theory. Lecture questions Classical definition of probability Frequency probability Discrete variable and probability distribution.

K. Ensor, STAT Spring 2004 Memory characterization of a process How would the ACF behave for a process with no memory? What is a short memory series?

Experiments on Noise CharacterizationRoma, March 10,1999Andrea Viceré Experiments on Noise Analysis l Need of noise characterization for  Monitoring the.

1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.

Time Series Basics Fin250f: Lecture 8.1 Spring 2010 Reading: Brooks, chapter

Lecture#10 Spectrum Estimation

Auto Regressive, Integrated, Moving Average Box-Jenkins models A stationary times series can be modelled on basis of the serial correlations in it. A non-stationary.

1 BABS 502 Moving Averages, Decomposition and Exponential Smoothing Revised March 14, 2010.

Review and Summary Box-Jenkins models Stationary Time series AR(p), MA(q), ARMA(p,q)

The Box-Jenkins (ARIMA) Methodology

MODELS FOR NONSTATIONARY TIME SERIES By Eni Sumarminingsih, SSi, MM.

Geology 6600/7600 Signal Analysis 05 Oct 2015 © A.R. Lowry 2015 Last time: Assignment for Oct 23: GPS time series correlation Given a discrete function.

Ch16: Time Series 24 Nov 2011 BUSI275 Dr. Sean Ho HW8 due tonight Please download: 22-TheFed.xls 22-TheFed.xls.

Introduction to stochastic processes

Partially Predictable

CHAPTER 16 ECONOMIC FORECASTING Damodar Gujarati

Regression with Autocorrelated Errors

BA 275 Quantitative Business Methods

Forecasting with non-stationary data series

Self-similar Distributions

Partially Predictable

Mark E. Crovella and Azer Bestavros Computer Science Dept,

Presented by Chun Zhang 2/14/2003

CH2 Time series.

CPSC 641: Network Traffic Self-Similarity

BOX JENKINS (ARIMA) METHODOLOGY

Presentation transcript:

Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University

2 Talk in a Nutshell Load is self-similar Load exhibits epochal behavior Load prediction benefits from capturing self-similarity Statistical analysis of two sets of week long, 1 Hz resolution traces of load on ~40 machines and evaluation of linear time series models for load prediction

3 Why Study Load? Load partially determines execution time We want to model and predict load [t min,t max ] ? Interactive Application Short tasks with deadlines Unmodified Distributed System

4 Load and Execution Time

5 Outline Measurement methodology Load traces Load variance New Results –Self-similarity –Epochal behavior Benefits of capturing self similarity in linear models Conclusions

6 Measurement Methodology Ready Queue RUN len t len t-T len t-2T len t-29T... len t-30T... Exponential Average (1 minute Load “Average”) avg t avg t-0.5T avg t-T... Our Measurements (1 Hz sample rate) Digital Unix KernelUser Level Measurement Tool T=2 seconds

7 Load Traces

8 Absolute Variation

9 Relative Variation

10 Load Autocorrelation Periodogram Time Lag Frequency

11 Visual Self-Similarity Here

12 The Hurst Parameter

13 Self-similarity Statistics

14 Why is Self-Similarity Important? Complex structure –Not completely random, nor independent –Short range dependence Excellent for history-based prediction –Long range dependence Possibly a problem Modeling Implications –Suggests models that can capture ARFIMA, FGN, TAR

15 Load Exhibits Epochal Behavior

16 Epoch Length Statistics

17 Why is Epochal Behavior Important? Complex structure –Non-stationary Modeling Implications –Suggests models ARIMA, ARFIMA, etc. Non-parametric spectral methods –Suggests problem decomposition

18 Linear Time Series Models Choose weights  j to minimize  a 2  a is the confidence interval for t+1 predictions Unpredictable Random Sequence Fixed Linear Filter Partially Predictable Load Sequence

19 Realizable Pole-Zero Models ARFIMA(p,d,q) ARIMA(p,d,q) ARMA(p,q) AR(p)MA(q) Self Similarity, d related to Hurst Non-stationarity, d integer p,q are numbers of parameters d is degree of differencing

20 Real World Benefits of Models  a is the confidence interval for t+1 predictions Map work that would take 100 ms at zero load axp0:  z =0.54,  =1.0,  a(ARMA(4,4)) =  a(ARFIMA(4,d,4)) = no model: 1.0 +/ (95%) => 100 to 306 ms ARMA:1.0 +/ (95%) => 178 to 222 ms ARFIMA:1.0 +/ (95%) => 179 to 221 ms axp7:  z =0.14,  =0.12,  a(ARMA(4,4)) =  a(ARFIMA(4,d,4)) = no model:0.12 +/ (95%) =>100 to 139 ms ARMA:0.12 +/ (95%) =>104 to 120 ms ARFIMA:0.12 +/ (95%)=>107 to 117 ms 1 % 40 %

21 t+1 prediction

22 t+8 prediction

23 Conclusions Load has high variance Load is self-similar Load exhibits epochal behavior Capturing self-similarity in linear time series models improves predictability

24 Load Traces Would a web-accessible load trace database be useful? Would you like to contribute?