Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University
2 Talk in a Nutshell Load is self-similar Load exhibits epochal behavior Load prediction benefits from capturing self-similarity Statistical analysis of two sets of week long, 1 Hz resolution traces of load on ~40 machines and evaluation of linear time series models for load prediction
3 Why Study Load? Load partially determines execution time We want to model and predict load [t min,t max ] ? Interactive Application Short tasks with deadlines Unmodified Distributed System
4 Load and Execution Time
5 Outline Measurement methodology Load traces Load variance New Results –Self-similarity –Epochal behavior Benefits of capturing self similarity in linear models Conclusions
6 Measurement Methodology Ready Queue RUN len t len t-T len t-2T len t-29T... len t-30T... Exponential Average (1 minute Load “Average”) avg t avg t-0.5T avg t-T... Our Measurements (1 Hz sample rate) Digital Unix KernelUser Level Measurement Tool T=2 seconds
7 Load Traces
8 Absolute Variation
9 Relative Variation
10 Load Autocorrelation Periodogram Time Lag Frequency
11 Visual Self-Similarity Here
12 The Hurst Parameter
13 Self-similarity Statistics
14 Why is Self-Similarity Important? Complex structure –Not completely random, nor independent –Short range dependence Excellent for history-based prediction –Long range dependence Possibly a problem Modeling Implications –Suggests models that can capture ARFIMA, FGN, TAR
15 Load Exhibits Epochal Behavior
16 Epoch Length Statistics
17 Why is Epochal Behavior Important? Complex structure –Non-stationary Modeling Implications –Suggests models ARIMA, ARFIMA, etc. Non-parametric spectral methods –Suggests problem decomposition
18 Linear Time Series Models Choose weights j to minimize a 2 a is the confidence interval for t+1 predictions Unpredictable Random Sequence Fixed Linear Filter Partially Predictable Load Sequence
19 Realizable Pole-Zero Models ARFIMA(p,d,q) ARIMA(p,d,q) ARMA(p,q) AR(p)MA(q) Self Similarity, d related to Hurst Non-stationarity, d integer p,q are numbers of parameters d is degree of differencing
20 Real World Benefits of Models a is the confidence interval for t+1 predictions Map work that would take 100 ms at zero load axp0: z =0.54, =1.0, a(ARMA(4,4)) = a(ARFIMA(4,d,4)) = no model: 1.0 +/ (95%) => 100 to 306 ms ARMA:1.0 +/ (95%) => 178 to 222 ms ARFIMA:1.0 +/ (95%) => 179 to 221 ms axp7: z =0.14, =0.12, a(ARMA(4,4)) = a(ARFIMA(4,d,4)) = no model:0.12 +/ (95%) =>100 to 139 ms ARMA:0.12 +/ (95%) =>104 to 120 ms ARFIMA:0.12 +/ (95%)=>107 to 117 ms 1 % 40 %
21 t+1 prediction
22 t+8 prediction
23 Conclusions Load has high variance Load is self-similar Load exhibits epochal behavior Capturing self-similarity in linear time series models improves predictability
24 Load Traces Would a web-accessible load trace database be useful? Would you like to contribute?