Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University

Slides:



Advertisements
Similar presentations
1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer.
Advertisements

A Prediction-based Real-time Scheduling Advisor Peter A. Dinda Carnegie Mellon University.
LOAD BALANCING IN A CENTRALIZED DISTRIBUTED SYSTEM BY ANILA JAGANNATHAM ELENA HARRIS.
An Evaluation of Linear Models for Host Load Prediction Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.
Introduction CSCI 444/544 Operating Systems Fall 2008.
1 Size-Based Scheduling Policies with Inaccurate Scheduling Information Dong Lu *, Huanyuan Sheng +, Peter A. Dinda * * Prescience Lab, Dept. of Computer.
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
Real-Time Scheduling CIS700 Insup Lee October 3, 2005 CIS 700.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
The Case For Prediction-based Best-effort Real-time Peter A. Dinda Bruce Lowekamp Loukas F. Kallivokas David R. O’Hallaron Carnegie Mellon University.
1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.
The Network Weather Service A Distributed Resource Performance Forecasting Service for Metacomputing Rich Wolski, Neil T. Spring and Jim Hayes Presented.
Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.
Host Load Trace Replay Peter A. Dinda Thesis Seminar 11/23/98.
Project 4 U-Pick – A Project of Your Own Design Proposal Due: April 14 th (earlier ok) Project Due: April 25 th.
Computer Science Deadline Fair Scheduling: Bridging the Theory and Practice of Proportionate-Fair Scheduling in Multiprocessor Servers Abhishek Chandra.
Modeling Host Load Peter A. Dinda Thesis Seminar 2/9/98.
26 April A Compositional Framework for Real-Time Guarantees Insik Shin and Insup Lee Real-time Systems Group Systems Design Research Lab Dept. of.
Responsive Interactive Applications by Dynamic Mapping of Activation Trees February 20, 1998 Peter A. Dinda School of Computer.
Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Effects and Implications of File Size/Service Time Correlation on Web Server Scheduling Policies Dong Lu* + Peter Dinda* Yi Qiao* Huanyuan Sheng* *Northwestern.
1 Yi Qiao Jason Skicewicz Peter A. Dinda Prescience Laboratory Department of Computer Science Northwestern University Evanston, IL An Empirical Study.
Looking at the Server-side of P2P Systems Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University.
Resource Signal Prediction and Its Application to Real-time Scheduling Advisors or How to Tame Variability in Distributed Systems Peter A. Dinda Carnegie.
A Network Measurement Architecture for Adaptive Networked Applications Mark Stemm* Randy H. Katz Computer Science Division University of California at.
Recent Results in Resource Signal Measurement, Dissemination, and Prediction App Transport Network Data Link Physical App Transport Network Data Link Physical.
Performance Evaluation
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
A Prediction-based Approach to Distributed Interactive Applications Peter A. Dinda Jason Skicewicz Dong Lu Prescience Lab Department of Computer Science.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
1 Dong Lu, Peter A. Dinda Prescience Laboratory Computer Science Department Northwestern University Virtualized.
Characterizing and Predicting TCP Throughput on the Wide Area Network Dong Lu, Yi Qiao, Peter Dinda, Fabian Bustamante Department of Computer Science Northwestern.
Multi-resolution Resource Behavior Queries Using Wavelets Jason Skicewicz Peter A. Dinda Jennifer M. Schopf Northwestern University.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
A Prediction-based Approach to Distributed Interactive Applications Peter A. Dinda Department of Computer Science Northwestern University
A Prediction-based Real-time Scheduling Advisor Peter A. Dinda Prescience Lab Department of Computer Science Northwestern University
Load Analysis and Prediction for Responsive Interactive Applications Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.
Inferring the Topology and Traffic Load of Parallel Programs in a VM environment Ashish Gupta Peter Dinda Department of Computer Science Northwestern University.
The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications Peter A. Dinda Carnegie Mellon University.
Realistic CPU Workloads Through Host Load Trace Playback Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.
IE 594 : Research Methodology – Discrete Event Simulation David S. Kim Spring 2009.
Traffic modeling and Prediction ----Linear Models
What is Concurrent Programming? Maram Bani Younes.
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
Chapter 3 System Performance and Models. 2 Systems and Models The concept of modeling in the study of the dynamic behavior of simple system is be able.
Hung X. Nguyen and Matthew Roughan The University of Adelaide, Australia SAIL: Statistically Accurate Internet Loss Measurements.
Suzhen Lin, A. Sai Sudhir, G. Manimaran Real-time Computing & Networking Laboratory Department of Electrical and Computer Engineering Iowa State University,
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
1 Grid Scheduling Cécile Germain-Renaud. 2 Scheduling Job –A computation to run on a machine –Possibly with network access e.g. input/output file (coarse.
April 28, 2003 Early Fault Detection and Failure Prediction in Large Software Systems Felix Salfner and Miroslaw Malek Department of Computer Science Humboldt.
Real-Time Support for Mobile Robotics K. Ramamritham (+ Li Huan, Prashant Shenoy, Rod Grupen)
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
1 BBN Technologies Quality Objects (QuO): Adaptive Management and Control Middleware for End-to-End QoS Craig Rodrigues, Joseph P. Loyall, Richard E. Schantz.
Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.
Real-Time Scheduling II: Compositional Scheduling Framework Insik Shin Dept. of Computer Science KAIST.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
OPERATING SYSTEMS CS 3502 Fall 2017
Introduction to Load Balancing:
CPSC 531: System Modeling and Simulation
Partially Predictable
Department of Computer Science University of California, Santa Barbara
Statistical Methods Carey Williamson Department of Computer Science
Department of Computer Science Northwestern University
Partially Predictable
Carey Williamson Department of Computer Science University of Calgary
Size-Based Scheduling Policies with Inaccurate Scheduling Information
Presentation transcript:

Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University

2 Overview Predict running time of task Application supplies task size ( seconds currently) Task is compute-bound (current limit) Prediction is a confidence interval Expresses prediction error Statistically valid decision-making in scheduler Based on host load prediction Homogenous Digital Unix hosts (current limit) »System is portable to many operating systems Everything in talk is publicly available

3 Outline Running time advisor Host load results Computing confidence intervals Performance evaluation Related work Conclusions

4 A Universal Challenge in High Performance Distributed Applications Highly variable resource availability Shared resources No reservations No globally respected priorities Competition from other users - “background workload” Running time can vary drastically Adaptation example goal: soft real-time for interactivity example mechanism: server selection Performance queries

5 Running Time Advisor (RTA) What will be the running time of this 3 second task if started now? nominal time: running time on empty host, task size It will be 5.3 seconds background workload Host App Entirely user-level tool No reservations or admission control Query result is a prediction

6 Variability and Prediction t High Resource Availability Variability Low Prediction Error Variability Characterization of variability Exchange high resource availability variability for low prediction error variability and a characterization of that variability t resource t error ACF t Prediction

7 Running Time Advisor (RTA) With 95% confidence, what will be the running time of this 3 second task if started now? It will be 4.1 to 6.3 seconds background workload Host App CI captures prediction error to the extent the application is interested in it Independent of prediction techniques

8 RTA API

9 Outline Running time advisor Host load results Computing confidence intervals Performance evaluation Related work Conclusions

10 Host Load Traces DEC Unix 5 second exponential average Full bandwidth captured (1 Hz sample rate) Long durations

11 Host Load Properties Self-similarity –long-range dependence Epochal behavior –non-stationarity Complex correlation structure [LCR ’98, Scientific Programming, 3:4, 1999]

12 Host Load Prediction Fully randomized study on traces MEAN, LAST, AR, MA, ARMA, ARIMA, ARFIMA models AR(16) models most appropriate Covariance matrix for prediction errors Low overhead: <1% CPU [HPDC ’99, Cluster Computing, 3:4, 2000]

13 RPS Toolkit Extensible toolkit for implementing resource signal prediction systems Easy “buy-in” for users C++ and sockets (no threads) Prebuilt prediction components Libraries (sensors, time series, communication) Users have bought in Incorporated in CMU Remos, BBN QuO [CMU-CS ]

14 Outline Running time advisor Host load results Computing confidence intervals Performance evaluation Related work Conclusions

15 A Model of the Unix Scheduler Unix Scheduler Task t nom Background workload Task t act Actual Load Nominal running time Actual running time t act = f(t nom, background workload)

16 A Model of the Unix Scheduler Unix Scheduler Task t nom Background workload Task t exp Predicted Load Nominal running time Predicted running time > t exp = g(t nom, ) = t act + Error >

17 Available Time and Average Load Available time from 0 to t Average load from 0 to t t act is minimum t where at(t)=t nom Fluid model, Processor Sharing, Idealized Round-Robin, … Load Signal – replace with prediction of load signal

18 Discrete Time No magic here – this is the obvious discretization  is the sample interval z t+j replaced with prediction

19 Confidence Intervals z t+j replaced with z t+j in prediction, giving al i, at i, at(t) >>>> Confidence interval for at(t) is a CI for al i … >> Since this is a sum, the central limit theorem applies… prediction errors Then a 95% confidence interval is

20 The Variance of the Sum Prediction errors a t+j are not independent Predictor’s covariance matrix captures this Predictor makes it possible to compute this variance and thus the CI Important detail: load discounting

21 Outline Running time advisor Host load results Computing confidence intervals Performance evaluation Related work Conclusions

22 Experimental Setup Environment –Alphastation 255s, Digital Unix 4.0 –Workload: host load trace playback [LCR 2000] –Prediction system on each host AR(16), MEAN, LAST Tasks –Nominal time ~ U(0.1,10) seconds –Interarrival time ~ U(5,15) seconds –95 % confidence level Methodology –Predict CIs –Run task and measure

23 Metrics Coverage Fraction of testcases within confidence interval Ideally should equal the target 95 % Span Average length of confidence interval Ideally as short as possible R 2 between t exp and t act

24 General Picture of Results Five classes of behavior I’ll show you two RTA Works Coverage near 95% in most cases is possible Predictor quality matters Better predictors lead to smaller spans on lightly loaded hosts and to correct coverage on heavily loaded hosts AR(16) >= LAST >= MEAN Performance is slightly dependent on nominal time

25 Most Common Coverage Behavior

26 Most Common Span Behavior

27 Uncommon Coverage Behavior

28 Uncommon Span Behavior

29 Related Work Distributed interactive applications QuakeViz/ Dv, Aeschlimann [PDPTA’99] Quality of service QuO, Zinky, Bakken, Schantz [TPOS, April 97] QRAM, Rajkumar, et al [RTSS’97] Distributed soft real-time systems Lawrence, Jensen [assorted] Workload studies for load balancing Mutka, et al [PerfEval ‘91] Harchol-Balter, et al [SIGMETRICS ‘96] Resource signal measurement systems Remos [HPDC’98] Network Weather Service [HPDC‘97, HPDC’99] Host load prediction Wolski, et al [HPDC’99] (NWS) Samadani, et al [PODC’95] Hailperin [‘93] Application-level scheduling Berman, et al [HPDC’96] Stochastic Scheduling, Schopf [Supercomputing ‘99]

30 Conclusions Predict running time of compute-bound task Based on host load prediction Prediction is a confidence interval Confidence interval algorithm Covariance matrix Load discounting Effective for domain Digital Unix, second tasks, 5-15 second interarrival Extensions in progress

31 For More Information All software and traces are available RPS + RTA + RTSA Load Traces and playback Prescience Lab Peter Dinda, Jason Skicewicz, Dong Lu

32 Outline Running time advisor Host load results Computing confidence intervals Performance evaluation Related work Conclusions

33 A Universal Problem Task ? Which host should the application send the task to so that its running time is appropriate? Known resource requirements What will the running time be if I... Example: Real-time

34 Running Time Advisor Predicted Running Time Task ? Application notifies advisor of task’s computational requirements (nominal time) Advisor predicts running time on each host Application assigns task to most appropriate host nominal time

35 Real-time Scheduling Advisor Predicted Running Time Task deadline nominal time ? deadline Application specifies task’s computational requirements (nominal time) and its deadline Advisor acquires predicted task running times for all hosts Advisor chooses one of the hosts where the deadline can be met

36 Task deadline nominal time ? Confidence Intervals to Characterize Variability Predicted Running Time deadline Application specifies confidence level (e.g., 95%) Running time advisor predicts running times as a confidence interval (CI) Real-time scheduling advisor chooses host where CI is less than deadline CI captures variability to the extent the application is interested in it “3 to 5 seconds with 95% confidence” 95% confidence

37 Prototype System This Paper

38 Load Discounting Motivation I/O priority boost Short tasks less effected by load

39 Load Discounting Apply before using load predictions  discount is estimatable machine property