A Prediction-based Real-time Scheduling Advisor Peter A. Dinda Prescience Lab Department of Computer Science Northwestern University

Slides:



Advertisements
Similar presentations
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Advertisements

Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.
1 Scoped and Approximate Queries in a Relational Grid Information Service Dong Lu, Peter A. Dinda, Jason A. Skicewicz Prescience Lab, Dept. of Computer.
A Prediction-based Real-time Scheduling Advisor Peter A. Dinda Carnegie Mellon University.
LOAD BALANCING IN A CENTRALIZED DISTRIBUTED SYSTEM BY ANILA JAGANNATHAM ELENA HARRIS.
An Evaluation of Linear Models for Host Load Prediction Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.
1 Size-Based Scheduling Policies with Inaccurate Scheduling Information Dong Lu *, Huanyuan Sheng +, Peter A. Dinda * * Prescience Lab, Dept. of Computer.
Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard.
The Case For Prediction-based Best-effort Real-time Peter A. Dinda Bruce Lowekamp Loukas F. Kallivokas David R. O’Hallaron Carnegie Mellon University.
1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.
Host Load Trace Replay Peter A. Dinda Thesis Seminar 11/23/98.
Automatic Run-time Adaptation in Virtual Execution Environments Ananth I. Sundararaj Advisor: Peter A. Dinda Prescience Lab Department of Computer Science.
Increasing Application Performance In Virtual Environments Through Run-time Inference and Adaptation Ananth I. Sundararaj Ashish Gupta Peter A. Dinda Prescience.
Responsive Interactive Applications by Dynamic Mapping of Activation Trees February 20, 1998 Peter A. Dinda School of Computer.
Understanding and Predicting Host Load Peter A. Dinda Carnegie Mellon University
Looking at the Server-side of P2P Systems Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University.
Resource Signal Prediction and Its Application to Real-time Scheduling Advisors or How to Tame Variability in Distributed Systems Peter A. Dinda Carnegie.
A Network Measurement Architecture for Adaptive Networked Applications Mark Stemm* Randy H. Katz Computer Science Division University of California at.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Dynamic Resource Allocation for Shared Data Centers Using Online Measurements.
Recent Results in Resource Signal Measurement, Dissemination, and Prediction App Transport Network Data Link Physical App Transport Network Data Link Physical.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
A Prediction-based Approach to Distributed Interactive Applications Peter A. Dinda Jason Skicewicz Dong Lu Prescience Lab Department of Computer Science.
Online Prediction of the Running Time Of Tasks Peter A. Dinda Department of Computer Science Northwestern University
Exploiting Packet Header Redundancy for Zero Cost Dissemination of Dynamic Resource Information Peter A. Dinda Prescience Lab Department of Computer Science.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
1 Dong Lu, Peter A. Dinda Prescience Laboratory Computer Science Department Northwestern University Virtualized.
Bandwidth Allocation in a Self-Managing Multimedia File Server Vijay Sundaram and Prashant Shenoy Department of Computer Science University of Massachusetts.
A Prediction-based Approach to Distributed Interactive Applications Peter A. Dinda Department of Computer Science Northwestern University
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
Load Analysis and Prediction for Responsive Interactive Applications Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.
1/16/2008CSCI 315 Operating Systems Design1 Introduction Notice: The slides for this lecture have been largely based on those accompanying the textbook.
The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications Peter A. Dinda Carnegie Mellon University.
Realistic CPU Workloads Through Host Load Trace Playback Peter A. Dinda David R. O’Hallaron Carnegie Mellon University.
1 Efficient Management of Data Center Resources for Massively Multiplayer Online Games V. Nae, A. Iosup, S. Podlipnig, R. Prodan, D. Epema, T. Fahringer,
23 September 2004 Evaluating Adaptive Middleware Load Balancing Strategies for Middleware Systems Department of Electrical Engineering & Computer Science.
Computer Science 1 Resource Overbooking and Application Profiling in Shared Hosting Platforms Bhuvan Urgaonkar Prashant Shenoy Timothy Roscoe † UMASS Amherst.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
Thanks to Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction n What is an Operating System? n Mainframe Systems.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 1: Introduction What is an Operating System? Mainframe Systems Desktop Systems.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
EmNet: Satisfying The Individual User Through Empathic Home Networks J. Scott Miller, John R. Lange & Peter A. Dinda Department of Electrical Engineering.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Computer Science 1 Resource Overbooking and Application Profiling in Shared Hosting Platforms Bhuvan Urgaonkar Prashant Shenoy Timothy Roscoe † UMASS Amherst.
Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
VSched: Mixing Batch And Interactive Virtual Machines Using Periodic Real-time Scheduling Bin Lin Peter A. Dinda Prescience Lab Department of Electrical.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
1 CS.217 Operating System By Ajarn..Sutapart Sappajak,METC,MSIT Chapter 5 CPU Scheduling Slide 1 Chapter 5 CPU Scheduling.
© 2002 IBM Corporation IBM Research 1 Policy Transformation Techniques in Policy- based System Management Mandis Beigi, Seraphin Calo and Dinesh Verma.
Lecture 5: 9/10/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.
Holding slide prior to starting show. Scheduling Parametric Jobs on the Grid Jonathan Giddy
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
Fermilab Scientific Computing Division Fermi National Accelerator Laboratory, Batavia, Illinois, USA. Off-the-Shelf Hardware and Software DAQ Performance.
Resource Provision for Batch and Interactive Workloads in Data Centers Ting-Wei Chang, Pangfeng Liu Department of Computer Science and Information Engineering,
OPERATING SYSTEMS CS 3502 Fall 2017
Introduction to Load Balancing:
Wayne Wolf Dept. of EE Princeton University
Drum: A Rhythmic Approach to Interactive Analytics on Large Data
Transparent Adaptive Resource Management for Middleware Systems
Ananth I. Sundararaj Ashish Gupta Peter A. Dinda Prescience Lab
Size-Based Scheduling Policies with Inaccurate Scheduling Information
Presentation transcript:

A Prediction-based Real-time Scheduling Advisor Peter A. Dinda Prescience Lab Department of Computer Science Northwestern University

2 RTSA Real-time Scheduling Advisor “I have a 5 second job. I want it to finish in under 10 seconds with at least 95% probability. Here is a list of hosts where I can run it. Which one should I use?” “Use host 3. It’ll finish there in 7 to 9 secs” “There is no host where that is possible. The fastest is host 5, where it’ll finish in 12 to 15 seconds.”

3 Core Results RTSA based on predictive signal processing Layered system architecture for scalable performance prediction Targets commodity shared, unreserved distributed environments All at user level Randomized trace-based evaluation giving evidence of its effectiveness Limitations Compute-bound tasks Evaluation on Digital Unix platform Publicly available as part of RPS system

4 Outline Motivation: interactive applications Interface Implementation Performance evaluation Conclusions and future work

5 Interactive Applications on Shared, Unreserved Distributed Computing Environments Examples: visualization, games, vr Responsiveness requirements => soft deadlines No resource reservation or admission control Constant competition from other users Changing resource availability => adaptation Adaptation is via server selection Other mechanisms possible

6 Interactive Applications and the RTSA RTSA controls adaptation mechanisms Operates on behalf of single application Multiple RTSAs may be running independently Current limitation: Compute-bound tasks

7 Interface - Request struct RTSARequest { doubletnom; double sf; double conf; Host hosts[]; }; Size of task in CPU-seconds Maximum slack allowed Minimum probability allowed List of hosts to choose from “I have a 5 second job. I want it to finish in under 10 seconds with at least 95% probability. Here is a list of hosts where I can run it. Which one should I use?” int RTSAAdviseTask(RTSARequest &req, RTSAResponse &resp); deadline = now + tnom(1+sf)

8 Interface - Response struct RTSAResponse { doubletnom; double sf; double conf; Host host; RunningTimePredictionResponse runningtime; }; Size of task in CPU-seconds Maximum slack allowed Minimum probability allowed Host to use int RTSAAdviseTask(RTSARequest &req, RTSAResponse &resp); Predicted running time of task on that host “Use host 3. It’ll finish there in 7 to 9 secs”

9 RunningTimePredictionResponse struct RunningTimePredictionResponse { Host host; double tnom; double conf; double texp; double tlb; double tub; }; Size of task in CPU-seconds Confidence level Point estimate of running time Host to use Confidence interval of running time “The most likely running time is 7.5 seconds. There is a 95% chance that the actual running time will be in the range 7 to 9 seconds.”

10 Implementation

11 Underlying Components Host load measurement Digital Unix 5 second load average, 1 Hz [LCR98, SciProg99] Host load prediction Periodic linear time series analysis (continuously monitored AR(16) predictors) <1% of CPU [HPDC99, Cluster00] Running time advisor (RTA) Task size + host load predictions => confidence interval for running time of task [SIGMETRICS01,HPDC01,Cluster02]

12 RTSA Implementation Simplified Predicted Running Time Task ? RTA predicts running time on each host tnom texp

13 RTSA Implementation Simplified Predicted Running Time Task deadline=(1+sf)tnom tnom ? deadline RTSA picks randomly from among the hosts where the deadline can be met If there is no such host, RTSA returns the host with the lowest running time RTSA also returns the estimate of the running time

14 Prediction Error Predictions are not perfect Some machines harder to predict than others Need more than a point estimate (texp) Predictors can estimate their quality Covariance matrix for prediction errors Estimate of predictor error also continually monitored for accuracy Confidence interval captures this Deadline probability serves as confidence level

15 Task deadline=(1+sf)tnom tnom ? RTSA Implementation Predicted Running Time deadline conf=95% RTSA picks randomly from among the hosts where the deadline can be met even given the maximum running time captured in the confidence interval If there is no such host, RTSA returns the host with the lowest running time RTSA also returns the estimate of the running time tub tlb

16 Experimental Setup Environment –Alphastation 255s, Digital Unix 4.0 –Private network –Separate machine for submitting tasks –Prediction system on each host BG workload: host load trace playback [lcr00] –Traces from PSC Alpha cluster, wide range of CMU machines –Reconstruct any combination of these machines (scenario) Testcase: submit synthetic task to system, run on host that RTSA selects, measure result

17 Scenarios NameNumhostsAverage Load Average Epoch 4LS4HighSmall 4SL4LowLarge 4MM4Mixed 5SS5LowSmall 4MS4MixedSmall 4SM4LowMixed 2CS2 large memory compute servers 2MP2 very predictable hosts

18 The Metrics Fraction of deadlines met Probability of meeting deadline Fraction of deadlines met when predicted Probability of meeting deadline if RTSA claims it is possible Number of possible hosts Degree of randomness in RTSA’s decision High randomness means different RTSAs are unlikely to conflict

19 Testcases Synthetic compute-bound tasks Size: 0.1 to 10 seconds, uniform Interarrival: 5 to 15 seconds, uniform sf: 0 to 2, uniform conf: 0.95 in all cases 8,000 to 16,000 testcases for each scenario How do metrics vary with scenario, size, sf?

20 The RTSA Implementations AR(16) –RTSA as described here –Instantiated with the AR(16) load predictor MEASURE –Send task to host with lowest load –Does not return predicted running time –High probability of conflicts RANDOM –Send task to a random host –Does not return predicted running time –Low probability of conflicts

21 Fraction of Deadlines Met – 4LS Performance gain from prediction

22 Fraction of Deadlines Met – 4LS Performance gain from prediction

23 Fraction of Deadlines Met – 4LS Highest performance gain from prediction near “critical slack”

24 Fraction of Deadlines Met When Predicted – 4LS Only predictive strategy can indicate whether meeting deadline is possible

25 Fraction of Deadlines Met When Predicted – 4LS Only predictive strategy can indicate whether meeting deadline is possible

26 Fraction of Deadlines Met When Predicted – 4LS Operating near critical slack is most challenging

27 Number of Possible Hosts – 4LS Predictive strategy introduces “appropriate randomness”

28 Number of Possible Hosts – 4LS Predictive strategy introduces “appropriate randomness”

29 Number of Possible Hosts – 4LS Operation near “critical slack” is most challenging

30 Conclusions and Future Work Introduced RTSA concept Described prediction-based implementation Demonstrated feasibility Evaluated performance Current and future work –Incorporate communication, memory, disk –Improved predictive models

31 For More Information Peter Dinda – RPS – Prescience Lab –