Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting Queue Waiting Time in Batch Controlled Systems Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University.

Similar presentations


Presentation on theme: "Predicting Queue Waiting Time in Batch Controlled Systems Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University."— Presentation transcript:

1 Predicting Queue Waiting Time in Batch Controlled Systems Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University of California, Santa Barbara

2 Problem: Predicting Delay in Batch Queues Time in queue is experienced as application delay Sounds like an easy problem, but —Distribution of load from users is a matter of some debate —Scheduling policy is partially hidden —Sites need to change the policies dynamically and without warning —Job execution times are difficult to predict Much research in this area over the past 20 years, but few solutions —Current commercial systems provide high variance estimates —Most sites simply disable this feature

3 Hard Problem

4 For Scheduling: It’s all about the big Q Predictions of the form —“What is the maximum time my job will wait with X% certainty?” —“What is the minimum time my job will wait with X% certainty?” Requires two estimates if certainty is to be quantified —Estimate the (1-X) quantile for the distribution of availability => Q x —Estimate the upper or lower X% confidence bound on the statistic Q x => Q (x,lb) If the estimates are unbiased, and the distribution is stationary, future availability duration will be larger than Q (x,lb) X% of the time, guaranteed

5 New Predictive Methodology New quantile estimator invention based on Binomial distribution Requires carefully engineered numerical system to deal with large- scale combinatorics New changepoint detector Binomial method in a time series context is difficult Need a system to determining Stationary regions in the data Minimum statistically meaningful history in each region New clustering methodology (coming soon) More accurate estimates are possible if predictions are made from jobs with similar characteristics Takes dynamic policy changes into account more effectively

6 Ten Years of Supercompuuting

7 See it In Action http://pompone.cs.ucsb.edu/~rgarver/bqindex.php

8 Predicting Things Upside Down Deadline scheduling: My job needs to start in the next X seconds for the results to be meaningful. —Amitava Mujumdar, Tharaka Devaditha, Adam Birnbaum (SDSC) –Need to run a 4 minute image reconstruction that completes in the next 8 minutes Given a —Machine —Queue —Processor count —Run time —Deadline What is the probability that a job will meet the deadline? http://pompone.cs.ucsb.edu/~rgarver/invbqueue.php

9 How Well Does it Work with an Application? Preliminary 3D Model Preliminary 3D model Particles Electron Micrograph Refine Final 3D model EMAN has been developed at Baylor College of Medicine by Research group of Wah Chiu and Steven Ludtke {wah,sludtke}@bcm.tmc.edu EMAN

10 VGrADS EMAN Batch Scheduler EMAN emulator —Run the EMAN scheduler to determine a job launch sequence —Launch the jobs by submitting them to the queues specified by the scheduler —When an EMAN job acquires the processors, exit and “sleep” the emulator for the predicted execution time –Saves system allocation time —Record the overall makespan Experiment: —Chicago TeraGrid, SDSC TeraGrid, NCSA TeraGrid and CNSI Dell at UCSB —57 separate runs Results: mean observed and mean predicted makespans are not significantly different at alpha = 0.05

11 95% Upper Bound on Median

12 Clustering RMS ratio of Binomial with Clustering to without —Both achieve 95% correctness —Measures “tightness” improvement through clustering

13 Batch Queue Prediction for Grid Systems A good point-valued prediction remains elusive Grid users certainly can use bounds instead —Early job completion is okay, typically —Bounds give a good intuitive feel for which queue will be quickest Automatic schedulers are coming —EMAN doesn’t use ranges…it should —VGrADS is developing new schedulers (workflow) —NEESGrid and ISI are in development (workflow) —Large-scale sensor network simulation

14 What’s Next? Open questions: —Does the availability of predictions affect load? –Rolling out production tools now and we will be monitoring –Job cancellation does not affect results —If it does, will allocations be stable? –Grid economies Virtual resource reservations (VGrADS) —Conditional prediction and resubmission —Virtual Cluster?? Thanks —NSF SCI, VGrADS, SDSC, TACC Us: rich@cs.ucsb.edu, nurmi@cs.ucsb.edurich@cs.ucsb.edu


Download ppt "Predicting Queue Waiting Time in Batch Controlled Systems Rich Wolski, Dan Nurmi, John Brevik, Graziano Obertelli Computer Science Department University."

Similar presentations


Ads by Google