Download presentation

Presentation is loading. Please wait.

Published byRosemary Croson Modified about 1 year ago

1
Tuan V. Dinh, Lachlan Andrew and Yoni Nazarathy Modelling a supercomputer with the model Australia and New Zealand Applied Probability Workshop

2
10 July 2013 Slide 2 Australia and New Zealand Applied Probability Workshop Supercomputer clusters large scale simulation: climate, genome, astronomy, etc. foundation of cloud computing BIG DATA EXASCALE COMPUTING MORE COMPUTING POWER DESIRED Electricity bills Heat – thermal management Investment – cooling systems, hardware, etc.

3
10 July 2013 Slide 3 Australia and New Zealand Applied Probability Workshop Power proportionality Load Power ideal reality 60% peak single server (1) ( 1) Bassoro, “The case for energy proportional”, idle server ~ 60% peak power turn off idle servers challenges: switching cost (setup, wear-and-tear), performance impacts ? Swinburne Supercomputer

4
10 July 2013 Slide 4 Australia and New Zealand Applied Probability Workshop An energy saving framework CONTROL FRAMEWORK system congestion model number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching ++ Objective:

5
10 July 2013 Slide 5 Australia and New Zealand Applied Probability Workshop Congestion model CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching + + Objective:

6
10 July 2013 Slide 6 Australia and New Zealand Applied Probability Workshop Congestion model … batch Poisson, rate function batch size distribution with c.d.f i.i.d service time WHY ? jobs arrive in “batch” manner, i.e within seconds, from same user system mostly under-utilized, using infinite server approximation substantial daily variations

7
10 July 2013 Slide 7 Australia and New Zealand Applied Probability Workshop Discrete-time cost time T+tt : current running jobs t +k {jobs arriving in (t,t+k], still around at t+k} {jobs arriving before t, still around at t+k} C(k) = n(k) + |n(k) – n(k-1)| + C 1 (k):energyC 3 (k):performance penaltyC 2 (k):switching

8
10 July 2013 Slide 8 Australia and New Zealand Applied Probability Workshop Optimization formulation C(k) = n(k) + |n(k) – n(k-1)|+ C 1 (k):energyC 3 (k):performance penaltyC 2 (k):switching (*) solving (*): load estimation in far future. the system can feedback the ACTUAL load U(s) for s < k

9
10 July 2013 Slide 9 Australia and New Zealand Applied Probability Workshop A Model Predictive Control framework CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching ++ Objective: MPC

10
10 July 2013 Slide 10 Australia and New Zealand Applied Probability Workshop Model Predictive Control execution time T+t t T Solve (**), obtain {n*(0), n*(1),…}.ONLY “execute” n*(0). t +1 T T+t+1 Solve (**), obtain {n*(0), n*(1),…}.ONLY “execute” n*(0). (**) Limited look-ahead 1.less sensitive to load estimation accuracy 2.Use “on-going” information know how many jobs actually arrived in (t,t+1]

11
10 July 2013 Slide 11 Australia and New Zealand Applied Probability Workshop Solving the optimization problem { n(k) + |u(k)| } (***) s.t:, k =0,1…,K-1 Normal approximation C(k) = n(k) + |n(k) – n(k-1)|+ C 1 (k):energyC 3 (k):performance penaltyC 2 (k):switching k =0,1…,K-1 solved numerically using LP

12
10 July 2013 Slide 12 Australia and New Zealand Applied Probability Workshop X(k): new arrivals [Carrillo,89]: is a compound Poisson RV, with batch rate:, where s = (k+1/2)Δ; Δ: slot-time. even if the arrival process is NOT Poisson, [Whitt,99]. {jobs arriving in (t,t+k], still around at t+k} N ~ Poisson( ) b i : i.i.d batch size, mean and variance

13
10 July 2013 Slide 13 Australia and New Zealand Applied Probability Workshop U(k): existing jobs [Carrillo,91]: is a binomial RV, with parameters: and, where s = (k+1/2)Δ; Δ: slot-time. Hence: {jobs arriving before t, still around at t+k} one can use job elapsed runtimes to calculate [Whitt,99]

14
10 July 2013 Slide 14 Australia and New Zealand Applied Probability Workshop Summary of analytical framework CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? Objective: MPC LP optimization Normal approximation min ( ) energy performance penalty switching ++

15
10 July 2013 Slide 15 Australia and New Zealand Applied Probability Workshop Numerical evaluation supercomputer simulator CONTROLLER system states control decision Swinburne supercomputer logs cost performance

16
10 July 2013 Slide 16 Australia and New Zealand Applied Probability Workshop Scheme 1: All up (no turn off) supercomputer simulator system states control decision cost performance NO CONTROL Swinburne supercomputer logs

17
10 July 2013 Slide 17 Australia and New Zealand Applied Probability Workshop Scheme 2: t wait heuristic supercomputer simulator system states control decision cost performance t wait heuristic Server idle for t wait => turn OFF Swinburne supercomputer logs

18
10 July 2013 Slide 18 Australia and New Zealand Applied Probability Workshop Scheme 3: predictive control supercomputer simulator system states control decision cost performance MPC estimated from historical data Swinburne supercomputer logs

19
10 July 2013 Slide 19 Australia and New Zealand Applied Probability Workshop S.3: rate function time of day rate arrivals use daily periodic rates

20
10 July 2013 Slide 20 Australia and New Zealand Applied Probability Workshop S.3: service time & batch size [Lublin et al.,2003]: Hyper-Gamma, Log-uniform [Li et al.,2005]: Log Normal, Weibull Empirical (2010) Gamma time(sec) c.d.f size(CPU) c.d.f Our approximations only concern MEAN and VARIANCE of X X: batch size G: service time (2010)

21
10 July 2013 Slide 21 Australia and New Zealand Applied Probability Workshop S.3: cost performance ε ~ service availability normalised cost Cost 1 = total cost when there is NO CONTROL (energy only) Simulation period: 1 year

22
10 July 2013 Slide 22 Australia and New Zealand Applied Probability Workshop Cost performance: all schemes “offline” optimal cost [Lu et al., 12]. No perf. penalty S.1S.2S.3, ε = 0.58 consider predictive settings (S.3) whose demand penalty cost is the same as t wait heuristic (S.2) after all, model is to estimate θ(k)s. still > 20% to gain

23
10 July 2013 Slide 23 Australia and New Zealand Applied Probability Workshop Remarks and considerations 1. Room for improvement: ~20% to gain! 2.Examining our estimations ? rate function not accurate Use job elapsed times Normal approximation ? 3. Fundamental bound on what to achieve given uncertainty ? [Dinh,Andrew and Branch,CCgrid13]

24
10 July 2013 Slide 24 Australia and New Zealand Applied Probability Workshop Thank you CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? Objective: MPC LP optimization Normal approximation min ( ) energy performance penalty switching ++

25
10 July 2013 Slide 25 Australia and New Zealand Applied Probability Workshop The objective cost CONTROL FRAMEWORK number of active servers needed ? historical implications ? ongoing system states ? arrival characteristics ? job elapsed times ? min ( ) energy performance penalty switching ++ Objective:

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google