Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Mor Harchol-Balter Computer Science Dept, CMU What Analytical Performance Modeling Teaches Us About Computer Systems Design.

Similar presentations


Presentation on theme: "1 Mor Harchol-Balter Computer Science Dept, CMU What Analytical Performance Modeling Teaches Us About Computer Systems Design."— Presentation transcript:

1 1 Mor Harchol-Balter Computer Science Dept, CMU What Analytical Performance Modeling Teaches Us About Computer Systems Design

2 2 Defn: Queueing Theory: The study of queues, congestion, resource management, stochastic (probabilistic) modeling Our Goal: Computer Systems Design CMU CS Dept 15-359 15-857

3 3 TERMINOLOGY WARMUP Incoming jobs  : Avg. rate jobs arrive (jobs/sec)  : Avg. rate jobs served (server speed) Response Time: E[T] Examples of single-server queues: Router Supercomputing center A database lock queue Web server 

4 4 Q1: TERMINOLOGY WARMUP Impact: Be careful not to overprovision! Incoming jobs  : Avg. rate jobs arrive (jobs/sec)  : Avg. rate jobs served (server speed) QUESTION 1: Suppose   2. We want to keep E[T] unchanged. Should we: (a) Double service rate (b) More than double service rate (c) Less than double service rate Response Time: E[T]

5 5 Workload Distribution Warmup Heavy tail: top 1% jobs comprise half load Huge Variability CPU Lifetimes of UNIX jobs [Harchol,Downey96] Supercomputing job sizes [Harchol-Balter,Schroeder00] Web file sizes [Crovella,Bestavros98],[Barford,Crovella98] Internet Node Degree [Faloutsos,Faloutsos,Faloutsos99] IP Flow durations [Rexford99] Self-similar arrival processes [Willinger93] many, many more... Heavy Tails are everywhere in CS: 1 ½ ¼ Exponential workload 1 ½ ¼ Heavy-tailed workload

6 6 QUESTION 2: Under exponentially-distributed job demands, which scheduling policy wins for E[T]? FCFS Q2: Exponential Distribution 1 ½ ¼ 1 ½ ¼ Huge Variability Exponential workload Heavy-tailed workload PS = FCFS E[T] load  PS

7 7 Q3: Heavy-tailed workload 1 ½ ¼ 1 ½ ¼ Huge Variability Exponential workload Heavy-tailed workload Impact: Know your workload  scheduling QUESTION 3: Under heavy-tailed job demands, which scheduling policy wins for E[T]? PS E[T] load  FCFS PS

8 8 Q4: Scheduling to minimize E[T] QUESTION 4: Under heavy-tailed job demands, in M/G/1, order these scheduling policies for E[T]: FCFS PS 1 ½ ¼ Huge Variability Heavy-tailed workload SRPT SJF RANDOM Low E[T] High E[T]

9 9 Scheduling to minimize E[T] Answer: Under heavy-tailed job demands, in M/G/1: 1 ½ ¼ Huge Variability Heavy-tailed workload Low E[T] High E[T] RANDOM FCFS SJF PS SRPT = < < < No “Starvation!” Even the biggest jobs prefer SRPT to PS: [Bansal, Harchol-Balter 01], [Wierman, Harchol-Balter 03]: THM: E[T(x)] SRPT < E[T(x)] PS for all x, when load < ½.

10 10 single-server questions multi-server questions

11 11 Growing trend towards server farms … Server farms: + cheap + easy to scale Dispatch

12 12 Growing trend towards server farms … FCFS Router Supercomputing/ManufacturingWeb server farm PS Router  Jobs non-preemptible  Run-to-completion  Served in FCFS order  Often variable job size  HTTP requests fully-preempt  Commodity PS servers  Highly-variable job size  Examples: Cisco Local Director IBM Network Dispatcher Microsoft SharePoint, etc.

13 13 Q5: 1 Fast versus Many Slow? QUESTION 5: Which has lower E[T]? (for heavy-tailed workload) FCFS Smart Dispatch  ½  ½ FCFS  vs Under RANDOM Dispatch … FCFS  FCFS ½ ½½ vs

14 14 Q5: 1 Fast versus Many Slow? QUESTION 5: Which has lower E[T]? (for heavy-tailed workload) FCFS Smart Dispatch  ½  ½ FCFS  vs Multiple servers way better under variable workload Under Least-Work-Left Dispatch [Wierman, Osogami, Harchol-Balter, Scheller-Wolf, Perf. Eval. 06] load   1 2 Variability  OPT # servers 3 x

15 15 Q5: 1 Fast versus Many Slow? QUESTION 5: Which has lower E[T]? (for heavy-tailed workload) FCFS Smart Dispatch  ½  ½ FCFS  vs Under Size-based Dispatch … Size-based Dispatch FCFS small jobs big jobs [Harchol-Balter, Crovella, Murta, Jour.Par.Dist.Comp.99] Multiple servers way better under high variability workload

16 16 Q5: 1 Fast versus Many Slow? QUESTION 5: Which has lower E[T]? (for heavy-tailed workload) FCFS Smart Dispatch  ½  ½ FCFS  vs Under Size-based Dispatch … Unknown Size Dispatch FCFS small jobs big jobs [Harchol-Balter, Journ. ACM 02] Multiple servers way better under variable workload

17 17 Q5: 1 Fast versus Many Slow? QUESTION 5: Which has lower E[T]? (for heavy-tailed workload) FCFS Smart Dispatch  ½  ½ FCFS  vs Impact: Best architecture can be cheaper …

18 18 FCFS Router FCFS Poisson Process Heavy-tailed, highly variable jobs A: Supercomputing/Manufacturing Random Equal probability Router PS B: Web Server Farm Heavy-tailed, highly variable Q6: Which routing policy is best? Join-Shortest-Queue Go to host with fewest # jobs. Least-Work-Left Go to host with least total work. Size-Based Splitting Jobs split up by size.

19 19 FCFS Router FCFS Poisson Process Heavy-tailed, highly variable jobs A: Supercomputing/Manufacturing Router PS B: Web Server Farm Heavy-tailed, highly variable Q6: Which routing policy is best? High E[T] Low E[T] Answer to B: 1. Random = Size-Based 2. Least-Work-Left 3. Join-Shortest-Queue (best!) [Gupta, Harchol-Balter, Sigman, Whitt 06+] [Harchol-Balter, Crovella, Murta, JPDC 99] Answer to A: 1. Random 2. Join-Shortest-Queue 3. Least-Work-Left 4. Size-Based ( best! )

20 20 single-server questions multi-server questions multiple servers with dependencies between servers

21 21 N-sharing model Studied by: S. Bell, R. Williams, M. Harrison, M. Lopez, M. Squillante, C. Xia, D.Yao, L. Zhang, R. Schumsky, L. Green, S. Meyn, A. Ahn, D. Stanford, W. Grassman, … cycle-stealing: Donor helps Beneficiary with her work when he’s free. But can do better with threshold policies…  DD  BD Beneficiary (Betty) Donor (Dan) B D

22 22 Q9: Who gets control: man or woman? Beneficiary Side Control Help me when I have > T B jobs, or you’re free. TBTB TDTD I’ll help you when I have < T D jobs. Donor Side Control Question 9: Who should have control? Dan (donor) or Betty (beneficiary)?

23 23 Q9: Who gets control: man or woman? Help me when I have > T B jobs, or you’re free. TBTB TDTD I’ll help you when I have < T D jobs. # Donor jobs # Beneficiary jobs Difficulty of analysis due to 2D-infinite chain. We introduce Markov-based Dimensionality Reduction. [Harchol-Balter, Osogami, Scheller-Wolf SPAA03, Sigmetrics03, Allerton04, Questa05, Perf. Eval. 06]

24 24 Q9: Who gets control: man or woman? Beneficiary Side Control Help me when I have > T B jobs, or you’re free. TBTB TDTD I’ll help you when I have < T D jobs. Donor Side Control Answer: Mean response time E[T] minimized when woman controls!

25 25 Q10: Which policy is more robust? Beneficiary Side Control Help me when I have > T B jobs, or you’re free. TBTB TDTD I’ll help you when I have < T D jobs. Donor Side Control Q10: Want policy robust against mis-estimation of load…

26 26 Q10: Which policy is more robust? Beneficiary Side Control Help me when I have > T B jobs, or you’re free. TBTB TDTD I’ll help you when I have < T D jobs. Donor Side Control Answer: Donor control helps, but even better is to let Benef. have 2 thresholds, where Donor controls which threshold is used.

27 27 Results: Adaptive Dual Threshold policy T B =6 (opt) T B =20 (robust) Mean response time Dan’s load ADT: meets both goals. Impact: Robustness equally important to efficiency T B =6 (opt)

28 28 Conclusion We’ve covered many themes in system design: Capacity provisioning Heavy-tailed workloads & scheduling Architectures: 1 fast or Many slow Load UNbalancing Server farms: routing policies Threshold-based resource sharing/control Robustness

29 29 If you want to know more … 15-857 Performance Modeling & Design ** Highly-recommended for CS theory, Math, TEPPER, and ACO doctoral students Instructor: Mor Harchol-Balter Prerequisite: Strong probability background MONDAYS & WEDNESDAY 3-4:30 Instructor: Mor Harchol-Balter Prerequisite: Strong probability background MONDAYS & WEDNESDAY 3-4:30 Queueing theory is an old area of mathematics which has recently become very hot. The goal of queueing theory has always been to improve the design/performance of systems, e.g. networks, servers, memory, disks, distributed systems, etc., by finding smarter schemes for allocating resources to jobs. In this class we will study the beautiful mathematical techniques used in queueing theory, including stochastic analysis, discrete-time and continuous-time Markov chains, renewal theory, product-forms, transforms, supplementary random variables, fluid theory, scheduling theory, matrix-analytic methods, and more. Throughout we will emphasize realistic workloads, in particular heavy-tailed workloads. This course is packed with open problems -- problems which if solved are not just interesting theoretically, but which have huge applicability to the design of computer systems today. Queueing theory is an old area of mathematics which has recently become very hot. The goal of queueing theory has always been to improve the design/performance of systems, e.g. networks, servers, memory, disks, distributed systems, etc., by finding smarter schemes for allocating resources to jobs. In this class we will study the beautiful mathematical techniques used in queueing theory, including stochastic analysis, discrete-time and continuous-time Markov chains, renewal theory, product-forms, transforms, supplementary random variables, fluid theory, scheduling theory, matrix-analytic methods, and more. Throughout we will emphasize realistic workloads, in particular heavy-tailed workloads. This course is packed with open problems -- problems which if solved are not just interesting theoretically, but which have huge applicability to the design of computer systems today. Come take my class

30 30 BACKUP

31 31 Q7: To balance or not to balance? S L XL M Size- based job size x xfx  () Question 7: How to choose the size cutoffs? ? ??

32 32 To Balance or Not to Balance? job size x xfx  () Answer: Recent Research on heavy-tailed workloads: Pr{ Job size > x} ~ x -   <1 UNBALANCE favor smalls UNBALANCE favor larges BALANCE LOAD  =1  >1 [Harchol-Balter,Vesilo, 06+], [Glynn, Harchol-Balter, Ramanan, 06+] L S FCFS ssss L LL Impact: May want to rethink all those load balancing policies … Size- based


Download ppt "1 Mor Harchol-Balter Computer Science Dept, CMU What Analytical Performance Modeling Teaches Us About Computer Systems Design."

Similar presentations


Ads by Google