1 Mor Harchol-Balter Computer Science Dept, CMU What Analytical Performance Modeling Teaches Us About Computer Systems Design.

Slides:

Advertisements

Similar presentations

VARUN GUPTA Carnegie Mellon University 1 With: Mor Harchol-Balter (CMU)

Advertisements

Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.

Achieving Elasticity for Cloud MapReduce Jobs Khaled Salah IEEE CloudNet 2013 – San Francisco November 13, 2013.

1 Mor Harchol-Balter, CMU, Computer Sci. Alan Scheller-Wolf, CMU, Tepper Business Andrew Young, Morgan Stanley Surprising results on task assignment for.

Anshul Gandhi (Carnegie Mellon University) Varun Gupta (CMU), Mor Harchol-Balter (CMU) Michael Kozuch (Intel, Pittsburgh)

1 Size-Based Scheduling Policies with Inaccurate Scheduling Information Dong Lu *, Huanyuan Sheng +, Peter A. Dinda * * Prescience Lab, Dept. of Computer.

Page 1 Alan Scheller-Wolf Lunteren, The Netherlands January 15, 2013 Things I Thought I Knew About Queueing Theory, but was Wrong About: Part 1, Multiserver.

Load Balancing of Elastic Traffic in Heterogeneous Wireless Networks Abdulfetah Khalid, Samuli Aalto and Pasi Lassila

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.

NETE4631:Capacity Planning (3)- Private Cloud Lecture 11 Suronapee Phoomvuthisarn, Ph.D. /

CSE 531: Performance Analysis of Systems Lecture 1: Intro and Logistics Anshul Gandhi 1307, CS building

Simulation Evaluation of Hybrid SRPT Policies

TCP Stability and Resource Allocation: Part II. Issues with TCP Round-trip bias Instability under large bandwidth-delay product Transient performance.

Maryam Elahi Fairness in Speed Scaling Design Joint work with: Carey Williamson and Philipp Woelfel.

1 Mor Harchol-Balter Carnegie Mellon University School of Computer Science.

CS CS 5150 Software Engineering Lecture 19 Performance.

System Performance & Scalability i206 Fall 2010 John Chuang.

Scheduling in Server Farms

Effects and Implications of File Size/Service Time Correlation on Web Server Scheduling Policies Dong Lu* + Peter Dinda* Yi Qiao* Huanyuan Sheng* *Northwestern.

Looking at the Server-side of P2P Systems Yi Qiao, Dong Lu, Fabian E. Bustamante and Peter A. Dinda Department of Computer Science Northwestern University.

Data Communication and Networks Lecture 13 Performance December 9, 2004 Joseph Conron Computer Science Department New York University

1 Queueing Theory H Plan: –Introduce basics of Queueing Theory –Define notation and terminology used –Discuss properties of queuing models –Show examples.

1 Connection Scheduling in Web Servers Mor Harchol-Balter School of Computer Science Carnegie Mellon

Performance Evaluation

1 Alan Scheller-Wolf Joint with: Mor Harchol-Balter, Taka Osogami, Adam Wierman, and Li Zhang. Dimensionality Reduction for the analysis of Cycle Stealing,

Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms

Fundamental Characteristics of Queues with Fluctuating Load VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Alan Scheller-Wolf Carnegie.

Carnegie Mellon University Computer Science Department 1 CLASSIFYING SCHEDULING POLICIES WITH RESPECT TO HIGHER MOMENTS OF CONDITIONAL RESPONSE TIME Adam.

Queueing Theory.

1 Mor Harchol-Balter Carnegie Mellon University Joint work with Bianca Schroeder.

Fundamental Characteristics of Queues with Fluctuating Load (appeared in SIGMETRICS 2006) VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ.

Building a Strong Foundation for a Future Internet Jennifer Rexford ’91 Computer Science Department (and Electrical Engineering and the Center for IT Policy)

CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.

Computer Systems Design

Power Management in Data Centers: Theory & Practice Mor Harchol-Balter Computer Science Dept Carnegie Mellon University 1 Anshul Gandhi, Sherwin Doroudi,

Web Server Load Balancing/Scheduling Asima Silva Tim Sutherland.

RAQFM – a Resource Allocation Queueing Fairness Measure David Raz School of Computer Science, Tel Aviv University Jointly with Hanoch Levy, Tel Aviv University.

Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.

1 Scheduling in Server Farms Mor Harchol-Balter Associate Department Head Computer Science Dept Carnegie Mellon University

1 SCHEDULING FOR TODAY’S COMPUTER SYSTEMS: SCHEDULING FOR TODAY’S COMPUTER SYSTEMS: BRIDGING THEORY AND PRACTICE Adam Wierman Mor Harchol-Balter John.

1 Mor Harchol-Balter Carnegie Mellon University Computer Science Heavy Tails: Performance Models & Scheduling Disciplines.

Scalability Terminology: Farms, Clones, Partitions, and Packs: RACS and RAPS Bill Devlin, Jim Cray, Bill Laing, George Spix Microsoft Research Dec

OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.

Probability Review Thinh Nguyen. Probability Theory Review Sample space Bayes’ Rule Independence Expectation Distributions.

Cloud Computing Energy efficient cloud computing Keke Chen.

1 WORkshop on Multiserver Scheduling (WORMS) Carnegie Mellon University Pittsburgh, PA April 18 and 19, 2004 Funded by NSF ALADDIN and Tepper WELCOME EVERYONE!

NETE4631:Capacity Planning (2)- Lecture 10 Suronapee Phoomvuthisarn, Ph.D. /

1 The Effect of Heavy-Tailed Job Size Distributions on System Design Mor Harchol-Balter MIT Laboratory for Computer Science.

Carnegie Mellon University Computer Science Department 1 OPEN VERSUS CLOSED: A CAUTIONARY TALE Bianca Schroeder Adam Wierman Mor Harchol-Balter Computer.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.

NETE4631: Network Information System Capacity Planning (2) Suronapee Phoomvuthisarn, Ph.D. /

Analysis of SRPT Scheduling: Investigating Unfairness Nikhil Bansal (Joint work with Mor Harchol-Balter)

1 CS 501 Spring 2003 CS 501: Software Engineering Lecture 23 Performance of Computer Systems.

1 Task Assignment with Unknown Duration Mor Harchol-Balter Carnegie Mellon.

1 Mor Harchol-Balter Carnegie Mellon with Nikhil Bansal with Bianca Schroeder with Mukesh Agrawal.

Notices of the AMS, September Internet traffic Standard Poisson models don’t capture long-range correlations. Poisson Measured “bursty” on all time.

Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks Energy.

Web Server Load Balancing/Scheduling

OPERATING SYSTEMS CS 3502 Fall 2017

OPERATING SYSTEMS CS 3502 Fall 2017

Web Server Load Balancing/Scheduling

Load Balancing and Data centers

Serve Assignment Policies

B.Ramamurthy Appendix A

Recursive dimensionality reduction

Queueing Theory Carey Williamson Department of Computer Science

Autoscaling Effects in Speed Scaling Systems

Autoscaling Effects in Speed Scaling Systems

Size-Based Scheduling Policies with Inaccurate Scheduling Information

Carey Williamson Department of Computer Science University of Calgary

Presentation transcript:

1 Mor Harchol-Balter Computer Science Dept, CMU What Analytical Performance Modeling Teaches Us About Computer Systems Design

2 Defn: Queueing Theory: The study of queues, congestion, resource management, stochastic (probabilistic) modeling Our Goal: Computer Systems Design CMU CS Dept

3 TERMINOLOGY WARMUP Incoming jobs  : Avg. rate jobs arrive (jobs/sec)  : Avg. rate jobs served (server speed) Response Time: E[T] Examples of single-server queues: Router Supercomputing center A database lock queue Web server 

4 Q1: TERMINOLOGY WARMUP Impact: Be careful not to overprovision! Incoming jobs  : Avg. rate jobs arrive (jobs/sec)  : Avg. rate jobs served (server speed) QUESTION 1: Suppose   2. We want to keep E[T] unchanged. Should we: (a) Double service rate (b) More than double service rate (c) Less than double service rate Response Time: E[T]

5 Workload Distribution Warmup Heavy tail: top 1% jobs comprise half load Huge Variability CPU Lifetimes of UNIX jobs [Harchol,Downey96] Supercomputing job sizes [Harchol-Balter,Schroeder00] Web file sizes [Crovella,Bestavros98],[Barford,Crovella98] Internet Node Degree [Faloutsos,Faloutsos,Faloutsos99] IP Flow durations [Rexford99] Self-similar arrival processes [Willinger93] many, many more... Heavy Tails are everywhere in CS: 1 ½ ¼ Exponential workload 1 ½ ¼ Heavy-tailed workload

6 QUESTION 2: Under exponentially-distributed job demands, which scheduling policy wins for E[T]? FCFS Q2: Exponential Distribution 1 ½ ¼ 1 ½ ¼ Huge Variability Exponential workload Heavy-tailed workload PS = FCFS E[T] load  PS

7 Q3: Heavy-tailed workload 1 ½ ¼ 1 ½ ¼ Huge Variability Exponential workload Heavy-tailed workload Impact: Know your workload  scheduling QUESTION 3: Under heavy-tailed job demands, which scheduling policy wins for E[T]? PS E[T] load  FCFS PS

8 Q4: Scheduling to minimize E[T] QUESTION 4: Under heavy-tailed job demands, in M/G/1, order these scheduling policies for E[T]: FCFS PS 1 ½ ¼ Huge Variability Heavy-tailed workload SRPT SJF RANDOM Low E[T] High E[T]

9 Scheduling to minimize E[T] Answer: Under heavy-tailed job demands, in M/G/1: 1 ½ ¼ Huge Variability Heavy-tailed workload Low E[T] High E[T] RANDOM FCFS SJF PS SRPT = < < < No “Starvation!” Even the biggest jobs prefer SRPT to PS: [Bansal, Harchol-Balter 01], [Wierman, Harchol-Balter 03]: THM: E[T(x)] SRPT < E[T(x)] PS for all x, when load < ½.

10 single-server questions multi-server questions

11 Growing trend towards server farms … Server farms: + cheap + easy to scale Dispatch

12 Growing trend towards server farms … FCFS Router Supercomputing/ManufacturingWeb server farm PS Router  Jobs non-preemptible  Run-to-completion  Served in FCFS order  Often variable job size  HTTP requests fully-preempt  Commodity PS servers  Highly-variable job size  Examples: Cisco Local Director IBM Network Dispatcher Microsoft SharePoint, etc.

13 Q5: 1 Fast versus Many Slow? QUESTION 5: Which has lower E[T]? (for heavy-tailed workload) FCFS Smart Dispatch  ½  ½ FCFS  vs Under RANDOM Dispatch … FCFS  FCFS ½ ½½ vs

14 Q5: 1 Fast versus Many Slow? QUESTION 5: Which has lower E[T]? (for heavy-tailed workload) FCFS Smart Dispatch  ½  ½ FCFS  vs Multiple servers way better under variable workload Under Least-Work-Left Dispatch [Wierman, Osogami, Harchol-Balter, Scheller-Wolf, Perf. Eval. 06] load   1 2 Variability  OPT # servers 3 x

15 Q5: 1 Fast versus Many Slow? QUESTION 5: Which has lower E[T]? (for heavy-tailed workload) FCFS Smart Dispatch  ½  ½ FCFS  vs Under Size-based Dispatch … Size-based Dispatch FCFS small jobs big jobs [Harchol-Balter, Crovella, Murta, Jour.Par.Dist.Comp.99] Multiple servers way better under high variability workload

16 Q5: 1 Fast versus Many Slow? QUESTION 5: Which has lower E[T]? (for heavy-tailed workload) FCFS Smart Dispatch  ½  ½ FCFS  vs Under Size-based Dispatch … Unknown Size Dispatch FCFS small jobs big jobs [Harchol-Balter, Journ. ACM 02] Multiple servers way better under variable workload

17 Q5: 1 Fast versus Many Slow? QUESTION 5: Which has lower E[T]? (for heavy-tailed workload) FCFS Smart Dispatch  ½  ½ FCFS  vs Impact: Best architecture can be cheaper …

18 FCFS Router FCFS Poisson Process Heavy-tailed, highly variable jobs A: Supercomputing/Manufacturing Random Equal probability Router PS B: Web Server Farm Heavy-tailed, highly variable Q6: Which routing policy is best? Join-Shortest-Queue Go to host with fewest # jobs. Least-Work-Left Go to host with least total work. Size-Based Splitting Jobs split up by size.

19 FCFS Router FCFS Poisson Process Heavy-tailed, highly variable jobs A: Supercomputing/Manufacturing Router PS B: Web Server Farm Heavy-tailed, highly variable Q6: Which routing policy is best? High E[T] Low E[T] Answer to B: 1. Random = Size-Based 2. Least-Work-Left 3. Join-Shortest-Queue (best!) [Gupta, Harchol-Balter, Sigman, Whitt 06+] [Harchol-Balter, Crovella, Murta, JPDC 99] Answer to A: 1. Random 2. Join-Shortest-Queue 3. Least-Work-Left 4. Size-Based ( best! )

20 single-server questions multi-server questions multiple servers with dependencies between servers

21 N-sharing model Studied by: S. Bell, R. Williams, M. Harrison, M. Lopez, M. Squillante, C. Xia, D.Yao, L. Zhang, R. Schumsky, L. Green, S. Meyn, A. Ahn, D. Stanford, W. Grassman, … cycle-stealing: Donor helps Beneficiary with her work when he’s free. But can do better with threshold policies…  DD  BD Beneficiary (Betty) Donor (Dan) B D

22 Q9: Who gets control: man or woman? Beneficiary Side Control Help me when I have > T B jobs, or you’re free. TBTB TDTD I’ll help you when I have < T D jobs. Donor Side Control Question 9: Who should have control? Dan (donor) or Betty (beneficiary)?

23 Q9: Who gets control: man or woman? Help me when I have > T B jobs, or you’re free. TBTB TDTD I’ll help you when I have < T D jobs. # Donor jobs # Beneficiary jobs Difficulty of analysis due to 2D-infinite chain. We introduce Markov-based Dimensionality Reduction. [Harchol-Balter, Osogami, Scheller-Wolf SPAA03, Sigmetrics03, Allerton04, Questa05, Perf. Eval. 06]

24 Q9: Who gets control: man or woman? Beneficiary Side Control Help me when I have > T B jobs, or you’re free. TBTB TDTD I’ll help you when I have < T D jobs. Donor Side Control Answer: Mean response time E[T] minimized when woman controls!

25 Q10: Which policy is more robust? Beneficiary Side Control Help me when I have > T B jobs, or you’re free. TBTB TDTD I’ll help you when I have < T D jobs. Donor Side Control Q10: Want policy robust against mis-estimation of load…

26 Q10: Which policy is more robust? Beneficiary Side Control Help me when I have > T B jobs, or you’re free. TBTB TDTD I’ll help you when I have < T D jobs. Donor Side Control Answer: Donor control helps, but even better is to let Benef. have 2 thresholds, where Donor controls which threshold is used.

27 Results: Adaptive Dual Threshold policy T B =6 (opt) T B =20 (robust) Mean response time Dan’s load ADT: meets both goals. Impact: Robustness equally important to efficiency T B =6 (opt)

28 Conclusion We’ve covered many themes in system design: Capacity provisioning Heavy-tailed workloads & scheduling Architectures: 1 fast or Many slow Load UNbalancing Server farms: routing policies Threshold-based resource sharing/control Robustness

29 If you want to know more … Performance Modeling & Design ** Highly-recommended for CS theory, Math, TEPPER, and ACO doctoral students Instructor: Mor Harchol-Balter Prerequisite: Strong probability background MONDAYS & WEDNESDAY 3-4:30 Instructor: Mor Harchol-Balter Prerequisite: Strong probability background MONDAYS & WEDNESDAY 3-4:30 Queueing theory is an old area of mathematics which has recently become very hot. The goal of queueing theory has always been to improve the design/performance of systems, e.g. networks, servers, memory, disks, distributed systems, etc., by finding smarter schemes for allocating resources to jobs. In this class we will study the beautiful mathematical techniques used in queueing theory, including stochastic analysis, discrete-time and continuous-time Markov chains, renewal theory, product-forms, transforms, supplementary random variables, fluid theory, scheduling theory, matrix-analytic methods, and more. Throughout we will emphasize realistic workloads, in particular heavy-tailed workloads. This course is packed with open problems -- problems which if solved are not just interesting theoretically, but which have huge applicability to the design of computer systems today. Queueing theory is an old area of mathematics which has recently become very hot. The goal of queueing theory has always been to improve the design/performance of systems, e.g. networks, servers, memory, disks, distributed systems, etc., by finding smarter schemes for allocating resources to jobs. In this class we will study the beautiful mathematical techniques used in queueing theory, including stochastic analysis, discrete-time and continuous-time Markov chains, renewal theory, product-forms, transforms, supplementary random variables, fluid theory, scheduling theory, matrix-analytic methods, and more. Throughout we will emphasize realistic workloads, in particular heavy-tailed workloads. This course is packed with open problems -- problems which if solved are not just interesting theoretically, but which have huge applicability to the design of computer systems today. Come take my class

30 BACKUP

31 Q7: To balance or not to balance? S L XL M Size- based job size x xfx  () Question 7: How to choose the size cutoffs? ? ??

32 To Balance or Not to Balance? job size x xfx  () Answer: Recent Research on heavy-tailed workloads: Pr{ Job size > x} ~ x -   <1 UNBALANCE favor smalls UNBALANCE favor larges BALANCE LOAD  =1  >1 [Harchol-Balter,Vesilo, 06+], [Glynn, Harchol-Balter, Ramanan, 06+] L S FCFS ssss L LL Impact: May want to rethink all those load balancing policies … Size- based