System Performance & Scalability i206 Fall 2010 John Chuang
2
John Chuang3 Computing Trends Multi-core CPUs Data centers Cloud computing What are the drivers? -scalability, availability, cost-effectiveness
John Chuang4 Lecture Outline Performance Metrics Availability Queuing theory -M/M/1 queue Scalability -M/M/m queue
John Chuang5 What is Performance? Users want fast response time and high availability Managers want happy users, and many of them, while minimizing cost What are standard measures of system performance?
John Chuang6 Performance Metrics Response time (seconds) Throughput (MIPS, Mbps, TPS,...) Resource utilization (%) Availability (%)
John Chuang7 Availability Down-time per yearOne hour down-time per: 90%36 days9 hours 99%3.7 days4.1 days 99.9%9 hours41.6 days 99.99%53 minutes1.14 years %5 minutes11.41 years Availability = MTTF / (MTTF + MTTR) -Mean-time-to-failure (MTTF) -Mean-time-to-recover (MTTR)
John Chuang8 Response Time ClientServer Formulate request Message latency Processing time Interpret response Network Queuing time Adapted from: David Messerschmitt
John Chuang9 Queuing Theory 1. Arrival Process 2. Service Time Distribution 3. Number of Servers 4. System Capacity 5. Customer Population 6. Service Discipline Source: Raj Jain
John Chuang10 Kendall’s Notation (1953) A/B/c/k/N/D -A: arrival process -B: service time distribution -c: number of servers -k: system capacity -N: population size -D: service discipline M: Markov (exponential, memoryless, random, Poisson) D: deterministic E: Erlang H: hyper-exponential G: general FCFS: first come first served FCLS: first come last served RR: round-robin etc. 1. Arrival Process 2. Service Time Distribution 3. Number of Servers 4. System Capacity 5. Customer Population 6. Service Discipline
John Chuang11 Example Systems M/M/1/ / /FCFS (simplified as M/M/1) -Markovian (Poisson, memoryless) arrival -Markovian service time -1 server -Infinite server capacity -Infinite arrival stream -First-come-first-serve discipline Other examples: -M/M/1/k (finite capacity) -M/M/m (m servers) -G/D/1 (arbitrary arrival, deterministic service time) 8 8
John Chuang12 M/M/1 Queue Poisson arrival, with average arrival rate of jobs/sec Poisson service, with average service rate of jobs/sec Single server with infinite queue System utilization (hopefully < 1): = / Average number of jobs in system: N = n·p n = /(1 - ) System throughput (if < 1) : X = Average response time (from Little’s Law): R = N/X = 1/( - )
John Chuang13 Example: Web Server Web server receives 40 requests/second Web server can process 100 requests/second What is server utilization? At any given time, how many requests are at server (waiting plus being processed)? What is the mean total delay at server (waiting plus processing)? What happens when traffic rate doubles?
John Chuang14 Example: Web Server = 40 requests/second = 100 requests/second Utilization = = / = 40/100 = 40% # of requests = N = /(1 - ) = 0.67 Average time spent at server = R = N/X = 0.67/40 = 17ms
John Chuang15 Example: Traffic Doubled = 80 requests/second = 100 requests/second Utilization = = / = 80/100 = 80% # of requests = N = /(1 - ) = 4 Average time spent at server = R = N/X = 4/80 = 50ms (more than doubled!)
John Chuang16 Approaching Congestion = 99 requests/second = 100 requests/second Utilization = = / = 99/100 = 99% # of requests = N = /(1 - ) = 99 Average time spent at server = R = N/X = 99/99 = 1 second!
John Chuang17 Utilization Affects Performance
John Chuang18 M/M/1/k Queue (Finite Capacity) = / N = /(1- ) – (k+1) k+1 /(1- k+1 ) R = N/X = N/ eff -where eff = (1-P k ) = effective arrival rate -and P k = k (1- )/(1- k+1 ) = probability of a full queue Loss rate = - eff
John Chuang19 M/M/1/k Response Time
John Chuang20 M/M/1/k Throughput
John Chuang21 Lecture Outline Performance Metrics Availability Queuing theory -M/M/1 queue Scalability -M/M/m queue
John Chuang22 Scalability The capability of a system to increase total throughput under an increased load when resources (typically hardware) are added -Cost of additional resource -Performance degradation under increased load
John Chuang23 Scalability Example Original web server: can process requests/sec; accepts requests at /sec Now request rate increases to 10 /sec and web server is swamped ( = 10 / )! Need to add new hardware!
John Chuang24 Which is better? Option 1: One big web server that can process 10 requests/sec Option 2: Ten web servers, each can process requests/sec; each accepts 10% of requests ( /sec per server) Option 3: Ten web servers, each can process requests/sec; share single queue (load balancer) that accepts requests at 10 /sec
John Chuang25 Option 1: M/M/1 queue with big server Option 2: (ten M/M/1 queues) Option 3: M/M/10 queue
John Chuang26 M/M/m Queue (m Servers) = /m N = m + /(1- ) where and
John Chuang27 Which is Better? Option 1 (M/M/1 big) Option 2 (ten M/M/1) Option 3 (M/M/10) Utilization ( ) 0.5 Number of requests (N) 11* Response Time (R) 2ms20ms10.07ms m = 10; = 100; = 50 Remember: Scalability is not just about performance!