NETE4631: Network Information System Capacity Planning (2) Suronapee Phoomvuthisarn, Ph.D. /

Slides:



Advertisements
Similar presentations
Lecture 5 This lecture is about: Introduction to Queuing Theory Queuing Theory Notation Bertsekas/Gallager: Section 3.3 Kleinrock (Book I) Basics of Markov.
Advertisements

Chapter 13 Queueing Models
SMA 6304/MIT2.853/MIT2.854 Manufacturing Systems Lecture 19-20: Single-part-type, multiple stage systems Lecturer: Stanley B. Gershwin
Performance Evaluation: Markov Models, revisited CSCI 8710 E. Kraemer.
10. 5: Model Solution Model Interpretation 10
Queueing Models and Ergodicity. 2 Purpose Simulation is often used in the analysis of queueing models. A simple but typical queueing model: Queueing models.
INDR 343 Problem Session
Review of Probability Distributions
TCOM 501: Networking Theory & Fundamentals
NETE4631:Capacity Planning (3)- Private Cloud Lecture 11 Suronapee Phoomvuthisarn, Ph.D. /
Reliable System Design 2011 by: Amir M. Rahmani
Lecture 13 – Continuous-Time Markov Chains
System Performance & Scalability i206 Fall 2010 John Chuang.
ECS 152A Acknowledgement: slides from S. Kalyanaraman & B.Sikdar
1 CSSE 477 – A bit more on Performance Steve Chenoweth Friday, 9/9/11 Week 1, Day 2 Right – Googling for “Performance” gets you everything from Lady Gaga.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Single queue modeling. Basic definitions for performance predictions The performance of a system that gives services could be seen from two different.
Performance Evaluation
Copyright © 1998 Wanda Kunkle Computer Organization 1 Chapter 2.1 Introduction.
1 PERFORMANCE EVALUATION H Often in Computer Science you need to: – demonstrate that a new concept, technique, or algorithm is feasible –demonstrate that.
1 CS 501 Spring 2005 CS 501: Software Engineering Lecture 22 Performance of Computer Systems.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Queuing Networks: Burke’s Theorem, Kleinrock’s Approximation, and Jackson’s Theorem Wade Trappe.
Lecture 14 – Queuing Systems
CDA6530: Performance Models of Computers and Networks Examples of Stochastic Process, Markov Chain, M/M/* Queue TexPoint fonts used in EMF. Read the TexPoint.
Computer Networks Performance Evaluation. Chapter 12 Single Class MVA Performance by Design: Computer Capacity Planning by Example Daniel A. Menascé,
yahoo.com SUT-System Level Performance Models yahoo.com SUT-System Level Performance Models8-1 chapter12 Single Class MVA.
yahoo.com SUT-System Level Performance Models yahoo.com SUT-System Level Performance Models8-1 chapter10 Markov Models.
Copyright warning. COMP5348 Lecture 6: Predicting Performance Adapted with permission from presentations by Alan Fekete.
AN INTRODUCTION TO THE OPERATIONAL ANALYSIS OF QUEUING NETWORK MODELS Peter J. Denning, Jeffrey P. Buzen, The Operational Analysis of Queueing Network.
M EAN -V ALUE A NALYSIS Manijeh Keshtgary O VERVIEW Analysis of Open Queueing Networks Mean-Value Analysis 2.
Verification & Validation
Chapter 2 Machine Interference Model Long Run Analysis Deterministic Model Markov Model.
1 1 Slide © 2001 South-Western College Publishing/Thomson Learning Anderson Sweeney Williams Anderson Sweeney Williams Slides Prepared by JOHN LOUCKS QUANTITATIVE.
Performance Evaluation of Computer Systems Introduction
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
MIT Fun queues for MIT The importance of queues When do queues appear? –Systems in which some serving entities provide some service in a shared.
Introduction to Operations Research
Lecture 14 – Queuing Networks Topics Description of Jackson networks Equations for computing internal arrival rates Examples: computation center, job shop.
Lecture 10: Queueing Theory. Queueing Analysis Jobs serviced by the system resources Jobs wait in a queue to use a busy server queueserver.
NETE4631:Capacity Planning (2)- Lecture 10 Suronapee Phoomvuthisarn, Ph.D. /
Network Design and Analysis-----Wang Wenjie Queueing System IV: 1 © Graduate University, Chinese academy of Sciences. Network Design and Analysis Wang.
Queuing Theory Basic properties, Markovian models, Networks of queues, General service time distributions, Finite source models, Multiserver queues Chapter.
Queueing Models with Multiple Classes CSCI 8710 Tuesday, November 28th Kraemer.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
Review Session Jehan-François Pâris. Agenda Statistical Analysis of Outputs Operational Analysis Case Studies Linear Regression.
Internet Applications: Performance Metrics and performance-related concepts E0397 – Lecture 2 10/8/2010.
Little’s Law & Operational Laws. Little’s Law Proportionality relation between the average number of jobs (E[N]) in a system and the average system time.
(C) J. M. Garrido1 Objects in a Simulation Model There are several objects in a simulation model The activate objects are instances of the classes that.
© 2015 McGraw-Hill Education. All rights reserved. Chapter 17 Queueing Theory.
Random Variables r Random variables define a real valued function over a sample space. r The value of a random variable is determined by the outcome of.
Managerial Decision Making Chapter 13 Queuing Models.
Reliability Engineering
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part VI System-level Performance Models for the Web.
OPERATING SYSTEMS CS 3502 Fall 2017
Lecture 14 – Queuing Networks
CPE 619 Mean-Value Analysis
DTMC Applications Ranking Web Pages & Slotted ALOHA
Lecture on Markov Chain
System Performance: Queuing
Computer Systems Performance Evaluation
TexPoint fonts used in EMF.
Lecture 13 – Queuing Systems
Lecture 14 – Queuing Networks
Lecture 18 Syed Mansoor Sarwar
Computer Systems Performance Evaluation
Chapter-5 Traffic Engineering.
Chapter 2 Machine Interference Model
TexPoint fonts used in EMF.
Presentation transcript:

NETE4631: Network Information System Capacity Planning (2) Suronapee Phoomvuthisarn, Ph.D. /

Lecture Outline  Solution to last lecture’s question (open system)  Markov Chain of close system  Example: Database server  Example: Data center reliability

Question1  Part (a): Calculate the resulting mean response time for each for the three alternatives  This aim of this question is to compare 3 different ways to upgrade an existing system.  Let us first calculate the response time of the existing system.  The assumptions mean that the queue is an M/M/1 queue with arrival rate λ = 9 and mean service time (= 1/ μ ) = 0.1.  This gives an utilisation ρ = λ / μ = 9 * 0.1 = 0.9.  Plugging these numbers into the mean response time T for M/M/1 = 1/( μ (1- ρ )), we have T = 1s.

Utilization Law  Consider  Utilisation U = B / T  Mean service time per completion S = B / C  Output rate X = C / T  Utilisation law  U = S X

Alternative 1  We upgrade the CPU to one twice as fast and since the service time is inversely proportional to the speed, the new mean service time =  The system is an M/M/1 queue with λ = 9 and = 1/ μ = The utilisation ρ = λ / μ = 9 * 0.05 =  Using the M/M/1 mean response time formula gives T =

Alternative 2 (Configuration 3)  We use 2 existing systems and each system maintains its own queue. The arrival rateat each queue is half of the system arrival rate.  Effectively, we have two M/M/1 queue with with λ = 4.5 and 1/ μ = 0.1. The utilisation ρ = λ / μ = 4.5 * 0.1 = Using the M/M/1 mean response time formula gives T =

Alternative 3 (Configuration 2)  This alternative is effectively an M/M/2 queue with λ = 9 and 1/ μ = 0.1. The utilisation ρ = λ / μ / 2= 9 * 0.1 / 2 =  Using the M/M/2 mean response time formula gives =

Curve of how the response time changes with the arrival rate.  Part (b): Plot a graph of arrival rates against the mean response time.

Queueing networks for close systems  Closed queueing networks  Have no external arrivals nor departures ( Known #customers )  Throughput depends on # customers  Example: Database server  A database server has a CPU and 2 disks (Disk1 and Disk2)  The response time is 10s for each query. How can we improve it?  Change the CPU? To what speed?  Add a CPU? What speed?  Add a new disk? What to move there?

DB server example  1 CPU, 1 fast disk, 1 slow disk.  Peak demand = 2 users in the system all the time.  Transactions alternate between CPU and disks.  The transactions will equally likely find files on either disk  Service time are exponentially distributed with mean showed in parentheses.

Typical capacity planning questions  What response time can a typical user expect?  What is the utilisation of each of the system resources?  How will performance parameters change if number of users are doubled?  If fast disk fails and all files are moved to slow disk, what will be the new response time?

Markov chain solution to the DB server problem  We use a 3-tuple (X,Y,Z) as the state  X is # users at CPU  Y is # users at fast disk  Z is # users at slow disk  Examples  (2,0,0): both users at CPU  (1,0,1): one user at CPU and one user at slow disk  Six possible states  (2,0,0) (1,1,0) (1,0,1) (0,2,0) (0,1,1) (0,0,2)

Identifying state transitions (1)  A state is: (#users at CPU, #users at fast disk, #users at slow disk)  What is the rate of moving from State (2,0,0) to State (1,1,0)?  This is caused by a job finishing at the CPU and move to fast disk  Jobs complete at CPU at a rate of 6 transactions/minute  Half of the jobs go to the fast disk  Transition rate from (2,0,0) -> (1,1,0) = 3 transaction/minute  Similarly, transition rate from (2,0,0) -> (1,0,1) = 3 transaction/minute

Identifying state transitions (2)  From (1,1,0) there are 3 possible transitions  Fast disk user goes back to CPU (2,0,0)  CPU user goes to the fast disk (0,2,0), or  CPU user goes to the slow disk (0,1,1)  Question: What are the transition rates in number of transactions per minute?

Markov model for the database server with 2 users

Flow balance equations

Steady State Probability  You can find the steady state probabilities from 6 equations  It’s easier to solve the equations by a software packages, e.g  Matlab, Scilab, Octave, Excel etc.  The solutions are:  P(2,0,0) =  P(1,1,0) =  P(1,0,1) =  P(0,2,0) =  P(0,1,1) =  P(0,0,2) =  How can we use these results for capacity planning?

Model interpretation  Response time of each transaction  Use Little’s Law T = N/X with N = 2  System throughput = CPU Throughput  Throughput = Utilisation x Service rate  CPU utilisation (using states where there is a job at CPU): P(2,0,0)+ P(1,1,0)+P(1,0,1)=  Throughput = x 6 = transactions / minute  Response time (with 2 users) = 2 / = minutes per transaction  When there are a large number of users, the burden to build a Markov chain model is large  If we have 4 users ->15 states need to solve 15 equations in 15 unknowns  We can use Mean Value Analysis (not cover in this lecture)

Another Example  In the previous example, we find that with the current workload and hardware specifications of the system, the response time is minutes per transaction. The engineer in charge of the system would like to improve the response time of the system by using a faster CPU.  Assuming: The workload remains the same as before.  There are always 2 users in the system.  The service time for the disks remains as before.  The service time for the CPU is inversely proportional to the speed of the CPU.  If the engineer would like to achieve a mean response time of 0.65 minutes per transactions, by how many times must the engineer speed up the CPU? Is speeding up the CPU a good choice? Explain.

Solving the model

What if we have 3 users instead?  What if we have 3 users in the database example instead of only 2 users?  We continue to use (X,Y,Z) as the state  X is the # users at CPU  Y is the # users at the fast disk  Z is the # users at the slow disk  There are 10 states:  (3,0,0),  (2,1,0),(2,0,1)  (1,2,0),(1,1,1),(1,0,2)  (0,3,0),(0,2,1),(0,1,2),(0,0,3)

What if there are n users?  You can show that if there are n users in the database server, the number of states m required will be  For n = 100, m (= #states) ~  The Markov model for a practical system will require many states due to  Large number of users  Large number of components

Example: Internet data centre availability  Distributed data centers  Availability problem:  Each data center may go down  Mean time between going down is 90 days  Mean repair time is 6 hours  Can I maintain % availability for 3 out of 4 centres  Technique: Markov Chain

Reliability problem using Markov chain  Consider the working-repair cycle of a machine  “Failure” is an arrival to the repair workshop  “Repair” time is the service time at the repair workshop  Let us assume  “Time-to-next-failure” and “Repair time” are exponentially distributed

Data centre reliability problem  Example: A data centre has 10 machines  Each machine may go down  Time-to-next-failure is exponentially distributed with mean 90 days  Repair time is exponentially distributed with mean 6 hours  Capacity planning question:  Can I make sure that at least 8 machines are available % of the time?  What is the probability that at least 6 machines are available?  How many repair staff are required to guarantee that at least k machines are available with a given probability?  What is the mean time to repair (MTTR) a machine?  Note: Mean-time-to-repair includes waiting time at the repair queue.

Data centre reliability - general problem  Data centre has  M machines  N staff maintain and repair machine  Assumption: M > N  Automatic diagnostic system  Check “heartbeat” by “ping” (Failure detection)  Staff are informed if failure is detected  Repair work  If a machine fails, any one of the idle repair staff (if there is one) will attend to it.  If all repair staff are busy, a failed machine will need to wait until a repair staff has finished its work  This is a queueing problem solvable by Markov chain!!!  Let us denote  λ = 1 / Mean-time-to-failure  μ = 1/ Mean repair time

Queueing model for data centre example

Markov model for the repair queue  State k represents k machines have failed  Part of the state transition diagram is showed below

Markov Model for the repair queue (2)

Solving the model

Using the model  Probability that exactly k machines are available = P(M-k)  Probability that at least k machines are available = P(0) + P(1) … + P(M-k)  But expression for P(k) are so complicated, need numerical software  Example:  M = 120  Mean-time-to-failure = 500 minutes  Mean repair time = 20 minutes  N = 2, 5 or 10  The results are showed in the graphs in the next page

Probability that at least k machines operate

Mean machine failure rate

Continuous-time Markov chain  Useful for analysing queues when the inter-arrival or service time distribution are exponential  The procedure is fairly standard for obtaining the steady state probability distribution  Identify the state  Find the state transition rates  Set up the balance equations  Solve the steady state probability  We can use the steady state probability to obtain other performance metrics: throughput, response time etc.  May need Little’s Law etc.  Continuous-time Markov chain is only applicable when the underlying probability distribution is exponential but the operations laws (e.g. Little’s Law) are applicable no matter what the underlying probability distributions are.

Simulation  You had learnt how to solve a number of queues analytically (= mathematically) given their  Inter-arrival time probability distribution  Service time probability distribution  Unfortunately, many queueing problems are still analytically intractable!  We will learn discrete event simulation for queueing problems  Simulation is an imitation of the operation of real-life system over time

References  Most of the slides have been modified from COMP9334 Capacity Planning of Computer Systems and Networks Week 1-7: Introduction to Capacity Planning, Chou, C. T., 2008  Recommended reading from Chou  The database server example is taken from Menasce et al., “Performance by design”, Chapter 10  The data centre example is taken from Mensace et al, “Performance by desing”, Chapter 7, Sections 1-4  For a more in-depth, and mathematical discussion of continuous-time Markov chain, see  Alberto Leon-Gracia, “Probabilities and random processes for Electrical Engineering”, Chapter 8.  Leonard Kleinrock, “Queueing Systems”, Volume 1