A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance.

Slides:



Advertisements
Similar presentations
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Advertisements

Operations Scheduling
Introduction to Queuing Theory
SMA 6304/MIT2.853/MIT2.854 Manufacturing Systems Lecture 19-20: Single-part-type, multiple stage systems Lecturer: Stanley B. Gershwin
Hadi Goudarzi and Massoud Pedram
INDR 343 Problem Session
S. Chopra/Operations/Managing Services1 Operations Management: Capacity Management in Services Module u Why do queues build up? u Process attributes and.
5/18/2015CPE 731, 4-Principles 1 Define and quantify dependability (1/3) How decide when a system is operating properly? Infrastructure providers now offer.
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Performance Engineering Methodology Chapter 4. Performance Engineering Performance engineering analyzes the expected performance characteristics of a.
Markov Reward Models By H. Momeni Supervisor: Dr. Abdollahi Azgomi.
NETE4631:Capacity Planning (3)- Private Cloud Lecture 11 Suronapee Phoomvuthisarn, Ph.D. /
Queueing Model 박희경.
Queuing Theory For Dummies
Previously Optimization Probability Review Inventory Models Markov Decision Processes.
1 Part II Web Performance Modeling: basic concepts © 1998 Menascé & Almeida. All Rights Reserved.
System Performance & Scalability i206 Fall 2010 John Chuang.
Distributed Cluster Repair for OceanStore Irena Nadjakova and Arindam Chakrabarti Acknowledgements: Hakim Weatherspoon John Kubiatowicz.
1 Part VI System-level Performance Models for the Web © 1998 Menascé & Almeida. All Rights Reserved.
QUEUING MODELS Queuing theory is the analysis of waiting lines It can be used to: –Determine the # checkout stands to have open at a store –Determine the.
Lecture 14 – Queuing Systems
1 Chapter 7 Dynamic Job Shops Advantages/Disadvantages Planning, Control and Scheduling Open Queuing Network Model.
Switching Techniques Student: Blidaru Catalina Elena.
Computer Networks Performance Evaluation. Chapter 12 Single Class MVA Performance by Design: Computer Capacity Planning by Example Daniel A. Menascé,
Introduction to Queuing Theory
Performance Evaluation of Computer Systems and Networks By Behzad Akbari Tarbiat Modares University Spring 2012 In the Name of the Most High.
Trading Agent Competition (Supply Chain Management) and TacTex-05.
Dr. Cesar Malave Texas A & M University
Copyright warning. COMP5348 Lecture 6: Predicting Performance Adapted with permission from presentations by Alan Fekete.
AN INTRODUCTION TO THE OPERATIONAL ANALYSIS OF QUEUING NETWORK MODELS Peter J. Denning, Jeffrey P. Buzen, The Operational Analysis of Queueing Network.
M EAN -V ALUE A NALYSIS Manijeh Keshtgary O VERVIEW Analysis of Open Queueing Networks Mean-Value Analysis 2.
D-1 © 2004 by Prentice Hall, Inc., Upper Saddle River, N.J Operations Management Waiting-Line Models Module D.
Network Aware Resource Allocation in Distributed Clouds.
1 Chapter 5 Flow Lines Types Issues in Design and Operation Models of Asynchronous Lines –Infinite or Finite Buffers Models of Synchronous (Indexing) Lines.
Management of Waiting Lines McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Computer Measurement Group, India Optimal Design Principles for better Performance of Next generation Systems Balachandar Gurusamy,
NETE4631:Capacity Planning (2)- Lecture 10 Suronapee Phoomvuthisarn, Ph.D. /
1 Performance and Availability Models for IaaS Cloud and Their Applications Rahul Ghosh Duke High Availability Assurance Lab Dept. of Electrical and Computer.
Flows and Networks Plan for today (lecture 6): Last time / Questions? Kelly / Whittle network Optimal design of a Kelly / Whittle network: optimisation.
Manijeh Keshtgary. Queuing Network: model in which jobs departing from one queue arrive at another queue (or possibly the same queue)  Open and Closed.
1 Voice Traffic Engineering & Management. 2 PSTN and PBX networks are designed with 2 objectives: Maximize usage of their circuits Maximize usage of their.
Range of Feasibility For phase 2 of manufacturing, the allowable increase is 40 (LINDO) Thus the dual price of $125 is valid for any increase in the allowable.
Introduction to Queueing Theory
Queueing Theory What is a queue? Examples of queues: Grocery store checkout Fast food (McDonalds – vs- Wendy’s) Hospital Emergency rooms Machines waiting.
Sensitivity Analysis Consider the CrossChek hockey stick production problem:   Management believes that CrossChek might only receive $120 profit from the.
1 Chapters 8 Overview of Queuing Analysis. Chapter 8 Overview of Queuing Analysis 2 Projected vs. Actual Response Time.
Queueing Models with Multiple Classes CSCI 8710 Tuesday, November 28th Kraemer.
Generalized stochastic Petri nets (GSPN)
Probability Review CSE430 – Operating Systems. Overview of Lecture Basic probability review Important distributions Poison Process Markov Chains Queuing.
COSC 3330/6308 Solutions to the Third Problem Set Jehan-François Pâris November 2012.
1 Components performance modelling - Outline of queue networks - Mean Value Analisys (MVA) for open and close queue networks.
CSCI1600: Embedded and Real Time Software Lecture 19: Queuing Theory Steven Reiss, Fall 2015.
NETE4631: Network Information System Capacity Planning (2) Suronapee Phoomvuthisarn, Ph.D. /
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part VI System-level Performance Models for the Web (Book, Chapter 8)
Chapter 6 Managing Capacity
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part II System Performance Modeling: basic concepts, operational analysis (book, chap. 3)
Chapter 5 Elementary Stochastic Analysis Prof. Ali Movaghar.
Flows and Networks Plan for today (lecture 6): Last time / Questions? Kelly / Whittle network Optimal design of a Kelly / Whittle network: optimisation.
Recoverable Service Parts Inventory Problems -Ibrahim Mohammed IE 2079.
Management of Waiting Lines Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent.
JA Our Region Name Title Company. Session 1 Am I an Entrepreneur? 1.
 Tata consultancy services Production Planning WORK CENTERS.
Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part VI System-level Performance Models for the Web.
Supply Chain Customer Order Decoupling Point
Morgan Kaufmann Publishers Large and Fast: Exploiting Memory Hierarchy
Introduction to Operating Systems
Windows Azure 講師: 李智樺, Ruddy Lee
Generalized Jackson Networks (Approximate Decomposition Methods)
Working With Cloud - 3.
Drawn from TAPI: oimt.2019.ND TapiStreaming.mht
Presentation transcript:

A Data Center by Ulrike Talbiersky, Holger Wichert, Christian Lohrengel, André Augustyniak Case Study Source: D. Menasce, V.A. Almeida, L.W. Dowdy Performance by Design: Computer Capacity Planning by Example Prentice Hall, 2004

2 Table of Contents: Introduction The Data Center First Model Attempt: Markov Chain Tasks Second Model Attempt: Two-Device QN Cost Analysis

3 Introduction  Data centers offer a variety of services  Trend: service-based data centers  Problems:  Compliance with SLA  default tolerance, privacy, security (...)  Too expensive  How to choose the optimal size? (  cost)

4 The Data Center Machine-Repair-Model:  M machines (functionally identical)  N repair people  Diagnostic system:  Detect failures of the machines  Maintain a queue of machines waiting to be repaired  Log failure time  record repair times

5 GSPN-Model MiOMachines in operation MBRMachines being repaired MWRMachines waiting to be repaired (Sharpe)  Failure rate  Repair rate

6 Queueing Model Machines waiting to be repaired Machines in operation Machines being repaired

7 Parameters Failure rate 1/ MTTF (Mean Time to Failure)  Repair rate 1/  Time to repair a machine MTTRMean Time to Repair MTBFMean Time Between Failures

8 Building a Model ~1~ Example: Markov Chain k number of failed machines k →k+1 transition when a machine fails k →k-1 transition when a machine is repaired λ k = (M-k)λ aggregate failure rate aggregate repair rate

9 Building a Model ~2~ 1-dim. Generalized Birth-Death (GBD) M-k machines in operation

10 Building a Model ~3~ Average aggregate rate at which machines fail (which equals average aggregate rate at which machines are repaired):

11 Building a Model ~4~ Interactive Response Time Law: Client work station ↔ machines in operation Average think time Z ↔ MTTF Average response time R ↔ MTTR System throughput

12 Building a Model ~5~ Little´s Law: (Box of reparation) R ↔ MTTR N f = average number of failed machines

13 Building a Model ~6~ Little´s Law: (operational machines) R ↔ MTTF N o = average number of operational machines

14 Values for the Example 120 machines MTTF = 500 min = per min Time to repair a machine = 20 min  = 0.05 per min

15 Task 1 Given is failure rate of machines = per min number of machines M = 120 repair rate of machines  = 0.05 per min What is the probability that exactly j machines are operational?

16 Task 1 Use: p exactly j machines in operation = p M-j

17 Task 1 N = 2,5,10

18 Task 2 Given is failure rate of machines = per min number of machines M = 120 number of repair people N repair rate of machines  = 0.05 per min What is the probability P j that at least j machines are operational ?

19 Task 2 Use Task 1 and: once the personnel becomes overloaded, the system tends towards failure if M>>N: having extra machines is pointless

20 Task 3 Given is failure rate of machines = per min number of machines M = 120 wanted probability: P j = 0.9 Time to repair a machine = 20 per min How many repair people are necessary to guarantee that at least two thirds of the machines are operational with P j = 0.9 ?

21 Task 2,3 N = 2,3,4,5,10

22 Task 4 Given are the values What is the effect of the size of the repair team, N, on the MTTR a machine ?

23 Task 4 computation 1. p 0 2. p k

24 Task 4 computation 1. p 0 2. p k

25 Task 4 computation 1. p 0 2. p k 4. MTTR

26 Task 4 computation 1. p 0 2. p k 4. MTTR 5. N o

27 Task 4 computation 1. p 0 2. p k 4. MTTR 5. N o 6. N f

28 Task 4 Effect of Number of Repair People Nrepair people N O average number of operational machines N f average number of failed machines MTTRMean Time to Repair

29 Task 4 number of repair people is increased beyond 5, further decreases in the MTTR is minimal with 5 repair people: 111 machines operational down time of 38 minutes (MTTR = 38 min: 20 min repair, 18 min wait)

30 Task 4 case N = M =120:

31 Task 5 Given are the values What is the effect of a repair person´s skill level on the overall down time ?

32 Task 5 Given are the values How does the skill level affect the percentage of operational machines ?

33 Task 5 Effect of the Repair Rate N O average number of operational machines N f average number of failed machines MTTRMean Time to Repair

34 Second Modeling Attempt ~1~ The Failure-recovery-model can also be modeled by a two-device QN: 1st device: delay server (  Machines in Operation) 2nd device: load-dependent server (  repair people)

35 Second Modeling Attempt ~2~ Delay server: A fixed machine goes into operation without queuing. The time a machine is valid depends only on its MTTF.

36 Second Modeling Attempt ~3~ Load-dependent server: total rate at which machines are repaired (TRMR) depends on: - number of failed machines k - number of repair people N service rate:

37 Second Modeling Attempt ~4~ Use MVA method with load- dependent devices for solving this model required: service rate´multipliers, k=1,...,M (s.Chp 14)

38 Second Modeling Attempt ~5~ The solution of this MVA model gives us: average throughput: average residence time at the LD-device: = MTTR Little´s Law to LD device: av. number of failed machines: av. number of machines in op.:

39 A Cost Analysis  C p annual personnel cost  C m annual cost per machine   constant revenue multiplier  N o average number of machines in operation  M min minimum number of machines that need to be in operation for the data center not to have to pay a penalty  C α cost  R α revenue

40 A Cost Analysis cost: revenue: profit:

41 A Cost Analysis

42 A Cost Analysis negative profit for low numbers of personnel, because of low machine availability with more than 6 personnel costs increases more then revenue, thus 6 service personnel are optimal

43 References Skripts And Talks Of Menasce CS672_Performance cs672-07CaseStudy-III-DataCenter.pdf cs672-03QuantifyingPerformanceModels.pdf Skript SN1 Haverkort: Computer Communication Systems Performance Analysis