Presentation is loading. Please wait.

Presentation is loading. Please wait.

What Changed? Frank Bereznay Kaiser Permanente. What Changed? Two Questions –Can we use statistical techniques to help us differentiate between variation.

Similar presentations


Presentation on theme: "What Changed? Frank Bereznay Kaiser Permanente. What Changed? Two Questions –Can we use statistical techniques to help us differentiate between variation."— Presentation transcript:

1 What Changed? Frank Bereznay Kaiser Permanente

2 What Changed? Two Questions –Can we use statistical techniques to help us differentiate between variation that is part of a normal operation and variation due to assignable causes? –How should data be organized to properly use these techniques?

3 Agenda Frank’s One Hour Stat Class –What is Statistics all about? –Commonly Used Statistical Techniques Hypothesis Testing Statistical Process Control Analysis of Variance –Time Series Data –An Example

4 What is Statistics all about? Populations have Parameters. Samples have Statistics. Statistics is all about estimating Population parameters by taking samples and calculating statistics. A Key Question –What is the data population you are trying to estimate and what are it’s properties?

5 Hypothesis Testing A very brief review Conduct an experiment to make a decision about a population parameter. Relies on the Central Limit Theorem to describe the properties of a sample. –Samples are always normally distributed irrespective of the underlying population.

6 Hypothesis Testing Classic form of test H 0 : µ = 0 H a : µ ≠  0 A level of confidence, usually 95% is specified. You collect a sample set of values and compare the derived statistic against a normal distribution to make the determination.

7 Hypothesis Testing

8 So, what can this do for us? –If we know what a metric is supposed to be, we can sample and test. –This has some value for SLAs and other metrics that are mandated. –Most of the time we don’t know the population parameters.

9 Statistical Process Control A bit of History –Walter Shewhart –W Edwards Deming –Post WWII Japan and the Deming Cycle

10 Statistical Process Control Key Concepts and Terms –There is no up front hypothesis test. –No information is required about the parameters of the process being evaluated. –Control Chart Primary method to track a process. Many forms of the metric can be analyzed. –Rational Subgroups Recurring sets of data that are summarized for analysis.

11 Statistical Process Control Sample Control Chart

12 Statistical Process Control A number of CMG papers have been published in this area: –Brey “Managing at the Knee of the Curve(The use of SPC in Managing a Data Center)”, CMG90 –Lipner “Zero-Defect Capacity and Performance Management”, CMG92 –Chu “A Three Sigma Quality Target for 100 Percent SLA”, CMG92 –Buzen & Shum “MASF – Multivariate Adaptive Statistical Filtering”, CMG95

13 Statistical Process Control Following the Buzen & Shum paper, Trubin has published a set of papers on applications of MASF. However, the overall interest in this area seemed to wane. –I believe it is primarily due to the complexity of the data we work with.

14 Analysis of Variance (ANOVA) Developed by Sir Ronald A. Fischer in the early 20 th Century. Initial use was focused on helping the agriculture industry. –A method was needed to evaluate the effectiveness of multiple simultaneous attempts to improve crop yield.

15 Analysis of Variance (ANOVA) Take a area of interest and sub-divide it into multiple populations. Subject these separate populations to various treatments. Make a determination if there are differences in the population means that can be attributed to the treatments.

16 Analysis of Variance (ANOVA) Hypothesis test H 0 : µ 1 = µ 2 = µ 3 … = µ n H a : Not all µ i (i=1,2,3,…n) are equal This can be a very handy tool to determine if there are differences in the sub-groups within a body of data. –Are business volumes the same Monday thru Friday? –Is there a difference in Tuesday’s volume week over week?

17 Quick Summary We have described three techniques: –Hypothesis Testing –Statistical Process Control –Analysis of Variance (ANOVA) Time to see an example

18 Example

19 Bottling Process We have a process that puts a beverage in a bottle –The intended fill volume is 2 liters or 2,000 CM –We collect 36 random samples over a nine day period. Sample Mean 1,999.51 Sample Variance 1.89 Standard Deviation 1.37

20 Bottling Process

21 Upper Control Limit = 2001.46 Lower Control Limit = 1997.56 Process Range = 3.9

22 Bottling Process

23 The ANOVA Procedure Class Level Information Class Levels Values TimeStamp 9 13MAR06 14MAR06 15MAR06 16MAR06 17MAR06 18MAR06 19MAR06 20MAR06 21MAR06 Number of Observations Read 36 Number of Observations Used 36 Dependent Variable: CM CM Sum of Source DF Squares Mean Square F Value Pr > F Model 8 20.23060000 2.52882500 1.49 0.2081 Error 27 45.91587500 1.70058796 Corrected Total 35 66.14647500

24 Tukey's Studentized Range (HSD) Test for CM Means with the same letter are not significantly different. Time Tukey Grouping Mean N Stamp A 2000.3525 4 13MAR06 A A 2000.3200 4 20MAR06 A A 2000.2050 4 19MAR06 A A 1999.8775 4 18MAR06 A A 1999.8275 4 14MAR06 A A 1999.1275 4 16MAR06 A A 1999.0150 4 21MAR06 A A 1998.8075 4 17MAR06 A A 1998.0650 4 15MAR06

25 Second Quick Summary That was reassuring! –When we purchase a bottle of wine we can be sure we are getting our money’s worth. Why did Frank chose this example? What is the relevance to our commercial computing environments?

26 Time Series Data The instrumentation data we analyze is a very complex data aggregate that contains the influences of multiple factors. Many of the factors are related to time or duration. –Hour of the day, Day of the week. –Day of the month, Month of the Year. –Growth rate for the enterprise.

27 Time Series Data Example of overall MIPS usage.

28 Time Series Data Time Series data generally contains four components. –Trend Long term constant movement. –Cycle Movement pattern greater than a year. –Seasonal Variations Movement patterns within a year. –Irregular Fluctuations Events not triggered by a duration.

29 Time Series Data To properly work with this type of data you need to decompose it into it’s components before you begin the testing. You have four separate questions to ask, one for each component. –Did the trend component change? –Did the cycle component change? –Did the seasonal component change? –Did the irregular component change?

30 Time Series Data While it is possible to perform the decomposition of Time Series data into it’s components, the best strategy is to avoid the need to do so. –Choose granular data intervals. –Keep the number of intervals to a minimum. –Start with a 24x7 type of matrix and use ANOVA to determine the hours that belong together.

31 Third Quick Summary OK, Now we have a strategy. –Treat each hour of each day as a separate process. –Use ANOVA to see how similar the day/hour combination is week over week. Are we selecting the right combination? –Use SPC to develop a process mean and control limits for this day/hour. –Plot the results on a day by day basis. Lets see how this looks.

32 Example – Midrange CPU

33 Tukey's Studentized Range (HSD) Test for CPUAVE Alpha 0.05 Error Degrees of Freedom 15 Error Mean Square 5.953 Critical Value of Studentized Range 4.36699 Minimum Significant Difference 5.3275 Means with the same letter are not significantly different. Tukey Grouping Mean N DATE A 34.800 4 30MAR06 A B A 29.850 4 09MAR06 B B 28.650 4 23MAR06 B B 28.075 4 02MAR06 B B 28.025 4 16MAR06

34 Example – Midrange CPU

35

36

37 Example – Midrange CPU Usage

38 Example – Midrange CPU

39 Example – Midrange CPU Usage

40 Summary These statistical tools can really help. –But it is not a slam dunk to implement. –You need to get to know your data. Producing the information is only the beginning. –Recall the problems Shewart and Demming had. –This really needs to the basis for managing the environment.


Download ppt "What Changed? Frank Bereznay Kaiser Permanente. What Changed? Two Questions –Can we use statistical techniques to help us differentiate between variation."

Similar presentations


Ads by Google