Did Something Change? Frank Bereznay

Slides:



Advertisements
Similar presentations
Did Something Change? Using Statistical Techniques to Interpret Service and Resource Metrics Frank Bereznay.
Advertisements

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
Today Concepts underlying inferential statistics
Automatic Daily Monitoring of Continuous Processes Theory and Practice MP Welch – Merrill Consultants Frank Bereznay - IBM.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
AM Recitation 2/10/11.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
What Changed? Frank Bereznay Kaiser Permanente. What Changed? Two Questions –Can we use statistical techniques to help us differentiate between variation.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Tests of Hypotheses Involving Two Populations Tests for the Differences of Means Comparison of two means: and The method of comparison depends on.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Chapter 15 Analysis of Variance. The article “Could Mean Platelet Volume be a Predictive Marker for Acute Myocardial Infarction?” (Medical Science Monitor,
Lecture notes 13: ANOVA (a.k.a. Analysis of Variance)
Chapter 11 Analysis of Variance
Chapter 9 -Hypothesis Testing
Slides by JOHN LOUCKS St. Edward’s University.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 10: Comparing Two Populations or Groups
Chapter 10 Two-Sample Tests and One-Way ANOVA.
Statistics for Managers Using Microsoft Excel 3rd Edition
Lecture Slides Elementary Statistics Twelfth Edition
Research Methodology Lecture No :25 (Hypothesis Testing – Difference in Groups)
CHAPTER 10 Comparing Two Populations or Groups
Applied Business Statistics, 7th ed. by Ken Black
CHAPTER 8 Estimating with Confidence
Chapter 8: Inference for Proportions
Chapter 10: Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
12 Inferential Analysis.
Statistics for Business and Economics (13e)
BUS 308 HELPS Perfect Education/ bus308helps.com.
Experimental Design.
CHAPTER 8 Estimating with Confidence
Chapter 11 Analysis of Variance
CHAPTER 8 Estimating with Confidence
CHAPTER 10 Comparing Two Populations or Groups
12 Inferential Analysis.
CHAPTER 8 Estimating with Confidence
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Statistical Data Analysis
What are their purposes? What kinds?
Inferential Statistics
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Psych 231: Research Methods in Psychology
CHAPTER 10 Comparing Two Populations or Groups
Psych 231: Research Methods in Psychology
Chapter 10: Comparing Two Populations or Groups
CHAPTER 8 Estimating with Confidence
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
CHAPTER 8 Estimating with Confidence
CHAPTER 10 Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
STATISTICS INFORMED DECISIONS USING DATA
Presentation transcript:

Did Something Change? Frank Bereznay Did Something Change? Using Statistical Techniques to Interpret Service and Resource Metrics Frank Bereznay

Did Something Change? Abstract In a perfect world, one would always know the answer to that question. Unfortunately, nobody works in a perfect world. This paper / presentation will explore statistical techniques used to look for deviations in metrics that are due to assignable causes as opposed to the period to period variation that is normally present. Hypothesis Testing, Statistical Process Control, Multivariate Adaptive Statistical Filtering, and Analysis of Variance will be compared and contrasted. SAS code will be used to perform the analysis. Exploratory analysis techniques will be used to build populations for analysis purposes.

What is Statistics all about? Did Something Change? Outline What is Statistics all about? It’s the population that counts Repeatable processes Four techniques to review Hypothesis Testing Statistical Process Control MultiVariate Adaptive Statistical Filtering (MASF) Analysis of Variance (ANOVA) Example Summary & Questions

A Note About Bill Mullen Did Something Change? A Note About Bill Mullen

What is Statistics All About? It is the Population that Counts. Populations have Parameters. Samples have Statistics. The Science of Statistics is all about estimating Population Parameters by taking Samples and calculating Statistics.

What is Statistics All About? What is your population? It can be anything you want it to be, but It must have well defined boundaries. A production cycle of a manufacturing process. A work shift. A bottling run for a particular wine vintage. It must be randomly sampled.

Standard topic for first year Stat Class. Simple and easy to do. Hypothesis Testing Standard topic for first year Stat Class. Simple and easy to do. Interpretation of results has been misunderstood.

Create an alternative hypothesis that contradicts the null hypothesis. Hypothesis Testing Start with a statement you wish to contradict or disprove. Typically this is the status quo. It becomes the null hypothesis. The average message rate is 15 per minute. Create an alternative hypothesis that contradicts the null hypothesis. The average message rate is not 15 per minute.

Standard Notation for stating problem. Hypothesis Testing Standard Notation for stating problem.

What is the population we are working with here? Hypothesis Testing What is the population we are working with here? It is a 24 hour period. We must randomly sample across the entire period. We randomly collect message rates at 10 different points in time. 13,14,16,11,16,15,12,16,12,14

Hypothesis Testing – Population Parameters

Hypothesis Testing – Population Distribution

Hypothesis Testing – Sample Statistics

Calculation of the t statistic Hypothesis Testing Calculation of the t statistic with 9 (N-1) Degrees of Freedom

Hypothesis Testing

So, What does this tell us? The official statement is: Hypothesis Testing So, What does this tell us? The official statement is: At a 95% confidence level, the data is insufficient for us to state the mean of the population is not 15 for the 24 hour period being examined. Important point, the contrary is not necessarily true. This does not prove in any way the population mean is 15.

Statistical Assumptions that need to be considered. Hypothesis Testing Statistical Assumptions that need to be considered. Underlying population does not need to be normally distributed. The population must be randomly sampled.

Some practical uses. Key point Hypothesis Testing Validating we have met an SLA. Looking to see if something is not what we expect it to be. Key point This technique combines an a priori expectation about a quality metric with sampled data. You need to know your data and choose wisely.

Statistical Process Control Two Legends standout in this area: Walter Shewhart W. Edwards Deming SPC is conceptually similar to Hypothesis Testing, but computationally different. No a priori data point is needed. Data is sub-grouped for calculation purposes. SPC and Hypothesis Testing can produce different results for the same set of data.

Statistical Process Control Sample Order Output

Statistical Process Control

Statistical Process Control Done the correct way: 15 ± .373*4.33

Statistical Process Control Without Sub-grouping limit calculation

Statistical Process Control Without Sub-grouping limit calculation

Statistical Process Control Statistical Assumptions that need to be considered. The data does not need to be normally distributed. Proper sub grouping of the data is fundamental to the technique. Sampling plan must be random and cover the boundaries of the population being examined.

Statistical Process Control Practical Uses Useful for measuring discrete physical objects. Things that have physical properties. Counts for outputs. Dollar volumes for orders / sales. Not appropriate for interval based instrumentation data we frequently use.

Multivariate Adaptive Statistical Filtering (MASF) Developed by Annie Shum and Jeff Buzen. Subject of 1995 CMG Paper by same name. Practitioner’s approach to create a statistical detection technique which addresses the unique challenges of the interval driven time series datasets used by Computer Resource Management Professionals.

Why MASF? Variance based statistical detection techniques are based on repeatable processes. Filling a bottle with wine. Manufacturing a roll of paper. Commercial computer workloads are generally not repeatable processes (and that is an understatement!).

A two step process is established: MASF A two step process is established: A Reference Set is created during a period of normal operation in place of a random sample. The Reference Set is used as a set of criteria to examine data from subsequent periods.

What is a normal period? MASF – Reference Set Workloads vary by time of day, day of week and month of year.

MASF – Aggregation Policies The collected data can / should be grouped into set of hours with same characteristics. Increases number of samples per collection period. Day 8 9 10 11 12 13 14 15 16 17 Mon 1 2 3 4 5 Tue 6 7 Wed Thur Fri Hour

MASF – Aggregation Policies Response Time Example.

MASF – Detection Limits Monday Tuesday thru Thursday Friday

Very robust statistical detection technique. MASF - Summary Very robust statistical detection technique. Addresses random sampling issues. Addresses volatility in commercial computing workloads. More of a framework than a specific procedure. Reference set is user defined. Measurement methodology is user defined.

Measurement framework is intended to be an N period rolling average. MASF Summary Measurement framework is intended to be an N period rolling average. Ideally 10 to 20 points per reference set. Longer term datasets subject to Time Series influences which distorts metrics. This technique should be included in every Resource Management Specialist’s toolkit!

Analysis of Variance (ANOVA) A comparison of parameters across populations. Best explained by why it was developed. Agricultural work in the late 1800’s to improve crop yields. Plot of land was divided into multiple areas and subjected to different treatments. Test was developed to compare the effects of these different treatments on crop yield.

ANOVA Example of how this type of experiment would be setup: Important Point - We are dealing with six separate populations.

Same ground rules as Hypothesis Testing ANOVA Same ground rules as Hypothesis Testing Start of by assuming all population means are equal. Null Hypothesis Attempt to prove they are not all the same. Alternative Hypothesis However, calculation of the result is very laborious and best done by a computer.

ANOVA Test stated in similar fashion to Hypothesis Testing.

Accepting Null Hypothesis has same meaning as Hypothesis Testing. ANOVA Accepting Null Hypothesis has same meaning as Hypothesis Testing. Can’t prove any mean is different – end of test. Accepting Alternative Hypothesis has an interesting twist. One or more of the means are different – but which one(s) is/are different?

Turkey test answers the Alternative Hypothesis question. ANOVA Turkey test answers the Alternative Hypothesis question. John Tukey developed a technique to group means of an ANOVA test when the Alternative Hypothesis is accepted. We now have a way to take a set of multiple data populations and segment them into like groups.

ANOVA SMF Data Volume Example

SAS Proc ANOVA Procedure Proc ANOVA; Class Day; Model Count = Day; Means Day / Tukey; Run;

Key Results from Test Tukey Test ANOVA Pr > F = .0424 We conclude at a 95% confidence level that one or more of the means are different. Tukey Test Monday and Friday are different All other days are the same A certain degree of ambiguity

Typical way to report or display results of Tukey test. ANOVA Typical way to report or display results of Tukey test. Mon Tue Wed Thur Fri |-------------------------------| |----------------------------|

ANOVA Second test. Compare the day of the week across weeks Proc ANOVA; Format Date Date8.; Class Date; Model Count = Date; By Day; Run;

Results from second test. ANOVA Results from second test. All five tests accepted the null hypothesis. Pr > F were all in the high 90% range. So the ‘official’ statement is’ The data is insufficient to conclude there is any difference in the mean value for a day of the week across weeks.

Statistical Assumptions that need to be considered. Did Something Change? ANOVA Statistical Assumptions that need to be considered. Sufficient data is need to obtain 6 to 10 observations for each treatment. Need to be sensitive to correlated data. Sampling plan must be random and cover the boundaries of the population being examined.

A very powerful tool that should be in everybody’s toolkit! ANOVA Practical Uses Comparing data from multiple days to see if it is the same or different. Use it as a clustering technique to build aggregated data groups for a MASF analysis. Multiple factor ANOVAs can look at multiple treatments (factors) at the same time. Day of week and hour of day. A very powerful tool that should be in everybody’s toolkit!

Midrange Server Example One Month of Prime Shift usage data for an OLTP server. The MASF technique will be used to look for deviations. The first three weeks will be used to be the reference set to examine the fourth weeks data. ANOVA will be used to create Aggregation Policies to cluster the hourly data.

Midrange Server Example Table of Hourly Usage Metrics Reference Set

Midrange Server Example ANOVA test was performed on the hours of the day. Two overlapping groups were identified. CPUAVE 33.1 30.8 30.7 30.5 29.7 29.6 29.4 28.8 Hour 8 11 12 9 10 13 14 15 |----------------------| |--------------------------------------------|

Midrange Server Example A second ANOVA test was performed on the day of the week. Identified two non-overlapping groups. Group 1 Monday and Friday. Group 2 Tuesday, Wednesday and Thursday.

Midrange Server Example The following aggregation policy was built for this workload.

Midrange Server Example The aggregation policy was used to build the following reference set from the table of hour usage metrics:

Midrange Server Example Plotting this along with the actual data from the fourth week produced the following control chart for Monday:

Midrange Server Example Exception Table for Rest of Week

So, What is in your toolkit? Summary So, What is in your toolkit? Pick up these tools at your nearest CMG meeting. They do take some getting used to, but are worth the learning curve. Hypothesis Testing, Statistical Process Control, MASF and ANOVA Be very wary of your data. The Time Series Data we routinely work with is a very complicated multi-dimensional dataset. Get to know you data. The better you know the data, the better you know your workload.

Next Step - Recommended Reading Summary Next Step - Recommended Reading I. Trubin’s CMG papers on application of MASF and variance based statistical detection techniques. 2001 – Exception Detection System, Based on Statistical Process Control Concept. 2002 – Global and Application Levels Exception Detection System, Based on MASF Technique 2003 – Disk Subsystem Capacity Management, Based on Business Drivers, I/O Performance Metrics and MASF 2004 – Mainframe Global and Workload Levels Statistical Exception Detection System, Based on MASF 2005 – Capturing Workload Pathology by Statistical Exception Detection System.

Questions ???