What Changed? Frank Bereznay Kaiser Permanente. What Changed? Two Questions –Can we use statistical techniques to help us differentiate between variation.

Slides:



Advertisements
Similar presentations
Chapter 9 Introduction to the t-statistic
Advertisements

Chapter 6 Sampling and Sampling Distributions
CHAPTER 25: One-Way Analysis of Variance Comparing Several Means
CHAPTER 25: One-Way Analysis of Variance: Comparing Several Means ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner.
Did Something Change? Using Statistical Techniques to Interpret Service and Resource Metrics Frank Bereznay.
Analysis of Variance (ANOVA) Statistics for the Social Sciences Psychology 340 Spring 2010.
Analysis and Interpretation Inferential Statistics ANOVA
1 Statistical Inference H Plan: –Discuss statistical methods in simulations –Define concepts and terminology –Traditional approaches: u Hypothesis testing.
Chapter 10 Simple Regression.
Lecture 10 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Chapter 7 Sampling and Sampling Distributions
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Statistics Are Fun! Analysis of Variance
Chapter 3 Analysis of Variance
Topic 2: Statistical Concepts and Market Returns
Lecture 9: One Way ANOVA Between Subjects
Chapter 10 - Part 1 Factorial Experiments.
Chapter 11 Multiple Regression.
One-way Between Groups Analysis of Variance
Sample Size Determination In the Context of Hypothesis Testing
The Analysis of Variance
Chapter 11: Inference for Distributions
Inferences About Process Quality
Total Quality Management BUS 3 – 142 Statistics for Variables Week of Mar 14, 2011.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Automatic Daily Monitoring of Continuous Processes Theory and Practice MP Welch – Merrill Consultants Frank Bereznay - IBM.
Statistical Process Control
Analysis of Variance (ANOVA) Quantitative Methods in HPELS 440:210.
Repeated Measures ANOVA
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
Lesson Comparing Two Means.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
The Scientific Method Formulation of an H ypothesis P lanning an experiment to objectively test the hypothesis Careful observation and collection of D.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Statistics 11 Confidence Interval Suppose you have a sample from a population You know the sample mean is an unbiased estimate of population mean Question:
Testing Hypotheses about Differences among Several Means.
Time Series Analysis and Forecasting
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.
ANOVA: Analysis of Variance.
Analysis of Variance (One Factor). ANOVA Analysis of Variance Tests whether differences exist among population means categorized by only one factor or.
Lesson Comparing Two Means. Knowledge Objectives Describe the three conditions necessary for doing inference involving two population means. Clarify.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
1 Overview of Experimental Design. 2 3 Examples of Experimental Designs.
ETM U 1 Analysis of Variance (ANOVA) Suppose we want to compare more than two means? For example, suppose a manufacturer of paper used for grocery.
Chapter 12 Introduction to Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Eighth Edition by Frederick.
Time Series Analysis and Forecasting. Introduction to Time Series Analysis A time-series is a set of observations on a quantitative variable collected.
Statistics for the Social Sciences Psychology 340 Spring 2009 Analysis of Variance (ANOVA)
Statistics for Political Science Levin and Fox Chapter Seven
Chapter 14: Analysis of Variance One-way ANOVA Lecture 9a Instructor: Naveen Abedin Date: 24 th November 2015.
Formula for Linear Regression y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis. Slope or the change in y for every.
 List the characteristics of the F distribution.  Conduct a test of hypothesis to determine whether the variances of two populations are equal.  Discuss.
Chapter 12 Introduction to Analysis of Variance
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
i) Two way ANOVA without replication
Comparing Three or More Means
Comparing Three or More Means
Chapter 14: Analysis of Variance One-way ANOVA Lecture 8
Sampling Distribution
Lesson Comparing Two Means.
Sampling Distribution
Putting It All Together: Which Method Do I Use?
Basic Training for Statistical Process Control
Basic Training for Statistical Process Control
I. Statistical Tests: Why do we use them? What do they involve?
Did Something Change? Frank Bereznay
Presentation transcript:

What Changed? Frank Bereznay Kaiser Permanente

What Changed? Two Questions –Can we use statistical techniques to help us differentiate between variation that is part of a normal operation and variation due to assignable causes? –How should data be organized to properly use these techniques?

Agenda Frank’s One Hour Stat Class –What is Statistics all about? –Commonly Used Statistical Techniques Hypothesis Testing Statistical Process Control Analysis of Variance –Time Series Data –An Example

What is Statistics all about? Populations have Parameters. Samples have Statistics. Statistics is all about estimating Population parameters by taking samples and calculating statistics. A Key Question –What is the data population you are trying to estimate and what are it’s properties?

Hypothesis Testing A very brief review Conduct an experiment to make a decision about a population parameter. Relies on the Central Limit Theorem to describe the properties of a sample. –Samples are always normally distributed irrespective of the underlying population.

Hypothesis Testing Classic form of test H 0 : µ = 0 H a : µ ≠  0 A level of confidence, usually 95% is specified. You collect a sample set of values and compare the derived statistic against a normal distribution to make the determination.

Hypothesis Testing

So, what can this do for us? –If we know what a metric is supposed to be, we can sample and test. –This has some value for SLAs and other metrics that are mandated. –Most of the time we don’t know the population parameters.

Statistical Process Control A bit of History –Walter Shewhart –W Edwards Deming –Post WWII Japan and the Deming Cycle

Statistical Process Control Key Concepts and Terms –There is no up front hypothesis test. –No information is required about the parameters of the process being evaluated. –Control Chart Primary method to track a process. Many forms of the metric can be analyzed. –Rational Subgroups Recurring sets of data that are summarized for analysis.

Statistical Process Control Sample Control Chart

Statistical Process Control A number of CMG papers have been published in this area: –Brey “Managing at the Knee of the Curve(The use of SPC in Managing a Data Center)”, CMG90 –Lipner “Zero-Defect Capacity and Performance Management”, CMG92 –Chu “A Three Sigma Quality Target for 100 Percent SLA”, CMG92 –Buzen & Shum “MASF – Multivariate Adaptive Statistical Filtering”, CMG95

Statistical Process Control Following the Buzen & Shum paper, Trubin has published a set of papers on applications of MASF. However, the overall interest in this area seemed to wane. –I believe it is primarily due to the complexity of the data we work with.

Analysis of Variance (ANOVA) Developed by Sir Ronald A. Fischer in the early 20 th Century. Initial use was focused on helping the agriculture industry. –A method was needed to evaluate the effectiveness of multiple simultaneous attempts to improve crop yield.

Analysis of Variance (ANOVA) Take a area of interest and sub-divide it into multiple populations. Subject these separate populations to various treatments. Make a determination if there are differences in the population means that can be attributed to the treatments.

Analysis of Variance (ANOVA) Hypothesis test H 0 : µ 1 = µ 2 = µ 3 … = µ n H a : Not all µ i (i=1,2,3,…n) are equal This can be a very handy tool to determine if there are differences in the sub-groups within a body of data. –Are business volumes the same Monday thru Friday? –Is there a difference in Tuesday’s volume week over week?

Quick Summary We have described three techniques: –Hypothesis Testing –Statistical Process Control –Analysis of Variance (ANOVA) Time to see an example

Example

Bottling Process We have a process that puts a beverage in a bottle –The intended fill volume is 2 liters or 2,000 CM –We collect 36 random samples over a nine day period. Sample Mean 1, Sample Variance 1.89 Standard Deviation 1.37

Bottling Process

Upper Control Limit = Lower Control Limit = Process Range = 3.9

Bottling Process

The ANOVA Procedure Class Level Information Class Levels Values TimeStamp 9 13MAR06 14MAR06 15MAR06 16MAR06 17MAR06 18MAR06 19MAR06 20MAR06 21MAR06 Number of Observations Read 36 Number of Observations Used 36 Dependent Variable: CM CM Sum of Source DF Squares Mean Square F Value Pr > F Model Error Corrected Total

Tukey's Studentized Range (HSD) Test for CM Means with the same letter are not significantly different. Time Tukey Grouping Mean N Stamp A MAR06 A A MAR06 A A MAR06 A A MAR06 A A MAR06 A A MAR06 A A MAR06 A A MAR06 A A MAR06

Second Quick Summary That was reassuring! –When we purchase a bottle of wine we can be sure we are getting our money’s worth. Why did Frank chose this example? What is the relevance to our commercial computing environments?

Time Series Data The instrumentation data we analyze is a very complex data aggregate that contains the influences of multiple factors. Many of the factors are related to time or duration. –Hour of the day, Day of the week. –Day of the month, Month of the Year. –Growth rate for the enterprise.

Time Series Data Example of overall MIPS usage.

Time Series Data Time Series data generally contains four components. –Trend Long term constant movement. –Cycle Movement pattern greater than a year. –Seasonal Variations Movement patterns within a year. –Irregular Fluctuations Events not triggered by a duration.

Time Series Data To properly work with this type of data you need to decompose it into it’s components before you begin the testing. You have four separate questions to ask, one for each component. –Did the trend component change? –Did the cycle component change? –Did the seasonal component change? –Did the irregular component change?

Time Series Data While it is possible to perform the decomposition of Time Series data into it’s components, the best strategy is to avoid the need to do so. –Choose granular data intervals. –Keep the number of intervals to a minimum. –Start with a 24x7 type of matrix and use ANOVA to determine the hours that belong together.

Third Quick Summary OK, Now we have a strategy. –Treat each hour of each day as a separate process. –Use ANOVA to see how similar the day/hour combination is week over week. Are we selecting the right combination? –Use SPC to develop a process mean and control limits for this day/hour. –Plot the results on a day by day basis. Lets see how this looks.

Example – Midrange CPU

Tukey's Studentized Range (HSD) Test for CPUAVE Alpha 0.05 Error Degrees of Freedom 15 Error Mean Square Critical Value of Studentized Range Minimum Significant Difference Means with the same letter are not significantly different. Tukey Grouping Mean N DATE A MAR06 A B A MAR06 B B MAR06 B B MAR06 B B MAR06

Example – Midrange CPU

Example – Midrange CPU Usage

Example – Midrange CPU

Example – Midrange CPU Usage

Summary These statistical tools can really help. –But it is not a slam dunk to implement. –You need to get to know your data. Producing the information is only the beginning. –Recall the problems Shewart and Demming had. –This really needs to the basis for managing the environment.