Analyze Phase Inferential Statistics

Analyze Phase Inferential Statistics
Now we will continue in the Analyze Phase with “Inferential Statistics”.

Inferential Statistics
Nature of Sampling Central Limit Theorem Inferential Statistics Hypothesis Testing NND P1 Hypothesis Testing ND P1 Intro to Hypothesis Testing “X” Sifting Welcome to Analyze Hypothesis Testing ND P2 Wrap Up & Action Items Hypothesis Testing NND P2 The core fundamentals of this phase are Inferential Statistics, Nature of Sampling and Central Limit Theorem.

Putting the pieces of the puzzle together….
Nature of Inference in·fer·ence (n.) The act or process of deriving logical conclusions from premises known or assumed to be true. The act of reasoning from factual knowledge or evidence. 1 1. Dictionary.com Inferential Statistics – To draw inferences about the process or population being studied by modeling patterns of data in a way that accounts for randomness and uncertainty in the observations. 2 2. Wikipedia.com Putting the pieces of the puzzle together…. in·fer·ence is “The act or process of deriving logical conclusions from premises known or assumed to be true. The act of reasoning from factual knowledge or evidence.” Inferential Statistics is used to draw inferences about the process or population being studied by modeling patterns of data in a way that accounts for randomness and uncertainty in the observations. One objective of Six Sigma is to move from only describing the nature of the data or Descriptive Statistics to the ability infer meaning from data as to what will happen in the future… or Inferential Statistics.

5 Step Approach to Inferential Statistics
So many questions….? 1. What do you want to know? 2. What tool will give you that information? 5. How confident are you with your data summaries? 4. How will you collect the data? 3. What kind of data does that tool require? As with most things you have learned associated with Six Sigma – there are defined steps to be taken.

4. Lack of measurement validity
Types of Error 1. Error in sampling Error due to differences among samples drawn at random from the population (luck of the draw). This is the only source of error that statistics can accommodate. 2. Bias in sampling Error due to lack of independence among random samples or due to systematic sampling procedures (height of horse jockeys only). 3. Error in measurement Error in the measurement of the samples (MSA/GR&R). 4. Lack of measurement validity Error in the measurement does not actually measure what it is intended to measure (placing a probe in the wrong slot measuring temperature with a thermometer that is just next to a furnace). Types of error contribute to uncertainty when trying to infer with data. There are four types of error that are explained above.

Population, Sample, Observation
EVERY data point that has ever been or ever will be generated from a given characteristic. Sample A portion (or subset) of the population, either at one time or over time. Observation An individual measurement. X Let’s just review a few definitions: A population is EVERY data point that has ever been or ever will be generated from a given characteristic. A sample is a portion (or subset) of the population, either at one time or over time. An observation is an individual measurement.

Significance is all about differences…
Practical difference and significance is: The amount of difference, change or improvement that will be of practical, economic or technical value to you. The amount of improvement required to pay for the cost of making the improvement. Statistical difference and significance is: The magnitude of difference or change required to distinguish between a true difference, change or improvement and one that could have occurred by chance. Twins: Sure there are differences… but do they matter? In general larger differences (or deltas) are considered to be “more significant.” As you see here we can experience both a “practical” difference and a “statistical” difference. Six Sigma decisions will ultimately have a return on resource investment (RORI)* element associated with them. So the key question of interest for our decisions is “is the benefit of making a change worth the cost and risk of making it?”

Mean Shift Variation Reduction Both
The Mission Mean Shift Variation Reduction Both Your mission, which you have chosen to accept, is to reduce cycle time, reduce the error rate, reduce costs, reduce investment, improve service level, improve throughput, reduce lead time, increase productivity… change the output metric of some process, etc. In statistical terms this translates to the need to move the process Mean and/or reduce the process Standard Deviation. You will be making decisions about how to adjust key process input variables based on sample data, not population data - this means you are taking some risks. How will you know your key process output variable really changed and is not just an unlikely sample? The Central Limit Theorem helps us understand the risk we are taking and is the basis for using sampling to estimate population parameters.

A Distribution of Sample Means
Imagine you have some population. The individual values of this population form some distribution. Take a sample of some of the individual values to calculate the sample Mean. Keep taking samples and calculating sample Means. Plot a new distribution of these sample Means. The Central Limit Theorem says as the sample size becomes large this new distribution (the sample Mean distribution) will form a Normal Distribution no matter what the shape of the population distribution of individuals. Please read the slide.

Sampling Distributions—The Foundation of Statistics
3 5 2 12 10 1 6 14 11 9 Population Sample 1 Sample 2 Sample 3 Samples from the population, each with five observations: In this example we have taken three samples out of the population each with five observations in it. We computed a Mean for each sample. Note the Means are not the same! Why not? What would happen if we kept taking more samples? Every statistic derives from a sampling distribution. For instance if you were to keep taking samples from the population over and over a distribution could be formed for calculating Means, Medians, Mode, Standard Deviations, etc. As you will see the above sample distributions each have a different statistic. The goal here is to successfully make inferences regarding the statistical data.

Constructing Sampling Distributions
Open Minitab Worksheet “Die Example”. Roll ‘em! To demonstrate how sampling distributions work we will create some random data for die rolls. Create a sample of 1,000 individual rolls of a die that we will store in a variable named “Population”. From the population we will draw five random samples.

Sampling Distributions
Calc> Random Data> Sample from Columns… To draw random samples from the population follow the command shown below and repeat four times for the other columns.

Sampling Error Calculate the Mean and Standard Deviation for each column and compare the sample statistics to the population. Descriptive Statistics: Population, Sample1, Sample2, Sample3, Sample4, Sample5 Stat > Basic Statistics > Display Descriptive Statistics… Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum Population 1000 3.5510 0.0528 1.6692 1.0000 2.0000 4.0000 5.0000 6.0000 Sample1 5 3.400 0.927 2.074 1.000 1.500 3.000 5.500 6.000 Sample2 4.600 0.678 1.517 2.000 3.500 5.000 Sample3 4.200 0.663 1.483 4.000 Sample4 3.800 0.917 2.049 Sample5 3.600 0.872 1.949 Now compare the Mean and Standard Deviation of the samples of five observations to the population. Subtract the lowest from the highest. What do you see? Range in Mean (4.600 – 3.400) Range in StDev (2.074 – 1.483)

Sampling Error Create 5 more columns of data sampling 10 observations from the population. Calc> Random Data> Sample from Columns… Please read the slide.

Sampling Error - Reduced
Calculate the Mean and Standard Deviation for each column and compare the sample statistics to the population. Stat > Basic Statistics > Display Descriptive Statistics… Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum Sample6 10 3.600 0.653 2.066 1.000 1.750 3.500 6.000 Sample7 4.100 0.567 1.792 2.750 4.500 Sample8 3.200 0.442 1.398 2.000 4.250 5.000 Sample9 0.563 1.780 5.250 Sample10 3.300 0.616 1.947 3.000 Range in Mean (4.100 – 3.200) Range in StDev (2.066 – 1.398) Can you tell what is happening to the Mean and Standard Deviation? When the sample size increases the values of the Mean and Standard Deviation decrease. What do you think would happen if the sample increased? Let’s try 30 for a sample size. With 10 observations the differences between samples are now much smaller.

Sampling Error - Reduced
Range in Mean 0. 63 Range in StDev 0.381 Variable N Mean StDev Sample Sample Sample Sample Sample Calc> Random Data> Sample from Columns… Stat> Basic Statistics> Display Descriptive Statistics… Do you notice anything different? Look how much smaller the range of the Mean and Standard Deviations. Did the sampling error get reduced?

In theory if we kept taking samples of size n = 5 and n = 10 and calculated the sample Means we could see how the sample Means are distributed. Simulate this in MINITABTM by creating ten columns of 1000 rolls of a die. Feeling lucky…? Calc> Random Data> Integer… Now instead of looking at the effect of sample size on error we will create a sampling distribution of averages. Follow along to generate your own random data.

For each row calculate the Mean of five columns. Repeat this command to calculate the Mean of C1-C10 and store result in Mean10. Calc> Row Statistics… The commands shown here will create new columns that are now averages from the columns of random population data. We have 1000 averages of sample size 5 and 1000 averages of sample size 10.

Create a Histogram of C1, Mean5 and Mean10. Graph> Histogram> Simple….. Multiple Graph…On separate graphs…Same X, including same bins Select “Same X, including same bins” to facilitate comparison In MINITABTM follow the above commands. The Histogram being generated makes it easy to see what happened when the sample size was increased.

Different Distributions
Sample Means Individuals What is different about the three distributions? What happens as the number of die throws increase? Let’s examine how the number of throws impacts our analysis results.

The Mean of the sample Mean distribution:
Observations As the sample size (number of die rolls) increases from 1 to 5 to 10, there are three points to note: The Center remains the same. The variation decreases. The shape of the distribution changes - it tends to become Normal. Good news: the Mean of the sample Mean distribution is the Mean of the population. Better news: I can reduce my uncertainty about the population Mean by increasing my sample size n. The Mean of the sample Mean distribution: The Standard Deviation of the sample Mean distribution, also known as the Standard Error. Here are the answers to the questions.

Central Limit Theorem If all possible random samples, each of size n, are taken from any population with a Mean μ and Standard Deviation σ the distribution of sample Means will: have a Mean have a Std Dev and be Normally Distributed when the parent population is Normally Distributed or will be approximately Normal for samples of size 30 or more when the parent population is not Normally Distributed. This improves with samples of larger size. Bigger is Better! Everything we have gone through with sampling error and sampling distributions was leading up to the Central Limit Theorem.

So What? So how does this theorem help me understand the risk I am taking when I use sample data instead of population data? Recall that 95% of Normally Distributed data is within ± 2 Standard Deviations from the Mean. Therefore the probability is 95% my sample Mean is within 2 standard errors of the true population Mean. Let’s examine the importance of this.

A Practical Example Let’s say your project is to reduce the setup time for a large casting: Based on a sample of 20 setups you learn your baseline average is 45 minutes with a Standard Deviation of 10 minutes. Because this is just a sample the 45 minute average is an estimate of the true average. Using the Central Limit Theorem there is 95% probability the true average is somewhere between 40.5 and 49.5 minutes. Therefore do not get too excited if you made a process change resulting in a reduction of only 2 minutes. What is the likelihood of getting a sample with a 2 second difference? This could be caused either by implementing changes or could be a result of random sampling variation, sampling error. The 95% confidence interval exceeds the 2 second difference (delta) seen as a result. What causes the delta? This could be a true difference in performance or random sampling error. This is why you look further than only relying on point estimators.

Sample Size and the Mean
Distribution of individuals in the population Theoretical distribution of sample Means for n = 10 Theoretical distribution of sample Means for n = 2 When taking a sample we have only estimated the true Mean. All we know is the true Mean lies somewhere within the theoretical distribution of sample Means or the t-distribution that are analyzed using t-tests. T-tests measure the significance of differences between Means.

Standard Error of the Mean
The Standard Deviation for the distribution of Means is called the standard error of the Mean and is defined as: This formula shows the Mean is more stable than a single observation by a factor of the square root of the sample size.

Standard Error The rate of change in the Standard Error approaches zero at about 30 samples. This is why 30 samples is often recommended when generating summary statistics such as the Mean and Standard Deviation. This is also the point at which the t and Z distributions become nearly equivalent. 3 2 1 Sample Size Standard Error 5 When comparing Standard Error with sample size the rate of change in the Standard Error approaches zero at about 30 samples. This is why a sample size of 30 comes up often in discussions on sample size. This is the point at which the t and the Z distributions become nearly equivalent. If you look at a Z table and a t table to compare Z=1.96 to t at as sample approaches infinite degrees of freedom they are equal.

At this point you should be able to:
Summary At this point you should be able to: Explain the term “Inferential Statistics” Explain the Central Limit Theorem Describe what impact sample size has on your estimates of population parameters Explain Standard Error Please read the slide.

Analyze Phase Inferential Statistics

Similar presentations

Presentation on theme: "Analyze Phase Inferential Statistics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Analyze Phase Inferential Statistics

Similar presentations

Presentation on theme: "Analyze Phase Inferential Statistics"— Presentation transcript:

Similar presentations

About project

Feedback