Lean Six Sigma Black Belt Training! Analyze (Part 2) Agenda

The DMAIC Lean Six Sigma Project and Team Tools Approach Analyze Phase (Part 2)

Lean Six Sigma Black Belt Training! Analyze (Part 2) Agenda
Review Analyze Part 1 Inferential Statistics Hypothesis Testing P-values Discrete X / Continuous Y Statistical Tests Continuous X / Continuous Y Statistical Tests Discrete X / Discrete Y Statistical Tests Applications / Lessons Learned / Conclusions Next Steps

Six Sigma Analyze Inferential Statistics (Identifying What’s Different (Xs) Statistically)

Introduction to Hypothesis Testing
Are these samples from the same population? Sample 1 Sample 2 Mean=6.5 StDev=1 Mean=6.9 StDev=1.2

Intro. to Confidence Intervals (pg. 157)
Brutal Facts Regarding Samples We know that the size of the sampling error is primarily based on the variation in the population and the size of the sample selected. Larger samples have a smaller margin or error, yet are more costly to obtain. As reality in practice dictates, one sample is usually selected and it usually is the minimum size required. Therefore, a method was needed to estimate a population parameter. This method resulted in the term Confidence Interval. 5

Intro. to Confidence Intervals (pg. 157)
A statistic plus or minus a margin of error is called a confidence interval. A confidence interval is a range of values, calculated from a data set, that gives an assigned probability that the true value falls within that range. The confidence level is dependent on the range of the margin of error that is selected. Generally, the margin of error that is accepted is plus or minus 2 standard errors, resulting in a 95% confidence level. “We are 95% confident that the true average door-to-balloon time is between 60 and 100 minutes.” 6

We get a normal distribution with a mean of 50 and n=100.
Assume we have a population of N size that is not normally distributed. We draw 100 random samples and plot the averages of each sample. We get a normal distribution with a mean of 50 and n=100.

The mean of our sampled distribution is 50.
How confident are we of where the population mean lies? Similar to standard deviation, we know that 68% of the sample distribution lies within 1 standard error and 95% within 2 standard errors. 68% 95% -2 SE +2 SE -1 SE +1 SE

√ σ The mean of our sample
50 -2 SE +2 SE -1 SE +1 SE 95% The mean of our sample distribution is 50. We are 95% confident that the true mean of the population lies between 44 and 56. Our margin of error is +/- 6. Let’s assume we want to be 95% confident of where the true mean of the population lies We can be 95% confident that the true mean lies within +/- 2SE σ √ n SE = In this case, let’s assume that SE=3, so 2 x SE = 6.

Central Limit Theorem/ Margin of Error/ Confidence Intervals
Why Use it? Why is this important? Six Sigma practitioners use the sample data and apply normal theory for making inferences about population parameters irrespective of the actual form of the parent population. Many statistical tests are founded on the principle that we do not need to know the original distribution. Means and proportions will always be “normal” if n is big enough. Practically, we use the central limit theorem to help us estimate the true average, and calculate the likelihood of observing certain events. Considering time and resources, we need to have a measure of confidence around our sample statistics. None of this is applicable if your data is Unreliable or BIASED!!!! 10

Data-Driven Problem Solving: Hypothesis Testing
Two fundamental questions must be adequately answered in order to be able to adequately perform hypothesis testing: What type of data is available (and reliable)? What question are you asking (what do you need to understand)?

Introduction to Hypothesis Testing (pg. 156)
Hypothesis testing is basically the process of using statistical analysis to determine if the observed differences between two or more sets of data are due to random chance variation, or due to true differences in the underlying populations. Generally, Hypothesis Testing tells us whether or not sets of data are truly different with a certain level of confidence. What are the two or more sets of data? Pre-intervention or change and post-intervention or change Local site versus peer or best practice, etc. 12

Are these samples from the same population? Sample 1 Sample 2 Mean=6.5 StDev=1 Mean=6.9 StDev=1.2

The Six Sigma Approach Practical Problem – Statistical Problem –
An unacceptable variation or gap in quality Statistical Problem – Defining the problem in statistical terms Statistical Solution – Using data and statistics to understand the cause of the problem Practical Solution – addresses the verified root causes Practical Problem – Lab specimens are mislabeled too often; leads to incorrect diagnosis and treatment Statistical Problem – Specimens are mislabeled 8 out of 10,000 collected Statistical Solution – ~85% of mislabeled specimens come from the ED Practical Solution – Redesign of the process of labeling and transporting specimens leads to dramatic reduction in errors Six Sigma applies many tools, including statistical tools to practical problems. The key is data-driven decision making.

Hypothesis Testing allows us to answer a practical question - Is there a true difference between ___ and ___ ? Practically, Hypothesis Testing uses relatively small sample sizes to answer questions about the population. There is always a chance that the samples we have collected are not truly representative of the population. Thus, we may obtain a wrong conclusion about the population(s) being studied.

Introduction to Hypothesis Testing: Testing Terms and Concepts
Statistically, we “ask and answer questions” using stated hypotheses that are tested at some level of confidence. The null hypothesis (Ho) is a statement being tested to determine whether or not it is true (the assumption that there is no difference). The alternative hypothesis (Ha) is a statement that represents reality if there is enough evidence to reject the stated null (Ho)… i.e. the null hypothesis is false. The null hypothesis : generally there is no difference between A and B… the null is dull! When there is not enough evidence to reject the null hypothesis, we do not say that we accept the null… we “fail to reject it.” Analogous to the scientific method… it never proves something… it rigourously fails to disprove it.

Example: Is the average Length of Stay for a total knee replacement different for Hospital A vs. Hospital B? Common Language: Ho: There is no difference in average length of stay between facilities. Ha: There is a difference in average length of stay between facilities. Statistical Language: Ho: mAlos = mBlos Ha: mAlos ≠ mBlos

Introduction to Hypothesis Testing: Type I and Type II Errors (Risk)
As stated earlier, there is the risk of arriving at a wrong conclusion about the hypothesis we are testing. The two types of error that can occur with hypothesis testing are called Type I and Type II. The associated risks are called Alpha and Beta risks. A Type I (Alpha) error is concluding there is a difference when there really isn’t one. - Rejecting the null when you should not! A Type II (Beta) error is concluding there is not a difference when there really is one. - Do not reject the null when you should!

Type I and Type II errors, Confidence, Power, and p-values
Conclusion Drawn You conclude there IS a difference when there really isn’t Do not reject H0 Reject H0 Type I Error (a risk) Correct Type II Error (b risk) H0 is true The True Statement H0 is false You conclude there is NO difference when there really is 19

Type I and Type II errors in the Justice System
Verdict Acquittal Guilty Innocent person convicted Innocent person acquitted Guilty person acquitted Guilty person convicted Did not commit crime True State Type I error – The Shawshank Redemption, The Green Mile, Nelson Mandela in South Africa Type II error – OJ Simpson, Casey Anthony Committed crime 20

Result Matrix Ho: No difference between the accused and an innocent person
Jury Trial Hypothesis Testing Verdict Decision Acquittal Guilty Do not reject Ho Reject Ho Did not commit crime Correct Type I error (a) Ho is true Committed Crime Type II error (b) is false Type II error (b) The Truth The Truth

Introduction to p-value
The p-value measures the probability of observing a certain amount of difference if the null hypothesis is true. In comparing the average length of stay (ALOS) at Hospitals A and B, p-value measures the likelihood of observing a difference in ALOS if the null hypothesis is true. If the p-value is large, then both averages probably came from the same population (i.e. there is no difference between ALOS at Hospital A and B). If the p-value is small, then it is unlikely both averages came from the same population (i.e. there is a difference between ALOS at Hospital A and B).

P-Value (pg. 160) What’s the probability of getting a value of “40”?
mean mean 50 If I have a sample distribution with a mean of 50 and I pull a second sample with a mean of forty, how do I know whether this sample represents a truly different population, or whether it is from the same population, but due to the random sampling, just happens to be lower than the first sample’s mean? The yellow zones represent the cut-off … the point at which I decide that the likelihood of the differences being due to random chance is so small, I am willing to say that they are from different populations. In the first distribution, the second sample– the 40– is far enough within the body of the main distribution, so I’m not willing to go out on a limb and say it’s a different population. But in the second distribution, the chance that I got a sample mean so different is very very small, so I’m willing to say that this sample came from a different population. 40 40 50

Setting the Alpha threshold
Alpha (a) is the level of risk you are willing to accept of making a Type I error (i.e. rejecting the null when the null is true). Traditionally, alpha (a) is set at 0.05, which means you are willing to accept a 5% chance of making a Type I error (i.e. rejecting the null when the null is true). So p represents the probability that your comparison value (sample mean) comes from a population that is truly different. It’s a statistical calculation. Your alpha is the level of risk you are willing to accept that a conclusion to reject the null is wrong. Using the Criminal Justice metaphor, your alpha is the level of “reasonable doubt” a juror holds. Remember, we are innocent until proven guilty… innocence is the null hypothesis. If my alpha is .05 or 5%, I have to be at least 95% certain of a person’s guilt to convict. I can’t have more than 5% doubt or else I have to acquit.

P-Value The critical value at which the null hypothesis is rejected.
“If p is low, Ho must go” (usually at or below 0.05) mean Fail to reject a region (reject) For many purposes, an alpha of 5% is fine. But for clinical trials, some precision manufacturing processes, alpha may need to be 1% or 0.1%. 25

Hypothesis Testing – Basic Steps (see also pg 156-160)
State the practical problem State the null hypothesis State the alternate hypothesis Test the assumptions of the data Determine appropriate alpha (a) decision value Calculate the appropriate test statistic and calculate p-value If calculated p-value < a, then reject Ho; if p-value > a then fail to reject Ho Formulate the statistical conclusion into a practical solution

Analyze – Hypothesis Testing – Type I / II Errors
So I may have a theory about mileage on my car. I suspect that my spouse’s driving habits lead to him/her getting worse mileage than I can get. The null hypothesis is that there is no difference in the effect of that specific behavior. The alternate hypothesis is that there is a difference, and that it is statistically significant. What makes it statistically significant? To what degree is it significant? The alpha determines that. So you may want to consider: how much risk am I willing to take in making a statement to my spouse that his/her driving habits are a problem? 5%? 1%? 0.1%? Analyze – Hypothesis Testing – Type I / II Errors

Statistical Testing – Basic Steps
What theory or potential cause is presented or proposed? Given the theory or potential cause in front of you, What is the question you are trying to answer? Do you have data directly related to and describing the question you are asking? What type of data do you have? If you do not have data, can you collect the appropriate data (reasonably and appropriately)? If no data exists relating to the theory being considered, or if it will be very costly to obtain, re-visit the magnitude and urgency of testing this particular theory. Proceed with data collection and sorting/grouping as needed. State the question as a null hypothesis (There is no difference…) State the alternate hypothesis Test the assumptions of the data as needed (normality, quantity, variances, etc.) Determine appropriate alpha (a) decision value (.05, etc.) Chose and calculate the appropriate test statistic (determined by the data you have and the question you are asking) and the associated p-value If calculated p-value < a, then reject Ho; if p-value > a then fail to reject Ho Formulate the statistical conclusion into a practical solution (answer to question)

Remember? - Data-Driven Problem Solving: Hypothesis Testing
Two fundamental questions must be adequately answered in order to be able to adequately perform hypothesis testing: What type of data is available (and reliable)? What question are you asking (what do you need to understand)?

What Type of Data to Analyze:
Discrete X / Continuous Y Continuous X / Continuous Y Discrete X / Discrete Y

Reference Sheet: Statistical Test Selection and "p-values" interpretation (based on 95% Confidence)
Input (x) Output (Y) Practical / General question we are asking The Tool Minitab commands P-Value < 0.05 P-Value > 0.05 / Continuous Is my collected set of data normally distributed Anderson Darling Normality Test Stat>Basic Statistics > Display Descriptive Statitstics You can be confident that your data is not Normally distributed. You can assume that your data is Normally distributed. Discrete Is the average of my sample the same as a given or known value 1 Sample t-Test (against a known value) Stat > Basic Statistics > 1 - Sample t You can be confident that your sample has a different average from the known test value. There is no difference between your sample average and the known test value (based on the data you have). Are the averages from 2 different sets of data the same 2 Sample t-Test Stat > Basic Statistics > 2 - Sample t You can be onfident that the averages of the two samples are different. There is no difference between the averages of the two samples (based on the data you have). Are the averages from paired sets of data (e.g. before / after) the same Paired t-Test Stat > Basic Statistics > Paired t You can be confident that there is a consistent difference between the pairs of data. There is no consistent difference between the pairs of data (based on the data you have). Is there at least one average from several sets of data (>2) that is different One Way ANOVA Stat > ANOVA > One - Way You can be confident that at least one of the samples has a different average from the others. There is no difference in the averages of the samples (based on the data you have). Is there at least one median from several sets of data (>2) that is different Kruskal Wallis & Mood's Median Test Stat > Nonparametrics You can be confident that at least one of the samples has a different median from the others. There is no difference in the medians of the samples (based on the data you have). Is there at least one variance from several sets of data that is different F-test, Levene's test Bartlett's test Stat > ANOVA > Test for equal variances You can be confident that at least one of your samples has a different standard deviation from the others. There is no difference between the standard deviations of the samples (based on the data you have). Is the proportion, or rate, from my sample the same as a given proportional value 1 Proportion (against a known value) Stat > Basic Statistics > 1 Proportion You can be confident that your sample has a different proportion from the known test value. There is no difference between your sample proportion and the known test value (based on the data you have). Are the proportions from 2 different sets of data the same 2 Proportion Stat > Basic Statistics > 2 Proportions You can be confident that the proportions from the two samples are different. There is no difference between the proportions from the two samples (based on the data you have). Is there at least one proportion from several sets of data that is different; Are observed frequencies the same as expected Chi-Square Stat > Tables > Cross Tabulation and Chi - Square You can be confident that at least one of the samples has a different proportion from the others. There is no difference in the proportions from the samples (based on the data you have). As one variable changes, you can predict the change in another (correlated) variable Correlation (Pearson Coefficient) Stat > Basic Statistics > Correlation You can be confident that there is a correlation (Pearson coefficient is not zero). There is no correlation (based on the data you have). (Pearson coefficient could be zero) Does one continuous factor (input) affect another continuous factor (output) Regression Stat > Regression > Regression You can be confident that the input factor (predictor) affects the process output. There is no correlation between the input factor (predictor) and the process output (based on the data you have). So I may have a theory about mileage on my car. I suspect that my spouse’s driving habits lead to him/her getting worse mileage than I can get. The null hypothesis is that there is no difference in the effect of that specific behavior. The alternate hypothesis is that there is a difference, and that it is statistically significant. What makes it statistically significant? To what degree is it significant? The alpha determines that. So you may want to consider: how much risk am I willing to take in making a statement to my spouse that his/her driving habits are a problem? 5%? 1%? 0.1%?

Data-Driven Analysis: Discrete X / Continuous Y
Descriptive Statistics: mean, median, variance, standard deviation Graphical display: box plots, error bars, run charts Potential Questions: Is there a difference in means, medians, variances

Variance Testing Distribution Sample 1 sample 2 sample >2 sample
Normal Non-normal or unknown 1 sample Chi2 Test HO: σ1=σt HA: σ1≠σt t=target Stat>Basic Stat>Display Desc>Graphical Summary (if target std dev falls within CI then fail to reject HO) 2 sample F Test HO: σ1=σ2 HA: σ1≠σ2 Stat>ANOVA>Test for Equal variance Levene’s Test HO: σ1=σ2=σ3... HA: σi≠σj for ij (at least one is different) Stat>ANOVA>Test for Equal Variance >2 sample Bartlett’s Test HO: σ1=σ2=σ3… HA: σi≠σj for i≠j (at least one is different) If variances are NOT equal, proceed with caution or use Welch’s Test, which is not available in Minitab HO: : σ1=σ2=σ3... Sample

Test for Equal Variances
Stat>Basic Statistics>2 Variances

Stat>Basic Statistics>2 Variances Test for Equal Variances: Quality versus Region 95% Bonferroni confidence intervals for standard deviations Region N Lower StDev Upper Bartlett's Test (Normal Distribution) Test statistic = 5.58, p-value = 0.061 Levene's Test (Any Continuous Distribution) Test statistic = 6.24, p-value = 0.002

Stat>Basic Statistics>2 Variances

Hypothesis Testing: Discrete X / Continuous Y
For : 1 Sample t-test (See page 162 in The Lean Six Sigma Pocket Toolbook) Ho: m equal to a target or known value Ha: m is not equal to a target or known value Statistical Test: One sample t-test Test Statistic: T-value – based on the area under the curve of an unknown or non-normal distribution

Hypothesis Testing: Discrete X / Continuous Y
For : 2 Sample t-test (See page 182 in The Lean Six Sigma Pocket Toolbook) Ho: m1 = m2 Ha: m1 ≠ m2 Statistical Test: 2 Sample t-test Test Statistic: T-value – based on the area under the curve of an unknown or non-normal distribution

Population is Non-Normal or Unknown
Hypothesis Testing: Discrete X / Continuous Y Population is Normal Population is Non-Normal or Unknown 1 group 1-Sample T Test 1-Sample Wilcoxon 2 groups 2-Sample T Test Mann-Whitney Test >2 groups ANOVA Mood’s Median Test or Kruskal Wallis Test

Analyze Tools: Discrete X / Continuous Y
Graphical display: Box plots The box shows the range of data values comprising the 2nd and 3rd quartiles of the data – the “middle” 50% of the data See page 110 in The Lean Six Sigma Pocket Toolbook 3rd Quartile line Median line 1st Quartile line

Analyze Tools: Box Plots
1 2 3 4 5 6 7 8 10 13 25% 1st Quartile There are 24 entries in this table 14 5 Median 2nd Quartile 3rd Quartile Extends to largest value within 3Q+1.5 x IQR Outlier Extends to smallest value within 2Q-1.5 x IQR * The Inter Quartile Range (IQR) is the range encompassed by the 2nd Quartile and 3rd Quartile… 6-4=2 25% 2nd Quartile Median= 4.5 25% 3rd Quartile 25% 4th Quartile

Data-Driven Analysis: Continuous X / Continuous Y
Descriptive Statistics: correlation Graphical Display: scatter plot, run charts See in The Lean Six Sigma Toolbook Out of order from Agenda

Analyze Tools: Continuous X / Continuous Y
Correlation indicates whether there is a relationship between the values of two measurements Positive correlation: higher values in X are associated with higher values in Y Negative correlation: higher values in X are associated with lower values in Y. Correlation does NOT imply cause-and-effect! Correlation could be coincidence Both variables could be influenced by some lurking variable

Hypothesis Testing Correlation Statistics
Regression analysis generates correlation coefficients to indicate the strength and nature of the relationship Pearson correlation coefficient (r): the strength and direction of the relationship Between 1 and -1 r2:percent of variation in Y that is attributable to X Between 0 and 1

Hypothesis Testing: Continuous X / Continuous Y
For : Regression and Correlation (pg. 168) Ho: The slope of the line is equal to zero b1 = 0 Ha: The slope of the line does not equal zero b1 ≠ 0 Statistical Test: Regression Test Statistic: F ratio – a measure of actual to expected variation in the sample

Correlation Example Stat>Basic Statistics>Correlation
Correlations: Clarity, Quality Pearson correlation of Clarity and Quality = 0.075 P-Value = 0.208

Pearson’s r Rules of Thumb
Strength and direction of relationship between x and Y 0 to .20: no or negligible correlation. .20 to .40: low degree of correlation. .40 to .60: moderate degree of correlation. .60 to .80: marked degree of correlation. .80 to 1.00: high correlation.

Regression Example Stat>Regression>Regression…
Regression Analysis: Quality versus Clarity The regression equation is Quality = Clarity Predictor Coef SE Coef T P Constant Clarity S = R-Sq = 0.6% R-Sq(adj) = 0.2% Analysis of Variance Source DF SS MS F P Regression Residual Error Total

Regression Example 2 Stat>Regression>Fitted Line Plot…
Regression Analysis: Quality versus Clarity The regression equation is Quality = Clarity S = R-Sq = 0.6% R-Sq(adj) = 0.2% Analysis of Variance Source DF SS MS F P Regression Error Total Analyze - Continuous X / Continuous Y

r2 Rules of Thumb The “coefficient of determination”
What percent of the variation in Y is due to x? less than or equal to .4 - not predictive .40 to .65 mildly predictive .65 to .86 moderately predictive .86 to 1 strongly predictive

Residuals Regression uses a method called “least squares” to choose the line that minimizes the sum of the squared vertical distances from the points on the lines.

Residuals The distances between the points and the regression line are called “residuals.” The residuals represent the portion of the Y that are not explained by the regression equation Residuals A good indicator of how good your model – the regression equation– is. Residuals are the amount of variation in Y that is not explained by the regression equation. Residuals

Residuals In Minitab, you can plot the residuals four ways.
(also see in The Lean Six Sigma Toolbook)

Residuals Regression has three assumptions about residual “errors.”
Errors are: Random and independent Normally distributed Have constant variance

Residuals Errors are random and independent Residuals versus order
Displayed in order collected If order is immaterial, do not use this Are the residuals random? Do they exhibit any patterns?

Residuals Errors are normally distributed Normal plot of residuals
Errors should follow a straight line on a normal probability plot Use the “fat pencil” test. Would a fat pencil laid on the normal probability plot cover the data points?

Residuals Errors have constant variance over all values of x
Residuals versus fits Should show a random scatter and have no pattern Should have roughly the same number of point above 0 as below

Flavor versus Quality Correlations: Quality, Flavor
Pearson correlation of Quality and Flavor = 0.870 P-Value = 0.000 Regression Analysis: Quality versus Flavor The regression equation is Quality = Flavor S = R-Sq = 75.7% R-Sq(adj) = 75.6% Analysis of Variance Source DF SS MS F P Regression Error Total

Analyze Tools: Continuous X / Continuous Y
r =1 r2=1 Perfect positive correlation r =-1 r2=1 Perfect negative correlation Out of order from Agenda r = 0 r2= 0 No correlation

Data-Driven Analysis: Discrete X / Discrete Y
Descriptive Statistics: counts and proportions Graphical display: bar graph and Pareto chart A Pareto chart is a type of bar graph where the categories are arranged from largest to smallest with a line indicating the cumulative percent

Contingency Tables χ2 : the statistic used to test hypotheses about the frequency of some event Goodness of Fit: is observed different from expected? Test for independence: are samples from the same distribution?

Goodness of Fit Test Compare actual and expected frequencies
Calculate the χ2 statistic Compare to a χ2 critical value from table If χ2calc > χ2crit, there is a difference

Calculate the χ2 statistic
χ2= Σ g (fo-fe)2 fe j=1 χ2= the sum of the squares of the differences between the actual and the expected frequencies divided by the expected frequencies

Coin-toss Will a fair coin tossed 100 times come up 66 times heads and 34 times tails?

Σ Coin-toss =10.24 (fo-fe)2 fe χ2calc= (fo-fe)2 fe Σ g j=1 Observed
Expected (fe) Heads 66 50 Tails 34 (fo-fe)2 fe Σ 10.24 (66-50)2 50 162 50 = 256 50 = =10.24 Σ g (fo-fe)2 fe j=1 χ2calc= = 5.12 (34-50)2 50 -162 50 = 256 50 = 5.12 =

Look up the χ2 critical value
First we must determine the degrees of freedom in the contingency table “Degrees of freedom” represents the number of values in the final calculation of a statistic that are free to vary DF=(rows in data-1)*(columns in data-1) In our example, the DF=1

p-value Df/area 0.1 0.05 0.025 0.01 0.005 1 6.6349 2 3 9.3484 4 5 If χ2calc > χ2crit, there is a difference χ2 calc = 10.24 χ2 crit = 3.84 There is a difference!

Chi-Square Test for Independence
Goodness of Fit asked if frequencies were different than expected Test for Independence asks whether our samples come from the same population Example: Students in a Six Sigma Black Belt course are offered two different time slots for taking their final exam. Is there a difference in the passing and failing rates for each group? State the null and alternative hypotheses for this problem.

Chi-Square Test for Independence
χ2= Σ g (fo-fe)2 fe j=1 We use the same formula, but calculate the expected differently

Test for Independence Arrange the data in table, showing observed frequencies Calculate the expected frequencies for each cell Calculate the χ2 statistic in each cell Sum the χ2 statistic from each cell Compare to a χ2 critical value from table If χ2calc > χ2crit, there is a difference

Calculating fe (f row * f column) fe= N Number passing Number failing
Total 1st test fo=20 fo=50 fo=70 2nd test fo=40 fo=110 fo=60 fo=120 fo=180 (f row * f column) N fe=

Total 1st test fo=20 fe=(70*60)/180 fo=50 fo=70 2nd test fo=40 fo=110 fo=60 fo=120 fo=180 (f row * f column) N fe=

Total 1st test fo=20 fe=23.33 fo=50 fe=(120*70)/180 fo=70 2nd test fo=40 fo=110 fo=60 fo=120 fo=180 (f row * f column) N fe=

Total 1st test fo=20 fe=23.33 fo=50 fe=46.67 fo=70 2nd test fo=40 fe=36.37 fe=73.33 fo=110 fo=60 fo=120 fo=180 (f row * f column) N fe=

Calculate the χ2 statistic for each cell
Number passing Number failing Total 1st test fo=20 fe=23.33 fo=50 fe=46.67 fo=70 2nd test fo=40 fe=36.37 fe=73.33 fo=110 fo=60 fo=120 fo=180 .476 .238 .303 .151 Σ g (fo-fe)2 fe j=1 χ2calc= = 1.169

p-value Df/area 0.1 0.05 0.025 0.01 0.005 1 6.6349 2 3 9.3484 4 5 If χ2calc > χ2crit, there is a difference χ2 calc = 1.169 χ2 crit = 3.84 There is no difference! Therefore, we fail to reject the null hypothesis. Ho= pass and fail rate are independent of the time the test was administered.

χ2calc θ= n(q-1) Cramer’s test
Quantifies the strength of the association between x and y Where: n=total number of observations q=lesser of rows or columns χ2calc θ= n(q-1) Describing the strength of association .5 to high association .3 to moderate association .1 to low association 0 to little if any association 78

Cramer’s test Quantifies the strength of the association between x and y Where: n=total number of observations q=lesser of rows or columns 1.169 θ= n(q-1) Describing the strength of association .5 to high association .3 to moderate association .1 to low association 0 to little if any association 79

Cramer’s test Quantifies the strength of the association between x and y Where: n=total number of observations q=lesser of rows or columns 1.169 θ= 180(1) Describing the strength of association .5 to high association .3 to moderate association .1 to low association 0 to little if any association 80

Cramer’s test Quantifies the strength of the association between x and y Where: n=total number of observations q=lesser of rows or columns θ= Describing the strength of association .5 to high association .3 to moderate association .1 to low association 0 to little if any association 81

Cramer’s test Quantifies the strength of the association between x and y Where: n=total number of observations q=lesser of rows or columns θ=0.0806 Describing the strength of association .5 to high association .3 to moderate association .1 to low association 0 to little if any association 82

Hypothesis Testing: Discrete X / Discrete Y
For : Comparing one proportion to a given value Ho: The proportion is equal to a given percentage Ha: The proportion is not equal to a given percentage Statistical Test: 1 Proportion Test Statistic: Z score – based on the area under the curve of a normal distribution

For: comparing two proportions Ho: The proportion of group A equals the proportion of group B PA = PB Ha: The proportion of group A does not equal the proportion of group B PA ≠ PB Statistical Test: Test of Proportions Test Statistic: Z Score – based on the area under the curve of a normal distribution

Considerations For contingency tables, the expected cell count should be at least 5 For proportions tests, if you do not have enough successes or failures in your numerator, consider using Fisher’s Exact Test Generally, np > 5 and n(1-p) > 5 is a minimum standard

Six Sigma Analyze: Remember, statistical analysis and testing within the context of practically applying Lean Six Sigma is about using data to identify the Key Xs to “fix” that will most likely result in a measureable improvement in the process Y (output), which in turn will improve customer satisfaction and efficiency.

Key Deliverables for Analyze
Main elements of Define and Measure completed “Obvious Xs” identified and confirmed Potential Xs identified, data collected and analyzed Root causes investigated and supported with data – the Xs to improve

Project Name: Define Measure Analyze Improve Control
Problem Statement: Mislabeled example Project Scope: Enter scope description Champion: Name Process Owner: Name Black Belt: Name Green Belts: Names Customer(s): CTQ(s): Defect(s): Beginning DPMO: Target DPMO: Estimated Benefits: Actual Benefits: Start Date: Enter Date End Date: Enter Date Benchmark Analysis Project Charter Formal Champion Approval of Charter (signed) SIPOC - High Level Process Map Customer CTQs Initial Team meeting (kickoff) Define Measure Start Date: Enter Date End Date: Enter Date Identify Project Y(s) Identify Possible Xs (possible cause and effect relationships) Develop & Execute Data Collection Plan Measurement System Analysis Establish Baseline Performance Start Date: Enter Date End Date: Enter Date Identify Vital Few Root Causes of Variation Sources & Improvement Opportunities Define Performance Objective(s) for Key Xs Quantify potential $ Benefit Analyze Improve Start Date: Enter Date End Date: Enter Date Generate Solutions Prioritize Solutions Assess Risks Test Solutions Cost Benefit Analysis Develop & Implement Execution Plan Formal Champion Approval Start Date: Enter Date End Date: Enter Date Implement Sustainable Process Controls – Validate: Control System Monitoring Plan Response Plan System Integration Plan $ Benefits Validated Formal Champion Approval and Report Out Control Directions: Replace All Of The Italicized, Black Text With Your Project’s Information Change the blank box into a check mark by clicking on Format>Bullets and Numbering and changing the bullet. Not Complete Complete Not Applicable Author: Enter Name Date: April 20, 2017

Six Sigma Analyze: Now, what specifically are we going to improve in the Improve Phase? We should have evidence (data) to support what we are improving and why?

Lean Six Sigma Black Belt Training! Analyze (Part 2) Agenda

Similar presentations

Presentation on theme: "Lean Six Sigma Black Belt Training! Analyze (Part 2) Agenda"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lean Six Sigma Black Belt Training! Analyze (Part 2) Agenda

Similar presentations

Presentation on theme: "Lean Six Sigma Black Belt Training! Analyze (Part 2) Agenda"— Presentation transcript:

Similar presentations

About project

Feedback