# Blended Lean Six Sigma Black Belt Training – ABInBev

## Presentation on theme: "Blended Lean Six Sigma Black Belt Training – ABInBev"— Presentation transcript:

Blended Lean Six Sigma Black Belt Training – ABInBev
Normal class introduction – “Welcome to the 3 Day session of the ASQ Lean Six Sigma Black Belt Blended training. We’re planning to cover several topics and areas during our time together. The scope and intent is to review some of the session materials that you have already completed—or in process of completing, to ensure that you’re comfortable with the pace of learning, the materials, and progress you have made to date on your project. Transition to next Correlation and Regression ©2010 ASQ. All Rights Reserved.

Module Objectives Learn and apply some key Black Belt tools used to analyze your data How to develop and interpret the correlation between variables Develop a mathematical model expressing the relationship—egression Regression Simple Linear Regression Multiple Linear Regression Logistic Regression Interpret the correlation coefficient and determine its statistical significance (p-value); recognize the difference between correlation and causation. Interpret the linear regression equation and determine its statistical significance (p-value). Use regression models for estimation and prediction. Use and interpret the results of hypothesis tests for means, variances and proportions. Select, calculate and interpret the results of ANOVAs. This review module is aligned with your Moresteam Web Training, Session 7: Identifying Root Cause

So Where Are We Now? We have understood our process using process maps and FMEA. Created graphs and charts to visualize what is happening in our process—seven basic tools. Validated our measurement system to ensure our data is both precise and accurate—Gage R&R. Collected data to establish our process performance using process capability analysis. Now we are going to use the statistical tools to infer cause and effect/uncover underlying relationships. [Self Explanatory]

Terms Correlation Regression Used when both Y and X are continuous
Measures the strength of linear relationship between Y and X Metric: Pearson Correlation Coefficient, r (r varies between -1 and +1) Perfect positive relationship: r = 1 No relationship: r = 0 Perfect negative relationship: r = -1 Regression Simple linear regression used when both Y and X are continuous Quantifies the relationship between Y and X (Y = b0 + b1X) Metric: Coefficient of Determination, R-Sq (varies from 0.0 to 1.0 or zero to 100%) None of the variation in Y is explained by X, R-Sq = 0.0 All of the variation in Y is explained by X, R-Sq = 1.0 Correlation and regression must be used with continuous variables.

Correlation Coefficients: Illustration
103 102 101 100 99 98 X Y SCATTERPLOT OF Y VERSUS X r = +1.0 r = -1.0 If r = 1: perfect positive correlation, increase x, increase y. If r = - 1: perfect negative correlation, increase x, decrease y. If r = 0: no correlation, increase x or decrease x, y has random response. Issue: What is “good enough?” r does not tell us, but r-squared, addressed later, does. r = 0.0

Correlation: Minitab Example
Voltage for the same power supply is measured at Station 1 and Station 2. Determine the correlation for voltage between the two stations. Approach: Open Datafile:CORRELAT.mtw (the data are displayed in the Data Window) Go to Stat > Basic Statistics > Correlation…

Correlation: Minitab Example (Continued)
Select C1 Station 1 and C2 Station 2 Select Display p-values 1 Graph > Scatterplot…Simple 2

Correlation: Minitab Example (Continued)
From Minitab Session Window Null Hypothesis: no correlation between Station 1 and Station 2 (H0 is false because p is less than 0.05) 16 JUL 03, DAVE: remind class to always calculate the correlation coefficient for a scatterplot—so they don’t get misled

ABI Example 1 – Correlation
This project related to measuring client satisfaction in the BSC. Client satisfaction was measured by a monthly survey of five general questions Four answers could be given for each question: ”very dissatisfied”, “dissatisfied”, “satisfied”, and “very satisfied”. The questions were about response time, language knowledge, helpfulness, quality of solution, and knowledge. A correlation test was run to determine if there is a relationship between the questions—meaning that if a low score in one area might mean a low score in another, and so on… Isabelle Verdoodt and Matthias Pindur Belt Project, Zone WE © Anheuser-Busch InBev. All Rights Reserved.

ABI Example 1 – Correlation (Continued)
Is there a correlation betweeen customer satisfaction questions? Isabelle Verdoodt and Matthias Pindur Belt Project, Zone WE There is only correlation between the questions Helpfulness and Knowledge. © Anheuser-Busch InBev. All Rights Reserved.

ABI Example 2 – Correlation

ABI Example 2 – Correlation (Continued)
Pearson Correlation P-Value Volume vs. Investment / case – Bud SBT 0.819 0.000 Volume vs. Investment / case – HICE 500 0.594 Volume vs. Investment / case - Bud BBT .890 Volume vs. Investment / case – HICE 600 .139 0.312 Is there a correlation? What is the strength of the relationship? Luke Zhou Belt Project, Zone APAC © Anheuser-Busch InBev. All Rights Reserved.

Testing Method Selection Matrix
Variable Type Attribute Y Count Y Continuous Y Discrete X 1 or 2 Treatments Proportions 3+ Treatments Chi Square 1 or 2 Treatments Poisson 1 or 2 Treatments T tests 3+ Treatments ANOVA Continuous X Logistic Regression Least Squares Regression

Simple Linear Regression Analysis
Used to fit lines and curves to data when the parameters (bs) are linear The fitted lines: Quantify the relationship between the predictor (input) variable (X) and response (output) variable (Y) Help to identify the vital few Xs Enable predictions of the response Y to be made from a knowledge of the predictor X Identify the impact of controlling a process input variable (X) on a process output variable (Y) Produces an equation of the form: Paraphrase slide. Regression analysis is used to estimate or investigate the mathematical relationship between variables. We can predict outcomes based on the equation of the fitted “line” and can eliminate non-critical variables. Define the x’s to be controlled that drive the y’s in Y = f(x)

Regression: Minitab Example 1
A Black Belt in the Supply department is tracking the output of voltage at two different stations. Voltage is measured at Station 1 and Station 2. A Black Belt is given the task of predicting the voltage at Station 2 from the voltage measured at Station 1. Stat>Regression>Fitted Line Plot Approach: Open Datafile: CORRELAT.mtw (the data are displayed in the Data Window) Go to Stat > Regression > Fitted Line Plot… 15

Regression: Minitab Example 1 (Continued)

Regression: Minitab Example 1 (Continued)
Prediction equation Fitted line: obeys the prediction equation Coefficient of Determination: use R-Sq for simple linear regression (one X)

Regression: Minitab Example 1 (Continued)
From the Session Window, the regression equation is: Station 2 = Station 1 The intercept (b0) is where the fitted line (regression line) crosses the Y-axis when X = 0. The slope, b1, is “rise over run”, or DY/DX. The coefficients b0 and b1 are estimates of the population parameters b0 and b1: they are linear coefficients. Intercept, b0 Slope, b1 Instructor nores added 5/2/02 - The regression equation allows us to predict the performance of Station 2 based upon the performance of Station 1! Practically, what does this mean? You can measure the voltage only at Station 1 and plug it into the equation. You can then predict the voltage at Station 2.. As a result of the regression equation, you no longer need to measure the voltage at Station 2. 18

Statistical Significance – Minitab Example 2
An analysis of variance (ANOVA) table informs us about the statistical significance of the regression analysis. Hypothesis for Regression: H0: The regression results from common cause variation—when H0 is true, there is no statistically significant regression, and the best prediction of Y is the mean of Y. Ha: The regression is statistically significant. Look at the p-value used to evaluate the null hypothesis; in this case, alpha = 0.05. So if p is less than alpha, then reject the null hypothesis. You can conclude that the regression is statistically significant Approach: Use Datafile:REGRESSANOVA.mtw Go to Stat > Regression… >Regression

ANOVA for Simple Linear Regression – Minitab Example 2 (Continued)
REGRESSANOVA.mtw Stat > Regression… >Regression

ANOVA for Simple Linear Regression – Minitab Example 2 (Continued)
Regression is significant: p < 0.05 What is R-sq value telling us?

Analysis of Residuals – Minitab Example 2 (Continued)
Residuals are used to test the adequacy of the prediction equation (model) In residual plots, three types of plots indicate model inadequacy The plots will be dramatic—not subtle! 1. Fans 2. Bands sloping up or down 3. Curved bands

Analysis of Residuals – Minitab Example 2 (Continued)
Do you see any patterns in the residuals that might indicate model inadequacy?

Regression: Minitab Example 3
Illustrating the analysis of residuals Use Datafile: RESIDUALS.mtw Go to Stat > Regression… >Fitted Line Plot Linear

Regression: Minitab Example 3 (Continued)
R-Sq is 89.7%. The regression is significant. Can we do better? How do the residuals look?

Regression: Minitab Example 3 (Continued)
Not quite random! What do the Residuals look like? Is the straight line a best fit? What do you suggest?

Regression: Minitab Example 3 (Continued)
Illustrating the analysis of residuals Continuing with the same example ….. Use Datafile: RESIDUALS.mtw Go to Stat > Regression… >Fitted Line Plot > Quadratic

Regression: Minitab Example 3 (Continued)
Improving the model adequacy increased R-Sq from 89.7% to 95.0% How do the residuals look?

ABI Example 1: Correlation and Regression
Trying to determine if there is a relationship between Customer Delivery Performance and Forecast Accuracy? What is the Regression Equation? UKI Forecast Accuracy (FA) Gustavo Burger Belt Project – Zone WE © Anheuser Busch InBev. All Rights Reserved.

ABI Example 2: Correlation and Regression
Bud SBT Pearson correlation: 0.819 P value:0.00 Legend: Volume is units sold Investment per case is how much money is paid to the POC Project description : Volume Vs Investment per case Investment per case is how much money is paid to the POC If POC sells more, ABI is willing to pay more However there are other factors that are influencing investment / case such as loyality, image and relationship Class questions - Is there a correlation between Volume and Investment per case for Bud SBT? Do we need to run a regression equation? Why / Why-not? What is the Rsq (adj) figure telling you? Luke Zhou Belt Project - Zone APAC © Anheuser Busch InBev. All Rights Reserved.

ABI Example 3: Correlation and Regression – Spare Parts Inventory
Determine whether these is a correlation between the inventory value of spare parts and the volume packaged at each brewery. The regression equation is: Gross Inv Val = 3,531, ,979 Vol pack (MM bbl) Predictor Coef SE Coef T P Constant Vol pack (MM bbl) S = R-Sq = 66.7% R-Sq(adj) = 63.3% Analysis of Variance Source DF SS MS F P Regression E E Residual Error E E+12 Total E+14 Regression Analysis: Gross Inv Val vs. Volume packaged What is the regression equation? What is the Rsq (adj) figure telling you? Is there a correlation between the inventory value and the volume packaged at each brewery? Null: There is not a correlation Alternative: There is a correlation P-value was > p is low, reject the null Conclusion -> There is a correlation Katie Shiro Belt Project, Zone NA © Anheuser Busch InBev. All Rights Reserved.

Multiple Linear Regression – Exercise 1 (Continued)
Our goal is to fit a multiple regression of the following form: This example will illustrate the following additional aspects of multiple regression: 1. Elimination of X-variables that have no explanatory power 2. Residual analysis

Multiple Factor Correlation and Regression
Data on water usage has been collected along with data on factors that may be used to predict water usage. The factors were average temperature, production volume, number of associates, number of days of plant operation, and number of visitors. Data is in Water Usage.mtw

Multiple Factor Regression
Stat>Regression>General Regression Recommend you always turn this option on.

Visitors are not significant and should be removed from the model.
Session Window Regression Equation Water Usage = Average Temp Production Operating Days Associates Visitors Coefficients Term Coef SE Coef T P VIF Constant Average Temp Production Operating Days Associates Visitors Visitors are not significant and should be removed from the model.

Reduced Model Regression Equation
Water Usage = Average Temp Production Operating Days Associates Coefficients Term Coef SE Coef T P VIF Constant Average Temp Production Operating Days Associates Variance Inflation Factor (VIF) checks for factors that are co-linear. Co-linear factors may cause invalid models and should be avoided. Rule of thumb: VIFs < 8 are not a problem. If factors are highly correlated, try removing one from the model or using Partial Least Squares Regression.

The Rest of the Session Window
Standard deviation of the error term Summary of Model S = R-Sq = 76.10% R-Sq(adj) = 68.14% PRESS = R-Sq(pred) = 58.65% Analysis of Variance Source DF Seq SS Adj SS Adj MS F P Regression Average Temp Production Operating Days Associates Error Total How well the model is expected to predict new observations.

Residual Analysis The residuals are normally distributed with a mean of zero and a constant variance. There is no reason to reject the model.

Let’s Use the Model to Predict Usage
You have been asked to predict the amount of usage for a month with an average temperature of 68, production of 1400, 20 days of operation, and 175 associates. Do Control + E to bring back previous dialog box

The Prediction The predicted value
Predicted Values for New Observations New Obs Fit SE Fit % CI % PI ( , ) ( , ) The predicted value However, because of the low r2 Predicted , the prediction intervals are very wide. However, because of the low r2 Predicted the prediction intervals are very wide.

Multiple Regression: ABI Example 1 – Brand Health
Is there a relationship between price and market share? This output is from Excel. What is the significance telling us? How do you interpret the Rsq? = Increase of 10% in price will decrease the share by – 17.6% Pedro Lozada Belt Project – Zone GHQ © Anheuser Busch InBev. All Rights Reserved.

ABI Example 2 – UK CDP Performance (Multiple Regression Analysis)
The GLY considers all change over times that happen during the filling process on the lines, this can a very good indication of potential source of variation. In order to assess the line performance disregarding the numerous change over times we need to look at LEF (Line efficiency). What is the prediction model between Customer Delivery Performance in the UK and line efficiency (LEF)? Gustavo Burger Belt Project – Zone WE © Anheuser Busch InBev. All Rights Reserved.

ABI Example 3 – Multiple Regression
Price Change vs. Ad Feature Use multi-variable regression to separate the impact of a price decrease vs. placing the product in the ad feature. Mike Zacharias Belt Project – Zone NA Source: NC Food Lion Natural Light 24pks © Anheuser Busch InBev. All Rights Reserved.

ABI Example 3 (Continued)
Practically what does this mean? What is the regression equation? From the regression equation: A \$1 price decrease is worth 1.8 share points, and an ad feature is worth 6.0 share points. Analysis of project regression - From the regression equation: A \$1.00 Price decrease is worth 1.8 shr pts, and an ad feature is worth 6.0 shr pts. Mike Zacharias Belt Project – Zone NA © Anheuser Busch InBev. All Rights Reserved.

Logistic Regression Logistic regression is a variation of ordinary regression which is used when: The dependent (response) variable is a dichotomous variable (i.e., it takes only two values, which usually represent the occurrence or non-occurrence of some outcome event, usually coded as 0 or 1). The independent (input) variables are continuous, categorical, or both.

Testing Method Selection Matrix
Variable Type Attribute Y Count Y Continuous Y Discrete X 1 or 2 Treatments Proportions 3+ Treatments Chi Square 1 or 2 Treatments Poisson 3 + Treatments Chi Square 1 or 2 Treatments T tests 3 + Treatments ANOVA Continuous X Logistic Regression Least Squares Regression

Logistic Regression Log (p/1 − p) = β0 + β1x
Logistic Regression evaluates the occurrence of the event in terms of its probability. If an event happens (success), the probability is “p” The probability of the event not happening is given by (1-p) Odds of success relative to failure is the ratio of p/(1-p) The logistic regression model is fitted to the natural logarithm of the odds Ln {p/(1-p)} The statistical model for logistic regression is: Log (p/1 − p) = β0 + β1x where p is a binomial proportion and x is the input factor. The parameters of the logistic model are β0 and β1.

The Logistic Function Datafile/EXHREG.XLS
The logistic function starts very close to 0, then rises rapidly as the event probability threshold is approached, then asymptotically approaches 1. Probability of event Datafile/EXHREG.XLS

An Example A cereal company want to determine the factors that increase the probability a consumer will purchase their product. Data was collected on 71 consumers to determine the effect of whether they had seen an advertisement, whether they have children, their income, and if they purchased the cereal. Data is in Logistic Regression Cereal Ad.mtw.

Set Up the Analysis Stat>Regression>Binary Logistics Regression
Discrete factors that are included in the model are entered in the Factors box.

Option and Graphs

Logistic Regression Output
Variable Value Count Bought Yes (Event) No Total Logistic Regression Table Odds % CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant Income Children Yes ViewAd Yes Log-Likelihood = Test that all slopes are zero: G = , DF = 3, P-Value = 0.000 The null hypothesis is that the factor has no effect on the event probability. All three factors are statistically significant

Model Integrity Goodness-of-Fit Tests Method Chi-Square DF P Pearson Deviance Hosmer-Lemeshow The null hypothesis for goodness of fit is that the model fits. Do not reject the null hypothesis and conclude the model fits. Measures of Association: (Between the Response Variable and Predicted Probabilities) Pairs Number Percent Summary Measures Concordant Somers' D Discordant Goodman-Kruskal Gamma 0.77 Ties Kendall's Tau-a Total

The Chi-Square vs. Probability Graph
Right-click on the graph and brush the outliers. Note them in the data sheet.

Prepare a Graph of the Results
Do Control + e to bring back previous dialog box

Storing the Data

Preparing the Graph Graph>Scatterplot

Presenting the Results

Exercise – Your Turn Data was collected for the outcome of emergency room admissions. A hospital administrator would like help determining if any of the factors collected could be used to predict the probability of dying in the hospital. The data is in Datafile/Emergency.MTW. A definition of the terms is given in Datafile/EmergencyFileTerms.DOC. Allow 25 minutes for this exercise.

What Have We Covered? Learned and applied key tools to analyze your data How to develop and interpret the correlation between variables Develop a mathematical model expressing the relationship—regression Regression Simple Linear Regression Multiple Linear Regression Logistic Regression

In the Next Module . . . We will learn how to determine the proper sample size and the power of the test We will use Minitab to determine: Sample size Delta Power

Supplemental Material

Exercise Solution – Emergency Room

Exercise Solution – Emergency Room (Continued)
Odds % CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant Age Sex Race Ser Can PRE TYP Age, TYP, and Can are significant

Exercise Solution – Emergency Room (Continued)

Exercise Solution – Emergency Room (Continued)
Logistic Regression Table Odds % CI Predictor Coef SE Coef Z P Ratio Lower Upper Constant Age Can TYP Even though Can is slightly over .05, let’s keep it in the model.

Exercise Solution – Emergency Room (Continued)

Exercise Solution – Emergency Room (Continued)