Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hypothesis Testing. Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly.

Similar presentations


Presentation on theme: "Hypothesis Testing. Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly."— Presentation transcript:

1 Hypothesis Testing

2 Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly a difference between a set of data from two samples. Mathematically, we infer this with some degree of confidence in our decision. ------------------------------------------------------------------------------ What does it prove ? – helps us determine whether observed differences are: statistically significant or due to chance (random or common cause variation)

3 Test a Hypothesis – (H Ø and H a) Null Hypothesis, (Ho) comes from word nullify (to negate) associated with distribution of chance events typically, the null hypothesis is: “2 samples are the same, except for variation caused by chance” ------------------------------------------------------------------------------ Alternate Hypothesis, (Ha) used as an alternative to the null hypothesis those hypotheses that identify a distribution of events that is not a chance distribution typically, the alternative hypothesis: “2 samples are fundamentally different”

4 Null Hypothesis and Risk Alpha Risk - finding a difference when one doesn’t really exist [A FALSE REJECT] [ probability of making an incorrect decision ] usually 5% or less i.e. – Jury decided GUILTY verdict when person was really INNOCENT - Rejecting a good part on the assembly line (aka: producers risk) ------------------------------------------------------------------------------- Beta Risk - NOT finding a difference when there is one [A FALSE ACCEPT] [ probability of making a right decision when you’re really wrong ] less than 10% chance it could have occurred randomly i.e. - taxi driver thought corner was dangerous when it was safe - Accepting a defective part from the assembly line (aka: consumers risk) NOTE: Statistically, P value is probability of occurrence by “chance only” (Ho = true (no difference) then a “high” >.05 p-value occurs)

5 Alpha Beta Risk ( GENERAL RULES) Hypothesis testing; Tests “NULL” hypothesis [Ho = NO difference ] against an alternative hypothesis [Ha = groups (data) are different ] ---------------------------------------------------------------------------------- If p value <.05 (reject Ho and conclude Ha) Are different (truly) If p value >.05 (cannot reject Ho) … So, there truly is NO difference ----------------------------------------------------------------------------------- Why use?? To detect differences that may be important to the business. Is minor difference in averages due to random variation or reflect a true difference. Want to see the impact of our intervention.

6 Statistical Difference vs. Practical Importance You Decide … If there are large amounts of data, or the variation within the data is very small, hypothesis tests can detect very small differences between samples While the samples are statistically different, the differences may not mean much in the PRACTICAL world. DOES IT MAKE BUSINESS SENSE ?? DOES IT PASS THE COMMON SENSE TEST ?

7 What are the Data Assumptions ? If data is continuous, we assume underlying distribution is Normal. You may need to transform non-Normal data. (i.e.: cycle times). When comparing groups from different populations we assume: independent samples achieved through random sampling samples are representative (unbiased) of the population When comparing groups from different processes we assume: each process is stable there are no special causes or shifts over time samples are representative of the process (unbiased) Also note: Pre-test and Post-test would violate independence. By knowing one, there is a possibility to predict the other. Any repeated measures of the same individuals would also violate independence.

8 Two Sample T Test “Are the means of these two normally distributed groups really different from each other?” If the P value is.05 or less, it is usually accepted that the groups are different. One Way ANOVA Similar to the two sample T test, except that it can handle more than two groups. Again, the groups must be normal. We also have the added requirement that the variances (and the standard deviations) of all the groups are approximately equal. A P value of.05 or less indicates that the mean of at least one group is different from the rest. Mann-Whitney Similar to, and less powerful than Two Sample T, but does not require normally distributed data. Homogeneity of Variance “Are the variances (and hence the standard deviations) of these groups of data equal?” Often used preparatory to ANOVA. If the P value of Levine's test is.05 or less, the variances are assumed to be unequal. Normality Test “Is this data normally distributed?” If the P value of the Anderson-Darling test is.05 or less, the data is presumed to be not normal. For small groups of data, the “fat pencil test” is more meaningful.

9 Multi-Vary Study A passive examination of the process as it runs in its normal state. By noting the state of key input variables, and the simultaneous state of output variables, useful correlations can often be found. Sometimes a Multi-Vari study will reveal the sources of problems. In other cases, the outputs of a Multi-Vari study become the inputs to a designed experiment. Outputs are often shown as Main Effects Plots and/or Boxplots. Chi Square Test Used with count data, arranged in a matrix of rows and columns. For example, TREATED and UNTREATED columns, and LIVED and DIED rows, in a 2x2 matrix. Counts entered into each cell are the number of people in each category. P value of the Chi Square test indicates whether or not the rows and columns are statistically independent, i.e, does ‘treatment’ or the lack of it influence survival? Regression Regression is used with interval/ratio/variable inputs and outputs. It answers the questions, “Are the inputs and outputs linearly correlated?” and “If they are linearly correlated, what is the formula that connects them?”. One output is an equation of the form Y=mX+b, where Y is the output variable, m is the slope of the line, X is the input variable, and b is a constant (the Y intercept). Another output is an R 2 value. An R 2 of 86% says that 86% of the observed variation is explained by the straight line model, and 14% is not. Regression with more than one input variable is called Multiple Linear Regression.

10 Hypothesis Testing Continuous Data Non-Normal DataNormality TestNormal Attribute Data Contingency Table One Sample T-Test One Way ANOVA Two-Sample T-Test (Variances Equal) Two-Sample T-Test (Variances Not Equal) Bartlett’s TestLevene’s Test One Sample Two or More Samples For all tests: p>0.05 Fail to Reject Ho (Null) p<0.05 Reject Ho Ho:  1 =  2 =  3 … Ha: At least one is different Minitab: Stat > ANOVA > Homog of Variance Ho: M1 = M target Ha: M1  M target Minitab: Stat > Nonparametric > 1 sample sign (OR) Stat > Nonparametric > 1 sample - Wilcoxon Ho: M1 = M2 = M3... Ha: At least one is different Minitab: Stat > Nonparametric > Mann Whitney (OR) Stat > Nonparametric > Kruskal Wallis (OR) Stat > Nonparametric > Moods Median (OR) Stat > Nonparametric > Friedmans Two or More Samples Ho: Data is normal Ha: Data is NOT normal Minitab: Stat > Basic Stat > Normality Test (Use Anderson-Darling) Ho: Two Factors are INDEPENDENT Ha: Two Factors are DEPENDENT Minitab: Stat > Tables > Chi-Square Test Two or More Samples One Sample Ho:  1 =  2 =  3 … Ha: At least one is different Minitab: Stat > ANOVA > Homog of Variance Ho:  1 =  2 =  3 … Ha: At least one is different Minitab: Stat > ANOVA > One-Way Ho:  1 =  target Ha:  1   target Minitab: Stat > Basic Stat > 1 Sample T Equal Variance (Three or more samples) Ho:  1 =  2 Ha:  1  2 Minitab: Stat > Basic Stat > 2 Sample T (Check Box for Equal Variance) Two Samples Ho:  1 =  2 Ha:  1  2 Minitab: Stat > Basic Stat > 2 Sample T (Check Box for Unequal Variance) One Sample “Roadmap”


Download ppt "Hypothesis Testing. Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly."

Similar presentations


Ads by Google