Presentation is loading. Please wait.

Presentation is loading. Please wait.

Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014.

Similar presentations


Presentation on theme: "Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014."— Presentation transcript:

1 Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014

2 15-2 Chi-Square Tests ML 10.1 Chapter Contents 15.1 Chi-Square Test for Independence 15.2 Chi-Square Tests for Goodness-of-Fit 15.3 Uniform Goodness-of-Fit Test 15.4 Poisson Goodness-of-Fit Test 15.5 Normal Chi-Square Goodness-of-Fit Test 15.6 ECDF Tests (Optional) Chapter 15 So many topics, so little time …

3 15-3 Chi-Square Test for Independence A contingency table is a cross-tabulation of n paired observations into categories. Each cell shows the count of observations that fall into the category defined by its row (r) and column (c) heading. Contingency Tables Contingency Tables Chapter 15

4 15-4 Contingency Tables Contingency Tables For example:For example: Chapter 15 Chi-Square Test for Independence

5 15-5 Chi-Square Test Chi-Square Test In a test of independence for an r x c contingency table, the hypotheses are H 0 : Variable A is independent of variable B H 1 : Variable A is not independent of variable B Use the chi-square test for independence to test these hypotheses. This nonparametric test is based on frequencies. The n data pairs are classified into c columns and r rows and then the observed frequency f jk is compared with the expected frequency e jk. Chapter 15 Chi-Square Test for Independence

6 15-6 The critical value comes from the chi-square probability distribution with d.f. degrees of freedom. d.f. = degrees of freedom = (r – 1)(c – 1) where r = number of rows in the table c = number of columns in the table Appendix E contains critical values for right-tail areas of the chi-square distribution, or use Excel’s =CHISQ.DIST.RT(α,d.f.) The mean of a chi-square distribution is d.f. with variance 2d.f. Chi-Square Distribution Chi-Square Distribution Chapter 15 Chi-Square Test for Independence

7 15-7 Consider the shape of the chi-square distribution: Chi-Square Distribution Chi-Square Distribution Chapter 15 Chi-Square Test for Independence

8 15-8 Assuming that H 0 is true, the expected frequency of row j and column k is: e jk = R j C k /n where R j = total for row j (j = 1, 2, …, r) C k = total for column k (k = 1, 2, …, c) n = sample size Expected Frequencies Expected Frequencies Chapter 15 Chi-Square Test for Independence

9 15-9 Step 1: State the Hypotheses H 0 : Variable A is independent of variable B H 1 : Variable A is not independent of variable B Step 2: Specify the Decision Rule Calculate d.f. = (r – 1)(c – 1) For a given α, look up the right-tail critical value (  2 R ) from Appendix E or by using Excel =CHISQ.DIST.RT(α,d.f.). Reject H 0 if  2 R > test statistic. Steps in Testing the Hypotheses Steps in Testing the Hypotheses Chapter 15 Chi-Square Test for Independence

10 15-10 For example, for d.f. = 6 and α =.05,  2.05 = 12.59. Chapter 15 Chi-Square Test for Independence Steps in Testing the Hypotheses Steps in Testing the Hypotheses

11 15-11 Here is the rejection region. Chapter 15 Chi-Square Test for Independence Steps in Testing the Hypotheses Steps in Testing the Hypotheses

12 15-12 Step 3: Calculate the Expected Frequencies e jk = R j C k /n For example, Chapter 15 Chi-Square Test for Independence Steps in Testing the Hypotheses Steps in Testing the Hypotheses

13 15-13 Step 4: Calculate the Test Statistic The chi-square test statistic is Step 5: Make the Decision Reject H 0 if test statistic  2 calc >  2 R or if the p-value  α. Steps in Testing the Hypotheses Steps in Testing the Hypotheses Chapter 15 Chi-Square Test for Independence

14 15-14 Example: MegaStat Example: MegaStat Chapter 15 Chi-Square Test for Independence p-value = 0.2154 is not small enough to reject the hypothesis of independence at α =.05 all cells have e jk  5 so Cochran’s Rule is met Caution: Don’t highlight row or column totals

15 15-15 For a 2 × 2 contingency table, the chi-square test is equivalent to a two- tailed z test for two proportions. The hypotheses are: Test of Two Proportions Test of Two Proportions Figure 14.6 Chapter 15 Chi-Square Test for Independence

16 15-16 The chi-square test is unreliable if the expected frequencies are too small. Rules of thumb: Cochran’s Rule requires that e jk > 5 for all cells. Up to 20% of the cells may have e jk < 5 Small Expected Frequencies Small Expected Frequencies Most agree that a chi-square test is infeasible if e jk < 1 in any cell. If this happens, try combining adjacent rows or columns to enlarge the expected frequencies. Chapter 15 Chi-Square Test for Independence

17 15-17 Chi-square tests for independence can also be used to analyze quantitative variables by coding them into categories. Cross-Tabulating Raw Data Cross-Tabulating Raw Data For example, the variables Infant Deaths per 1,000 and Doctors per 100,000 can each be coded into various categories: Chapter 15 Chi-Square Test for Independence

18 15-18 Why Do a Chi-Square Test on Numerical Data? Why Do a Chi-Square Test on Numerical Data? The researcher may believe there’s a relationship between X and Y, but doesn’t want to use regression. There are outliers or anomalies that prevent us from assuming that the data came from a normal population. The researcher has numerical data for one variable but not the other. Chapter 15 Chi-Square Test for Independence

19 15-19 More than two variables can be compared using contingency tables. However, it is difficult to visualize a higher-order table. For example, you could visualize a cube as a stack of tiled 2-way contingency tables. Major computer packages permit three-way tables. 3-Way Tables and Higher 3-Way Tables and Higher Chapter 15 Chi-Square Test for Independence

20 15-20 Chi-Square Tests for Goodness-of-Fit ML 10.2 Purpose of the Test Purpose of the Test The goodness-of-fit (GOF) test helps you decide whether your sample resembles a particular kind of population. The chi-square test is versatile and easy to understand. Chapter 15 Hypotheses for GOF tests: Hypotheses for GOF tests: The hypotheses are: H 0 : The population follows a _____ distribution H 1 : The population does not follow a ______ distribution The blank may contain the name of any theoretical distribution (e.g., uniform, Poisson, normal).

21 15-21 Assuming n observations, the observations are grouped into c classes and then the chi-square test statistic is found using: Test Statistic and Degrees of Freedom for GOF where f j = the observed frequency of observations in class j e j = the expected frequency in class j if the sample came from the hypothesized population Chapter 15 Chi-Square Tests for Goodness-of-Fit

22 15-22 If the proposed distribution gives a good fit to the sample, the test statistic will be near zero. The test statistic follows the chi-square distribution with degrees of freedom d.f. = c – m – 1. where c is the number of classes used in the test and m is the number of parameters estimated. Test Statistic and Degrees of Freedom for GOF tests Chapter 15 Chi-Square Tests for Goodness-of-Fit

23 15-23 Many statistical tests assume a normal population, so this the most common GOF test. Two parameters, the mean μ and the standard deviation σ, fully describe a normal distribution. Unless μ and σ are known a priori, they must be estimated from a sample in order to perform a GOF test for normality. Is the Sample from a Normal Population? Chapter 15 Normal Chi-Square GOF Test

24 15-24 Method 1: Standardize the Data Method 1: Standardize the Data Chapter 15 Normal Chi-Square GOF Test Problem: Frequencies will be small in the end bins yet large in the middle bins (this may violate Cochran’s Rule and seems inefficient). Transform sample observations x 1, x 2, …, x n into standardized z-values. Count the sample observations within each interval on the z-scale and compare them with expected normal frequencies e j.

25 15-25 Step 1: Divide the exact data range into c groups of equal width, and count the sample observations in each bin to get observed bin frequencies f j. Step 2: Convert the bin limits into standardized z-values: Method 2: Equal Bin Widths Method 2: Equal Bin Widths Chapter 15 Step 3: Find the normal area within each bin assuming a normal distribution. Step 4: Find expected frequencies e j by multiplying each normal area by the sample size n. Normal Chi-Square GOF Test Problem: Frequencies will be small in the end bins yet large in the middle bins (this may violate Cochran’s Rule and seems inefficient).

26 Chapter 15 15-26 Method 3: Equal Expected Frequencies Method 3: Equal Expected Frequencies Normal Chi-Square GOF Test Define histogram bins in such a way that an equal number of observations would be expected under the hypothesis of a normal population, i.e., so that e j = n/c. A normal area of 1/c is expected in each bin. The first and last classes must be open-ended, so to define c bins we need c-1 cut points. Count the observations f j within each bin. Compare the f j with the expected frequencies e j = n/c. Advantage: Advantage: Makes efficient use of the sample. Disadvantage Disadvantage: Cut points on the z-scale points may seem strange.

27 15-27 Method 3: Equal Expected Frequencies Method 3: Equal Expected Frequencies Standard normal cut points for equal area bins. Standard normal cut points for equal area bins. Table 15.16 Chapter 15 Normal Chi-Square GOF Test

28 15-28 Critical Values for Normal GOF Test Critical Values for Normal GOF Test Two parameters, m and s, are estimated from the sample, so the degrees of freedom are d.f. = c – m – 1.Two parameters, m and s, are estimated from the sample, so the degrees of freedom are d.f. = c – m – 1. We need at least four bins to ensure at least one degree of freedom.We need at least four bins to ensure at least one degree of freedom. Chapter 15 Normal Chi-Square GOF Test Small Expected Frequencies Small Expected Frequencies Cochran’s Rule suggests at least e j  5 in each bin (e.g., with 4 bins we would want n  20, and so on).Cochran’s Rule suggests at least e j  5 in each bin (e.g., with 4 bins we would want n  20, and so on).

29 15-29 Visual Tests Visual Tests The fitted normal superimposed on a histogram gives visual clues as to the likely outcome of the GOF test. A simple “eyeball” inspection of the histogram may suffice to rule out a normal population by revealing outliers or other non- normality issues. Chapter 15 Normal Chi-Square GOF Test

30 15-30 ECDF Tests ML 10.3 There are alternatives to the chi-square test for normality based on the empirical cumulative distribution function (ECDF). ECDF tests are done by computer. Details are omitted here. A small p-value casts doubt on normality of the population. Kolmogorov-Smirnov (K-S)The Kolmogorov-Smirnov (K-S) test uses the largest absolute difference between the actual and expected cumulative relative frequency of the n data values. Anderson-Darling (A-D)The Anderson-Darling (A-D) test is based on a probability plot. When the data fit the hypothesized distribution closely, the probability plot will be close to a straight line. The A-D test is widely used because of its power and attractive visual. Chapter 15 ECDF Tests for Normality ECDF Tests for Normality

31 15-31 Chapter 15 ECDF Tests Example: Minitab’s Anderson-Darling Test for Normality Near-linear probability plot suggests good fit to normal distribution p-value = 0.122 is not small enough to reject normal population at α =.05 Data: weights of 80 babies (in ounces)

32 15-32 Chapter 15 ECDF Tests Example: MegaStat’s Normality Tests Near-linear probability plot suggests good fit to normal distribution p-value = 0.2487 is not small enough to reject normal population at α =.05 in this chi-square test Data: weights of 80 babies (in ounces) Note: Note: MegaStat’s chi-square test is not as powerful as the A-D test, so we would prefer the A-D test if software is available. The MegaStat probability plot is good, but shows no p-value.


Download ppt "Week 10 Nov 3-7 Two Mini-Lectures QMM 510 Fall 2014."

Similar presentations


Ads by Google