Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.

Slides:



Advertisements
Similar presentations
Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution.
Advertisements

Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.
Chapter 11 Inference for Distributions of Categorical Data
Chapter 12 Chi-Square Tests and Nonparametric Tests
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Chi-Square Tests and the F-Distribution
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Presentation 12 Chi-Square test.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
Copyright © Cengage Learning. All rights reserved. 11 Applications of Chi-Square.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.7.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 10 Inferring Population Means.
Introduction to Statistical Inferences Inference means making a statement about a population based on an analysis of a random sample taken from the population.
Definitions Population: A collection, or set, of individuals, objects, or events whose properties are to be analyzed. Sample: A subset of the population.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chi-square test or c2 test
Chi-square test Chi-square test or  2 test Notes: Page Goodness of Fit 2.Independence 3.Homogeneity.
Inference about Mean (σ Unknown) When σ is known, the sampling distribution for a sample mean is normal if conditions are satisfied. For many years, it.
Chapter 26 Chi-Square Testing
1 Pertemuan 11 Uji kebaikan Suai dan Uji Independen Mata kuliah : A Statistik Ekonomi Tahun: 2010.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Inference about Mean (σ Unknown) When σ is known, the sampling distribution for a sample mean is normal if conditions are satisfied. For many years, it.
Dependent and Independent Samples 1Section 10.1, Page 208.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Chapter 11 Chi- Square Test for Homogeneity Target Goal: I can use a chi-square test to compare 3 or more proportions. I can use a chi-square test for.
The Practice of Statistics Third Edition Chapter (13.1) 14.1: Chi-square Test for Goodness of Fit Copyright © 2008 by W. H. Freeman & Company Daniel S.
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Test of Goodness of Fit Lecture 43 Section 14.1 – 14.3 Fri, Apr 8, 2005.
Inference for Distributions of Categorical Variables (C26 BVD)
© Copyright McGraw-Hill CHAPTER 11 Other Chi-Square Tests.
Chapter Outline Goodness of Fit test Test of Independence.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
Slide 1 Copyright © 2004 Pearson Education, Inc..
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
AGENDA:. AP STAT Ch. 14.: X 2 Tests Goodness of Fit Homogeniety Independence EQ: What are expected values and how are they used to calculate Chi-Square?
11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 12 The Analysis of Categorical Data and Goodness of Fit Tests.
Test of Homogeneity Lecture 45 Section 14.4 Tue, Apr 12, 2005.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
11.1 Chi-Square Tests for Goodness of Fit Objectives SWBAT: STATE appropriate hypotheses and COMPUTE expected counts for a chi- square test for goodness.
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
The Chi-Square Distribution  Chi-square tests for ….. goodness of fit, and independence 1.
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi-square.
Goodness-of-Fit and Contingency Tables Chapter 11.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Test of Goodness of Fit Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007.
Comparing Counts Chi Square Tests Independence.
Presentation 12 Chi-Square test.
Chi-square test or c2 test
Chapter 12 Tests with Qualitative Data
Analysis of count data 1.
Chapter 10 Analyzing the Association Between Categorical Variables
Contingency Tables: Independence and Homogeneity
Lecture 41 Section 14.1 – 14.3 Wed, Nov 14, 2007
Analyzing the Association Between Categorical Variables
Lecture 43 Sections 14.4 – 14.5 Mon, Nov 26, 2007
S.M.JOSHI COLLEGE, HADAPSAR
Chapter Outline Goodness of Fit test Test of Independence.
Presentation transcript:

Multinomial Experiments Goodness of Fit Tests We have just seen an example of comparing two proportions. For that analysis, we used the normal distribution as a sampling distribution. We will now look an problems where we compare more than two proportions. We will not be able to use the normal distribution, but will use a different distribution called the Chi-Square or Χ 2 Distribution. Consider the problem of testing a die to see if it is unfair. The die has six numbers, all equally likely. If die is fair, then each number should have a probability of 1/6. In the long run, each number will come up 1/6 of the number of rolls. Suppose weI take a sample of 60 rolls. Theoretically, each number should come up 1/6*60 = 10 times. If the numbers are not all 10, either the die is not fair, or, the die is fair, and the numbers different from 10 are explained by sampling variation. To sort this out, I need a hypotheses test, a sampling distribution, and a p-value. 1Section 11.2, Page 236

Goodness of Fit Test Fair Die Example Following is the distribution of the observed frequencies of results from rolling a die 60 times. Is the die fair? The hypotheses are as follows: Clearly, the observed frequencies are not all equal to the theoretical frequencies of 10. We need a way to measure how big the “miss” is to see if it likely to be due to sampling variation, or if it is so large as to not be explained by sampling variation. 2Section 11.2, Page 239

Chi Square Statistic Fair Die Example We calculate the “miss” called the chi-square statistic similarly to the way we calculate the variance. Note that the expected frequencies always equal the total number of observations × H o true proportion for each cell or proportion. Also note that the total expected frequencies always equals the total observed frequencies. The χ2 Statistic is the Total, 2.2. Also, note that minimum value of the χ2 is zero. If we took another sample, we would likely get a different value for the chi-square statistic. 3Section 11.1, Page 239

Chi-Square Distribution Fair Die Example Now we need a sampling distribution for the Χ 2 statistic = 2.2, so we can calculate the probability of getting a Χ 2 ≥ 2.2 when the true proportions are all equal to 1/6. Χ 2 Distribution for 5 df This is a distribution of all possible Χ 2 statistics calculated from all possible samples of 60 observations when there are 6 proportions or cells. Note that the degree of freedom equals the number of proportions – 1. Finding the p-value on the TI-83, Given Χ 2 Stat, df PRGM – CHI2DIST LOWER BOUND: 2.2 UPPER BOUND: 2 ND E99 df: 5 Output: P-VALUE = The null hypothesis cannot be rejected. 4Section 11.2, Page 240

Chi-Square Distribution Conditions The sample is random and the observed data represents counts of of individuals in individual categories of a categorical variable Each expected count is 5 or greater 5Section 11.1, Page 241

Goodness of Fit Test Fair Die Example – TI-83 Add-In Following is the distribution of the observed frequencies of results from rolling a die 60 times. Is the die fair? The hypotheses are as follows: 6Section 11.2, Page 240 Each expected cell = 1/6*60 = 10. STAT-EDIT – LI: Enter the observed frequency numbers L2: Enter the expected values, 10 in each of 6 cells. PRGM – GOODFIT OBSERVED LIST = 2 ND L1 EXPECTED LIST = 2 ND L2 Answer: p-value =.8208, Chi-Square Stat = 2.2 Since p-value > 0.05, H o cannot be rejected.

Goodness of Fit Test Mendelian Theory Problem Mendel’s genetic theory of inheritance claims that the frequencies of round and yellow, wrinkled and yellow, round and green, and wrinkled and green peas will occur in the ratio of 9:3:3:1. In testing the theory, Mendel obtained frequencies of 315, 101, 108, and 32 respectively. Does the data contradict the theory. Do a hypotheses test. Ho: The data fits the theory Ha: The data does not fit the theory. Calculation of Expected Values ObservedExpected Proportions Expected Count 3159/169/16 *556 = /163/16 * 556 = /163/16 *556 = /161/16 *556= Total = 556Total = 1Total = 556 7Section 11.2, Page 245

Goodness of Fit Test Mendelian Theory Problem ObservedExpected Proportions Expected Count 3159/169/16 *556 = /163/16 * 556 = /163/16 *556 = /161/16 * 556 = Total = 556Total = 1Total = 556 STAT – EDIT: Enter observed data in L1 and expected in L2 PRGM – GOODFIT OBSERVED LIST = 2 ND L1 EXPECTED LIST = 2 ND L2 Answer: p-value =.9254, Chi-Square Stat =.47 The null hypothesis cannot be rejected. The observed data does not contradict the theory 8Section 11.2, Page 243

Problems a.Perform a hypotheses test to see if the colors are not equally likely. State the hypotheses. b.Find the p-value and state your conclusion c.What is the name of the model used for the sampling distribution? d.What conditions must be satisfied? Are they? 9Problems, Page 252

Problems a.Perform a hypotheses test to see if the preferences are not all the same. State the hypotheses. b.Find the p-value and state your conclusion c.What is the name of the model used for the sampling distribution? d.What conditions must be satisfied? Are they? 10Problems, Page 252

Problems a.Perform a hypotheses test to see of the observed data is inconsistent with the stated ratios. State the appropriate hypotheses. b.Find the expected counts for each color. c.What are the necessary conditions for the sampling distribution? d.What is the name of the model used for the sampling distribution? e.Find the p-value and state your conclusion. 11Problems, Page 252

Test for Independence Following is a two way table. In this case, two categorical variables are measured on one group of college students. For each student, their Gender and Favorite Subject Area are recorded. Independence of Two Variables Consider the Social Science category. 113/300 or 38% of all students chose Social Science. However, 41/122 or 34% of males chose the category and 72/178 or 40% of Females chose the category. Considering this a probability distribution, if I pick a person at random, there is a 38% chance the person chose Social Science. However, it you tell me the person is a female, then the probability is 40% they chose the category. This is an indication that the two variables are not independent, but related. Two variables are independent, if knowing the outcome of one variable does not change the probability of the outcome of the other variable. 12Section 11.3, Page 244

Tests for Independence The sample data gives an indication the variables are not independent, but this indication may be due to sampling variation. To test for independence, we will use Chi-Square methods. The appropriate hypotheses are: H o : The variables are independent H a : The variables are not independent Next, we need to calculate the expected values for each cell of the data matrix under the assumption that the variables are independent. For example, if the variables are independent, then the the overall proportion of of students in the Social science category is 113/300 = Both the proportions for the category have to be the same. The expected value for Males is *122= and the expected values for Females is *178 = Section 11.3, Page 244

Test for Independence Shown above in the parentheses are all the expected values. Next we need to calculate the χ2 statistic for each data cell. For example, for the first cell: (Observed-Expected) 2 /Expected = ( ) 2 /29.28 = Adding up the cell calculations for the 6 cells gives total χ2 statistic of The formula for df =(#rows – 1)*(#columns – 1) = (2-1)*(3-1) = 2. The area under the curve to the right of =.1001 >.05. The null hypotheses cannot be rejected. 14Section 11.3, Page 246

Test for Independence Black Box Program H o : The variables are independent H a : The variables are not independent 2 nd MATRIX – EDIT 2 ENTER 3 (The data table is 2 rows and 3 columns. Ignore total row and total column) Enter the data in matrix [A] left to right STAT-TESTS-C:χ2-TEST Observed: [A] Expected: [B] Calculate Answer: p-value =.0999, χ2-Stat = nd MATRIX – EDIT – [B] – ENTER Displays the Expected Values Matrix All cells ≥ 5; conditions satisfied 15Section 11.3, Page 246

Problems a.Test the hypotheses that the size of community reared in is independent of the size of community residing in. State the appropriate hypotheses. b.Find the p-value and state your conclusion c.What is the name of the sampling distribution? d.What are the necessary conditions, and are they satisfied? What is the value of the smallest expected cell? 16Section 11.3, Page 254

Problems a.Test the hypotheses that years of employment and knowing what supervisor expects are independent. State the appropriate hypotheses. b.Find the p-value and state your conclusion c.What is the name of the sampling distribution? d.What are the necessary conditions, and are they satisfied? What is the value of the smallest expected cell? 17Section 11.3, Page 254

Problems a.Test the hypotheses that the survival rate and the treatment are independent. State the appropriate hypotheses. b.Find the p-value and state your conclusion c.What is the name of the sampling distribution? d.What are the necessary conditions, and are they satisfied? What is the value of the smallest expected cell? 18Problems, Page 253

Tests for Homogeneity Another application of Chi-Square procedures is test for homogeneity, or essentially, a test whether different groups have the same distribution for a given variable. Consider the table below that gives voter’s opinion on a proposal broken down by separate locations. In the case of a test for independence, we had one group of individuals and measure two categorical variables in that group. In the case of a test for homogeneity, we have one categorical variable, Opinion on Proposal, and three separately located groups of voters. The hypothesis are: Ho: The distributions are homogeneous Ha: The distributions are not homogeneous 19Section 11.3, Page 247

Tests for Homogeneity The mechanics for a test of homogeneity are exactly the same as for a test of independence. We calculate the expected values under the assumption Ho is true. The proportion favor are all assumed to be 254/500 = The expected value for urban is.5080*200 = The χ2 Stat for cell 1 = ( )2/101.6 = The total χ2 statistic for all cells is The df = 2 and the p-value = 1.21E-20 ≅ 0 Ho is rejected, the distributions are not the same. 20Section 11.3, Page 248

Problems 21Section 11.3, Page 253 a.State the hypotheses. b.Find the p-value and state your conclusion. c.What is the name of the model used for the sampling distribution. d.What is the value of the smallest expected cell?

Problems 22Section 11.3, Page 253 a.State the hypotheses. b.Find the p-value and state your conclusion. c.What is the name of the model used for the sampling distribution. d.What is the value of the smallest expected cell?

Summary of Chi-Square Applications Goodness of Fit Test Given one categorical variable with a fixed set of proportions for the categories. Ha: The observed data does not fit the proportions. Calculate expected values (Ho true proportion * total observations) Observed and Expected data in List Editor PRGM: GOODFIT Test for Independence Given two categorical variables measured on the same population. Ha: The variables are not independent (They are related) Observed data in Matrix Editor Stat-Tests-χ2 Test Test for Homogeneity Given one categorical variable and two or more populations. Ha: The proportions for the categories are not the same for for all populations. Observed data in Matrix Editor Stat-Tests-χ2 Test 23Chapter 12, Summary

Problems a.Is this a goodness of fit test, a test for independence, or a test or homogeneity? b.State the hypotheses. c.Find the p-value and state your conclusion. d.What is the name of the model used for the sampling distribution. e.What is the value of the smallest expected cell? 24Problems, Page 252