Chi Squared Tests Hypothesis Tests for Linear Regression

Slides:

Advertisements

Similar presentations

Overview of Lecture Parametric vs Non-Parametric Statistical Tests.

Advertisements

Statistical Methods Lecture 26

Chapter 13: Chi-Square Test

Chi-Square and Analysis of Variance (ANOVA)

Comparing Two Population Parameters

© The McGraw-Hill Companies, Inc., Chapter 10 Testing the Difference between Means and Variances.

Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 10 Associations Between Categorical Variables.

Chapter 18: The Chi-Square Statistic

Chapter 11 Other Chi-Squared Tests

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.

Chapter 11 Inference for Distributions of Categorical Data

A GEICO Direct magazine had an interesting article concerning the percentage of teenage motor vehicle deaths and the time of day. The following percentages.

Chapter 13: Inference for Distributions of Categorical Data

© 2010 Pearson Prentice Hall. All rights reserved The Chi-Square Test of Independence.

Statistics Are Fun! Analysis of Variance

Ch 15 - Chi-square Nonparametric Methods: Chi-Square Applications

Chi-Square Tests and the F-Distribution

Chi-square Goodness of Fit Test

Presentation 12 Chi-Square test.

Chapter 13: Inference in Regression

Testing Distributions Section Starter Elite distance runners are thinner than the rest of us. Skinfold thickness, which indirectly measures.

Chapter 26: Comparing Counts AP Statistics. Comparing Counts In this chapter, we will be performing hypothesis tests on categorical data In previous chapters,

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on Categorical Data 12.

Hypothesis Testing for Proportions

Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.

Chapter 11 Chi-Square Procedures 11.3 Chi-Square Test for Independence; Homogeneity of Proportions.

Chi-square test or c2 test

Chapter 26 Chi-Square Testing

Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.

Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.

Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.

A GEICO Direct magazine had an interesting article concerning the percentage of teenage motor vehicle deaths and the time of day. The following percentages.

Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:

Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.

Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.

Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &

Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.

Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.

AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.

 Check the Random, Large Sample Size and Independent conditions before performing a chi-square test  Use a chi-square test for homogeneity to determine.

Other Chi-Square Tests

Chi-Square hypothesis testing

Chi-square test or c2 test

Hypothesis Testing for Proportions

Chapter 12 Tests with Qualitative Data

CHAPTER 11 Inference for Distributions of Categorical Data

Elementary Statistics: Picturing The World

Sun. Mon. Tues. Wed. Thurs. Fri. Sat.

AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…

Chi Square Two-way Tables

AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…

Chapter 11: Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

Chapter 10 Analyzing the Association Between Categorical Variables

Contingency Tables: Independence and Homogeneity

Inference for Relationships

Inference on Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

Analyzing the Association Between Categorical Variables

Chapter 13: Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

11.2 Inference for Relationships

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

Lecture 46 Section 14.5 Wed, Apr 13, 2005

Presentation transcript:

Chi Squared Tests Hypothesis Tests for Linear Regression AP Statistics Topic 7 Chi Squared Tests Hypothesis Tests for Linear Regression

These are the last 2 things we’ll study Chi-squared tests Goodness of Fit Independence and Homogeneity Hypothesis tests for linear regression The significance of the linear relationship We’ll spend one week on each

Chi-Squared Tests Analysis of categorical data The tests we’ll study are Goodness of Fit test tests for homogeneity and independence These tests are performed exactly the same way For homogeneity, we look at two samples and one characteristic For independence, we look at one sample and two characteristics

Goodness of Fit Test Measures the extent to which some empirical distribution “fits” the distribution expected under the null hypothesis 20 30 40 50 60 Fork length 10 Frequency

For example Time % 12-3AM 17 3-6AM 8 6-9AM 8 9AM-noon 6 Noon-3PM 10 A GEICO Direct magazine had an interesting article concerning the percentage of teenage motor vehicle deaths and the time of day. The following percentages were given from a sample. Time % 12-3AM 17 3-6AM 8 6-9AM 8 9AM-noon 6 Noon-3PM 10 3-6 PM 16 6-9 PM 15 9PM-12AM 19

The Distribution and Hypothesis Statements Is the percentage of teenage motor vehicle deaths the same for each time period? Conduct a hypothesis test at the 1% level. Ho: The percent of teenage motor vehicle deaths is the same for each time period. Ha: The percent of teenage motor vehicle deaths is not the same for each time period.

Let’s look at this more closely In this problem, what type of data are we considering? Categorical data – that is, the time of day How many classes is our data divided into? 8 different classes

Put another way …. We want to see if the distribution of our data is consistent with the hypothesized distribution In this case, we want to see if the distribution of accidents is uniform – about 12.5% per period

So how can we do this? We have our observed occurrences Are these consistent with our hypothesis? What should we compare with these? Expected values -- Time 12-3 3-6 6-9 9-12 Count 17 8 6 10 16 15 19 Time 12-3 3-6 6-9 9-12 Observed 17 8 6 10 16 15 19 Expected 12.38

Test Statistic Our test statistic is

Chi-squared Distribution Family of curves identified by deg of freedom (k-1) Mean = degrees of freedom Variance = 2(degrees of freedom) As deg of freedom increases, curves approach normal

How we’ll use the chi-squared distribution We’ll use the chi-squared distribution to determine our p-value If our test statistic is large, then we’ll reject the null hypothesis P-value

How can we find p-values? Calculate the chi-squared statistic Use the chi-squared table Use the chi-squared cdf function on your TI-83

The Chi-square table

Graphing Calculator 2nd DIST

Graphing Calculator STAT Inputs Observed data list Expected data list Degrees of freedom

Assumptions We have 2 assumptions for this test First, the observed cell counts are based on a random sample (our sample is random) Our sample is large. How do we determine large? Expected cell counts must all be greater than or equal to 5

Our conclusions? The same as we’ve always done We reject or fail to reject the null based on a comparison of the p-value and our significance level We interpret our conclusion in the context of our alternative hypothesis

Let’s summarize Use the same 9 steps for hypothesis testing Identify the parameter Null Alternative Choose significance level Test Statistic Assumptions Calculate Test Statistic Determine P-value Make your conclusion

Let’s finish the Geico Problem Let’s identify the parameter Proportion of teenage accidents Null Hypothesis Ho: The percent of teenage motor vehicle deaths is the same for each time period. Alternative Hypothesis Ha: The percent of teenage motor vehicle deaths is not the same for each time period. Significance level

Continuing … Test Statistic Assumptions: The sample is random The sample is large

Continuing … Calculate the Test Statistic Time 12-3 3-6 6-9 9-12 Observed 17 8 6 10 16 15 19 Expected 12.38

Conclusion We fail to reject the null hypothesis because the p-value (.056) is greater than the significance level (.01). The data does not suggest that the distribution of accidental deaths is not distributed differently among the time periods.

Homework 7-1 Read section 12.1 in the textbook 12.10 12.12 12.14

Let’s try this for some practice Using a test, investigate whether it’s reasonable to assume the random number table is random. Use a significance level of .05.

Tests for Homogeneity and Independence In these tests we’ll be taking n samples and looking at one characteristic. Take samples of 1000 people from 4 different countries and ask how they feel about whether the use of torture against suspected terrorist is justified. In this case we’d like to see if the responses are distributed equally (homogenous) among the countries. Or, we’ll take one sample and look at two characteristics. Take a sample of 300 adults and determine each person’s political philosophy and what television news station they watch In this case we’d like to see if political philosophy and news station are independent.

Let’s do an example of each First, let’s do a test for independence Big Office is a chain of large office supply stores that sell an extensive line of desktop and laptop computers. Company executives want to know whether the demands for these types of computers are related in any way. They might act as complementary products or sales may not be related. Big Office randomly selected 250 business days categorized demand for each type of computer as Low, MedLow, MedHi and Hi. Desktops Low MedLow MedHi Hi 4 17 5 43 8 23 22 27 80 16 20 14 70 10 19 11 57 38 77 72 63 250 Laptops

So how many samples do we have So how many samples do we have? Is the data we are collecting categorical or numerical? How many characteristics are we investigating? How many classes within those characteristics?

Hypotheses Statements Ho: The two variables are independent. Ha: The two variables are not independent.

How do we test for independence? Low MedLow MedHi Hi 4 17 5 43 8 23 22 27 80 16 20 14 70 10 19 11 57 38 77 72 63 250 Recall that if events A and B are independent So, we’ll assume the two variables are independent. Then we’ll determine expected cell counts for each cell. We’ll look at the differences between the expected and observed counts for out test.

Test Statistic

Assumptions The sample is random. The sample is large Each expected cell count is at least 5

At this point … We can calculate the p-value using the chi-squared table The chi-squared CDF function on our calculator Or, use the chi-squared test

Let’s look at the Chi-Squared Test This test is a piece of cake …. First, put your observed matrix into the calculator STAT – TEST – Now just select Calculate The calculator creates the Expected matrix Output : value of your test statistic and p-value

So let’s do this problem using our 9 steps of hypothesis testing

Chi-Squared Test for Homogeniety The paper “No Evidence of Impaired Neurocognitive Performance in Collegiate Soccer Players’ compared collegiate soccer players, athletes in sports other than soccer, and a group of students who were not involved in collegiate sports with respect to head injuries. Three independent random samples were chosen and each person in the sample was asked to complete a medical history survey. The following 2-way contingency table was created based on reported concussions. 1 2 3+ Soccer 45 25 11 10 91 Other 68 15 8 5 96 Non 3 53 158 22 240

So how many samples do we have? Is the data we are collecting categorical or numerical? How many characteristics are we investigating? How many classes within those characteristics?

Hypotheses Statements Ho: The populations are homogenous or, The category proportions are the same for all populations. Ha: The populations are not homogenous. or, the category proportions are not the same for all populations.

Test Statistic

Assumptions The samples are random and independent. The sample is large The expected cell counts are at least 5

Everything else is the same Let’s finish this test using our 9 steps. 1 2 3+ Soccer 45 25 11 10 91 Other 68 15 8 5 96 Non 3 53 158 22 240

To summarize … Test for Independence Test for Homogeneity One sample Two characteristics Assumptions: Sample is random Sample is large Test for Homogeneity Multiple samples One characteristic Samples are independent and random Samples are large

Homework 7-2 Read Section 12.2 12.18 12.22