Presentation on theme: "Chi Squared Tests Hypothesis Tests for Linear Regression"— Presentation transcript:
1Chi Squared Tests Hypothesis Tests for Linear Regression AP Statistics Topic 7Chi Squared TestsHypothesis Tests for Linear Regression
2These are the last 2 things we’ll study Chi-squared testsGoodness of FitIndependence and HomogeneityHypothesis tests for linear regressionThe significance of the linear relationshipWe’ll spend one week on each
3Chi-Squared Tests Analysis of categorical data The tests we’ll study areGoodness of Fit testtests for homogeneity and independenceThese tests are performed exactly the same wayFor homogeneity, we look at two samples and one characteristicFor independence, we look at one sample and two characteristics
4Goodness of Fit TestMeasures the extent to which some empirical distribution “fits” the distribution expected under the null hypothesis2030405060Fork length10Frequency
5For example Time % 12-3AM 17 3-6AM 8 6-9AM 8 9AM-noon 6 Noon-3PM 10 A GEICO Direct magazine had an interesting article concerning the percentage of teenage motor vehicle deaths and the time of day. The following percentages were given from a sample.Time %12-3AM 173-6AM 86-9AM 89AM-noon 6Noon-3PM 103-6 PM 166-9 PM 159PM-12AM 19
6The Distribution and Hypothesis Statements Is the percentage of teenage motor vehicle deaths the same for each time period? Conduct a hypothesis test at the 1% level.Ho: The percent of teenage motor vehicle deaths is the same for each time period.Ha: The percent of teenage motor vehicle deaths is not the same for each time period.
7Let’s look at this more closely In this problem, what type of data are we considering?Categorical data – that is, the time of dayHow many classes is our data divided into?8 different classes
8Put another way ….We want to see if the distribution of our data is consistent with the hypothesized distributionIn this case, we want to see if the distribution of accidents is uniform – about 12.5% per period
9So how can we do this? We have our observed occurrences Are these consistent with our hypothesis?What should we compare with these?Expected values --Time12-33-66-99-12Count178610161519Time12-33-66-99-12Observed178610161519Expected12.38
16Graphing Calculator STAT Inputs Observed data list Expected data list Degrees of freedom
17Assumptions We have 2 assumptions for this test First, the observed cell counts are based on a random sample (our sample is random)Our sample is large.How do we determine large?Expected cell counts must all be greater than or equal to 5
18Our conclusions? The same as we’ve always done We reject or fail to reject the null based on a comparison of the p-value and our significance levelWe interpret our conclusion in the context of our alternative hypothesis
19Let’s summarize Use the same 9 steps for hypothesis testing Identify the parameterNullAlternativeChoose significance levelTest StatisticAssumptionsCalculate Test StatisticDetermine P-valueMake your conclusion
20Let’s finish the Geico Problem Let’s identify the parameterProportion of teenage accidentsNull HypothesisHo: The percent of teenage motor vehicle deaths is the same for each time period.Alternative HypothesisHa: The percent of teenage motor vehicle deaths is not the same for each time period.Significance level
21Continuing … Test Statistic Assumptions: The sample is random The sample is large
22Continuing … Calculate the Test Statistic Time 12-3 3-6 6-9 9-12 Observed178610161519Expected12.38
23ConclusionWe fail to reject the null hypothesis because the p-value (.056) is greater than the significance level (.01).The data does not suggest that the distribution of accidental deaths is not distributed differently among the time periods.
24Homework 7-1Read section 12.1 in the textbook12.1012.1212.14
25Let’s try this for some practice Using a test, investigate whether it’s reasonable to assume the random number table is random. Use a significance level of .05.
26Tests for Homogeneity and Independence In these tests we’ll be taking n samples and looking at one characteristic.Take samples of 1000 people from 4 different countries and ask how they feel about whether the use of torture against suspected terrorist is justified.In this case we’d like to see if the responses are distributed equally (homogenous) among the countries.Or, we’ll take one sample and look at two characteristics.Take a sample of 300 adults and determine each person’s political philosophy and what television news station they watchIn this case we’d like to see if political philosophy and news station are independent.
27Let’s do an example of each First, let’s do a test for independenceBig Office is a chain of large office supply stores that sell an extensive line of desktopand laptop computers. Company executives want to know whether the demands forthese types of computers are related in any way. They might act as complementaryproducts or sales may not be related. Big Office randomly selected 250 business dayscategorized demand for each type of computer as Low, MedLow, MedHi and Hi.DesktopsLowMedLowMedHiHi417543823222780162014701019115738777263250Laptops
28So how many samples do we have So how many samples do we have? Is the data we are collecting categorical or numerical? How many characteristics are we investigating? How many classes within those characteristics?
29Hypotheses Statements Ho: The two variables are independent.Ha: The two variables are not independent.
30How do we test for independence? LowMedLowMedHiHi417543823222780162014701019115738777263250Recall that if events A and B are independentSo, we’ll assume the two variables are independent.Then we’ll determine expected cell counts for each cell.We’ll look at the differences between the expected and observed counts for out test.
32Assumptions The sample is random. The sample is large Each expected cell count is at least 5
33At this point …We can calculate the p-value using the chi-squared tableThe chi-squared CDF function on our calculatorOr, use the chi-squared test
34Let’s look at the Chi-Squared Test This test is a piece of cake ….First, put your observed matrix into the calculatorSTAT – TEST –Now just select CalculateThe calculator creates the Expected matrixOutput : value of your test statistic and p-value
35So let’s do this problem using our 9 steps of hypothesis testing
36Chi-Squared Test for Homogeniety The paper “No Evidence of Impaired Neurocognitive Performance in Collegiate Soccer Players’ compared collegiate soccer players, athletes in sports other than soccer, and a group of students who were not involved in collegiate sports with respect to head injuries. Three independent random samples were chosen and each person in the sample was asked to complete a medical history survey. The following 2-way contingency table was created based on reported concussions.123+Soccer4525111091Other68158596Non35315822240
37So how many samples do we have? Is the data we are collecting categorical or numerical?How many characteristics are we investigating?How many classes within those characteristics?
38Hypotheses Statements Ho: The populations are homogenousor, The category proportions are the same for all populations.Ha: The populations are not homogenous.or, the category proportions are not the same for all populations.
40Assumptions The samples are random and independent. The sample is largeThe expected cell counts are at least 5
41Everything else is the same Let’s finish this test using our 9 steps.123+Soccer4525111091Other68158596Non35315822240
42To summarize … Test for Independence Test for Homogeneity One sample Two characteristicsAssumptions:Sample is randomSample is largeTest for HomogeneityMultiple samplesOne characteristicSamples are independent and randomSamples are large