Presentation is loading. Please wait.

Presentation is loading. Please wait.

Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–

Similar presentations


Presentation on theme: "Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–"— Presentation transcript:

1 Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial– Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc- sa/3.0/ We have reviewed this material in accordance with U.S. Copyright Law and have tried to maximize your ability to use, share, and adapt it. The citation key on the following slide provides information about how you may share and adapt this material. Copyright holders of content included in this material should contact open.michigan@umich.edu with any questions, corrections, or clarification regarding the use of content. For more information about how to cite these materials visit http://open.umich.edu/education/about/terms-of-use. Any medical information in this material is intended to inform and educate and is not a tool for self- diagnosis or a replacement for medical evaluation, advice, diagnosis or treatment by a healthcare professional. Please speak to your physician if you have questions about your medical condition. Viewer discretion is advised: Some medical content is graphic and may not be suitable for all viewers.

2 Attribution Key for more information see: http://open.umich.edu/wiki/AttributionPolicy Use + Share + Adapt Make Your Own Assessment Creative Commons – Attribution License Creative Commons – Attribution Share Alike License Creative Commons – Attribution Noncommercial License Creative Commons – Attribution Noncommercial Share Alike License GNU – Free Documentation License Creative Commons – Zero Waiver Public Domain – Ineligible: Works that are ineligible for copyright protection in the U.S. (17 USC § 102(b)) *laws in your jurisdiction may differ Public Domain – Expired: Works that are no longer protected due to an expired copyright term. Public Domain – Government: Works that are produced by the U.S. Government. (17 USC § 105) Public Domain – Self Dedicated: Works that a copyright holder has dedicated to the public domain. Fair Use: Use of works that is determined to be Fair consistent with the U.S. Copyright Act. (17 USC § 107) *laws in your jurisdiction may differ Our determination DOES NOT mean that all uses of this 3rd-party content are Fair Uses and we DO NOT guarantee that your use of the content is Fair. To use this content you should do your own independent analysis to determine whether or not your use will be Fair. { Content the copyright holder, author, or law permits you to use, share and adapt. } { Content Open.Michigan believes can be used, shared, and adapted because it is ineligible for copyright. } { Content Open.Michigan has used under a Fair Use determination. }

3 Inference for Categorical Variables (chpt 15, pg 205) 1.Goodness of Fit Test: this test is for assessing if a particular discrete model is a good fitting model for a discrete characteristic, based on a random sample from the population. 2.Test of Homogeneity: this test is for assessing if two or more populations are homogeneous (alike) with respect to the distribution of some discrete (categorical) variable. 3.Test of Independence: this test helps us to assess if two discrete (categorical) variables are independent for a population, or if there is an association between the two variables.

4 The Chi-square Tests All three tests are based on X 2 test statistic if the H 0 is true and assumptions hold, follows a chi-square distribution  2 (df). From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission.

5 The Chi-square Distribution If we have a chi-square distribution with df = degrees of freedom, then the... Mean is equal to ___________ Variance is equal to __________ Standard deviation is equal to ________ From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission.

6 Table A.5 Chi-square distribution You also have the test() function using R to find the exact p-value From Utts, Jessica M. and Robert F. Heckard. Mind on Statistics, Fourth Edition. 2012. Used with permission.

7 Try It! Consider the  2 (4) distribution… pg 206 a. What is the mean? b.What is the median? c. How likely to see a value of 4 or larger? d.How likely to see a value of 10.3 or larger?

8 Big Idea about the Chi-square Tests The data consists of observed counts. We compute expected counts under the H 0 – these counts are what we would expect (on average) if the corresponding H 0 were true. Compare observed and expected counts using a X 2 test statistic. The statistic measures how close observed counts are to expected counts under H 0. If this distance is large, we have support for the H a alternative.

9 Test of Goodness of Fit to assess if a particular discrete model is a good fitting model for a discrete characteristic, based on a random sample from the population.  1 population  1 random sample  1 response which is categorical or discrete

10 Toll Road Scenario: 1 population of interest – all cars exiting a toll road w/ four booths at exit Question: Are 4 booths used equally often? Data: 1 random sample of 100 cars k = number of categories = 4 Let p i = proportion of cars using booth i H 0 : p 1 =_______, p 2 =_______, p 3 =_______, p 4 =_______ H a : Booth 1Booth 2Booth 3Booth 4 Obs # cars26202826

11 Expected Counts H 0 : p 1 = 0.25, p 2 = 0.25, p 3 = 0.25, p 4 = 0.25 100 cars and 4 booths … If booths used equally often (H 0 is true), then we would expect …... cars to use Booth #1... cars to use Booth #2... cars to use Booth #3... cars to use Booth #4 Expected Counts E = np

12 Toll Road Test Statistic X 2 = Booth 1Booth 2Booth 3Booth 4 Obs # cars26 ( 25 ) 20 ( 25 ) 28 ( 25 ) 26 ( 25 )

13 Toll Road Example Do you think a value of X 2 = ___________ is large enough to reject H 0 ? If H 0 is true, then X 2 has the  2 distribution with df = _________________ Use this null distribution to find the p-value!

14 p-value for Toll Road Example page 209 Observed value = __________ df = ___________ Are results statistically significant at 5% level? Conclusion at a 5% level: It appears that....

15 Aside: Using our Frame of Reference If we have a chi-square distribution with df = degrees of freedom, then mean is equal to df, and std dev is equal to. So, if H 0 true, we would expect our X 2 test statistic to be about ______ give or take about ___________. Since we reject H 0 for large values of X 2, and we only observed a value of ___________ (even less than expected to see under H 0 ), we do not have enough evidence to reject H 0.

16 Goodness of Fit Test Summary page 209

17 Try It! Crossbreeding of Peas pg 210 Mendel data from second generation seeds resulting from crossing yellow round peas and green wrinkled peas. Test theory that the four types occur with probabilities 9/16, 3/16, 3/16, and 1/16 respectively at  = 0.01. H 0 : Yellow Round Yellow Wrinkled Green Round Green Wrinkled 31510110832

18 Try It! Crossbreeding of Peas H 0 : p 1 = 9/16, p 2 = 3/16, p 3 = 3/16, and p 4 = 1/16 X 2 = Yellow Round Yellow Wrinkled Green Round Green Wrinkled 315 ( ) 101 ( ) 108 ( ) 32 ( ) Find the p-value and be ready to click in your answer.

19 What can you say about the p-value? A) p-value < 0.01 B) p-value = 0.47 C) p-value > 0.50

20 Yes or No Do these data support Mendel’s theory? In fact, the results look almost “TOO GOOD” – Mendel had a fictitious assistant … perhaps fictitious data too? Or did the assumptions not hold? Or did we just observe a very unusually “too good” result? Try the Desired Vacation Place on own – to be posted on CTools!

21 Test of Homogeneity page 211 to assess if distribution for 1 discrete (categorical) variable is same for 2 or more populations.  2 or more populations  2 or more (independent) random samples  1 response which is categorical or discrete

22 Ice Cream Preference Scenario:Two populations of interest – preschool boys & preschool girls Question: Is Ice Cream Preference the same for boys and girls? Data: 1 random sample of 75 preschool boys 1 random sample of 75 preschool boys The two samples are independent Find row and column totals … Note: column totals known even before measuring preference. Ice Cream PreferenceBoysGirls Vanilla (V)2526 Chocolate (C)3023 Strawberry (S)2026

23 Ice Cream Preference Null Hypothesis Scenario:Two populations of interest – preschool boys & preschool girls Question: Is Ice Cream Preference the same for boys and girls? H 0 : Distribution of ice cream preference is the same for the two populations, boys and girls. We say it in words (following math way from page 211)… P(prefer flavor i | girl) = P(prefer flavor i | boy) = P(prefer flavor i)

24 Expected Counts page 212 Strawberry: Since there were children who preferred Strawberry overall, if distributions for boys & girls are same, then we would expect of these children to be boys and the remaining of these children to be girls. Note: if not 50% boys, 50% girls; would need to adjust.

25 Expected Counts Chocolate: Since there were children who preferred Chocolate overall, if distributions for boys & girls are same, then we would expect of these children to be boys and the remaining of these children to be girls.

26 Expected Counts Vanilla: Since there were children who preferred Vanilla overall, if distributions for boys & girls are same, then we would expect of these children to be boys and the remaining of these children to be girls.

27 Observed and Expected Counts A closer look … Overall P(child prefers vanilla) = 51/150 = p; if H 0 true, this vanilla rate should apply to boys and to girls. Expected number of boys preferring vanilla = np = 75(51/150) Called the Cross-Product Rule Ice Cream PreferenceBoysGirlsTotal Vanilla (V)25 ( )26 ( )51 Chocolate (C)30 ( )23 ( )53 Strawberry (S)20 ( )26 ( )46 Total75 150

28 Ice Cream Preference Test Statistic X 2 = Ice Cream PreferenceBoysGirlsTotal Vanilla (V)25 ( 25.5 )26 ( 25.5 )51 Chocolate (C)30 ( 26.5 )23 ( 26.5 )53 Strawberry (S)20 ( 23 )26 ( 23 )46 Total75 150

29 Ice Cream Preference Example Is the value of ____________ large enough to reject H 0 ? If H 0 is true, then X 2 has the  2 distribution with df = _________________ Motivation: If knew 50% girls  ____% boys? If knew 66% liked C & S  _____% V?

30 p-value for Ice Cream Preference Example Observed value = __________ df = ___________ Decision at a 5% significance level? Reject H 0 Fail to reject H 0 Conclusion at a 5% level: It appears that....


Download ppt "Author(s): Brenda Gunderson, Ph.D., 2011 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Non-commercial–"

Similar presentations


Ads by Google