We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byTyler Callahan
Modified over 2 years ago
Copyright © 2010 Pearson Education, Inc. Slide
Copyright © 2010 Pearson Education, Inc. Slide Solution: B
Copyright © 2010 Pearson Education, Inc. Chapter 26 Comparing Counts
Copyright © 2010 Pearson Education, Inc. Slide Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a goodness-of-fit test.
Copyright © 2010 Pearson Education, Inc. Slide Assumptions and Conditions Counted Data Condition: Check that the data are counts for the categories of a categorical variable. Independence Assumption: The counts in the cells should be independent of each other. Randomization Condition: The individuals who have been counted and whose counts are available for analysis should be a random sample from some population. Sample Size Assumption: We must have enough data for the methods to work. Expected Cell Frequency Condition: We should expect to see at least 5 individuals in each cell.
Copyright © 2010 Pearson Education, Inc. Slide Calculations The test statistic, called the chi-square (or chi- squared) statistic, is found by adding up the sum of the squares of the deviations between the observed and expected counts divided by the expected counts:
Copyright © 2010 Pearson Education, Inc. Slide Calculations (cont.) The chi-square models are actually a family of distributions indexed by degrees of freedom (much like the t-distribution). The number of degrees of freedom for a goodness-of-fit test is n – 1, where n is the number of categories. The chi-square statistic is used only for testing hypotheses, not for constructing confidence intervals. If the observed counts dont match the expected, the statistic will be largeit cant be too small. So the chi-square test is always one-sided. If the calculated statistic value is large enough, well reject the null hypothesis.
Copyright © 2010 Pearson Education, Inc. Slide The Chi-Square Calculation 1.Find the expected values: Every model gives a hypothesized proportion for each cell. The expected value is the product of the total number of observations times this proportion. 2.Compute the residuals: Once you have expected values for each cell, find the residuals, Observed – Expected. 3.Square the residuals.
Copyright © 2010 Pearson Education, Inc. Slide The Chi-Square Calculation (cont.) 4.Compute the components. Now find the components for each cell. 5.Find the sum of the components (thats the chi- square statistic).
Copyright © 2010 Pearson Education, Inc. Slide The Chi-Square Calculation (cont.) 6.Find the degrees of freedom. Its equal to the number of cells minus one. 7.Test the hypothesis. Use your chi-square statistic to find the P- value. (Remember, youll always have a one- sided test.) Large chi-square values mean lots of deviation from the hypothesized model, so they give small P-values.
Copyright © 2010 Pearson Education, Inc. Slide Comparing Observed Distributions A test comparing the distribution of counts for two or more groups on the same categorical variable is called a chi- square test of homogeneity. The statistic that we calculate for this test is identical to the chi-square statistic for goodness-of-fit. In this test, however, we ask whether choices are the same among different groups (i.e., there is no model). The expected counts are found directly from the data and we have different degrees of freedom.
Copyright © 2010 Pearson Education, Inc. Slide Assumptions and Conditions The assumptions and conditions are the same as for the chi-square goodness-of-fit test: Counted Data Condition: The data must be counts. Randomization Condition and 10% Condition: As long as we dont want to generalize, we dont have to check these conditions. Expected Cell Frequency Condition: The expected count in each cell must be at least 5.
Copyright © 2010 Pearson Education, Inc. Slide Calculations We calculated the chi-square statistic as we did in the goodness-of-fit test: In this situation we have (R – 1)(C – 1) degrees of freedom, where R is the number of rows and C is the number of columns.
Copyright © 2010 Pearson Education, Inc. Slide Examining the Residuals When we reject the null hypothesis, its always a good idea to examine residuals. For chi-square tests, we want to work with standardized residuals, since we want to compare residuals for cells that may have very different counts. To standardize a cells residual, we just divide by the square root of its expected value:
Copyright © 2010 Pearson Education, Inc. Slide Independence Contingency tables categorize counts on two (or more) variables so that we can see whether the distribution of counts on one variable is contingent on the other. A test of whether the two categorical variables are independent examines the distribution of counts for one group of individuals classified according to both variables in a contingency table. A chi-square test of independence uses the same calculation as a test of homogeneity.
Copyright © 2010 Pearson Education, Inc. Slide Assumptions and Conditions We still need counts and enough data so that the expected values are at least 5 in each cell. If were interested in the independence of variables, we usually want to generalize from the data to some population. In that case, well need to check that the data are a representative random sample from that population.
Copyright © 2010 Pearson Education, Inc. Slide What have we learned? All three methods we examined look at counts of data in categories and rely on chi-square models. Goodness-of-fit tests compare the observed distribution of a single categorical variable to an expected distribution based on theory or model. Tests of homogeneity compare the distribution of several groups for the same categorical variable. Tests of independence examine counts from a single group for evidence of an association between two categorical variables.
Copyright © 2010 Pearson Education, Inc. Slide Example: It has been proposed by some researchers that children who are the older ones in their class at school naturally perform better in sports and that these children then get more coaching and encouragement. Could that make a difference in who makes it to the professional level in sports. We have the birthdates of every one of the 16,804 players who ever played in a major league game from 1975 who played through Lets test whether the observed distribution of ballplayers birth months shows just random fluctuations or whether it represents a real deviation from the national pattern. How can we find the expected counts?
Copyright © 2010 Pearson Education, Inc. Slide MonthBallplayer CountNational birth % 11378% 21217% 31168% 41218% 51268% 61148% 71029% 81659% 91349% % % % Total %
Copyright © 2010 Pearson Education, Inc. Slide Example: Are the assumptions and conditions met for performing a goodness-of-fit test? Example: What are the hypotheses, and what does the test show? Example: Whats different about the distribution of birth months among major league ball players?
Copyright © 2010 Pearson Education, Inc. Slide Example: Tiny black potato flea beetles can damage potato plants. These pests chew holes in the leaves, causing the plants to die. They can be killed with an insecticide, but canola oil spray has been suggested as a natural method of controlling the beetles. We gather 500 beetles and place them in 3 containers. Two hundred in the first container, they are sprayed with canola oil. 200 are in the second container, sprayed with insecticide. 100 in the last container serve as a control group. We wait 6 hours and count the number of surviving beetles in each container. a. Why do we need the control group? b. What would our null hypothesis be? c. After the experiment is over, we could summarize the results in a table. Draw the table and calculate the degrees of freedom for our chi-squared test. d. All together, 125 beetles survived. Whats the expected count in the first cell – survivors among those sprayed with the natural spray? e. If it turns out that only 40 of the beetles in the first container survived, whats the calculated component of chi-squared for that cell? f. If the total calculated value of chi-squared for this table turns out to be around 10, would you expect the P-value of our test to be large or small? Explain.
Copyright © 2010 Pearson Education, Inc. Slide Which of following chi-square tests would you use to calculate each of the following situations? (goodness-of-fit, homogeneity, or independence) a. A restaurant manger wonders whether customers who dine on Friday nights have the same preferences among the four chefs specials entrees as those who dine on Saturday nights. One weekend he has the wait staff record which entrees were ordered each night. Assuming these customers to be typical of all weekend diners, hell compare the distributions of meals chosen Friday and Saturday. b. Company policy calls for parking spaces to be assigned to everyone at random, but you suspect that may not be so. There are three lots of equal size: lot A, next to the building; lot B, at bit father away, and lot C, on the other side of the highway. You gather data about employees at middle management level and above to see how many were assigned parking in each lot. c. Is a students social life affected by where the student lives? A campus survey asked a random sample of students whether they lived in a dorm, in off-campus housing, or at home and whether they had been out on a date 0, 1-2, 3-4, or 5 or more times in the past two months.
Copyright © 2010 Pearson Education, Inc. Slide Homework: Pg ,2,3-9 (day 2) pg odd
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Parametric vs Non-Parametric Statistical Tests. Single Sample.
Statistics. Be able to state the null and alternative hypotheses for testing the difference between two population proportions. Know how to examine.
21-1 Copyright 2010 McGraw-Hill Australia Pty Ltd PowerPoint slides to accompany Croucher, Introductory Mathematics and Statistics, 5e Chapter 21 Hypothesis.
Copyright © 2010 Pearson Education, Inc. Slide
Chapter 14 Inference for Distributions of Categorical Variables: Chi-Square Procedures AP Statistics Hamilton/Mann.
STA291 Statistical Methods Lecture 25. Goodness-of-Fit Tests Given the following… 1) Counts of items in each of several categories 2) A model that predicts.
1 Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION.
Copyright © 2010 Pearson Education, Inc. Slide The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted.
Hypothesis Testing Testing Statistical Significance.
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
AP Statistics Hamilton/Mann. Confidence intervals are one of the two most common types of statistical inference. Use confidence intervals when you.
Quantitative Methods Lecture 3 Populations and Samples Statistics books often assume we already know the true mean or the true variance of the whole population.
Copyright © 2011 by Pearson Education, Inc. All rights reserved Statistics for the Behavioral and Social Sciences: A Brief Course Fifth Edition Arthur.
Hypothesis Tests: Two Independent Samples Cal State Northridge 320 Andrew Ainsworth PhD.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Estimation and Confidence Intervals Chapter 09 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Lesson 14 - R Chapter 14 Review: Inference for Distribution of Categorical Variables: Chi-Square Procedures.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
1 MRes Course THE ANALYSIS OF PSYCHOLOGICAL DATA.
1 SESSION 1 Basic statistics: points and pitfalls.
The. of and a to in is you that it he for.
1 Matters arising 1.Summary of last weeks lecture 2.The exercises.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 2 Overview of the Data Mining Process 1. Introduction Data Mining – Predictive analysis Tasks of Classification & Prediction Core of Business.
© 2016 SlidePlayer.com Inc. All rights reserved.