Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal/Ordinal Variables.

Slides:



Advertisements
Similar presentations
Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal (or Ordinal) Variables —It’s All About Deviations!
Advertisements

Bivariate Analysis Cross-tabulation and chi-square.
Hypothesis Testing IV Chi Square.
Comparing Two Groups’ Means or Proportions Independent Samples t-tests.
Chapter 13: The Chi-Square Test
Sociology 601 Class 13: October 13, 2009 Measures of association for tables (8.4) –Difference of proportions –Ratios of proportions –the odds ratio Measures.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Statistical Tests Karen H. Hagglund, M.S.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 12 Chicago School of Professional Psychology.
Chi-square Test of Independence
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies – calculating Chi-square – finding p When (not) to use Chi-squared.
Inferential Statistics  Hypothesis testing (relationship between 2 or more variables)  We want to make inferences from a sample to a population.  A.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Today Concepts underlying inferential statistics
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Presentation 12 Chi-Square test.
Comparing Two Groups’ Means or Proportions
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Chapter 10 Analyzing the Association Between Categorical Variables
How Can We Test whether Categorical Variables are Independent?
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Chapter SixteenChapter Sixteen. Figure 16.1 Relationship of Frequency Distribution, Hypothesis Testing and Cross-Tabulation to the Previous Chapters and.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Copyright © 2012 by Nelson Education Limited. Chapter 10 Hypothesis Testing IV: Chi Square 10-1.
Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.
Quantitative Methods Partly based on materials by Sherry O’Sullivan Part 3 Chi - Squared Statistic.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Chapter 16 The Chi-Square Statistic
Chi-Square X 2. Parking lot exercise Graph the distribution of car values for each parking lot Fill in the frequency and percentage tables.
Lecture 15: Crosstabulation 1 Sociology 5811 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
+ Chi Square Test Homogeneity or Independence( Association)
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
CHI SQUARE TESTS.
Chi-square Test of Independence
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
Chapter 11: Chi-Square  Chi-Square as a Statistical Test  Statistical Independence  Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Week 13a Making Inferences, Part III t and chi-square tests.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Introduction to Marketing Research
Hypothesis Testing Review
Qualitative data – tests of association
Chapter 10 Analyzing the Association Between Categorical Variables
Analyzing the Association Between Categorical Variables
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
CHI SQUARE (χ2) Dangerous Curves Ahead!.
Presentation transcript:

Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal/Ordinal Variables

Cross Tabs and Chi-Squared The test you choose depends on level of measurement: IndependentDependentStatistical Test DichotomousInterval-ratio Independent Samples t-test Dichotomous NominalNominalCross TabsDichotomous NominalInterval-ratioANOVA Dichotomous Interval-ratioInterval-ratioCorrelation and OLS Regression Dichotomous

Cross Tabs and Chi-Squared We are asking whether there is a relationship between two nominal (or ordinal) variables—this includes dichotomous variables. (Even though one may use cross tabs for ordinal variables, it is generally better to treat them as interval variables and use more powerful statistical techniques whenever you can.)

Cross Tabs and Chi-Squared Cross tabs and Chi-Squared will tell you whether classification on one nominal or ordinal variable is related to classification on a second nominal or ordinal variable. For Example: Are rural Americans more likely to vote Republican in presidential races than urban Americans? Classification of Region Party Vote Are white people more likely to drive SUV’s than blacks or Hispanics? RaceType of Vehicle

Cross Tabs and Chi-Squared The statistical focus will be on the number of people in a sample who are classified in patterned ways on two variables. Why? Means and standard deviations are meaningless for nominal variables.

Cross Tabs and Chi-Squared The procedure starts with a “cross classification” of the cases in categories of each variable. Example: Data on male and female support for SJSU football from 650 students put into a matrix Yes NoMaybeTotal Female: Male: Total:

Cross Tabs and Chi-Squared In the example, I can see that the campus is divided on the issue. But are there associations between sex and attitudes? Example: Data on male and female support for SJSU football from 650 students put into a matrix Yes NoMaybeTotal Female: Male: Total:

Cross Tabs and Chi-Squared But are there associations between sex and attitudes? An easy way to get more information is to convert the frequencies to percentages. Example: Data on male and female support for SJSU football from 650 students put into a matrix Yes NoMaybeTotal Female:185(41%)200(44%)65(14%)450(99%)* Male:80(40%)65(33%)55(28%)200(101%) Total:265(41%)265(41%)120(18%)650(100%) *percentages do not add to 100 due to rounding

Cross Tabs and Chi-Squared We can see that in the sample men are less likely to oppose football, but no more likely to say “yes” than women—men are more likely to say “maybe” Example: Data on male and female support for SJSU football from 650 students put into a matrix Yes NoMaybeTotal Female:185(41%)200(44%)65(14%)450(99%)* Male:80(40%)65(33%)55(28%)200(101%) Total:265(41%)265(41%)120(18%)650(100%) *percentages do not add to 100 due to rounding

Cross Tabs and Chi-Squared Data on male and female support for SJSU football from 650 students put into a matrix Yes NoMaybeTotal Female:185(41%)200(44%)65(14%)450(99%)* Male:80(40%)65(33%)55(28%)200(101%) Total:265(41%)265(41%)120(18%)650(100%) *percentages do not add to 100 due to rounding Using percentages to describe relationships is valid statistical analysis: These are descriptive statistics! However, they are not inferential statistics. What can we say about the population? Could we have gotten sample statistics like these from a population where there is no association between sex and attitudes about starting football? This is where the Chi-Squared Test of Independence comes in handy.

Cross Tabs and Chi-Squared The whole idea behind the Chi-Squared test of independence is to determine whether the patterns of frequencies in your cross classification table could have occurred by chance, or whether they represent systematic assignment to particular cells. For example, were women more likely to answer “no” than men or could the deviation in responses by sex have occurred because of random sampling or chance alone?

Cross Tabs and Chi-Squared A number called Chi-Squared,  2, tells us whether the numbers in our sample deviate from what would be expected by chance. It’s formula: f o = observed frequency in each cellf e = expected frequency in each cell A bigger  2 will result as our sample data deviates more and more from what would be expected by chance. A big  2 will imply that there is a relationship between our two nominal variables.  2 =  ((f o - f e ) 2 / f e )

Cross Tabs and Chi-Squared Calculating  2 begins with the concept of a deviation of observed data from what is expected by chance alone. Deviation in  2 = Observed frequency – Expected frequency Observed frequency is just the number of cases in each cell of the cross classification table. For example, 185 women said “yes,” they support football at SJSU. 185 is the observed frequency. Expected frequency is the number of cases that would be in a cell of the cross classification table if people in each group of one variable had a propensity to answer the same as each other on the second variable.  2 =  ((f o - f e ) 2 / f e )

Cross Tabs and Chi-Squared Data on male and female support for SJSU football from 650 students Yes NoMaybeTotal Female: Male: Total: Expected frequency (if our variables were unrelated): Since females comprise 69.2% of the sample, we’d expect 69.2% of the “Yes” answers to come from females, 69.2% of the “No” answers to come from females, and 69.2% of the “Maybe” answers to come from females. On the other hand, 30.8% of the “Yes,” “No,” and “Maybe” answers should come from Men. Therefore, to calculate expected frequency for each cell you do this: f e = cell’s row total / table total * cell’s column total or f e = cell’s column total / table total * cell’s row total The idea is that you find the percent of persons in one category on the first variable, and “expect” to find that percent of those people in the other variable’s categories.  2 =  ((f o - f e ) 2 / f e )

Cross Tabs and Chi-Squared Data on male and female support for SJSU football from 650 students Yes NoMaybeTotal Female: Male: Total: Now you know how to calculate the expected frequency (and the observed frequency is obvious). f e1 = (450/650) * 265 = f e4 = (200/650) * 265 = 81.5 f e2 = (450/650) * 265 = f e5 = (200/650) * 265 = 81.5 f e3 = (450/650) * 120 = 83.1 f e6 = (200/650) * 120 = 36.9 You already saw how to calculate the deviations too. D c = f o – f e D 1 = 185 – = 1.5 D 4 = 80 – 81.5 = -1.5 D 2 = 200 – = 16.5 D 5 = 65 – 81.5 = D 3 = 65 – 83.1 = D 4 = 55 – 36.9 = 18.1  2 =  ((f o - f e ) 2 / f e )

Cross Tabs and Chi-Squared Data on male and female support for SJSU football from 650 students Yes NoMaybeTotal Female: Male: Total: Deviations: D c = f o – f e D 1 = 185 – = 1.5 D 4 = 80 – 81.5 = -1.5 D 2 = 200 – = 16.5 D 5 = 65 – 81.5 = D 3 = 65 – 83.1 = D 4 = 55 – 36.9 = 18.1 Now, we want to add up the deviations… What would happen if we added these deviations together? To get rid of negative deviations, we square each one (like in computing standard deviations).  2 =  ((f o - f e ) 2 / f e )

Cross Tabs and Chi-Squared Data on male and female support for SJSU football from 650 students Yes NoMaybeTotal Female: Male: Total: Deviations: D c = f o – f e D 1 = 185 – = 1.5 D 4 = 80 – 81.5 = -1.5 D 2 = 200 – = 16.5 D 5 = 65 – 81.5 = D 3 = 65 – 83.1 = D 4 = 55 – 36.9 = 18.1 To get rid of negative deviations, we square each one (like in standard deviations). (D 1 ) 2 = (1.5) 2 = 2.25 (D 4 ) 2 = (-1.5) 2 = 2.25 (D 2 ) 2 = (16.5) 2 = (D 5 ) 2 = (-16.5) 2 = (D 3 ) 2 = (-18.1) 2 = (D 6 ) 2 = (18.1) 2 =  2 =  ((f o - f e ) 2 / f e )

Cross Tabs and Chi-Squared Squared Deviations: (D 1 ) 2 = (1.5) 2 = 2.25 (D 4 ) 2 = (-1.5) 2 = 2.25 (D 2 ) 2 = (16.5) 2 = (D 5 ) 2 = (-16.5) 2 = (D 3 ) 2 = (-18.1) 2 = (D 6 ) 2 = (18.1) 2 = Just how large is each of these squared deviations? The next step is to give the deviations a “metric.” The deviations are compared relative to the what was expected. In other words, we divide by what was expected. You’ve already calculated what was expected in each cell: f e1 = (450/650) * 265 = f e4 = (200/650) * 265 = 81.5 f e2 = (450/650) * 265 = 183.5f e5 = (200/650) * 265 = 81.5 f e3 = (450/650) * 120 = 83.1f e6 = (200/650) * 120 = 36.9 Relative Deviations-squared—Small values indicate little deviation from what was expected, while larger values indicate much deviation from what was expected: (D 1 ) 2 / f e1 = 2.25 / = (D 4 ) 2 / f e4 = 2.25 / 81.5 = (D 2 ) 2 / f e2 = / = (D 5 ) 2 / f e5 = / 81.5 = (D 3 ) 2 / f e3 = / 83.1 = (D 6 ) 2 / f e6 = / 36.9 =  2 =  ((f o - f e ) 2 / f e )

Cross Tabs and Chi-Squared Relative Deviations-squared—Small values indicate little deviation from what was expected, while larger values indicate much deviation from what was expected: (D 1 ) 2 / f e1 = 2.25 / = (D 4 ) 2 / f e4 = 2.25 / 81.5 = (D 2 ) 2 / f e2 = / = (D 5 ) 2 / f e5 = / 81.5 = (D 3 ) 2 / f e3 = / 83.1 = (D 6 ) 2 / f e6 = / 36.9 = The next step will be to see what the total relative deviations-squared are: Sum of Relative Deviations-squared = = This number is also what we call Chi-Squared or  2. So… Of what good is knowing this number?  2 =  ((f o - f e ) 2 / f e )  2 =  ((f o - f e ) 2 / f e )

Cross Tabs and Chi-Squared This value,  2, would form an identifiable shape in repeated sampling if the two variables were unrelated to each other. That shape depends only on the number of rows and columns. We technically refer to this as the “degrees of freedom.” For  2, df =(#rows – 1)*(#columns – 1)

Cross Tabs and Chi-Squared For  2, df =(#rows – 1)*(#columns – 1)  2 distributions: df = 20 df = 10 df = 5 df = FYI: This should remind you of the normal distribution, except that, it changes shape depending on the nature of your variables.

Cross Tabs and Chi-Squared We can use the known properties of the  2 distribution to identify the probability that we would get our sample’s  2 if our variables were unrelated! This is exciting! Think of the Power!!!!

Cross Tabs and Chi-Squared If our  2 in a particular analysis were under the shaded area or beyond, what could we say about the population given our sample? 5% of  2 values

Cross Tabs and Chi-Squared Answer: We’d reject the null, saying that it is highly unlikely that we could get such a large chi-squared value from a population where the two variables are unrelated. 5% of  2 values

Cross Tabs and Chi-Squared So, what is the critical  2 value? 5% of  2 values

Cross Tabs and Chi-Squared That depends on the particular problem because the distribution changes depending on the number of rows and columns. df = 20 df = 10 df = 5 df = Critical  2 ‘s

Cross Tabs and Chi-Squared According to Table C, df = 1, critical  2 = 3.84 with  -level =.05, if:df = 5, critical  2 = df = 10, critical  2 = df = 20, critical  2 = df = 20 df = 10 df = 5 df =

Cross Tabs and Chi-Squared In our football problem above, we had a chi-squared of in a cross classification table with 2 rows and 3 columns. Our chi-squared distribution for that table would have df = (2 – 1) * (3 – 1) = 2. According to Table C, with  -level =.05, Critical Chi-Squared is: Since > 5.99, we reject the null. We reject that our sample could have come from a population where sex was not related to attitudes toward football.

Cross Tabs and Chi-Squared Now let’s get formal… 7 steps to Chi-squared test of independence: 1.Set  -level (e.g.,.05) 2.Find Critical  2 (depends on df and  -level) 3.The null and alternative hypotheses: H o : The two nominal variables are independent H a : The two variables are dependent on each other 4.Collect Data 5.Calculate  2 :  2 =  ((f o - f e ) 2 / f e ) 6.Make decision about the null hypothesis 7.Report the P-value

Cross Tabs and Chi-Squared Afterwards, what have you found? If Chi-Squared is not significant, your variables are unrelated. If Chi-Squared is significant, your variables are related. That’s All! Chi-Squared cannot tell you anything like the strength or direction of association. For purely nominal variables, there is no “direction” of association. Chi-Squared is a large-sample test. If dealing with small samples, look up appropriate tests. (A condition of the test: no expected frequency lower than 5 in each cell) The larger the sample size, the easier it is for Chi-Squared to be significant. 2 x 2 table Chi-Square gives same result as Independent Samples t- test for proportion and ANOVA.

Cross Tabs and Chi-Squared If you want to know how you depart from independence, you may: 1.Check percentages (conditional distributions) in your cross classification table. 2.Do a residual analysis: The difference between observed and expected counts in a cell behaves like a significance test when divided by a standard error for the difference. That s.e. =  f e *(1-cell’s row  )*(1 – cell’s column  ) f o – f e Z = s.e.

Cross Tabs and Chi-Squared Residual Analysis: Let’s do cell 5! s.e. =  f e *(1-cell’s row  )*(1 – cell’s column  ) f o – f e #5 row  = 200/650 =.308, column  = 265/650 =.408 Z = s.e. s.e. =  81.5 * (.692) * (.592) = 5.78 Z = 65 – 81.5 / 5.78 = -2.85; 2.85 > 1.96, there is a significant difference in cell 5 Data on male and female support for SJSU football from 650 students Yes NoMaybeTotal Female: Male: Total: f e1 = (450/650) * 265 = f e4 = (200/650) * 265 = 81.5 f e2 = (450/650) * 265 = f e5 = (200/650) * 265 = 81.5 f e3 = (450/650) * 120 = 83.1 f e6 = (200/650) * 120 = 36.9 Deviations: D c = f o – f e D 1 = 185 – = 1.5 D 4 = 80 – 81.5 = -1.5 D 2 = 200 – = 16.5 D 5 = 65 – 81.5 = D 3 = 65 – 83.1 = D 4 = 55 – 36.9 = 18.1

Cross Tabs and Chi-Squared Further topics you could explore: Strength of Association Discussing outcomes in terms of difference of proportions Reporting Odds Ratios (likelihood of a group giving one answer versus other answers or the group giving an answer relative to other groups giving that answer) Strength and Direction of Association for Ordinal Variables Gamma (an inferential statistic, so check for significance) Ranges from -1 to +1 Valence indicates direction of relationship Magnitude indicates strength of relationship Chi-squared and Gamma can disagree when there is a nonrandom pattern that has no direction. Chi-squared will catch it, gamma won’t. Kendall’s tau-b Somer’s d