Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal (or Ordinal) Variables —It’s All About Deviations!

Slides:



Advertisements
Similar presentations
Analyzing Chi-Squares and Correlations
Advertisements

Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Chapter 7 Sampling and Sampling Distributions
Chi-square, Goodness of fit, and Contingency Tables
Contingency Table Analysis Mary Whiteside, Ph.D..
Chapter 16 Goodness-of-Fit Tests and Contingency Tables
Chi-Square and Analysis of Variance (ANOVA)
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
Comparing Two Groups’ Means or Proportions: Independent Samples t-tests.
The Kruskal-Wallis H Test
ANALYSIS OF VARIANCE (ONE WAY)
Chapter 15 ANOVA.
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Inferential Statistics
Copyright © 2014 Pearson Education, Inc. All rights reserved Chapter 10 Associations Between Categorical Variables.
Chapter 18: The Chi-Square Statistic
Chapter 11 Other Chi-Squared Tests
Basic Statistics The Chi Square Test of Independence.
Bivariate Analysis Cross-tabulation and chi-square.
Hypothesis Testing IV Chi Square.
Chapter 13: The Chi-Square Test
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 12 Chicago School of Professional Psychology.
Chi-square Test of Independence
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Week 10 Chapter 10 - Hypothesis Testing III : The Analysis of Variance
Chapter SixteenChapter Sixteen. Figure 16.1 Relationship of Frequency Distribution, Hypothesis Testing and Cross-Tabulation to the Previous Chapters and.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Copyright © 2012 by Nelson Education Limited. Chapter 10 Hypothesis Testing IV: Chi Square 10-1.
Quantitative Methods Partly based on materials by Sherry O’Sullivan Part 3 Chi - Squared Statistic.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Chapter 16 The Chi-Square Statistic
Chi-Square X 2. Parking lot exercise Graph the distribution of car values for each parking lot Fill in the frequency and percentage tables.
Chi- square test x 2. Chi Square test Symbolized by Greek x 2 pronounced “Ki square” A Test of STATISTICAL SIGNIFICANCE for TABLE data.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
+ Chi Square Test Homogeneity or Independence( Association)
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Chapter 11: Chi-Square  Chi-Square as a Statistical Test  Statistical Independence  Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Chapter Eight: Using Statistics to Answer Questions.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
The Analysis of Variance ANOVA
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal/Ordinal Variables.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
I. ANOVA revisited & reviewed
Introduction to Marketing Research
Basic Statistics The Chi Square Test of Independence.
Chi-Square (Association between categorical variables)
Chapter 12 Chi-Square Tests and Nonparametric Tests
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Chi-Square X2.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 16: Research with Categorical Data.
Making Use of Associations Tests
Hypothesis Testing Review
Qualitative data – tests of association
The Chi-Square Distribution and Test for Independence
Chapter 10 Analyzing the Association Between Categorical Variables
Analyzing the Association Between Categorical Variables
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
CHI SQUARE (χ2) Dangerous Curves Ahead!.
Presentation transcript:

Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal (or Ordinal) Variables —It’s All About Deviations!

Cross Tabs and Chi-Squared The test you choose depends on level of measurement: IndependentDependentTest DichotomousInterval-ratio Independent Samples t-test Dichotomous NominalInterval-ratioANOVADichotomous Nominal (Ordinal)Nominal (Ordinal)Cross TabsDichotomous

Cross Tabs and Chi-Squared We are asking whether there is a relationship between two nominal (or ordinal) variables— this includes dichotomous variables. One may use cross tabs for ordinal variables, but it is generally better to use more powerful statistical techniques if you can treat them as interval-ratio variables.

Cross Tabs and Chi-Squared Cross tabs and Chi-Squared will tell you whether classification on one nominal variable is related to classification on a second nominal variable. For Example: Are rural Americans more likely to vote Republican in presidential races than urban Americans? Classification of Region Party Vote Are white people more likely to drive SUV’s than blacks or Latinos? Classification on RaceType of Vehicle

Cross Tabs and Chi-Squared The statistical focus will be on the number or “count” of people in a sample who are classified in patterned ways on two variables. Or The number or “count” of people classified in each category created when considering both variables at the same time such as: # White & SUV# Black & SUV # White & Car# Black & Car Race Vehicle Type

Cross Tabs and Chi-Squared Number in Each Joint Group? Why? Means and standard deviations are meaningless for nominal variables. So we need statistics that allow us to work “categorically.”

Cross Tabs The procedure starts with a “cross classification” of the cases in categories of each variable. Example: Data on male and female support for keeping SJSU football from 650 students put into a matrix Yes NoMaybeTotal Female: Male: Total:

Cross Tabs In the example, you can see that the campus is divided on the issue. But is there an association between sex and attitudes? Example: Data on male and female support for SJSU football from 650 students put into a matrix Yes NoMaybeTotal Female: Male: Total:

Cross Tabs Descriptive Statistics But is there an association between sex and attitudes? An easy way to get more information is to convert the frequencies (or “counts” in each cell) to percentages Data on male and female support for SJSU football from 650 students put into a matrix Yes NoMaybeTotal Female:185(41%)200(44%)65(14%)450(99%)* Male:80(40%)65(33%)55(28%)200(101%) Total:265(41%)265(41%)120(18%)650(100%) *percentages d not add to 100 due to rounding

Cross Tabs Descriptive Statistics We can see that in the sample men are less likely to oppose football, but no more likely to say “yes” than women—men are more likely to say “maybe” Data on male and female support for SJSU football from 650 students put into a matrix Yes NoMaybeTotal Female:185(41%)200(44%)65(14%)450(99%)* Male:80(40%)65(33%)55(28%)200(101%) Total:265(41%)265(41%)120(18%)650(100%) *percentages d not add to 100 due to rounding

Chi-Squared Using percentages to describe relationships is valid statistical analysis: These are descriptive statistics! However, they are not inferential statistics. What can we say about the population using this sample (inferential statistics)? Thinking about random variations in who would be selected from random sample to random sample… Could we have gotten sample statistics like these from a population where there is no association between sex and attitudes about keeping football? The Chi-Squared Test of Independence allows us to answer questions like those above.

Chi-Squared The whole idea behind the Chi-Squared test of independence is to determine whether the patterns of frequencies (or “counts”) in your cross classification table could have occurred by chance, or whether they represent systematic assignment to particular cells. For example, were women more likely to answer “no” than men or could the deviation in responses by sex have occurred because of random sampling or chance alone?

Calculating Chi-Squared A number called Chi-Squared,  2, tells us whether the numbers in each cross classification cell in our sample deviate from the kind of random fluctuations you would get if our two variables were not associated with each other (independent of each other). It’s formula: f o = observed frequency in each cellf e = expected frequency in each cell The crux of  2 is that it gets larger as observed data deviate more from the data we would expect if our variables were unrelated. From sample to sample, one would expect deviations from what is expected even when variables are unrelated. But when  2 gets really big it grows beyond the numbers that random variation in samples would produce. A big  2 will imply that there is a relationship between our two nominal variables.  2 =  ((f o - f e ) 2 / f e )

Calculating Chi-Squared Calculating  2 begins with the concept of a deviation of observed data from what is expected by unrelated variables. Deviation in  2 = Observed frequency – Expected frequency Observed frequency is just the number of cases in each cell of the cross classification table for your sample. For example, 185 women said “yes,” they support football at SJSU. 185 is the observed frequency. Expected frequency is the number of cases that would be in a cell of the cross classification table if people in each group of one variable were classified in the second variable’s groups in the same ways.  2 =  ((f o - f e ) 2 / f e )

Chi-Squared, Expected Frequency Data on male and female support for SJSU football from 650 students Yes NoMaybeTotal Female:??? % Male:??? % Total: % Expected frequency (if our variables were unrelated): Females comprise 69.2% of the sample, so we’d expect 69.2% of the “Yes” answers to come from females, and 69.2% of “No” and “Maybe” answers to come from females. On the other hand, 30.8% of the “Yes,” “No,” and “Maybe” answers should come from Men. Therefore, to calculate expected frequency for each cell you do this: f e = cell’s row total / table total * cell’s column total or f e = cell’s column total / table total * cell’s row total The idea: 1. Find the percent of persons in one category on the first variable then 2. “Expect” to find that percent of those people in each of the other variable’s categories.  2 =  ((f o - f e ) 2 / f e )

Chi Squared, Expected Frequency Data on male and female support for SJSU football from 650 students Yes NoMaybeTotal Female: f e1 = f e2 = f e3 = % Male: f e4 = 81.5 f e5 = 81.5 f e6 = % Total: % Now you know how to calculate the expected frequencies: f e1 = (450/650) * 265 = f e4 = (200/650) * 265 = 81.5 f e2 = (450/650) * 265 = f e5 = (200/650) * 265 = 81.5 f e3 = (450/650) * 120 = 83.1 f e6 = (200/650) * 120 = 36.9 …and the observed frequencies are obvious  2 =  ((f o - f e ) 2 / f e )

Chi-Squared, Deviations Data on male and female support for SJSU football from 650 students Yes: fo (Yes: fe) No: fo (No: fe) Maybe: fo (Maybe: fe) Total Female: 185 (183.5) 200 (183.5) 65 (83.1) % Male: 80 (81.5) 65 (81.5) 55 (36.9) % Total: % You already know how to calculate the deviations too. D c = f o – f e D 1 = 185 – = 1.5 D 4 = 80 – 81.5 = -1.5 D 2 = 200 – = 16.5 D 5 = 65 – 81.5 = D 3 = 65 – 83.1 = D 4 = 55 – 36.9 = 18.1  2 =  ((f o - f e ) 2 / f e )

Chi-Squared, Deviations Data on male and female support for SJSU football from 650 students Yes: fo (Yes: fe) No: fo (No: fe) Maybe: fo (Maybe: fe) Total Female: 185 (183.5) 200 (183.5) 65 (83.1) % Male: 80 (81.5) 65 (81.5) 55 (36.9) % Total: % Deviations: D c = f o – f e D 1 = 185 – = 1.5 D 4 = 80 – 81.5 = -1.5 D 2 = 200 – = 16.5 D 5 = 65 – 81.5 = D 3 = 65 – 83.1 = D 4 = 55 – 36.9 = 18.1 Now, we want to add up the deviations… What would happen if we added these deviations together? To get rid of negative deviations, we square each one (like in computing variance and standard deviation).  2 =  ((f o - f e ) 2 / f e )

Chi-Squared, Deviations Squared Data on male and female support for SJSU football from 650 students Yes: fo (Yes: fe) No: fo (No: fe) Maybe: fo (Maybe: fe) Total Female: 185 (183.5) 200 (183.5) 65 (83.1) % Male: 80 (81.5) 65 (81.5) 55 (36.9) % Total: % Deviations: D c = f o – f e D 1 = 185 – = 1.5 D 4 = 80 – 81.5 = -1.5 D 2 = 200 – = 16.5 D 5 = 65 – 81.5 = D 3 = 65 – 83.1 = D 4 = 55 – 36.9 = 18.1 To get rid of negative deviations, we square each one (like for variance and standard deviation). (D 1 ) 2 = (1.5) 2 = 2.25 (D 4 ) 2 = (-1.5) 2 = 2.25 (D 2 ) 2 = (16.5) 2 = (D 5 ) 2 = (-16.5) 2 = (D 3 ) 2 = (-18.1) 2 = (D 6 ) 2 = (18.1) 2 =  2 =  ((f o - f e ) 2 / f e )

Chi-Squared, Deviations Squared Just how large is each of these squared deviations? What do these numbers really mean? Squared Deviations: (D 1 ) 2 = (1.5) 2 = 2.25 (D 4 ) 2 = (-1.5) 2 = 2.25 (D 2 ) 2 = (16.5) 2 = (D 5 ) 2 = (-16.5) 2 = (D 3 ) 2 = (-18.1) 2 = (D 6 ) 2 = (18.1) 2 =  2 =  ((f o - f e ) 2 / f e )

Chi-Squared, Relative Deviations 2 The next step is to give the deviations a “metric.” The deviations are compared relative to the what was expected. In other words, we divide by what was expected. Squared Deviations: (D 1 ) 2 = (1.5) 2 = 2.25 (D 4 ) 2 = (-1.5) 2 = 2.25 (D 2 ) 2 = (16.5) 2 = (D 5 ) 2 = (-16.5) 2 = (D 3 ) 2 = (-18.1) 2 = (D 6 ) 2 = (18.1) 2 = You’ve already calculated what was expected in each cell: f e1 = (450/650) * 265 = f e4 = (200/650) * 265 = 81.5 f e2 = (450/650) * 265 = 183.5f e5 = (200/650) * 265 = 81.5 f e3 = (450/650) * 120 = 83.1f e6 = (200/650) * 120 = 36.9  2 =  ((f o - f e ) 2 / f e )

Chi-Squared, Relative Deviations 2 Squared Deviations: (D 1 ) 2 = (1.5) 2 = 2.25 (D 4 ) 2 = (-1.5) 2 = 2.25 (D 2 ) 2 = (16.5) 2 = (D 5 ) 2 = (-16.5) 2 = (D 3 ) 2 = (-18.1) 2 = (D 6 ) 2 = (18.1) 2 = Expected Frequencies: f e1 = (450/650) * 265 = f e4 = (200/650) * 265 = 81.5 f e2 = (450/650) * 265 = 183.5f e5 = (200/650) * 265 = 81.5 f e3 = (450/650) * 120 = 83.1f e6 = (200/650) * 120 = 36.9 Relative Deviations-squared—Small values indicate little deviation from what was expected, while larger values indicate much deviation from what was expected: (D 1 ) 2 / f e1 = 2.25 / = (D 4 ) 2 / f e4 = 2.25 / 81.5 = (D 2 ) 2 / f e2 = / = (D 5 ) 2 / f e5 = / 81.5 = (D 3 ) 2 / f e3 = / 83.1 = (D 6 ) 2 / f e6 = / 36.9 =  2 =  ((f o - f e ) 2 / f e )

Chi-Squared Relative Deviations-squared—Small values indicate little deviation from what was expected, while larger values indicate much deviation from what was expected: (D 1 ) 2 / f e1 = 2.25 / = (D 4 ) 2 / f e4 = 2.25 / 81.5 = (D 2 ) 2 / f e2 = / = (D 5 ) 2 / f e5 = / 81.5 = (D 3 ) 2 / f e3 = / 83.1 = (D 6 ) 2 / f e6 = / 36.9 = The next step will be to see what the total relative deviations-squared are: Sum of Relative Deviations-squared = = This number is also what we call Chi-Squared or  2. So… Of what good is knowing this number?  2 =  ((f o - f e ) 2 / f e )  2 =  ((f o - f e ) 2 / f e )

Chi-Squared, Degrees of Freedom This value,  2, would form an identifiable shape in repeated sampling if the two variables were unrelated to each other—the chance variation that we should expect among samples. That shape depends only on the number of rows and columns (or the nature of your variables). We technically refer to this as the “degrees of freedom.” For  2, df =(#rows – 1)*(#columns – 1)

Chi-Squared Distributions For  2, df =(#rows – 1)*(#columns – 1)  2 distributions: df = 20 df = 10 df = 5 df = FYI: This should remind you of the normal distribution, except that, it changes shape depending on the nature of your variables.

Chi-Squared, Significance Test We can use the known properties of the  2 distribution to identify the probability that we would get our sample’s  2 if our variables were not related to each other! This is exciting! Think of the Power!!!!

Chi-Squared, Significance Test  2 Using the null that our variables are unrelated, when  2 is large enough to be in the shaded area, what can be said about the population given our sample? 5% of  2 values My Chi-squared

Chi-Squared, Significance Test  2 Answer: We’d reject the null, saying that it is highly unlikely that we could get such a large chi-squared value from a population where the two variables are unrelated. 5% of  2 values My Chi-squared

Critical Chi-Squared  2 So, what does the critical  2 value equal? 5% of  2 values My Chi-squared

Critical Chi-Squared That depends on the particular problem because the distribution changes depending on the number of rows and columns in your cross classification table. df = 20 df = 10 df = 5 df = Critical  2 ‘s 22

Critical Chi-Squared According to A.4 in Field, with  -level =.05, if: df = 1, critical  2 = 3.84 df = 5, critical  2 = df = 10, critical  2 = df = 20, critical  2 = df = 20 df = 10 df = 5 df = 22

Critical Chi-Squared In our football problem above, we had a chi-squared of in a cross classification table with 2 rows and 3 columns. Our chi-squared distribution for that table has df = (2 – 1) * (3 – 1) = 2. According to A.4, with  -level =.05, Critical Chi-Squared is: Since > 5.99, we reject the null. We reject that our sample could have come from a population where sex was not related to attitudes toward football. 5% of  2 values Football Chi-squared 22 df = 2

Chi-Squared 7-Step Significance Test Now let’s get formal… 7 steps to Chi-squared test of independence: 1. Set  -level (e.g.,.05) 2. Find Critical  2 (depends on df and  -level) 3. The null and alternative hypotheses: H o : The two variables are independent of each other H a : The two variables are dependent on each other 4. Collect Data 5. Calculate  2 :  2 =  ((f o - f e ) 2 / f e ) 6. Make decision about the null hypothesis 7. Report the p-value

Chi-Squared Interpretation Afterwards, what is discovered? If Chi-Squared is not significant, the variables are unrelated. If Chi-Squared is significant, the variables are related. That’s All! Chi-Squared cannot tell you anything like the strength or direction of association. For purely nominal variables, there is no “direction” of association.

Chi-Squared Properties Other points… 1. Chi-Squared is a large-sample test. If dealing with small samples, look up appropriate tests. (A condition of the test: no expected frequency lower than 5 in each cell) 2. The larger the sample size, the easier it is for Chi-Squared to be significant x 2 table Chi-Square gives same result as Independent Samples t-test for proportion and ANOVA.

Cross Tabs, Strength and Direction of Association - Ordinal Variables Further topics you could explore: Strength of Association Discussing outcomes in terms of difference of proportions Reporting Odds Ratios (likelihood of a group giving one answer versus other answers or the group giving an answer relative to other groups giving that answer) Yule’s Q and Phi: for 2x2 tables, ranging from -1 to 1, with 0 indicating no relationship and 1 a strong relationship Strength and Direction of Association for Ordinal--not nominal--Variables Gamma (an inferential statistic, so check for significance) Ranges from -1 to +1 Valence indicates direction of relationship Magnitude indicates strength of relationship Chi-squared and Gamma can disagree when there is a nonrandom pattern that has no direction. Chi-squared will catch it, gamma won’t. Tau c Kendall’s tau-b Somer’s d

Cross Tabs and Chi-Squared Controlling for a Third Variable Controlling for a third variable. One can see the relationship between two variables for each level of a third variable. E.g., Sex and Football by Lower or Upper Division. Yes No Maybe UpperF M Yes No Maybe LowerF M

Cross Tabs and Chi-Squared Controlling for a Third Variable Sex and Pornlaws

Sex and Pornlaw by Sex Education Cross Tabs and Chi-Squared Controlling for a Third Variable

Cross Tabs and Chi Squared Another Example A criminologist is interested in the effects of placement type on recidivism and severity of crime. He collects data from delinquents in four placement types: Boot Camp (n = 50) Wilderness Expedition (n = 75) Electronic Supervision (n = 100) Residential Treatment (n = 25) He records records recidivism and severity of crime. The categories are no crime (NC), minor (MI), moderate (MO), serious (S). Severity for each group includes: NCMIMOS Boot Camp: Wilderness Expedition: Electronic Supervision: Residential Treatment:55105