CATEGORICAL VARIABLES Testing hypotheses using. When only one variable is being measured, we can display it. But we can’t answer why does this variable.

Slides:



Advertisements
Similar presentations
Relationships Between Two Variables: Cross-Tabulation
Advertisements

DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.
Bivariate Analysis Cross-tabulation and chi-square.
CHAPTER 23: Two Categorical Variables The Chi-Square Test ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Research Designs. REVIEW Review -- research General types of research – Descriptive (“what”) – Exploratory (find out enough to ask “why”) – Explanatory.
Cross-Tabulations.
Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.
Significance Testing 10/22/2013. Readings Chapter 3 Proposing Explanations, Framing Hypotheses, and Making Comparisons (Pollock) (pp ) Chapter 5.
How Can We Test whether Categorical Variables are Independent?
Chapter 15 – Elaborating Bivariate Tables
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
Modeling Possibilities
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.
DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Chapter 10: Relationships Between Two Variables: CrossTabulation
Chi-Square X 2. Parking lot exercise Graph the distribution of car values for each parking lot Fill in the frequency and percentage tables.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Difference Between Means Test (“t” statistic) Analysis of Variance (“F” statistic)
Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.
Chapter 13 Inference for Counts: Chi-Square Tests © 2011 Pearson Education, Inc. 1 Business Statistics: A First Course.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
Chapter 10: Cross-Tabulation Relationships Between Variables  Independent and Dependent Variables  Constructing a Bivariate Table  Computing Percentages.
Variables, measurement and causation. Variable Any personal or physical characteristic that... –Can change –The change must be measurable Examples of.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
1.1 Analyzing Categorical Data Pages 7-24 Objectives SWBAT: 1)Display categorical data with a bar graph. Decide if it would be appropriate to make a pie.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”
Chi Square & Correlation
DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number.
BIVARIATE/MULTIVARIATE DESCRIPTIVE STATISTICS Displaying and analyzing the relationship between categorical variables.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent variable.
Research Designs. REVIEW Review -- research General types of research – Descriptive (“what”) – Exploratory (find out enough to ask “why”) – Explanatory.
Difference Between Means Test (“t” statistic) Analysis of Variance (F statistic)
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Chapter 6 – 1 Relationships Between Two Variables: Cross-Tabulation Independent and Dependent Variables Constructing a Bivariate Table Computing Percentages.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
ANOVA Knowledge Assessment 1. In what situation should you use ANOVA (the F stat) instead of doing a t test? 2. What information does the F statistic give.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
BUS 308 Entire Course (Ash Course) For more course tutorials visit BUS 308 Week 1 Assignment Problems 1.2, 1.17, 3.3 & 3.22 BUS 308.
Chi Square Procedures Chapter 14. Chi-Square Goodness-of-Fit Tests Section 14.1.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
SAMPLING. Basic concepts Why not measure everything? – Practical reason: Measuring every member of a population is too expensive or impractical – Mathematical.
SAMPLING Purposes Representativeness “Sampling error”
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Sampling Population: The overall group to which the research findings are intended to apply Sampling frame: A list that contains every “element” or.
SAMPLING Purposes Representativeness “Sampling error”
Chi-Square X2.
CATEGORICAL VARIABLES
Bi-variate #1 Cross-Tabulation
Difference Between Means Test (“t” statistic)
Inferential Statistics
Research Designs.
Chapter 10 Analyzing the Association Between Categorical Variables
CATEGORICAL VARIABLES
Testing hypotheses Continuous variables.
Analyzing the Association Between Categorical Variables
Testing hypotheses Continuous variables.
Presentation transcript:

CATEGORICAL VARIABLES Testing hypotheses using

When only one variable is being measured, we can display it. But we can’t answer why does this variable change? Why does homicide go up and down? That, in effect, is the research question To find out why? we begin with a literature review and identify variables (we call them “independent” variables) whose change may cause change in our variable of interest (the “dependent” variable, say, homicides, or officer’s disposition). – In other words, its value or level “depends” on the value or level of these other variables We then state this formally, in a hypothesis: changes in IV  changes in DV To assess a relationship, we can place both variables on a table, and the case frequencies in cells But without more, all this can suggest is association - that certain values of these variables coincide – Uncooperative students are arrested a lot, and cooperative students are seldom arrested To find out whether cooperation  disposition, we must take it further… Assessing relationships between categorical variables Officer’s disposition

Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high income – Income is measured by where a car is parked - student lot (low income) and faculty-staff lot (high income) Dependent variable: Car value, measured categorically (ordinal variable) – 1, 2, 3, 4 or 5 (1- cheapest, 5 - most expensive) Sampling – Stratified, disproportionate, systematic random sampling of 10 cars from a student lot, and 10 cars from a faculty lot Coding – Income is automatically coded by a car’s location (faculty-staff or student lot) – A 5-level categorical measure is used to code car values Hypothesis: Higher income persons drive more expensive cars

DV - Car value IV - Income12345n LOW (student lot) HIGH (F/S lot) Car value Student lot Car value Faculty/staff lot Team A For the purposes of this class, always place the values of the DV along the horizontal axis, and the values of the IV along the vertical axis Each value of the DV has its own column Each value of the IV has its own row Step 1: Coding Hypothesis: Higher income persons drive more expensive cars

DV - Car value IV - Income12345n LOW (student lot) HIGH (F/S lot) Team B Student lot Faculty/staff lot For the purposes of this class, always place the values of the DV along the horizontal axis, and the values of the IV along the vertical axis Each value of the DV has its own column Each value of the IV has its own row Step 1: Coding Hypothesis: Higher income persons drive more expensive cars

DV - Car value IV - Income12345% LOW (student lot)20% 0%30% 100% HIGH (F/S lot)30%10% 40%100% For accurate analysis, frequencies must be converted to percentages Convert each row separately so the cells add to 100 percent DV - Car value IV - Income12345% LOW (student lot)60%40%0% 100% HIGH (F/S lot)50%10%20% 0%100% Team A Team B DV - Car value IV - Income12345n LOW (student lot) HIGH (F/S lot) DV - Car value IV - Income12345n LOW (student lot) HIGH (F/S lot) Step 2: Percentaging

DV - Car value IV - Income12345% LOW (student lot)20% 0%30% 100% HIGH (F/S lot)30%10% 40%100% Switch values of the independent variable. Does the distribution of car values change? If so, is the difference in the predicted direction? DV - Car value IV - Income12345% LOW (student lot)60%40%0% 100% HIGH (F/S lot)50%10%20% 0%100% Team A Team B Step 3: Analysis Forty percent of the cars in the student lot are value 1 and 2. Same for the F/S lot. There are differences between rows in values 3-5, but they seem minimal. All the cars in the student lot are value 1 and 2. But forty percent of the cars in the F/S lot are value 3 and 4. As we “switch” values of the IV from low to high income, the proportion of expensive cars substantially increases. The direction of the effect is consistent with the hypothesis.

IV Poverty is measured by income, DV crime by arrests – Income has two values, low and high – Arrests has two values, never arrested and arrest record To test the hypothesis, switch from one category of the IV to the other. – Does the distribution of cases along the DV change substantially? – If so, is the change in the hypothesized direction? Another example Hypothesis: poverty  crime Never Arrested Arrest Record Low Income 80%20%100% High Income 20%80%100% Distribution flip-flops in an unexpected direction. High income persons seem much more likely to have an arrest record. The hypothesis is rejected. Distribution remains the same. There seems to be no connection between income and arrest record. The hypothesis is rejected. Never Arrested Arrest Record Low Income 50% 100% High Income 50% 100% Never Arrested Arrest Record Low Income 20%80%100% High Income 80%20%100% Distribution flip-flops in the expected direction. High income persons seem much less likely to have an arrest record. The hypothesis is confirmed.

Cranking it up a notch with “elaboration analysis” Hmmm, interesting! Sergeants are more stressed than patrol officers. But is it possible that another variable - one closely associated with position - either mediates the relationship with job stress or is the real driving force? In other words… Position on police force  other variable  job stress OR other variable  job stress position on police force Hypothesis: position on police force determines job stress Job Stress PositionLowHighn Sergeant Patrol officer Source: Fitzgerald Job Stress PositionLowHigh Sergeant33%67%100% Patrol officer78%22%100%

Elaboration analysis - using first-order partial tables to analyze the effect of a “control” variable So…what variables might be associated with position and with job stress? – Data indicates that females are less likely to be police supervisors. – The literature review also suggests that males and females may have different stress responses Let’s “elaborate” (dig deeper) – Does the effect of position on job stress hold regardless of gender? Gender is used as a “control” variable. We will test the original, “zero-order” relationship between position and job stress, “controlling” for each value of gender. – Gender is categorical, so we keep using tables Create one table just like the one we originally designed (position  job stress) for each value of control variable gender – One table for males, another for females – Each table is identical to the zero-order table, except it only includes cops of that gender These tables are called “first order partial tables” because they represent our first attempt to introduce a “control” variable. – Each table is “partial” - only part of the sample - because it only includes cases with a certain value of the control variable

Original “zero-order” tables First order partial tables - one for each value of the control variable Job Stress PositionLowHighn Sergeant Patrol officer Job Stress male officers PositionLowHighn Sergeant Patrol officer Job Stress - 70 female officers PositionLowHighn Sergeant Patrol officer Job Stress male officers PositionLowHigh Sergeant23%77%100% Patrol officer83%17%100% Job Stress - 70 female officers PositionLowHigh Sergeant53%47%100% Patrol officer70%30%100% Job Stress PositionLowHigh Sergeant33%67%100% Patrol officer78%22%100%

Zero-order table, all cops First-order partial table, male cops No, the percentages aren’t exactly the same. But, overall, the relationship in the first-order partial table is in the same direction as in the zero-order table, perhaps stronger. Most male sergeants report being highly stressed, and most male patrol officers report very low stress. Knowing that an officer is male is consistent with the hypothesis that higher position leads to more job stress. Does the zero-order relationship between position and job stress persist for males? Job Stress - Male officers PositionLowHigh Sergeant23%77%100% Patrol officer83%17%100% Job Stress PositionLowHigh Sergeant33%67%100% Patrol officer78%22%100%

OUTCOME: SPECIFICATION Knowing that officers were male didn’t change our opinion about the effects of position on job stress. So for male officers, the “zero-order” relationship between position and job stress holds. But knowing that officers were female gave us a new insight. Only 47% of female sergeants report being highly stressed, a far smaller proportion than 77% of male sergeants. So our opinion of the effects of position on job stress is moderated by one value of the control variable, female. Knowing that a supervisor is female tells us something we didn’t know. Does the zero-order relationship between position and job stress persist for females? Zero-order table, all cops Job Stress PositionLowHigh Sergeant33%67%100% Patrol officer78%22%100% Job Stress - Female officers PositionLowHigh Sergeant53%47%100% Patrol officer70%30%100% Job Stress - Male officers PositionLowHigh Sergeant23%77%100% Patrol officer83%17%100%

First-order partial analysis: three outcomes Doing a first-order partial analysis yields three possible interpretive outcomes: – Specification (prior example): The zero-order relationship persists for some but not all values of the new variable. Coding this variable teaches us something. – Replication (next example): The original relationship from the zero-order table persists at both values of the new variable. Coding for the new variable teaches us nothing. – Explanation (final example): The zero-order relationship is not present at any value of the new variable. The apparent effect of the original independent variable - the one in the hypothesis - has been completely “explained away.” We just covered specification. Let’s turn to the other two possible outcomes of elaboration analysis.

PRACTICAL EXERCISE Hypothesis: Higher rank  Less cynicism Sample of 100 officers and 100 supervisors ―Twenty officers scored low on cynicism; 80 were high cynicism ―Fifty supervisors scored low on cynicism; 50 were high cynicism Build a (zero-order) frequency table, then convert it to percentages Be sure to place the categories of the dependent variable in columns, and the categories of the independent variable in rows

Well, that was easy! Looks like the hypothesis (higher rank, less cynicism) is confirmed Of course, we can’t stop here. Many variables are floating around. Are there any that may be related to our main independent variable, rank, and which could possibly affect cynicism? Cynicism RankLowHighn Officers Supervisors Cynicism RankLowHighn Officers20%80%100% Supervisors50% 100% PRACTICAL EXERCISE Hypothesis: Higher rank  Less cynicism Zero-order tables

According to our literature review, a variable associated with rank – gender – may affect cynicism. Let’s “control” for gender. We get data on cynicism for officers and supervisors, broken down by gender: MALES Officers: 10 low cynicism, 50 high cynicism Supervisors: 35 low, 35 high FEMALES Officers: 10 low, 30 high Supervisors: 15 low, 15 high Create first-order partial tables for gender, convert tables to percentages, and analyze the results... PRACTICAL EXERCISE Hypothesis: Higher rank  Less cynicism

Original “zero-order” tables Cynicism RankLowHighn Officers Supervisors Cynicism RankLowHigh Officers20%80%100% Supervisors50% 100% Cynicism - Males RankLowHighn Officers Supervisors Cynicism - Males RankLowHigh Officers17%83%100% Supervisors50% 100% Cynicism - Females RankLowHighn Officers Supervisors Cynicism - Females RankLowHigh Officers25%75%100% Supervisors50% 100% OUTCOME: REPLICATION Each level of the control (“first order”) variable - male and female - yielded about the same findings as the zero- order table. Gender did not add anything to what we already knew about the relationship between rank and cynicism. In essence, we replicated the zero-order findings. In other words, gender is not a factor in rank  cynicism. Our original hypothesis remains confirmed. First-order partial tables

But the literature suggests that still another variable associated with rank – time on the job – may affect cynicism. Let’s “control” for time on the job. Here’s the data: LESS THAN FIVE YEARS ON THE JOB Officers: 0 low cynicism, 75 high cynicism Supervisors: 2 low cynicism, 40 high cynicism FIVE YEARS OR MORE ON THE JOB Officers: 20 low, 5 high Supervisors: 48 low, 10 high Create first-order partial tables for time on the job, convert tables to percentages, and analyze the results... PRACTICAL EXERCISE Hypothesis: Higher rank  Less cynicism

Original “zero-order” tables Cynicism RankLowHighn Officers Supervisors Cynicism RankLowHigh Officers20%80%100% Supervisors50% 100% Cynicism, <5 years RankLowHighn Officers075 Supervisors Cynicism, 5+ years RankLowHighn Officers20525 Supervisors Cynicism, <5 years RankLowHigh Officers0%100% Supervisors5%95%100% Cynicism, 5+ years RankLowHigh Officers80%20%100% Supervisors83%17%100% First-order partial tables OUTCOME: EXPLANATION Each level of control (“first order”) variable time on the job demonstrates a very strong relationship with cynicism. This completely “explains” (wipes away) what seemed to be a relationship between rank and cynicism. Our original hypothesis (rank  cynicism) must be rejected. Now we have a new hypothesis to test: time on the job  cynicism. Good luck!

Cynicism - Males RankLowHighn Officers Supervisors But isn’t this too “loosey-goosey”? Assume there is a relationship between variables. When we “switch” the value of the IV, will the change in the DV always be this obvious? No. And when the DV has multiple categories, such as in our parking lot exercise, visually discerning an effect can be impossible. Bottom line - changes in percentage are not enough. Great. Now what? Fortunately, we can use the cell frequencies to calculate a statistic known as “Chi-square”, X 2. This statistic assigns a numerical measure to the relationship between variables. We then look up that number in a table to determine if it is large enough to be statistically “significant.” All we need is the original frequency table? We use the table to build a second table, which projects what the frequencies would be if there was NO relationship between variables. We then compare the two frequency tables. More on that during the third part of the semester! Cynicism - Males RankLowHigh Officers17%83%100% Supervisors50% 100%