© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 1 What types of data are collected? “Categorical” Data “Continuous”

Slides:



Advertisements
Similar presentations
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Advertisements

CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
CHAPTER 11 Inference for Distributions of Categorical Data
1 Practicals, Methodology & Statistics II Laura McAvinue School of Psychology Trinity College Dublin.
Chapter 9 Hypothesis Testing.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 15 The.
Hypothesis Testing:.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Testing Hypotheses Tuesday, October 28. Objectives: Understand the logic of hypothesis testing and following related concepts Sidedness of a test (left-,
Copyright © 2010 Pearson Education, Inc. Slide
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Chapter 8 Introduction to Hypothesis Testing
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Two Way Tables and the Chi-Square Test ● Here we study relationships between two categorical variables. – The data can be displayed in a two way table.
Copyright © 2010 Pearson Education, Inc. Slide
Confidence intervals are one of the two most common types of statistical inference. Use a confidence interval when your goal is to estimate a population.
Day 3: Sampling Distributions. CCSS.Math.Content.HSS-IC.A.1 Understand statistics as a process for making inferences about population parameters based.
CHAPTER 17: Tests of Significance: The Basics
Chapter 11: Inference for Distributions of Categorical Data Section 11.1 Chi-Square Goodness-of-Fit Tests.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Sampling Distribution Models.
11/16/2015Slide 1 We will use a two-sample test of proportions to test whether or not there are group differences in the proportions of cases that have.
Chapter 21: More About Test & Intervals
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Testing Hypotheses about a Population Proportion Lecture 31 Sections 9.1 – 9.3 Wed, Mar 22, 2006.
A significance test or hypothesis test is a procedure for comparing our data with a hypothesis whose truth we want to assess. The hypothesis is usually.
Chi-Square Analyses.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll finish.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
What Types Of Data Are Collected? What Kinds Of Question Can Be Asked Of Those Data?  Do people who say they study for more hours also think they’ll.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Copyright © 2009 Pearson Education, Inc. 9.2 Hypothesis Tests for Population Means LEARNING GOAL Understand and interpret one- and two-tailed hypothesis.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 1 S010Y: Answering Questions with Quantitative Data Class 4: II.2.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Topics
Topics
Hypothesis Testing Review
CHAPTER 11 Inference for Distributions of Categorical Data
Simulation-Based Approach for Comparing Two Means
CHAPTER 11 Inference for Distributions of Categorical Data
Chapter 10 Analyzing the Association Between Categorical Variables
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHI SQUARE (χ2) Dangerous Curves Ahead!.
Presentation transcript:

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 1 What types of data are collected? “Categorical” Data “Continuous” Data What kinds of question can be asked of those data? “Descriptive” Questions How many members of the class are women? What proportion of the class is fulltime? …. ? How tall are class members, on average? How many hours a week do class members report that they study? …. ? “Relational” Questions Are men more likely to study part-time? Are women more likely to enroll in USP? …. ? Do people who say they study for more hours think they’ll finish their doctorate earlier? Are computer literates less anxious about statistics? …. ? Good research is a partnership of questions and data S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 2 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables We’re trying to address the following research question: Is there a greater probability that a convicted murderer will be sentenced to death, in Georgia, if he kills someone Black, or if he kills someone White? We’re trying to address the following research question: Is there a greater probability that a convicted murderer will be sentenced to death, in Georgia, if he kills someone Black, or if he kills someone White? (2475 cases) (2475 cases) And, as we’ve seen, this question can be addressed in the DEATHPEN dataset, by asking whether categorical variable DEATH is related to categorical variable RVICTIM, in the sample of convicted murderers. In other words, we are being asked whether the values in the DEATH column correspond to the values in the RVICTIM column in some meaningful way? Our approach: Display the sample relationship between DEATH and RVICTIM in a “two-way contingency table.” Describe their sample relationship with suitable sample percentages. Summarize their sample relationship using a Pearson Chi- square (  2) statistic. ? Use statistical inference to carry out a statistical test? Interpret and tell the story (especially to Justice Powell). Our approach: Display the sample relationship between DEATH and RVICTIM in a “two-way contingency table.” Describe their sample relationship with suitable sample percentages. Summarize their sample relationship using a Pearson Chi- square (  2) statistic. ? Use statistical inference to carry out a statistical test? Interpret and tell the story (especially to Justice Powell).

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 3 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables So far, we’ve begun by displaying and describing the sample relationship in a two-way contingency table: Frequencies we have observed in the sample … Our prior estimation of slice-by-slice percentages in this block chart has described the sample relationship between DEATH and RVICTIM in these data. This descriptive analysis suggests that knowing the value of RVICTIM does indeed help you predict the value of DEATH, in the sample, and so perhaps we might legitimately conclude that DEATH and RVICTIM are “related.” For instance,  When the victim was Black, 1.33% of defendants were sentenced to death.  When the victim was White, 11.1% of the defendants were sentenced to death.. So, the percentage of our sample of convicted murderers who were sentenced to death in Georgia after killing a White victim was 8.33 times the percentage of convicted murderers who were sentenced to death after killing a Black victim. For instance,  When the victim was Black, 1.33% of defendants were sentenced to death.  When the victim was White, 11.1% of the defendants were sentenced to death.. So, the percentage of our sample of convicted murderers who were sentenced to death in Georgia after killing a White victim was 8.33 times the percentage of convicted murderers who were sentenced to death after killing a Black victim.

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 4 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables To help summarize the sample relationship between DEATH &RVICTIM more fully, we imagined what the sample might look like if there were no relationship between the two variables, as follows: And then we asked ourselves how different the observed and expected tables of sample frequencies are?  If the “observed” and “expected” contingency tables seem very similar, we might be tempted to conclude that we have not observed much of a relationship between DEATH & RVICTIM, or that it is even zero??  If the “observed” and “expected” contingency tables seem very different from each other, we might be tempted to say that a relationship does indeed exist between the variables, and may be quite strong???? And then we asked ourselves how different the observed and expected tables of sample frequencies are?  If the “observed” and “expected” contingency tables seem very similar, we might be tempted to conclude that we have not observed much of a relationship between DEATH & RVICTIM, or that it is even zero??  If the “observed” and “expected” contingency tables seem very different from each other, we might be tempted to say that a relationship does indeed exist between the variables, and may be quite strong????

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 5 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables To help us in our quest to computerize this process, we summarized the net discrepancy between the tables of observed and expected frequencies by estimating a single number index... It was called the Pearson  2 statistic :

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 6 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables Key Issues:  What is “big”?  What is “close to zero”?  Is 115 big or close to zero? Key Issues:  What is “big”?  What is “close to zero”?  Is 115 big or close to zero? “If  2 is big, then declare that there is a relationship between DEATH and RVICTIM” “If  2 is zero, or close to zero, then declare there is no relationship between DEATH and RVICTIM” Decision Rule???

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 7 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables To respond to these issues we must step back … and think more broadly about the nature of the problem we’re facing… First, let’s re-assess where we are …  All we’ve done so far is putter around in some data on a sample of convicted murderers.  But. out there, somewhere, there’s a larger population of convicted murderers from which our sample was drawn (somehow).  Is there something about our “sampling from a population” that could resolve our problem?  And, wouldn’t our conclusions be more compelling if there was some way to generalize our sample conclusions about the DEATH- RVICTIM relationship back to the underlying population. First, let’s re-assess where we are …  All we’ve done so far is putter around in some data on a sample of convicted murderers.  But. out there, somewhere, there’s a larger population of convicted murderers from which our sample was drawn (somehow).  Is there something about our “sampling from a population” that could resolve our problem?  And, wouldn’t our conclusions be more compelling if there was some way to generalize our sample conclusions about the DEATH- RVICTIM relationship back to the underlying population. This is called statistical inference and it is the critical contribution of quantitative methods to research! This is called statistical inference and it is the critical contribution of quantitative methods to research!

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 8 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables For instance, is the following scenario plausible?  There is really no relationship between DEATH and RVICTIM in the population.  But, by accident, we have drawn an idiosyncratic sample from the population.  This “sampling idiosyncrasy” has ended up giving us a  2 statistic that is as large as 115 purely by accident. For instance, is the following scenario plausible?  There is really no relationship between DEATH and RVICTIM in the population.  But, by accident, we have drawn an idiosyncratic sample from the population.  This “sampling idiosyncrasy” has ended up giving us a  2 statistic that is as large as 115 purely by accident. Of course, when you generalize from a sample back to its underlying population, you must be careful that your sole original empirical study has not been the victim of sampling idiosyncrasy!!! How Can We Assess The Plausibility Of This Scenario? If this were plausible, you wouldn’t want to claim a relationship between DEATH and RVICTIM despite the sample evidence to the contrary!!

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 9 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables Hypothetical Scenario … Imagine we draw samples of 2475 cases repeatedly from a hypothetical population of convicted murderers in which there is no relationship between DEATH and RVICTIM, and we go ahead and estimate the  2 statistic for each of these drawings, using our usual methods … Hypothetical “Null Population” in which: H 0 : DEATH & RVICTIM are not related Hypothetical “Null Population” in which: H 0 : DEATH & RVICTIM are not related Sample #1,  2 = 3.2 Sample #2,  2 = 0.3 Sample #3,  2 = 17.4 Etc.

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 10 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables  2 statisticvertical histogram In this hypothetical “repeated sampling from a null population” scenario, I could record all the values of the  2 statistic that occurred by accident in a vertical histogram … What if it looked like this? Frequency of each accidental value of the  2 Statistic Accidental value of the  2 Statistic Accidental value of the  2 Statistic Histogram summarizes the “natural variation” that could occur in a Pearson  2 statistic as a result of sampling idiosyncrasy, after drawing repeated samples from a hypothetical population in which there is no relationship between DEATH and RVICTIM. it would provide a context for deciding whether our sole “empirical” value of the  2 statistic – equals was big or small!!! If such a histogram were available, it would provide a context for deciding whether our sole “empirical” value of the  2 statistic – equals was big or small!!!

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 11 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables Frequency of each accidental value of the  2 Statistic Accidental value of the  2 Statistic Accidental value of the  2 Statistic If This Were The Histogram That Could Be Obtained By Sampling Idiosyncrasy, What Would You Think?  It seems highly unlikely that we could have obtained a value of the Pearson  2 statistic as large as 115, in our actual empirical analysis, if we had been sampling from a null population in which there was no relationship between DEATH and RVICTIM!  So, we can reject the null hypothesis that there is no relationship between DEATH and RVICTIM, in the population, and conclude that there really is a relationship between the two variables!! If This Were The Histogram That Could Be Obtained By Sampling Idiosyncrasy, What Would You Think?  It seems highly unlikely that we could have obtained a value of the Pearson  2 statistic as large as 115, in our actual empirical analysis, if we had been sampling from a null population in which there was no relationship between DEATH and RVICTIM!  So, we can reject the null hypothesis that there is no relationship between DEATH and RVICTIM, in the population, and conclude that there really is a relationship between the two variables!!

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 12 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables Actually, for you to reach a conclusion, I wouldn’t really even have to show you the entire vertical histogram … I could just tell you one of the two following alternatives … Frequency of each accidental value of the  2 Statistic Accidental value of the  2 Statistic Accidental value of the  2 Statistic In fact, I really only need to tell you one of these … so, I choose to tell you this:  “In repeated sampling from a null population, we’d expect the proportion of all values of the Pearson  2 statistic that could be equal to, or greater than, 115 by an accident of sampling, to be.0001” In fact, I really only need to tell you one of these … so, I choose to tell you this:  “In repeated sampling from a null population, we’d expect the proportion of all values of the Pearson  2 statistic that could be equal to, or greater than, 115 by an accident of sampling, to be.0001” “Hey, in a hypothetical exercise of sampling repeatedly from a null population, of all accidental values of the  2 statistic fall to the left of a value of 115, mate!!!” Or, … “Hey, in a hypothetical exercise of sampling repeatedly from a null population, only of all accidental values of the  2 statistic fall to the right of a value of 115, mate!!! We call this proportion, the “p-value,” and it can be obtained by computer simulation, or from tables.

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 13 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables At what p-value do you cease to believe that the single value of the  2 statistic that you obtained in your actual empirical research was “big” (i.e., was unlikely to have occurred by accident)? Sole Value of Your Statistic Sole Value of Your Statistic.0001 Sole Value of Your Statistic Sole Value of Your Statistic.001 Sole Value of Your Statistic Sole Value of Your Statistic.01 Sole Value of Your Statistic Sole Value of Your Statistic.05 Sole Value of Your Statistic Sole Value of Your Statistic.10 Sole Value of Your Statistic Sole Value of Your Statistic.25 Sole Value of Your Statistic Sole Value of Your Statistic.50

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 14 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables Of course, we can’t actually do all this random re-sampling from a hypothetical null population … but we can get the computer to simulate it and tell us what it finds … it’s in Class 5/Handout 1 OPTIONS Nodate Pageno=1; TITLE1 'A010Y: Answering Questions with Quantitative Data'; TITLE2 'Class 5/Handout 1: Introducing the Notion of Statistical Inference'; TITLE3 'Death penalty and race bias in Georgia'; TITLE4 'Data in DEATHPEN.txt'; * * Input data, name and label variables in dataset * *; DATA DEATHPEN; INFILE 'C:\DATA\A010Y\DEATHPEN.txt'; INPUT DEATH RDEFEND RVICTIM; LABEL DEATH = 'Sentenced to death?' RDEFEND = 'Race of defendant' RVICTIM = 'Race of victim'; * * Format labels for values of categorical variables * *; PROC FORMAT; VALUE DFMT0 = 'No'1 = 'Yes'; VALUE RFMT1 = 'Black'2 = 'White'; * * Summarizing the relationship between DEATH and RVICTIM * *; PROC FREQ DATA=DEATHPEN; TITLE5 'Using a p-value to Test the Relationship Between DEATH and RVICTIM'; FORMAT DEATH DFMT. RVICTIM RFMT.; TABLES DEATH*RVICTIM / EXPECTED DEVIATION CELLCHI2 CHISQ NOCOL NOROW NOPERCENT; RUN; OPTIONS Nodate Pageno=1; TITLE1 'A010Y: Answering Questions with Quantitative Data'; TITLE2 'Class 5/Handout 1: Introducing the Notion of Statistical Inference'; TITLE3 'Death penalty and race bias in Georgia'; TITLE4 'Data in DEATHPEN.txt'; * * Input data, name and label variables in dataset * *; DATA DEATHPEN; INFILE 'C:\DATA\A010Y\DEATHPEN.txt'; INPUT DEATH RDEFEND RVICTIM; LABEL DEATH = 'Sentenced to death?' RDEFEND = 'Race of defendant' RVICTIM = 'Race of victim'; * * Format labels for values of categorical variables * *; PROC FORMAT; VALUE DFMT0 = 'No'1 = 'Yes'; VALUE RFMT1 = 'Black'2 = 'White'; * * Summarizing the relationship between DEATH and RVICTIM * *; PROC FREQ DATA=DEATHPEN; TITLE5 'Using a p-value to Test the Relationship Between DEATH and RVICTIM'; FORMAT DEATH DFMT. RVICTIM RFMT.; TABLES DEATH*RVICTIM / EXPECTED DEVIATION CELLCHI2 CHISQ NOCOL NOROW NOPERCENT; RUN; This is the usual titling, data input, labeling and formatting that you have seen several times – it should be getting quite familiar by now Next page..

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 15 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables * * Summarizing the relationship between DEATH and RVICTIM * *; PROC FREQ DATA=DEATHPEN; TITLE5 'Using a p-value to Test the Relationship Between DEATH and RVICTIM'; FORMAT DEATH DFMT. RVICTIM RFMT.; TABLES DEATH*RVICTIM / EXPECTED DEVIATION CELLCHI2 CHISQ NOCOL NOROW NOPERCENT; RUN; * * Summarizing the relationship between DEATH and RVICTIM * *; PROC FREQ DATA=DEATHPEN; TITLE5 'Using a p-value to Test the Relationship Between DEATH and RVICTIM'; FORMAT DEATH DFMT. RVICTIM RFMT.; TABLES DEATH*RVICTIM / EXPECTED DEVIATION CELLCHI2 CHISQ NOCOL NOROW NOPERCENT; RUN; PC_SAS uses the PROC FREQ procedure to carry out standard contingency table analyses. The TABLES command requests a contingency table of DEATH by RVICTIM. The CHISQ option requests the estimation of the  2 statistic. The CELLCHISQ option requests the computation of the bit of the overall  2 statistic that is contributed by each cell in the contingency table. The DEVIATION option requests the computation of the difference between the observed and expected frequencies. The EXPECTED option requests the computation of the expected frequencies.

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 16 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables Here’s the observed frequency in the cell Here’s the expected frequency in the cell Here’s the observed frequency minus the expected frequency in the cell Here’s the cell’s contribution to the  2 statistic Here the  2 statistic, Here’s the p-value, <.0001 accident of sampling Because the p-value is less than.05 (representing a 5% chance of getting this a  2 statistic this large by an accident of sampling from a null population), we can conclude that DEATH and RVICTIM are probably related in the actual population of convicted murderers in Georgia …

© Willett, Harvard University Graduate School of Education, 1/17/2016S010Y/C05 – Slide 17 S010Y: Answering Questions with Quantitative Data Class 5/II.2: Examining the Relationship Between Categorical Variables 1.State A research question: Is imposition of the death penalty related to the race of the victim in the population of convicted murderers in Georgia? 2.Display and describe the observed data: use a block chart and sample frequencies. 3.Summarize the observed data in A contingency table: find the observed frequencies, figure out expected frequencies, estimate the  2 statistic. 4.Estimate the p-value: figure out how likely it is that you could’ve obtained a value of the  2 statistic equal to, or greater than, the observed value by an accident of sampling from a population in which the null hypothesis (H 0 : DEATH & RVICTIM are not related in the population) is true. 5.If your p-value is less than.05 (.01?.10?), reject the null hypothesis and conclude that there really is a relationship between DEATH and RVICTIM in the population – i.e., that you are confident your finding is not a consequence of idiosyncratic sampling. 6.Interpret your findings in words drawing explicitly on your plots, summary statistics, and test statistics, for a naïve but intelligent audience to read. 1.State A research question: Is imposition of the death penalty related to the race of the victim in the population of convicted murderers in Georgia? 2.Display and describe the observed data: use a block chart and sample frequencies. 3.Summarize the observed data in A contingency table: find the observed frequencies, figure out expected frequencies, estimate the  2 statistic. 4.Estimate the p-value: figure out how likely it is that you could’ve obtained a value of the  2 statistic equal to, or greater than, the observed value by an accident of sampling from a population in which the null hypothesis (H 0 : DEATH & RVICTIM are not related in the population) is true. 5.If your p-value is less than.05 (.01?.10?), reject the null hypothesis and conclude that there really is a relationship between DEATH and RVICTIM in the population – i.e., that you are confident your finding is not a consequence of idiosyncratic sampling. 6.Interpret your findings in words drawing explicitly on your plots, summary statistics, and test statistics, for a naïve but intelligent audience to read. “In the population of convicted murderers in Georgia, capital sentencing and race of victim are related (  2 = 115, p <.0001). The percentage of convicted murderers who were sentenced to death after killing a White victim was more than 8 times the percentage of convicted murderers who were sentenced to death after killing a Black victim. In the block chart in Figure 1, notice that … etc.” p.s. Make sure the Supreme Court gets the memo! So, there it is … Statistical Inference … in several steps: