Bivariate Relationships Chapter 5 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using.

Slides:

Advertisements

Similar presentations

Richard M. Jacobs, OSA, Ph.D.

Advertisements

Simple Linear Regression Chapter 6 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using.

CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.

An Integrated Approach to Teaching with Real Data Joint Mathematics Meetings, January 2005 MAA Contributed Paper Session Using Real-World Data to Illustrate.

Correlation and Linear Regression.

1 Using SPSS: Descriptive Statistics Department of Operations Weatherhead School of Management.

CORRELATION. Overview of Correlation u What is a Correlation? u Correlation Coefficients u Coefficient of Determination u Test for Significance u Correlation.

Determining and Interpreting Associations Among Variables.

Chapter18 Determining and Interpreting Associations Among Variables.

Project #3 by Daiva Kuncaite Problem 31 (p. 190)

Chapter Eighteen MEASURES OF ASSOCIATION

Session 7.1 Bivariate Data Analysis

PSY 307 – Statistics for the Behavioral Sciences

Describing Relationships: Scatterplots and Correlation

Summary of Quantitative Analysis Neuman and Robson Ch. 11

Introduction to Excel 2007 Part 1: Basics and Descriptive Statistics Psych 209.

Re-Expressing Variables

Problem 1: Relationship between Two Variables-1 (1)

Bivariate Relationships Chapter 5 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using.

Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,

Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India

8/10/2015Slide 1 The relationship between two quantitative variables is pictured with a scatterplot. The dependent variable is plotted on the vertical.

Correlation and Regression A BRIEF overview Correlation Coefficients l Continuous IV & DV l or dichotomous variables (code as 0-1) n mean interpreted.

Analyzing Data: Bivariate Relationships Chapter 7.

Slide 1 SOLVING THE HOMEWORK PROBLEMS Simple linear regression is an appropriate model of the relationship between two quantitative variables provided.

Examining Univariate Distributions Chapter 2 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using.

Understanding Research Results

Joint Distributions AND CORRELATION Coefficients (Part 3)

LIS 570 Summarising and presenting data - Univariate analysis continued Bivariate analysis.

How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.

Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.

Class Meeting #11 Data Analysis. Types of Statistics Descriptive Statistics used to describe things, frequently groups of people.  Central Tendency 

Smith/Davis (c) 2005 Prentice Hall Chapter Four Basic Statistical Concepts, Frequency Tables, Graphs, Frequency Distributions, and Measures of Central.

Bivariate Correlation Lesson 11. Measuring Relationships n Correlation l degree relationship b/n 2 variables l linear predictive relationship n Covariance.

Statistics in Applied Science and Technology Chapter 13, Correlation and Regression Part I, Correlation (Measure of Association)

Experimental Research Methods in Language Learning Chapter 11 Correlational Analysis.

1 GE5 Tutorial 4 rules of engagement no computer or no power → no lessonno computer or no power → no lesson no SPSS → no lessonno SPSS → no lesson no.

METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.

Basic Statistics Correlation Var Relationships Associations.

Descriptive Statistics

Slide 1 SOLVING THE HOMEWORK PROBLEMS Pearson's r correlation coefficient measures the strength of the linear relationship between the distributions of.

Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.

By: Amani Albraikan.  Pearson r  Spearman rho  Linearity  Range restrictions  Outliers  Beware of spurious correlations….take care in interpretation.

DESCRIPTIVE STATISTICS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.

11/4/2015Slide 1 SOLVING THE PROBLEM Simple linear regression is an appropriate model of the relationship between two quantitative variables provided the.

Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.

Slide 1 The introductory statement in the question indicates: The data set to use (2001WorldFactBook) The task to accomplish (association between variables)

Describing Relationships Using Correlations. 2 More Statistical Notation Correlational analysis requires scores from two variables. X stands for the scores.

Chapter 7 Scatterplots, Association, and Correlation.

Type author names here Social Research Methods Chapter 16: Using IBM SPSS for Windows (part 2) Alan Bryman Slides authored by Tom Owens.

What Do You See?. A scatterplot is a graphic tool used to display the relationship between two quantitative variables. How to Read a Scatterplot A scatterplot.

Educational Research: Data analysis and interpretation – 1 Descriptive statistics EDU 8603 Educational Research Richard M. Jacobs, OSA, Ph.D.

Chapter 15: Correlation. Correlations: Measuring and Describing Relationships A correlation is a statistical method used to measure and describe the relationship.

Educational Research Descriptive Statistics Chapter th edition Chapter th edition Gay and Airasian.

Determining and Interpreting Associations between Variables Cross-Tabs Chi-Square Correlation.

Theme 5. Association 1. Introduction. 2. Bivariate tables and graphs.

Determining and Interpreting Associations Among Variables

Bivariate Relationships

Making Use of Associations Tests

Multiple Regression.

Correlation and Regression

Understanding Research Results: Description and Correlation

Ch. 11: Quantifying and Interpreting Relationships Among Variables

Summarising and presenting data - Bivariate analysis

M248: Analyzing data Block D.

Correlation and the Pearson r

Making Use of Associations Tests

Bivariate Correlation

Presentation transcript:

Bivariate Relationships Chapter 5 SHARON LAWNER WEINBERG SARAH KNAPP ABRAMOWITZ StatisticsSPSS An Integrative Approach SECOND EDITION Using

Summarizing the Relationship Between Two Variables: An Overview Variable TypesSummary GraphicSummary Statistic Both ScaleScatterplotPearson Correlation Both OrdinalScatterplotSpearman Correlation An Ordinal & A Scale ScatterplotSpearman Correlation A Scale & A Dichotomy Scatterplot or BoxplotPearson (point biserial) Correlation Both DichotomiesClustered Bar GraphPearson (phi-coefficient) Correlation or Contingency Table

The Relationship Between Two Scale Variables What the Scatterplot Tells Us Whether the relationship appears linear If it does appear linear, it also tells us: The direction and nature of the linear relationship The relative strength of the linear relationship

Overview: Examples Using the Scatterplot to Describe the Relationship Between Two Scale Variables Hamburg data set: FAT and the CALORIES. States data set: PERTAK (percentage of eligible students taking the SAT) and SATV (average verbal SAT score for the state). Currency data set: BILLVALUE (bill denomination) and the number of bills in circulation. Marijuana data set: YEAR and the percentage of students reporting that they ever smoked marijuana from

Creating the Scatterplot Example: Using the Hamburg data set, describe the relationship between the FAT and the CALORIES of a burger. Solution: To obtain the scatterplot between FAT and CALORIES for the Hamburg data set, using SPSS, go to Graphs on the main menu bar, Legacy Dialogs, and then Scatter. Click Define. Put CALORIES into the box labeled y-axis and FAT into the box labeled x-axis and click OK.

Scatterplot: FAT vs. CALORIES

Interpreting the Scatterplot of FAT vs. CALORIES A line appears to fit the data well; i.e., there is not a simple curve that would provide a better fit, so a linear model is appropriate. The direction of the linear relationship is positive because the slope of the line representing the data is positive. The nature of the linear relationship is that burgers that are relatively high in fat tend also to be relatively high in calories. The strength of the linear relationship appears to be strong because the points cluster tightly around the line.

Editing the Scatterplot to Label Points Go to Graphs on the main menu bar, Legacy Dialogs, and then Scatter. Click Define. Put CALORIES into the box labeled y-axis, FAT into the box labeled x-axis, and NAME in the box labeled label cases by and click OK. Double click on the graph to put it in the Chart Editor. Click on Elements, Show Data Labels. Move Name to the Displayed box and eliminate count. Click Apply, Close.

Labeled Scatterplot: FAT vs CALORIES

Scatterplot: PERTAK vs. SATV

Interpreting the Scatterplot of SATV vs. PERTAK Although the points have a curvilinear shape, a line would appear to represent these points reasonably well, and so we will use it in this case. The direction the linear relationship is negative because the slope of the line representing the data is negative. The nature is that states with a relatively low percentage of students taking the SAT tend to have higher SAT Verbal scores, on average. The strength of the linear relationship is more moderate than for the hamburger example because the points in this case do not cluster as tightly around the line.

Scatterplot: Denomination (BILLVALUE) vs. number of bills in circulation. Note: Use Transform, Compute to combine variables to create a variable for the number of bills in circulation.

Interpreting the Scatterplot of BILLVALUE vs. NUMBER Because the points have a “cloud like” formation, neither a simple curve nor a line is a good fit for these data. We conclude that there is little or no relationship between the bill value and the number in circulation.

Scatterplot: Year vs. percentage of high school seniors reporting that they smoked marijuana at least once: Note: Use Select Cases to restrict to the appropriate years.

Interpreting the Scatterplot of YEAR vs. MARIJUANA A simple curve (or two lines) provides a better fit for the data than a single line and is therefore more appropriate than a line for modeling the data. The relationship between marijuana use and year is non-linear.

Quantifying the Linear Relationship between Two Scale Variables: Pearson Product Moment Correlation Coefficient Often called, simply, correlation, and symbolized by the letter r. Before calculating, use a scatterplot to verify that the relationship between the variables appears to be linear. Calculated as the average of the product of the z-scores. This summary statistic measures the direction, nature, and strength of the linear relationship. Direction: Look at sign of r (positive or negative) Nature: Look at sign of r (positive means that high scores on one variable correspond to high scores on the other and low with low, negative means that low scores on one variable correspond to high on the other and vice versa) Strength: Look at magnitude (absolute value) of r. In the social sciences, a good rule of thumb comes from Cohen’s scale: r =.5 strong

Obtaining the Pearson Correlation Using SPSS To use SPSS to obtain the correlation coefficient between CALORIES and FAT, click Analyze on the Main Menu Bar, Correlate, and Bivariate. Move the two variables, CALORIES and FAT, into the Variables box and click OK.

Interpreting the Pearson Correlation Coefficients The correlation between FAT and CALORIES is.997 indicating a very strong positive linear relationship: burgers that are relatively high in fat tend also to be relatively high in calories and burgers that are relatively low in fat tend also to be relatively low in calories. The correlation between SAT Verbal and the percentage of students taking the SAT is -.86 indicating a strong negative linear relationship: states that have a relatively high verbal SAT average tend to have a relatively low percentage of students taking the SAT and states that have a relatively low verbal SAT average tend to have a relatively high percentage of students taking the SAT.

Other Properties of Correlation The strength of the correlation is measured on an ordinal scale Correlation does not imply causation, i.e. when two variables are correlated it is not necessarily true that changing one will result in a predictable change in the other A linear transformation applied to one variable does not change the magnitude of the correlation. The sign of the correlation will change, however, if the transformation involves multiplication by a negative number Restricting the range of one of the variables can increase or decrease the magnitude of the correlation

Relationships between Two Ordinal or One Ordinal and One Scale: Scatterplot and Spearman Rank Correlation Coefficient The Spearman correlation, called Spearman’s rho, is a special case of the Pearson correlation computed on ranked data. Example: Describe the relationship, or indicate that there is not one, between the amount of time spent in school on homework (HWKIN12) and the amount of time spent out of school on homework (HWKOUT12) in twelfth grade for students in the NELS data set.

Scatterplot: HWKIN12 and HWKOUT12

Obtaining the Spearman Rank Correlation Coefficient Using SPSS Click Analyze, Correlate, Bivariate. Move the variables HWKIN12 and HWKOUT12 into the Variables box. Click Spearman and click off Pearson in the Correlation Coefficients box. Click OK. Note that when using SPSS, we do not need to transform the data to rankings to obtain the Spearman correlation coefficient. SPSS does this transformation for us.

Interpreting the Spearman Rank Correlation Coefficient The Spearman correlation is interpreted in the same way as the Pearson correlation. In this case, Spearman’s rho =.40, indicating a moderate positive relationship. Twelfth grade students in the NELS data set who spend a relatively large amount of time doing homework in school also spend a relatively large amount of time doing homework outside of school and students who spend a relatively small amount of time doing homework in school tend also to spend a relatively small amount of time doing homework outside of school.

Relationships between One Scale and One Dichotomous Variable Example using the Hamburg data set: Describe the relationship between calories and cheese.

Interpreting the Correlation When One Variable is Scale and One is Dichotomous The correlation between CALORIES and CHEESE is r =.51. The correlation is positive indicating that high scores on one variable are associated with high scores on the other. CHEESE is coded with 0 (a relatively low score) representing the absence of cheese and 1 (a relatively high score) representing the presence of cheese. Burgers with cheese tend to be higher in calories than those without cheese. This special case of Pearson correlation is sometimes called the point biserial correlation.

Description of the Impeach Data Set On February 12, 1999, for only the second time in the nation’s history, the U.S. Senate voted on whether to remove a President, based on impeachment articles passed by the U.S. House. Dozens of political talk shows featured analyses of why senators may have voted the way they did, but such discourse was rarely (if ever) informed by systematic statistical analysis of the votes. Professor Alan Reifman of Texas Tech University created this data set about the senators to be used as part of such an analysis. The relevant variable descriptions appear in the following table.

Variables in the Impeach Data Set

Scatterplot Example: Describe the relationship between conservatism score and the vote on perjury

Interpreting the Correlation between Senators’ Conservatism and Their Vote on Perjury The correlation between VOTE1 and conservatism is r =.87, indicating a strong relationship between the two variables. The sign of the correlation is positive, so high scores on one variable are associated with high scores on the other. VOTE1 is coded with 0 (a relatively low score) representing not guilty and 1 (a relatively high score) representing guilty. Senators who are more conservative tended to vote guilty on perjury.

Scatterplot Example: Describe the relationship between conservatism score and the vote on obstruction of justice

Interpreting the Correlation between Senators’ Conservatism and Their Vote on Obstruction of Justice The correlation between VOTE2 and conservatism is r =.94, indicating a strong relationship between the two variables and a stronger relationship than that between VOTE1 and conservatism. The sign of the correlation is positive, so high scores on one variable are associated with high scores on the other. VOTE2 is coded with 0 (a relatively low score) representing not guilty and 1 (a relatively high score) representing guilty. Senators who are more conservative tended to vote guilty on obstruction of justice.

Relationships between Two Dichotomous Variables Example: Is there a relationship between whether or not the senator is first-term and his or her vote on perjury? Solutions via: Clustered bar graph Pearson Crosstabulation

Using SPSS to Obtain a Clustered Bar Graph Click Graphs on the main menu bar, Legacy Dialogs, and Bar. Change from Simple to Clustered and click Define. Put VOTE1 in the Category Axis box and NEWBIE in the Define Clusters By box. Click OK.

Clustered Bar Graph

Using SPSS to Obtain the Contingency Table To obtain the frequencies of each of the four cells (a contingency table or cross-tabulation), click Analyze on the main menu bar, Descriptive Statistics, Crosstabs. Put VOTE1 in the Row(s) box and NEWBIE in the Column(s) box. Click OK.

Contingency Table

Contingency Table Analysis First term senators tended to vote guilty and more established senators tended to vote not guilty. Any of the following alternatives may be used to provide statistical support: Approximately 62.9 percent (39/62*100) of the non-first term senators voted not guilty whereas 42.1 percent (16/38*100) of the first term senators voted not guilty. Approximately 37.1 percent (23/62*100) of the non-first term senators voted guilty whereas 57.9 percent (22/38*100) of the first term senators voted guilty. Approximately 70.9 percent (39/55*100) of the not guilty votes came from non-first term senators whereas 51.1 percent (23/45*100) of the guilty votes came from non-first term senators. Approximately 29.1 percent (16/55*100) of the not guilty votes came from first term senators whereas 48.9 percent (22/45*100) of the guilty votes came from first term senators.

Correlation Analysis The correlation between VOTE1 and NEWBIE is r =.20. The sign of the correlation is positive, so high scores on one variable are associated with high scores on the other. VOTE2 is coded with 0 (a relatively low score) representing not guilty and 1 (a relatively high score) representing guilty. NEWBIE is coded with 0 representing non-first term and 1 representing first term. First term senators tended to vote guilty on perjury and more established senators tended to vote not guilty. This special case of Pearson correlation is sometimes called the phi coefficient.

Relationships between Other Variable Types Nominal non-dichotomous or ordinal with fewer than about five categories by dichotomous. Example: Are there regional differences in how the senators tended to vote on obstruction of justice? Nominal non-dichotomous or ordinal with fewer than about five categories by scale. Example: Are there regional differences in the typical conservatism score of the senators?

Clustered Bar Graph: Graphically Representing Vote on Obstruction vs Region

Contingency Table: Tabulating Vote on Obstruction of Justice by Region

Contingency Table Analysis Senators from the northeast tended to vote not guilty, while those from the south and west tended to vote guilty and those from the midwest were equally likely to vote guilty or not guilty. In particular, approximately 83.3 percent (15/18*100) of the senators from the northeast voted not guilty whereas 50.0 percent (12/24*200) from the midwest, 40.6 percent (13/32*200) from the south, and 38.5 percent (10/26*200) from the west voted not guilty. Alternatively, in terms of voting guilty, approximately 16.7 percent (3/18*100) of the senators from the northeast voted guilty whereas 50.0 percent (12/24*200) from the midwest, 59.4 percent (19/32*200) from the south, and 61.5 percent (16/26*200) from the west voted guilty.

Boxplots: Graphically Representing Conservatism Score by Region

Compare Means or Medians: Comparing Conservatism Scores by Region

Analysis Based on Medians Because the data are noticeably skewed for the northeast region, a more appropriate comparison of conservatism across regions is via the median, although results based on the means in this example yield the same result. According to the values of the median, the most conservative senators come from the south (72), followed by the west (64), the midwest (50), and the northeast (19.5).

Selection The table on the following slide provides guidelines for choosing the appropriate statistic(s) and graphs for describing the relationship between two variables. Other combinations may be correct.