SPSS Session 4: Association and Prediction Using Correlation and Regression.

Slides:



Advertisements
Similar presentations
SPSS Session 3: Finding Differences Between Groups
Advertisements

SPSS Session 2: Hypothesis Testing and p-Values
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
Correlation and Linear Regression.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Learning Objectives 1 Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
SPSS Session 1: Levels of Measurement and Frequency Distributions
Describing Relationships Using Correlation and Regression
Overview Correlation Regression -Definition
Correlation CJ 526 Statistical Analysis in Criminal Justice.
Correlation Chapter 9.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
CJ 526 Statistical Analysis in Criminal Justice
Intro to Statistics for the Behavioral Sciences PSYC 1900
Matching level of measurement to statistical procedures
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Correlations and T-tests
Lecture 5: Simple Linear Regression
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Regression Analysis We have previously studied the Pearson’s r correlation coefficient and the r2 coefficient of determination as measures of association.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Leon-Guerrero and Frankfort-Nachmias,
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Review Regression and Pearson’s R SPSS Demo
Relationships Among Variables
Correlations 11/5/2013. BSS Career Fair Wednesday 11/6/2013- Mabee A & B 12:30-2:30P.
Correlation and Linear Regression Chapter 13 Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Chapter 8: Bivariate Regression and Correlation
Example of Simple and Multiple Regression
Linear Regression and Correlation
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
February  Study & Abstract StudyAbstract  Graphic presentation of data. Graphic presentation of data.  Statistical Analyses Statistical Analyses.
Week 12 Chapter 13 – Association between variables measured at the ordinal level & Chapter 14: Association Between Variables Measured at the Interval-Ratio.
Linear Regression and Correlation
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Is there a relationship between the lengths of body parts ?
Learning Objective Chapter 14 Correlation and Regression Analysis CHAPTER fourteen Correlation and Regression Analysis Copyright © 2000 by John Wiley &
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Chapter 12 Examining Relationships in Quantitative Research Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Examining Relationships in Quantitative Research
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
Chapter 16 Data Analysis: Testing for Associations.
Correlation & Regression Correlation does not specify which variable is the IV & which is the DV.  Simply states that two variables are correlated. Hr:There.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
Chapter Eight: Using Statistics to Answer Questions.
Correlation & Regression Analysis
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Correlation & Linear Regression Using a TI-Nspire.
Chapter 2 Bivariate Data Scatterplots.   A scatterplot, which gives a visual display of the relationship between two variables.   In analysing the.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Regression Analysis.
Is there a relationship between the lengths of body parts?
Lecture 10 Regression Analysis
Multiple Regression.
Regression Analysis.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
CHAPTER fourteen Correlation and Regression Analysis
Chapter 15 Linear Regression
Understanding Research Results: Description and Correlation
Presentation transcript:

SPSS Session 4: Association and Prediction Using Correlation and Regression

Learning Objectives Review information from Lecture 10 Understand the relationship between two interval/ratio variables using Test for association between two variables using correlation and interpret the correlation coefficients Using regression, describe how one variable can be used to predict the score in another Conduct correlation and regression analyses in SPSS and interpret the statistical findings

Review of Lecture 10 Completion of this session enabled you to : -Understand how multiple variables may interact with one another -Appreciate the role of intervening variables -Be aware of how interpretation of statistics may be affected by outliers and misinterpretations

Association Between Variables Correlation is a statistical test that allows us to gauge the association between two interval/ratio variables. For example, we would expect age and height to be correlated. As age increases, we expect a similar increase in height. “Pearson’s R” statistic is the most common correlation test. Correlation is best understood through the use of a chart called a scatterplot.

Correlation and Pearson’s r Pearson’s r is the most common correlation coefficient. It is used to statistically show the magnitude and direction of a relationship between two variables. It is on a scale of -1 to 1. Distance either direction from 0 is crucial and shows magnitude. The sign of the r (+/-) shows the direction. – Either negative or positive direction

Scatterplots Scatterplots produce an useful visualization of the association between two variables. The independent variable is shown on the horizontal axis (X axis). The dependent variable is shown on the vertical axis (Y axis). In the next example, we wanted to describe the relationship between the age of the person responding to the questionnaire in our child protection study and the age of the child in their care.

In this example of a scatterplot, age of the respondent is on the X axis. Age of the child is on the Y axis.

Each dot is a single family and represents the point at which the ages of the respondent and child intersect based on the two ages. Example of a case: Parent age = 45 years Child age = 5 years

Correlation Lines Based on the scatterplot, think of a line that could be drawn to represent the relationship between the age of the person responding to the questionnaire and the age of the child in their care. This line should attempt to minimize the vertical distance between any given point and the line. It’s often called “the line of best fit”.

Correlation Line?

Correlation Line Shown

The line predicts some of the cases and their association between the ages of the respondent and child very well! These cases sit right on the line!

The line does not other cases and their ages quite as well. These cases are vertically very far from the line. Perhaps these were cases where the children were placed in the care of their grandparents after the children were removed from their parents.

Correlation and Pearson’s r There are three critical characteristics of correlation needed to properly describe the association between to variables. 1.MAGNITUDE 2.DIRECTION 3.STATISTICAL SIGNIFICANCE

Magnitude of the Correlation Distance either direction from 0 is crucial and shows magnitude. Correlation scores farther away from 0, closer to either -1 or 1, are deemed as stronger. We would say that correlations of -1 or 1 are perfectly correlated.

Direction of Correlation Correlation scores that are above 0 are called positive correlations. – As values for one variable increase, we would expect an associated increase in the other. Correlation scores that are below 0 are called negative correlations. – As values for one variable increase, we would expect as associated decrease in the other.

The correlation between the ages of the children and the respondents to the questionnaire in the child protection study was r=.514. The magnitude was moderate as the correlation coefficient was halfway between 0 and 1. Because the correlation score was above 0, we would say that it was a positive correlation.

Correlation Example 1: GHQ and WAI We wanted to test for the association between two variables in our child protection study. The General Health Questionnaire (GHQ) total score which was a measure of psychological distress reported by the respondent answering the questionnaire. The Working Alliance Inventory (WAI) total score which is a measure of the quality of the relationship that respondents reported having with their the child protection worker. We hypothesized that respondents reporting greater distress (GHQ scores) would report having a worse relationship (WAI scores) with their child protection worker.

Correlation Example 1: GHQ and WAI Our research hypothesis is that GHQ scores and WAI scores are negatively and significantly correlated. We expected that the r correlation coefficient would be less than 0, closer to -1, and statistically significant. Our null hypothesis would be that the two variables would not be significantly associated and thus would have a r correlation coefficient not significantly different from to 0.

Correlation Example 1: GHQ and WAI In SPSS, we select the “Analyze” menu, then “Correlate”, and select “Bivariate”.

Correlation Example 1: GHQ and WAI The “Bivariate Correlations” window will appear Find the “WAI_Total” and “GHQ_TotalScore” variables and add them to the “Variables” list All the options below which are selected are the usual default.

Correlation Example 1: GHQ and WAI Click “OK” to conduct the analysis.

Correlation Results 1: GHQ and WAI The results from the analysis indicate that the GHQ and WAI scores have a weak, negative correlation (r= -.184). However, the p-value for this correlation is above the significance level standard of α=.05. The obtained p-value is.075 which is to say the correlation was likely to have happened by chance and is not a significant relationship (p>.05). We would then fail to reject the null hypothesis and say that these two variables are unrelated.

Correlation Results 1: GHQ and WAI

Correlation Example 2: Family Environment In the child protection study, we have three measures of the characteristics of the family environment using the Family Environment Scale: FES – Cohesion: – Measure of the perceived level of commitment and support expressed by family members FES – Expressiveness: – Measure of the degree of emotional openness and encouragement in the family FES – Conflict: – Measure of familial conflict and expressed anger

Correlation Example 2: Family Environment Based on the cohesion, expressiveness, and conflict within a family environment, we can begin to hypothesize about the relationships between the three measures. We would expect the Cohesion and Expressiveness scores to be positively, strongly, and significantly correlated (correlation coefficient closer to 1). We would expect that the Conflict scores to be negatively, strongly, and significantly correlated with the Cohesion and Expressiveness scores (correlation coefficient closer to -1). Our null hypothesis for each of these analyses would be that no score is correlated with any other score and would produce a correlation coefficient not significantly different from 0.

Correlation Example 2: Family Environment In SPSS, we select the “Analyze” menu, then “Correlate”, and select “Bivariate”.

Correlation Example 2: Family Environment The “Bivariate Correlations” window will appear Find the “FES_Cohesive”, “FES_Express”, and “FES_Conflict” variables and add them to the “Variables” list All the options below which are selected are the usual default.

Correlation Example 2: Family Environment Click “OK” to conduct the analysis.

Correlation Results 2: Family Environment Here are the results

Correlation Results 2: Family Environment Cohesion and Expressiveness are moderately, positively, and significantly correlated (r=.556, p<.05). We can reject our null hypothesis that these variables were not associated. In our study, it appears that there is a moderate and significant between parent or carer reports of the level of commitment and support expressed by family members and their degree of emotional openness and encouragement of each other.

Correlation Results 2: Family Environment

The degree of family Conflict is moderately, negatively, and significantly correlated with both Cohesion (r= -.486, p<.05) and Expressiveness (r= -.403, p<.05). We can reject our null hypothesis that these variables were not associated. In our study, it appears that increased reports of family conflict is associated with decreased reports of both their level of commitment and support expressed by family members and their degree of emotional openness and encouragement of each other.

Moving from Association to Prediction Moving from Correlation to Regression

Regression Regression is an extension of correlation where we take the value of an independent variable and attempt to predict the value in another variable. Both variables must be interval/ratio level of measurement

Regression Equation

Regression Equation and Lines X 1 = Predictor – IV Y = Outcome - DV

Regression Example 1: Age and FES We will conduct three separate regression analyses in this example. In each case, we will use age of the child (IV) to predict one of the three FES scores (DV). – FES – Cohesion: Measure of the perceived level of commitment and support expressed by family members

Regression Example 1: Age and FES Within our child protection study, we wanted to determine if age of the child could predict characteristics of the family environment as reported by the parent or carer responding to the questionnaire. We would expect that older children are associated with more challenges in the family environment (research hypothesis). Like correlation, regression uses two interval/ratio variables. For this analysis, our interval/ratio variables are age of the child and one of the three FES scores.

Regression Example 1: Age and FES

To conduct each analysis, we need first to select the FES score using the Linear Regression menu. Select “Analyze”, then “Regression”, then “Linear”. The Linear Regression window will appear.

Regression Example 1: Age and FES

The Linear Regression window:

Find our first dependent variable which will be “FES_Cohesive” Add it to the “Dependent” list.

Our independent variable is the age of the child. Find the “Child_Age_Yrs” variable and add it to the “Independent(s)” variables list.

Regression Example 1: Age and FES Under the “Statistics” menu on the right side of the “Linear Regression” window, select the following: – Regression Coefficients – Estimates: this provides the correlation coefficient r-value for the association between the IV and DV – Model Fit: this provides a value to estimate the percentage of the DV that is explained by the IV – Descriptives: this provides the descriptive statistics for the values in the analysis Click “Continue” Click “OK” to conduct the analysis

Regression Example 1: Age and FES

Regression Results 1: Age and FES The first table provides the descriptive statistics for the IV and DV.

Regression Results 1: Age and FES The second table offers the correlation coefficients between the age of the child (IV) and the FES – Cohesion scores (DV). From this table, we see that these two variables are significantly associated and have a weak, negative correlation (r = -.244, p<.05).

Regression Results 1: Age and FES Produced by the “Model Fit” option, this table provides a summary of the value of our regression equation in predicting FES – Cohesion by using age of the child as our predictor.

Regression Results 1: Age and FES We see “R” is our correlation coefficient’s distance from r = 0. “R Square” is r 2 or squaring the correlation coefficient. r 2 can be interpreted as a percentage of the variance in the DV that is explained by the IV. In this case, age of the child can statistically explain 5.9% of the variation in the FES – Cohesion scores.

Regression Results 1: Age and FES Regression tests use the same class as test as the ANOVA, which for this analysis, is below:

Regression Results 1: Age and FES The table indicates that from our regression model, we have significantly predicted the FES – Cohesion scores (F=5.880, df= 1,93, p<.05). We can reject our null hypothesis that the age of the child does not predict FES – Cohesion scores.

Regression Results 1: Age and FES

From our equation, we can see that for every year that a child is older, there is an average decrease in the FES-Cohesion score of The range of FES-Cohesion scores was from 0-9. A decrease of 1.34 on a scale from 0-9 for every year that child is older is a significant and meaningful decrease in cohesion of a family environment as reported by the parent or carer!

The regression model significant predicts FES- Cohesion scores (F=5.880, df= 1,93, p<.05). Age of the child is a significant predictor (t=-2.425, p<.05) of FES-Cohesion. Age of the child explains 5.9% of the variance in the FES-Cohesion scores.

Regression Example 2: Age and SDQ To conduct each analysis, we need first to select the SDQ score using the Linear Regression menu. Select “Analyze”, then “Regression”, then “Linear”. The Linear Regression window will appear.

Regression Example 2: Age and SDQ scores We found from the previous analysis that older children in the home are associated with greater difficulties with the cohesion of the family environment. We wanted to explore this aspect of the family further. The Strength and Difficulties measure (SDQ) is provides a view of the psychosocial problems of a child as reported by the parent or carer. We would hypothesize that age of the child would predict increased psychosocial difficulties reported by the parent or carer on the SDQ measure (research hypothesis).

Regression Example 2: Age and SDQ

Replace “FES-Cohesive” with “SDQ_TotalDif” from the list on the right. “SDQ_TotalDif” is the new dependent variable in this new analysis.

Regression Example 2: Age and SDQ Under the “Statistics” menu on the right side of the “Linear Regression” window, select the following: – Regression Coefficients – Estimates: this provides the correlation coefficient r-value for the association between the IV and DV – Model Fit: this provides a value to estimate the percentage of the DV that is explained by the IV – Descriptives: this provides the descriptive statistics for the values in the analysis Click “Continue” Click “OK” to conduct the analysis

Regression Example 2: Age and SDQ

Regression Results 2: Age and SDQ The first table gives the descriptive statistics for the variables in the analysis.

Regression Results 2: Age and SDQ The second table provides the correlation coefficients between the two variables.

Age of the child is not correlated with the SDQ total difficulties score (r=.127, p>.05). The variables are not significantly correlated. The weak, positive correlation likely occurred by chance and not representative of an actual relationship between the two variables.

We see “R” is our correlation coefficient, r =.127 “R Square” is r 2 or squaring the correlation coefficient. r 2 can be interpreted as a percentage of the variance in the DV that is explained by the IV. In this case, age of the child can statistically explain 1.6% of the variation in the SDQ total difficulties score. This is a very small r 2 showing how poorly age of the child predicts SDQ scores.

Regression Results 2: Age and SDQ Regression tests use the same class as test as the ANOVA, which for this analysis, is below:

Regression Results 2: Age and SDQ The table indicates that from our regression model, we have NOT significantly predicted the SDQ total scores (F=1.524, df= 1,93, p>.05). We have failed to reject our null hypothesis that the age of the child does not predict SDQ total scores.

The regression model does not significant predict SDQ scores (F=1.524, df= 1,93, p>.05). Age of the child is not a significant predictor (t= , p>.05) of SDQ scores.

Regression Results 2: Age and SDQ It is interesting that the family environment was predicted by the age of the child, but the age of the child did not predict the parent/carer reports of the psychosocial difficulties of the child. From these two analyses, we can see different results of regression models having completed the regression equation for the one regression model where the independent variable (age of child) did significantly predict the dependent variable (reports of cohesion in the family environment).