Correlation and Linear Regression.

Presentation on theme: "Correlation and Linear Regression."— Presentation transcript:

Correlation and Linear Regression

Correlation Correlation: the mathematical extent to which two variables are related to each other Correlation refers to both a type of research design and a descriptive statistical procedure. Generally performed between two scores obtained from the same source The mathematical extent to which two sets of numbers are related to each other (the extent to which two variables are related) Generally performed on two scores obtained from the same source, like two measurements drawn from every person in the group

Correlation Coefficient
Correlation Coefficient: number between +1 and -1 that represents the strength and direction of the relationship between two variables Correlations that are closer to +1 and –1 are stronger and are better able to accurately predict Correlation Coefficient – Number between +1 and -1 that represents the degree and direction of relationship between two variables The Correlation Coefficient tells us how they are related; correlations and their coefficients can be either positive or negative and vary from –1.0 to +1.0 Correlations that are closer to +1 and –1 are stronger and are better able to accurately predict; correlations that are close to zero indicate no relationship among the variables An important use of the correlation coefficient is the ability to predict one set of scores from another; if we know the score on one variable, we can use that score to predict someone’s score on the correlated variable

Types of Correlation Coefficients
Pearson r: both variables are measured at an interval/ratio level Spearman rho: used when the measurement of at least one variable is ordinal (scores on the other variable must be converted to ranks)

Positive Correlations
Positive Correlation: a correlation that is a greater than zero, but less than +1 Indicates that high scores on one variable are associated with high scores on another variable The values of the variables increase and decrease together. Positive Correlation: as scores on one variable go up, scores on the other variable go up as well; high numbers match high numbers in raw scores The relationship between two measures such that an increase in the value of one is associated with an increase in the value of the other; also called a direct relationship.

Negative Correlations
Negative Correlation: a correlation coefficient whose value is between 0 and -1 Indicates that there is an inverse relationship between the two sets of scores A high score on X is related to a low score on Y, and vice versa The relationship existing between two variables such that an increase in one is associated with a decrease in the other; also called an inverse relationship. Negative Correlation: as scores on one variable go up, scores on the other variable go down; high raw scores on one variable coincide with low raw scores on the other variable

Linear Relationships Linear Relationship: a condition wherein the relationship between two variables can be best described by a straight line (the regression line or the line of best fit)

Scatterplots Scatterplot: provides a visual representation of the relationship between variables Each point represents paired measurements on two variables for a specific individual Scatterplots are used to graph correlations: In a positive correlation, a line can be drawn from the lower left to the upper right that represents the correlation; in a negative correlation, the line goes from the top left of the graph to the lower right. Shows the coordinates of the values from the two variables of each source A perfect positive correlation shows the coordinates all on a straight line extending from the lower left to the upper right A perfect negative correlation shows the coordinates all falling on a straight line extending from the upper left to the lower right A correlation of zero show the coordinates scattered randomly throughout the graph Five possible relationships in scatterplots: Positive Correlation Perfect Positive Correlation Negative Correlation Perfect Negative Correlation Zero, near zero, correlation

Understanding the Pearson Product Moment Correlation Coefficient
Pearson r: represents the extent to which individuals occupy the same relative position in two distributions Definitional Equation: Important Reminder: Σz2 = N A high positive Pearson r indicates that each individual or event obtained approximately the same z-score on both variables The sum of zxzy is maximum only when each zx is equal to its corresponding zy (sum of squares) No other combination of sums of products will be as large as when the two values are identical Advantage of the pearson r: we can correlate variables that were measured on different scales with different means and standard deviations  z- score transformation always converts the numbers to a scale for comparison The less the z- scores are aligned, the smaller are their sums of products

Interpreting the Correlation Coefficient
Coefficient of Determination (r2): the proportion of variance in one variable that can be described or explained by the other variable Coefficient of Nondetermination (1 - r2): the proportion of variance in one variable that cannot be described or explained by the other variable

Correlation Matrices Tables of correlations are generated when more than two variables are involved. A Correlation Matrix is a table in which each variable is listed both at the top and at the left side, and the correlation of all possible pairs of variables is shown inside the table An asterisk identifies significant correlations.

Caution: Spurious Correlations
Spurious Correlations: a correlation coefficient that is artificially high or low because of the nature of the data or method for collecting the data Common Causes of Spurious Correlations: A nonlinear relationship Truncated range Sample Size Outliers Multiple Populations Extreme Scores

Caution: No Causality Correlations only tell us that two variables are related; they do not determine causality Four Possible Explanations: X Y (Temporal Directionality) Y X (Temporal Directionality) X Y (Bidirectional Causation) Z X and Y (Third Variable Problem) Correlation only tells us that a relationship exists between the variables Two general problems with causation that correlation cannot address: Causes precede effects  the IV must occur in time before the DV  temporal directionality There must not be other variables that could cause X and Y to change (third variable problem) Bidirectional Causation  All of the behaviors could affect each other Third variable problem  could be another variable that causes the two behaviors to appear to be related Coefficient of Determination: In a correlational study, an estimate of the amount of variability in scores on one variable that can be explained by the other variable.

Computing the Correlation Coefficient Using SPSS
Analyze  Correlate  Bivariate Select variables to be correlated in the left side of the Bivariate Correlations window and move them to the right side Select the appropriate correlation coefficient Check two tailed and flag significant correlations  click OK

Interpreting the Output

Creating a Scatterplot
Graphs  Scatter Click Simple  Click Define Move the criterion variable to the Y axis box Move the predictor variable to the X axis box Click OK Double-click on the chart to edit it. Click Fit Line at Total.

Linear Regression An important use of the correlation coefficient is the ability to predict one set of scores from another. If we know the score on one variable, we can use that score to predict someone’s score on the correlated variable. An important use of the correlation coefficient is the ability to predict one set of scores from another; if we know the score on one variable, we can use that score to predict someone’s score on the correlated variable Applied to research in which all of the variables are measured (as opposed to manipulated) Useful in exploring relationships when experimentation is difficult, impossible, or unethical to use

The Regression Line Line of Best Fit: minimizes the distance between each individual point and the regression line Correlations measure how close the data points come to the line that relates them The regression line summarizes the relationship between X and Y in a manner somewhat analogous to the way a mean summarizes a sample of scores The regression line is a central tendency that moves with the values of X The line is placed by the method of least squares: the sum of the squares of the distances of the points to the line is a minimum The distance between each individual data point and the regression line is the error in prediction

The Regression Equation
Equation: Y’ = aY + bY(X) Where Y’ = the predicted score of Y based on a known value of X aY = the intercept of the regression line bY = the slope of the line X = the score being used as the predictor Linear regression describes a relations between variables in terms of the slope (regression coefficient or Beta) of a straight line The slope (often designated as m) tells how much the second variable (Y) changes as the values on the other variable (X) change by one unit X is the predictor variable; the variable whose values precede the values of Y Y is the predicted variable Y’ is the predicted value of Y, which is different from its actual value The difference between Y’ and Y is called the standard error of the estimate The intercept of a regression line is the value of Y when X equals zero i.e. the value where the regression line intercepts the Y axis at X=0 Coefficient of determination )R2): the square of the correlation value; measure the proportion of variation in the Y values that are explained or predicted by variable X values The remainder, 1 – R2, represents the variation unaccounted for, sometimes called the error

In English Please… Slope: how much variable Y changes as the values of variable X change one unit Intercept: the value of variable Y when X = 0 Predictor Variable: the variable X which is used to predict the score on variable Y (antecedent or independent variable) Criterion Variable: the variable that is predicted (dependent variable)

Linear Regression Using SPSS
Analyze  Regression  Linear Click on the criterion variable and move it to the Dependent box Click on the predictor variable and move ot to the Independent(s) box Click Statistics  check Descriptives  make sure that Estimates and Model fit are also selected Click Continue Click OK

Interpreting the Output
The F value in the ANOVA box indicates whether the predictor variable was a significant predictor of the criterion variable. The unstandardized coefficient for the constant reflects the Y intercept of the regression equation. The unstandardized coefficient for the predictor variable reflects the slope of the line. The regression equation for this example would be Y’ = X