2 CorrelationCorrelation: the mathematical extent to which two variables are related to each otherCorrelation refers to both a type of research design and a descriptive statistical procedure.Generally performed between two scores obtained from the same sourceThe mathematical extent to which two sets of numbers are related to each other (the extent to which two variables are related)Generally performed on two scores obtained from the same source, like two measurements drawn from every person in the group
3 Correlation Coefficient Correlation Coefficient: number between +1 and -1 that represents the strength and direction of the relationship between two variablesCorrelations that are closer to +1 and –1 are stronger and are better able to accurately predictCorrelation Coefficient – Number between +1 and -1 that represents the degree and direction of relationship between two variablesThe Correlation Coefficient tells us how they are related; correlations and their coefficients can be either positive or negative and vary from –1.0 to +1.0Correlations that are closer to +1 and –1 are stronger and are better able to accurately predict; correlations that are close to zero indicate no relationship among the variablesAn important use of the correlation coefficient is the ability to predict one set of scores from another; if we know the score on one variable, we can use that score to predict someone’s score on the correlated variable
4 Types of Correlation Coefficients Pearson r: both variables are measured at an interval/ratio levelSpearman rho: used when the measurement of at least one variable is ordinal (scores on the other variable must be converted to ranks)
5 Positive Correlations Positive Correlation: a correlation that is a greater than zero, but less than +1Indicates that high scores on one variable are associated with high scores on another variableThe values of the variables increase and decrease together.Positive Correlation: as scores on one variable go up, scores on the other variable go up as well; high numbers match high numbers in raw scoresThe relationship between two measures such that an increase in the value of one is associated with an increase in the value of the other; also called a direct relationship.
6 Negative Correlations Negative Correlation: a correlation coefficient whose value is between 0 and -1Indicates that there is an inverse relationship between the two sets of scoresA high score on X is related to a low score on Y, and vice versaThe relationship existing between two variables such that an increase in one is associated with a decrease in the other; also called an inverse relationship.Negative Correlation: as scores on one variable go up, scores on the other variable go down; high raw scores on one variable coincide with low raw scores on the other variable
7 Linear RelationshipsLinear Relationship: a condition wherein the relationship between two variables can be best described by a straight line (the regression line or the line of best fit)
8 ScatterplotsScatterplot: provides a visual representation of the relationship between variablesEach point represents paired measurements on two variables for a specific individualScatterplots are used to graph correlations: In a positive correlation, a line can be drawn from the lower left to the upper right that represents the correlation; in a negative correlation, the line goes from the top left of the graph to the lower right.Shows the coordinates of the values from the two variables of each sourceA perfect positive correlation shows the coordinates all on a straight line extending from the lower left to the upper rightA perfect negative correlation shows the coordinates all falling on a straight line extending from the upper left to the lower rightA correlation of zero show the coordinates scattered randomly throughout the graphFive possible relationships in scatterplots:Positive CorrelationPerfect Positive CorrelationNegative CorrelationPerfect Negative CorrelationZero, near zero, correlation
9 Understanding the Pearson Product Moment Correlation Coefficient Pearson r: represents the extent to which individuals occupy the same relative position in two distributionsDefinitional Equation:Important Reminder:Σz2 = NA high positive Pearson r indicates that each individual or event obtained approximately the same z-score on both variablesThe sum of zxzy is maximum only when each zx is equal to its corresponding zy (sum of squares)No other combination of sums of products will be as large as when the two values are identicalAdvantage of the pearson r: we can correlate variables that were measured on different scales with different means and standard deviations z- score transformation always converts the numbers to a scale for comparisonThe less the z- scores are aligned, the smaller are their sums of products
10 Interpreting the Correlation Coefficient Coefficient of Determination (r2): the proportion of variance in one variable that can be described or explained by the other variableCoefficient of Nondetermination (1 - r2): the proportion of variance in one variable that cannot be described or explained by the other variable
11 Correlation MatricesTables of correlations are generated when more than two variables are involved.A Correlation Matrix is a table in which each variable is listed both at the top and at the left side, and the correlation of all possible pairs of variables is shown inside the tableAn asterisk identifies significant correlations.
12 Caution: Spurious Correlations Spurious Correlations: a correlation coefficient that is artificially high or low because of the nature of the data or method for collecting the dataCommon Causes of Spurious Correlations:A nonlinear relationshipTruncated rangeSample SizeOutliersMultiple PopulationsExtreme Scores
13 Caution: No CausalityCorrelations only tell us that two variables are related; they do not determine causalityFour Possible Explanations:X Y (Temporal Directionality)Y X (Temporal Directionality)X Y (Bidirectional Causation)Z X and Y (Third Variable Problem)Correlation only tells us that a relationship exists between the variablesTwo general problems with causation that correlation cannot address:Causes precede effects the IV must occur in time before the DV temporal directionalityThere must not be other variables that could cause X and Y to change (third variable problem)Bidirectional Causation All of the behaviors could affect each otherThird variable problem could be another variable that causes the two behaviors to appear to be relatedCoefficient of Determination: In a correlational study, an estimate of the amount of variability in scores on one variable that can be explained by the other variable.
14 Computing the Correlation Coefficient Using SPSS Analyze Correlate BivariateSelect variables to be correlated in the left side of the Bivariate Correlations window and move them to the right sideSelect the appropriate correlation coefficientCheck two tailed and flag significant correlations click OK
16 Creating a Scatterplot Graphs ScatterClick Simple Click DefineMove the criterion variable to the Y axis boxMove the predictor variable to the X axis boxClick OKDouble-click on the chart to edit it.Click Fit Line at Total.
18 Linear RegressionAn important use of the correlation coefficient is the ability to predict one set of scores from another.If we know the score on one variable, we can use that score to predict someone’s score on the correlated variable.An important use of the correlation coefficient is the ability to predict one set of scores from another; if we know the score on one variable, we can use that score to predict someone’s score on the correlated variableApplied to research in which all of the variables are measured (as opposed to manipulated)Useful in exploring relationships when experimentation is difficult, impossible, or unethical to use
19 The Regression LineLine of Best Fit: minimizes the distance between each individual point and the regression lineCorrelations measure how close the data points come to the line that relates themThe regression line summarizes the relationship between X and Y in a manner somewhat analogous to the way a mean summarizes a sample of scoresThe regression line is a central tendency that moves with the values of XThe line is placed by the method of least squares: the sum of the squares of the distances of the points to the line is a minimumThe distance between each individual data point and the regression line is the error in prediction
20 The Regression Equation Equation: Y’ = aY + bY(X)WhereY’ = the predicted score of Y based on a known value of XaY = the intercept of the regression linebY = the slope of the lineX = the score being used as the predictorLinear regression describes a relations between variables in terms of the slope (regression coefficient or Beta) of a straight lineThe slope (often designated as m) tells how much the second variable (Y) changes as the values on the other variable (X) change by one unitX is the predictor variable; the variable whose values precede the values of YY is the predicted variableY’ is the predicted value of Y, which is different from its actual valueThe difference between Y’ and Y is called the standard error of the estimateThe intercept of a regression line is the value of Y when X equals zero i.e. the value where the regression line intercepts the Y axis at X=0Coefficient of determination )R2): the square of the correlation value; measure the proportion of variation in the Y values that are explained or predicted by variable X valuesThe remainder, 1 – R2, represents the variation unaccounted for, sometimes called the error
21 In English Please…Slope: how much variable Y changes as the values of variable X change one unitIntercept: the value of variable Y when X = 0Predictor Variable: the variable X which is used to predict the score on variable Y (antecedent or independent variable)Criterion Variable: the variable that is predicted (dependent variable)
22 Linear Regression Using SPSS Analyze Regression LinearClick on the criterion variable and move it to the Dependent boxClick on the predictor variable and move ot to the Independent(s) boxClick Statistics check Descriptives make sure that Estimates and Model fit are also selectedClick ContinueClick OK
23 Interpreting the Output The F value in the ANOVA box indicates whether the predictor variable was a significant predictor of the criterion variable.The unstandardized coefficient for the constant reflects the Y intercept of the regression equation.The unstandardized coefficient for the predictor variable reflects the slope of the line.The regression equation for this example would be Y’ = X