Download presentation
Presentation is loading. Please wait.
Published byRosamond Dickerson Modified over 9 years ago
1
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly. Perfect predictions: –When dealing with z scores, the z score you predict for the Y variable is exactly the same as the z-score for the X variable –That is, when r = +1.0: z Y’ = z X –And, when r = -1.0: z Y’ = -z X When r is less than perfect, this rule must be modified, according to the strength of the correlation. The modified rule is the standardized regression equation, as shown on the next slide. Chapter 10: Linear Regression
2
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 2 Predicting with z scores –Standardized Regression Equation: z Y’ = r z X If r = –1 or +1, the magnitude of the predicted z score is the same as the z score from which we are predicting. If r = 0, the z score prediction is always zero (i.e., the mean), which implies that, given no other information, our best prediction for a variable is its own mean. –As the magnitude of r becomes smaller, there is less of a tendency to expect an extreme score on one variable to be associated with an equally extreme score on the other. This is consistent with Galton’s concept of “regression toward mediocrity” (i.e., regression toward the mean).
3
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 3 Raw score graph z score graph
4
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen. 4 Regression Formulas When Dealing With a Population –A basic formula for linear regression in terms of population means and standard deviations is as follows: –This formula can be simplified to the basic equation for a straight line: where and
5
Chapter 10For Explaining Psychological Statistics, 4th ed. by B. Cohen 5 Regression Formulas for Making Predictions From Samples –The same raw-score regression equa- tion is used when working with samples: except that the slope of the line is now found from the unbiased SDs: and the Y-intercept is now expressed in terms of the sample means:
6
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 6 Quantifying the Errors Around the Regression Line –Residual: The difference between the actual Y value and the predicted Y value (Y – Y’). Each residual can be thought of as an error of prediction. –The positive and negative residuals will balance out so that the sum of the residuals will always be zero. –The linear regression equation gives us the straight line that minimizes the sum of the squared residuals (i.e., the sum of squared errors). Therefore, it is called the least-squares regression line. –The regression line functions like a running average of Y, in that it passes through the mean of the Y values (approximately) for each value of X.
7
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 7 The Variance of the Estimate in a Population –Quantifies the average amount of squared error in the predictions : –The variance of the estimate (or residual variance) is the variance of the data points around the regression line. –As long as r is not zero, σ 2 est Y will be less than σ 2 Y (the ordinary variance of the Y values); the amount by which it is less represents the advantage of performing regression. –Larger rs (in absolute value) will lead to less error in prediction (i.e., points closer to the regression line), and therefore a smaller value for σ 2 est Y. –This relation between σ 2 est Y and Pearson’s r is shown in the following formula:
8
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 8 Coefficient of Determination –The proportion of variance in the predicted variable that is not accounted for by the predicting variable is found by rearranging the formula for the variance of the estimate in the previous slide. 1 – r 2 = unexplained variance = σ 2 estY total variance σ 2 Y –The ratio of the variance of the estimate to the ordinary variance of Y is called the coefficient of nondetermination, and it is sometimes symbolized as k 2. –Larger absolute values of r are associated with smaller values for k 2. –The proportion of the total variance that is explained by the predictor variable is called the coefficient of determination, and it is simply equal to r 2 : r 2 = explained variance = 1 – k 2 total variance
9
Chapter 10For Explaining Psychological Statistics, 4th ed. by B. Cohen 9 Example from Lockhart, Robert S. (1998). Introduction to statistics and data analysis. New York: W. H. Freeman & Company. Here Is a Concrete Example of Linear Regression …
10
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 10 Estimating the Variance of the Estimate From a Sample –When using a sample to estimate the variance of the estimate, we need to correct for bias, even though we are basing our formula on the unbiased estimate if the ordinary variance: Standard Error of the Estimate The standard error of the estimate is just the square root of the variance of the estimate. When estimating from a sample, the formula is:
11
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 11 Assumptions Underlying Linear Regression –Independent random sampling –Bivariate normal distribution –Linearity of the relationship between the two variables –Homoscedasticity (i.e., the variance around the regression line is the same for every X value) Uses for Linear Regression –Prediction –Statistical control (i.e., removing the linear effect of one variable on another) –Quantifying the relationship between a DV and a manipulated IV with quantita- tive levels
12
Chapter 10For Explaining Psychological Statistics, 4th ed. by B. Cohen 12 The Point-Biserial Correlation Coefficient –An ordinary Pearson’s r calculated for one continuous multivalued variable and one dichotomous (i.e., grouping) variable. The sign of r pb is arbitrary and therefore usually ignored. –A r pb can be tested for significance with a one-sample t test as follows: –By solving for r pb, we obtain a simple formula for converting a two-sample pooled-variance t value into a correl- ational measure of effect size:
13
Chapter 10For Explaining Psychological Statistics, 4th ed. by B. Cohen 13 The Proportion of Variance Accounted for in a Two-Sample Comparison –Squaring r pb gives the proportion of vari- ance in your DV accounted for by your two-level IV (i.e., group membership). –Even when you obtain a large t value it is possible that little variance is accounted for; therefore r pb is a useful supplement to the two-sample t value. –r pb is an alternative to g for expressing the effect size found in your samples. The two measures have a fairly simple relationship: where N is the total number of cases across both groups, and df = N – 2
14
Chapter 10For Explaining Psychological Statistics, 4th ed. by B. Cohen 14 Estimating the Proportion of Variance Accounted for in the Population –r 2 pb from a sample tends to over- estimate the proportion of variance accounted for in the population. This bias can be corrected with the following formula: –ω 2 and d 2 are two different measures of the effect size in the population. They have a very simple relationship, as shown by the following formula:
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.