Presentation is loading. Please wait.

Presentation is loading. Please wait.

Theme 6. Linear regression

Similar presentations


Presentation on theme: "Theme 6. Linear regression"— Presentation transcript:

1 Theme 6. Linear regression
1. Introduction. 2. The equation of the line. 3. The least squares criterion. 4. Graphical representation. 5. Standardized regression coefficients. 6. The coefficient of determination. 7. Introduction to multiple regression.

2 Introduction The establishment of a correlation between two variables is important, but this is considered a first step when predicting one variable from the other. (Or others, in the case of multiple regression; ie., multiple predictors.) Of course, if we know that the variable X is closely related to Y, this means that we can predict Y from X. We are now in the field of prediction. (Obviously if X is unrelated to Y, X does not serve as predictor of Y.) Note: We will use the terms "regression" and "prediction" as almost synonymous. (The reason for the use of the term "regression" is old, and has remained as such.)

3 Introduction To simplify, we will focus (for simplicity) in the case that the relationship between X and Y is linear. Performance Of course, the issue now is how to get the "best" line that connects the dots. We need a criterion. While there are other criteria, the most commonly used, and we will see here, it is the least squares criterion. IQ Least squares criterion: it minimizes the squared distances of the points on the line.

4 Review of the equation of a line
Y=A+BX A is the intercept (This is where the short straight the Y axis) B is the slope (observe that in the case of positive relationships, B is positive, in the case of the negative ratio, B is negative, if there is no relation, B will be approximately 0) PErformance IQ If we want to predict Y from X, we need to calculate (in the case of linear relationship) the regression of Y on (from) X.

5 Calculation of the linear regression equation (Y on X)
The least squares criterion gives us a value of A and one of B, such that Y’ Performance (Y) Is minimal IQ (X)

6 Calculation of the linear regression equation (Y on X)
IQ (X) Performance (Y)

7 Calculation of the linear regression equation (Y on X)
The least-squares line is: Y’=-8’5+0’15X Is minimal It is 11.5 in this case Important notes: -Each Unit in IQ increases the grde in 0’15. Although in this case, the following does not make sense, a person with IQ of 0, would be predicted a grade of -8.5

8 Calculation of the linear regression equation (Y on X)
Formulae... In direct scores intercept slope Note: Both A and B can be easily obtained in any calculator with "LR" option (Linear Regression)

9 Calculation of the linear regression equation (Y on X)
then Y’=-8’5+0’15X

10 Calculation of the linear regression equation (Y on X)
The formulas in differential scores Notice that the average of X and Y will mean in typical score 0 intercept IMPORTANT: B = b That is, the slope differential scores is the same as in direct scores slope Therefore, the regression in differential scores is in our case: y’=0’15x

11 Calculation of the linear regression equation (Y on X)
Formulas in standarized scores As in differential scores intercept This is actually Pearson’s coefficient slope Therefore, the regression line in standard scores is in our case: zy’ =0’703zx

12 Calculation of the linear regression equation (Y on X)
OUTPUT from SPSS Intercept and slow (stand.scores) Ord. y pendiente (punt.directas) Note: in standardized scores, the slope matches Pearsons’ r coefficiente.

13 Calculation of the linear regression equation (Y on X)
We know that y And from Theme 5 And also therefore

14 Calculation of the linear regression equation (Y on X)
Therefore, and

15 Prediction errors in the regression line of Y on X
Observed scores Predicted scores Prediction errors with the equation The question now is how much variance is reduced by using the regression of Y on X (ie, having X as a predictor) compared to the case where we did not have the regression line

16 Prediction errors in the regression of Y on X
If we had no predictors, what could score for scores of Y? In this case, since the least squares criterion, whether we are in Y and   lack data on X, our best estimate of Y will be your average Recall that the average minimizes the sum of the differences quadratic es minimal If we use the average as a predictor, the variance of the predictions will be

17 Prediction errors in the regression of Y on X
But if we have a predictor X, the variance will be This is the variance of Y not explained by X It can be proven that And as a result

18 How good is the prediction of the regression line
How good is the prediction of the regression line? The coefficient of determination as an index of the goodness of fit of our model (the regression line) We just showed that This is called coefficient of determination. It indicates how good the fit of the regression line (or in general linear model). It is bounded between 0 and 1. If all the points in the scatterplot are on the line (with a slope different from 0), then it will be 0, and the coefficient of determination is 1

19 The coefficient of determination and the proportion of variance associated / explained / Common (2)
Let's start with a tautology This expression indicates that the observed score for the ith subject is equal to the predicted score for said subject over a prediction error. It can be shown that the predicted scores and the prediction errors are independent, so we can indicate the following Total variance of Y Variance of the predicted data in Y Variance of the residuals (errors) when using the equation to predict Y

20 The coefficient of determination and the proportion of variance associated / explained / Common (2)
From the previous slide, we have As we know that therefore In short, the coefficient of determination measures the proportion of the variance of Y that is associated / explained by the predictor X

21 Introduction to multiple linear regression (1)
We have seen the case of a predictor (X) and a variable predicted (Y), and obtained the regression of Y on X by the least squares method. Given the nature of human behavior, in which each observed behavior can be influenced by different variables, We can have several predictors X1, X2, to predict Y (or if you prefer, several predictors, X2, X3, ...., to predict X1). This is the case of multiple regression. So far we has Dependent variable But now we will have k predictors: Predictors (independent) variabless

22 Introduction to multiple linear regression (2)
Recta regresión Introduction to multiple linear regression (2) It is important that you realize that the B2, B3, ..., weights are similar to those we saw in the case of the regression line. For instance Such coefficients represent how important predictor respective variable in the regression equation. As was the case in the regression line (notice that the case of 1 predictor is a particular case of multiple regression), A represents where the multiple regression hyperplane short axis of the predicted variable. For simplicity, the whole process is usually done by computer, so we will not see the formulas...

23 Introduction to multiple linear regression (3)
In raw scores, the regression equation is that we know In differential scores, remember that A is 0 in the regression; the same applies in the regression equation. . And applying the same logic, the value of the weights is the same as we had in direct scores etcétera

24 Introduction to multiple linear regression (4)
Data(N=5) Rendim Ansied Neurot As in the case of one predictor

25 The general linear model
The general linear model underlies much of the statistical tests that are conducted in psychology and other social sciences. To say a few: Regression analyses (already seen) Analysis of Variance (2nd semester will be) T-Test (2nd term) Analysis of Covariance Analysis of clusters (cluster analysis) -Factorial analysis -Discriminant analysis ….

26 The general linear model (2)
Clearly, the regression analyses we have seen is a particular case of the general linear model or Observed = Predicted + prediction error In general terms

27 The general linear model
The general expression is Y: Dependent Variable X1, X2, ..., independent variables (predictors of Y) e: random error B1, B2, ..., they are the weights that determine the contribution of each independent variable.


Download ppt "Theme 6. Linear regression"

Similar presentations


Ads by Google