Theme 6. Linear regression

Theme 6. Linear regression
1. Introduction. 2. The equation of the line. 3. The least squares criterion. 4. Graphical representation. 5. Standardized regression coefficients. 6. The coefficient of determination. 7. Introduction to multiple regression.

Introduction The establishment of a correlation between two variables is important, but this is considered a first step when predicting one variable from the other. (Or others, in the case of multiple regression; ie., multiple predictors.) Of course, if we know that the variable X is closely related to Y, this means that we can predict Y from X. We are now in the field of prediction. (Obviously if X is unrelated to Y, X does not serve as predictor of Y.) Note: We will use the terms "regression" and "prediction" as almost synonymous. (The reason for the use of the term "regression" is old, and has remained as such.)

Introduction To simplify, we will focus (for simplicity) in the case that the relationship between X and Y is linear. Performance Of course, the issue now is how to get the "best" line that connects the dots. We need a criterion. While there are other criteria, the most commonly used, and we will see here, it is the least squares criterion. IQ Least squares criterion: it minimizes the squared distances of the points on the line.

Review of the equation of a line
Y=A+BX A is the intercept (This is where the short straight the Y axis) B is the slope (observe that in the case of positive relationships, B is positive, in the case of the negative ratio, B is negative, if there is no relation, B will be approximately 0) PErformance IQ If we want to predict Y from X, we need to calculate (in the case of linear relationship) the regression of Y on (from) X.

Calculation of the linear regression equation (Y on X)
The least squares criterion gives us a value of A and one of B, such that Y’ Performance (Y) Is minimal IQ (X)

IQ (X) Performance (Y)

The least-squares line is: Y’=-8’5+0’15X Is minimal It is 11.5 in this case Important notes: -Each Unit in IQ increases the grde in 0’15. Although in this case, the following does not make sense, a person with IQ of 0, would be predicted a grade of -8.5

Formulae... In direct scores intercept slope Note: Both A and B can be easily obtained in any calculator with "LR" option (Linear Regression)

then Y’=-8’5+0’15X

The formulas in differential scores Notice that the average of X and Y will mean in typical score 0 intercept IMPORTANT: B = b That is, the slope differential scores is the same as in direct scores slope Therefore, the regression in differential scores is in our case: y’=0’15x

Formulas in standarized scores As in differential scores intercept This is actually Pearson’s coefficient slope Therefore, the regression line in standard scores is in our case: zy’ =0’703zx

OUTPUT from SPSS Intercept and slow (stand.scores) Ord. y pendiente (punt.directas) Note: in standardized scores, the slope matches Pearsons’ r coefficiente.

We know that y And from Theme 5 And also therefore

Therefore, and

Prediction errors in the regression line of Y on X
Observed scores Predicted scores Prediction errors with the equation The question now is how much variance is reduced by using the regression of Y on X (ie, having X as a predictor) compared to the case where we did not have the regression line

Prediction errors in the regression of Y on X
If we had no predictors, what could score for scores of Y? In this case, since the least squares criterion, whether we are in Y and lack data on X, our best estimate of Y will be your average Recall that the average minimizes the sum of the differences quadratic es minimal If we use the average as a predictor, the variance of the predictions will be

Prediction errors in the regression of Y on X
But if we have a predictor X, the variance will be This is the variance of Y not explained by X It can be proven that And as a result

How good is the prediction of the regression line
How good is the prediction of the regression line? The coefficient of determination as an index of the goodness of fit of our model (the regression line) We just showed that This is called coefficient of determination. It indicates how good the fit of the regression line (or in general linear model). It is bounded between 0 and 1. If all the points in the scatterplot are on the line (with a slope different from 0), then it will be 0, and the coefficient of determination is 1

The coefficient of determination and the proportion of variance associated / explained / Common (2)
Let's start with a tautology This expression indicates that the observed score for the ith subject is equal to the predicted score for said subject over a prediction error. It can be shown that the predicted scores and the prediction errors are independent, so we can indicate the following Total variance of Y Variance of the predicted data in Y Variance of the residuals (errors) when using the equation to predict Y

The coefficient of determination and the proportion of variance associated / explained / Common (2)
From the previous slide, we have As we know that therefore In short, the coefficient of determination measures the proportion of the variance of Y that is associated / explained by the predictor X

Introduction to multiple linear regression (1)
We have seen the case of a predictor (X) and a variable predicted (Y), and obtained the regression of Y on X by the least squares method. Given the nature of human behavior, in which each observed behavior can be influenced by different variables, We can have several predictors X1, X2, to predict Y (or if you prefer, several predictors, X2, X3, ...., to predict X1). This is the case of multiple regression. So far we has Dependent variable But now we will have k predictors: Predictors (independent) variabless

Recta regresión Introduction to multiple linear regression (2) It is important that you realize that the B2, B3, ..., weights are similar to those we saw in the case of the regression line. For instance Such coefficients represent how important predictor respective variable in the regression equation. As was the case in the regression line (notice that the case of 1 predictor is a particular case of multiple regression), A represents where the multiple regression hyperplane short axis of the predicted variable. For simplicity, the whole process is usually done by computer, so we will not see the formulas...

In raw scores, the regression equation is that we know In differential scores, remember that A is 0 in the regression; the same applies in the regression equation. . And applying the same logic, the value of the weights is the same as we had in direct scores etcétera

Data(N=5) Rendim Ansied Neurot As in the case of one predictor

The general linear model
The general linear model underlies much of the statistical tests that are conducted in psychology and other social sciences. To say a few: Regression analyses (already seen) Analysis of Variance (2nd semester will be) T-Test (2nd term) Analysis of Covariance Analysis of clusters (cluster analysis) -Factorial analysis -Discriminant analysis ….

The general linear model (2)
Clearly, the regression analyses we have seen is a particular case of the general linear model or Observed = Predicted + prediction error In general terms

The general linear model
The general expression is Y: Dependent Variable X1, X2, ..., independent variables (predictors of Y) e: random error B1, B2, ..., they are the weights that determine the contribution of each independent variable.

Theme 6. Linear regression

Similar presentations

Presentation on theme: "Theme 6. Linear regression"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Theme 6. Linear regression

Similar presentations

Presentation on theme: "Theme 6. Linear regression"— Presentation transcript:

Similar presentations

About project

Feedback