Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 5: Simple Linear Regression

Similar presentations


Presentation on theme: "Lecture 5: Simple Linear Regression"— Presentation transcript:

1 Lecture 5: Simple Linear Regression
Laura McAvinue School of Psychology Trinity College Dublin

2 Previous Lecture Regression Line
Offers a model of the relationship between two variables A straight line that represents the ‘best fit’ Enables us to predict Variable Y on the basis of Variable X

3 Today Calculation of the regression line
Measuring the accuracy of prediction Some practice!

4 How is the regression line calculated?
The Method of Least Squares Computes a line that minimises the difference between the predicted values of Y (Y’) and the actual values of Y (Y) Minimises… (Y – Y’)s Errors of prediction Residuals

5 Y X 7 6 5 4 3 2 1 1 2 3 4 5 These lines = Errors of prediction
(Y - Y’)s Residuals 2 1 1 2 3 4 5 X

6 7 Y = 6 6 5 Y’ = 5 Y 4 3 2 1 1 2 3 4 5 X

7 Method of Least Squares
When fitting a line to the data, the regression procedure attempts to fit a line that minimises these errors of prediction, total (Y – Y’)s But! You can’t try to minimise (Y-Y’) as (Y-Y’)s will have positive and negative values, which will cancel each other out So, you square the residuals and then add them and try to minimise (Y-Y’)2 Hence, the name, ‘Method of Least Squares’

8 How do we measure the accuracy of prediction?
The regression line is fitted in such a way that the errors of prediction are kept as small as possible You can fit a regression line to any dataset, doesn’t mean it’s a good fit! How do we measure how good this fit is? How to we measure the accuracy of the prediction that our regression equation makes? Three methods Standard Error of the Estimate r2 Statistical Significance

9 Standard Error of the Estimate
A measure of the size of the errors of prediction We’ve seen that… The regression line is computed in such a way as to minimise the difference between the predicted values (Y’) and the actual values (Y) The difference between these values are known as errors of prediction or residuals, (Y – Y’)s For any set of data, the errors of prediction will vary Some data points will be close to the line, so (Y – Y’) will be small Some data points will be far from the line, so (Y – Y’) will be big

10 Standard Error of the Estimate
One way to assess the fit of the regression line is to take the standard deviation of all of these errors On average, how much do the data points vary from the regression line? Standard error of the estimate

11 Standard Error of the Estimate
One point to note… Standard error is a measure of the standard deviation of data points around the regression line (Standard error)2 expresses the variance of the data points around the regression line Residual or error variance

12 r2 Interested in the relationship between two variables
Variable X A set of scores that vary around a mean, Variable Y If these two variables are correlated, they will share some variance

13 X Y Variance in X that is not related to Y
Variance in Y that is not related to X Shared variance between X and Y

14 In regression, we are trying to explain Variable Y as a function of Variable X
Would be useful if we could find out what percentage of variance in Variable Y can be explained by variance in Variable X

15 Total Variance in Variable Y
SStotal Variance due to Variable X Variance due to other factors Regression / Model Variance Error Variance SSm SSerror SStotal - SSerror

16 r2 To calculate the percentage of variance in Variable Y that can be explained by variance in Variable X SSm Variance due to X / regression SStotal Total variance in Y = r2 =

17 r2 (Pearson Correlation)2 For example
Shared variance between two variables Used in simple linear regression to show what percentage of Variable Y can be explained by Variable X For example If rxy = .8, r2xy = .64, then 64% of the variability in Y is directly predictable from variable X If rxy = .2, r2xy = .04, then 4% of the variability in Y is due to / can be explained by X

18 Statistical Significance
Does the regression model predict Variable Y better than chance? Simple linear regression Does X significantly predict Y? If the correlation between X & Y is statistically significant, the regression model will be statistically significant Not so for multiple regression, next lecture F Ratio

19 Statistical Significance
F-Ratio Average variance due to the regression Average variance due to error MSm = SSm / dfm MSerror SSerror / dferror It uses the mean square rather than the sum of squares in order to compare the average variance You want the F-Ratio to be large and statistically significant If large, then more variance is explained by the regression than by the error in the model

20 An example ‘Linear regression’ data-set
I want to predict a person’s verbal coherency based on the number of units of alcohol they consume Record how much alcohol is consumed and administer a test of verbal coherency SPSS Analyse, Regression, Linear Dependent variable: Verbal Coherency Independent variable: Alcohol Method: Enter

21 Three parts to the output
Model Summary r2 Standard error Anova F Ratio Coefficients Regression Equation

22 Table: how well our regression model explains the
variation in verbal coherency Pearson r between alcohol and verbal coherency Statistical estimate of the error in the regression model Statistical estimate of the population proportion of variation in verbal coherency that is related to alcohol Proportion of variation in verbal coherency that is related to alcohol

23 Total variation in data due to regression model Ratio of variation
Average variation in data due to regression model Total variation in data due to regression model Ratio of variation in data due to regression model & variation not due to model Probability of observing this F-ratio if Ho is true Average variation in data NOT due to regression model Total variation in data NOT due to regression model

24 T-statistic = tells us whether using the predictor variable gives us a
better than chance prediction of the DV Alcohol is a sig. predictor of verbal coherency Values that we use in the regression equation (Y = BX +a) Verbal Coherency = B (alcohol) + constant Verbal coherency = 4.7 (alcohol) As alcohol 1 unit, verbal coherency  by 4.7 units

25 Second Example Can we predict how many months a person survives after being diagnosed with cancer, based on their level of optimism? Linear Regression dataset Analyse, regression, linear Dependent variable: Survival Independent variable: Optimism

26 Aspects of Regression analysis
Write the regression equation Explain what this equation tells us about the relationship between Variables X and Y Make a prediction of Y when given a value of X State the standard error of your prediction Ascertain if the regression model significantly predicts the dependent variable Y State what percentage of Variable Y is explained by Variable X

27 State the following… Describe the relationship between survival (Y) and optimism (X) in terms of a regression equation. In your own words, explain what this equation tells us about the relationship between survival and optimism. Using this equation, predict how many months a person will survive for if their optimism score is 10.

28 State the following… What is the standard error of your prediction?
Does the regression model significantly predict the dependent variable? What percentage of variance in survival is explained by optimism level?

29 Answers Describe the relationship between survival (Y) and optimism (X) in terms of a regression equation. Y’ = .69X In your own words, explain what this equation tells us about the relationship between survival and optimism. As optimism level increases by one unit, survival increases by .69months When a person’s optimism score is 0, his/her predicted length of survival is 18.4 months Using this equation, predict how many months a person will survive for if their optimism score is 10. Y’ = .69(10) = 25.3 months

30 State the following… What is the standard error of your prediction?
4.5months Does the regression model significantly predict the dependent variable? Yes, F (1, 432) = 202, p < .001 What percentage of variance in survival is explained by optimism level? 32%

31 Summary Simple linear regression
Provides a model of the relationship between two variables Creates a straight line that best represents the relationship between two variables Enables us to estimate the percentage of variance in one variable that can be explained by another Enables us to predict one variable on the basis of another Remember that a regression line can be fitted to any dataset. It’s necessary to assess the accuracy of the fit.


Download ppt "Lecture 5: Simple Linear Regression"

Similar presentations


Ads by Google