Presentation is loading. Please wait.

Presentation is loading. Please wait.

Business Statistics - QBM117 Least squares regression.

Similar presentations


Presentation on theme: "Business Statistics - QBM117 Least squares regression."— Presentation transcript:

1 Business Statistics - QBM117 Least squares regression

2 Objectives w To explain the least squares method of finding the line of best fit. w To understand the relationship between predicted values and residuals. w To introduce measures which can be used to assess how well the line fits the data and also how good the predictions are.

3 Regression:prediction of one variable from another w Linear regression analysis can be used to predict one variable from the other, using an estimated straight line that summarises the relationship between the two variables. w The variable being predicted is the y variable (dependent variable), and the variable that helps with the prediction is the x variable (independent variable).

4 w Just as we use the average to summarise a single variable, we can use a straight line to summarise a linear predictive relationship between two variables. w Just as there is variability in the data about the average for univariate data, there is also variability about the straight line which summarises the bivariate data. w Just like the average, the straight line is a useful, but imperfect summary of the data, due to this variability in the data.

5 Straight line equations w A straight line can be exactly described by two numbers: the slope, and the y intercept. w The slope is a measure of how steeply the line rises or falls and the y intercept is simply the value for y (on the y axis) where x = 0. w In situations where it is not sensible for x to be zero, the y intercept should not be interpreted directly. Therefore the general equation of a straight line is given by

6 Finding a line which best summarises the data w We find the line which has the smallest prediction error overall. w The most usual way of doing this is using the least squares method. How do we find the line which best summarises a set of bivariate data? How do we find the line which will best predict y from x?

7 The least squares method w This method finds that line which has the smallest sum of squared vertical prediction errors compared to all other lines that could possibly be drawn. w This line will then provide the best predictions for y from x. w This least squares line can easily be found using Excel or any other statistical package.

8 Example: Salary and Experience w Salary vs. Years Experience For n = 6 employees Linear (straight line) relationship Increasing relationship higher salary generally goes with higher experience Correlation r = 0.8667 Experience 15 10 20 5 15 5 Salary 30 35 55 22 40 27 20 30 40 50 60 01020 Experience Salary ($thousand) Mary earns $55,000 per year, and has 20 years of experience

9

10 The Sample Least-Squares Line w Summarizes bivariate data: Predicts y from x with smallest errors (in vertical direction, for y axis) Intercept is 15.32 salary (at 0 years of experience) Slope is 1.673 salary (ie for each additional year of experience, the salary will increase by $1673 on average.) 10 20 30 40 50 60 01020 Experience (x) Salary ($000s) (y) Salary = 15.32 + 1.673 Experience

11 w The predicted value for y given a value for x will be represented by the height of the line at x. This can be found by substituting the value of x into the equation of the least- squares line. w Each data point has a residual, which is a measure of how far the actual data point is above (or below, if negative) the fitted (least-squares) line. w Residual = actual y – predicted y Predicted values and residuals

12 w Predicted value comes from Least-Squares Line For example, Mary (with 20 years of experience) has a predicted salary 15.32+1.673(20) = 48.8 So does anyone with 20 years of experience w Residual is actual y minus predicted y Mary’s residual is 55 – 48.8 = 6.2 She earns about $6,200 more than the predicted salary for a person with 20 years of experience A person who earns less than predicted will have a negative residual

13 Mary earns 55 thousand Mary’s predicted value is 48.8 10 20 30 40 50 60 01020 Experience Salary Mary’s residual is 6.2

14 How useful is the line for prediction? The least squares line is a useful summary of the main trend of the data but it does not describe the data perfectly. So, how useful is the line (for making predictions)? This depends on two measures: 1. The standard error of estimate, an absolute measure of how large the prediction errors are and; 2. the coefficient of determination, a relative measure of how much of the variability has been explained.

15 w provides an approximation of how large the prediction errors (residuals) are for the data; w is measured in the same units as y; w When the standard error is small, we would expect the predicted values to be reasonably accurate. w When the standard error is large, we would expect the predicted values to be less reliable. w The standard error can be read directly from the Excel output. w The standard error for our example was 6.52 ($6520) The standard error of estimate

16 w tells us how much of the variability in y is explained by the variability in x. w can be found by squaring the correlation or can be read directly from the Excel output; The coefficient of determination For our example, R 2 = 0.751 Therefore, experience explains 75.1% of the variation in salaries. The remaining 24.9% of the salary variation is due to other factors. Generally larger values of R 2 are considered better, as they indicate a stronger relationship between x and y and a better fit of the line to the data.

17 Reading for next lecture Read Chapter 18 Sections 18.4 and 18.8 (Chapter 11 Sections 11.4 and 11.8 abridged) Exercises to be completed before next lecture S&S 18.11 18.13 18.15 (11.11 11.13 11.15 abridged)


Download ppt "Business Statistics - QBM117 Least squares regression."

Similar presentations


Ads by Google