Business Statistics - QBM117 Least squares regression.

Slides:



Advertisements
Similar presentations
Regresi Linear Sederhana Pertemuan 01 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Advertisements

Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Correlation and Regression
Overview Correlation Regression -Definition
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
LINEAR REGRESSION: What it Is and How it Works Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
CHAPTER 3 Describing Relationships
Chapter 8: Bivariate Regression and Correlation
Linear Regression.
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
Least-Squares Regression Section 3.3. Why Create a Model? There are two reasons to create a mathematical model for a set of bivariate data. To predict.
AP STATISTICS LESSON 3 – 3 LEAST – SQUARES REGRESSION.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Regression Regression relationship = trend + scatter
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 3 Describing Relationships 3.2 Least-Squares.
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and Regression and Time Series CHAPTER 11 Correlation and Regression: Measuring and Predicting Relationships.
Correlation The apparent relation between two variables.
Essential Statistics Chapter 51 Least Squares Regression Line u Regression line equation: y = a + bx ^ –x is the value of the explanatory variable –“y-hat”
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
LEAST-SQUARES REGRESSION 3.2 Least Squares Regression Line and Residuals.
CHAPTER 3 Describing Relationships
© 2001 Prentice-Hall, Inc.Chap 13-1 BA 201 Lecture 18 Introduction to Simple Linear Regression (Data)Data.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Chapters 8 Linear Regression. Correlation and Regression Correlation = linear relationship between two variables. Summarize relationship with line. Called.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Describing Bivariate Relationships. Bivariate Relationships When exploring/describing a bivariate (x,y) relationship: Determine the Explanatory and Response.
Statistics 101 Chapter 3 Section 3.
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
1) A residual: a) is the amount of variation explained by the LSRL of y on x b) is how much an observed y-value differs from a predicted y-value c) predicts.
Simple Linear Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Least-Squares Regression
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Algebra Review The equation of a straight line y = mx + b
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
9/27/ A Least-Squares Regression.
Chapter 3: Describing Relationships
Chapter 3: Describing Relationships
CHAPTER 3 Describing Relationships
Presentation transcript:

Business Statistics - QBM117 Least squares regression

Objectives w To explain the least squares method of finding the line of best fit. w To understand the relationship between predicted values and residuals. w To introduce measures which can be used to assess how well the line fits the data and also how good the predictions are.

Regression:prediction of one variable from another w Linear regression analysis can be used to predict one variable from the other, using an estimated straight line that summarises the relationship between the two variables. w The variable being predicted is the y variable (dependent variable), and the variable that helps with the prediction is the x variable (independent variable).

w Just as we use the average to summarise a single variable, we can use a straight line to summarise a linear predictive relationship between two variables. w Just as there is variability in the data about the average for univariate data, there is also variability about the straight line which summarises the bivariate data. w Just like the average, the straight line is a useful, but imperfect summary of the data, due to this variability in the data.

Straight line equations w A straight line can be exactly described by two numbers: the slope, and the y intercept. w The slope is a measure of how steeply the line rises or falls and the y intercept is simply the value for y (on the y axis) where x = 0. w In situations where it is not sensible for x to be zero, the y intercept should not be interpreted directly. Therefore the general equation of a straight line is given by

Finding a line which best summarises the data w We find the line which has the smallest prediction error overall. w The most usual way of doing this is using the least squares method. How do we find the line which best summarises a set of bivariate data? How do we find the line which will best predict y from x?

The least squares method w This method finds that line which has the smallest sum of squared vertical prediction errors compared to all other lines that could possibly be drawn. w This line will then provide the best predictions for y from x. w This least squares line can easily be found using Excel or any other statistical package.

Example: Salary and Experience w Salary vs. Years Experience For n = 6 employees Linear (straight line) relationship Increasing relationship higher salary generally goes with higher experience Correlation r = Experience Salary Experience Salary ($thousand) Mary earns $55,000 per year, and has 20 years of experience

The Sample Least-Squares Line w Summarizes bivariate data: Predicts y from x with smallest errors (in vertical direction, for y axis) Intercept is salary (at 0 years of experience) Slope is salary (ie for each additional year of experience, the salary will increase by $1673 on average.) Experience (x) Salary ($000s) (y) Salary = Experience

w The predicted value for y given a value for x will be represented by the height of the line at x. This can be found by substituting the value of x into the equation of the least- squares line. w Each data point has a residual, which is a measure of how far the actual data point is above (or below, if negative) the fitted (least-squares) line. w Residual = actual y – predicted y Predicted values and residuals

w Predicted value comes from Least-Squares Line For example, Mary (with 20 years of experience) has a predicted salary (20) = 48.8 So does anyone with 20 years of experience w Residual is actual y minus predicted y Mary’s residual is 55 – 48.8 = 6.2 She earns about $6,200 more than the predicted salary for a person with 20 years of experience A person who earns less than predicted will have a negative residual

Mary earns 55 thousand Mary’s predicted value is Experience Salary Mary’s residual is 6.2

How useful is the line for prediction? The least squares line is a useful summary of the main trend of the data but it does not describe the data perfectly. So, how useful is the line (for making predictions)? This depends on two measures: 1. The standard error of estimate, an absolute measure of how large the prediction errors are and; 2. the coefficient of determination, a relative measure of how much of the variability has been explained.

w provides an approximation of how large the prediction errors (residuals) are for the data; w is measured in the same units as y; w When the standard error is small, we would expect the predicted values to be reasonably accurate. w When the standard error is large, we would expect the predicted values to be less reliable. w The standard error can be read directly from the Excel output. w The standard error for our example was 6.52 ($6520) The standard error of estimate

w tells us how much of the variability in y is explained by the variability in x. w can be found by squaring the correlation or can be read directly from the Excel output; The coefficient of determination For our example, R 2 = Therefore, experience explains 75.1% of the variation in salaries. The remaining 24.9% of the salary variation is due to other factors. Generally larger values of R 2 are considered better, as they indicate a stronger relationship between x and y and a better fit of the line to the data.

Reading for next lecture Read Chapter 18 Sections 18.4 and 18.8 (Chapter 11 Sections 11.4 and 11.8 abridged) Exercises to be completed before next lecture S&S ( abridged)