Presentation on theme: "1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square."— Presentation transcript:
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square
2 The Model The first order linear model y = dependent variable x = independent variable 0 = y-intercept 1 = slope of the line = error variable x y 00 Run Rise = Rise/Run 0 and 1 are unknown, therefore, are estimated from the data.
3 Estimating the Coefficients The estimates are determined by –drawing a sample from the population of interest, –calculating sample statistics. –producing a straight line that cuts into the data. The question is: Which straight line fits best? x y
4 3 3 The best line is the one that minimizes the sum of squared vertical differences between the points and the line. (1,2) 2 2 (2,4) (3,1.5) Sum of squared differences =(2 - 1) 2 +(4 - 2) 2 +( ) 2 + (4,3.2) ( ) 2 = The smaller the sum of squared differences the better the fit of the line to the data. X Y
5 To calculate the estimates of the coefficients that minimize the differences between the data points and the line, use the formulas: The regression equation that estimates the equation of the first order linear model is:
6 Example 1 Relationship between odometer reading and a used car’s selling price. –A car dealer wants to find the relationship between the odometer reading and the selling price of used cars. –A random sample of 100 cars is selected, and the data recorded. –Find the regression line. Independent variable x Dependent variable y
7 Solution –Solving by hand To calculate b 0 and b 1 we need to calculate several statistics first; where n = 100.
8 Assessing the Model The least squares method will produce a regression line whether or not there is a linear relationship between x and y. –Are the coefficients different from zero? (T-stats) –How closely does the line fit the data? (R-square)
9 –This is the sum of differences between the points and the regression line. –It can serve as a measure of how well the line fits the data. –This statistic plays a role in every statistical technique we employ to assess the model. Sum of squares for errors
10 Testing the slope –When no linear relationship exists between two variables, the regression line should be horizontal. Linear relationship. Different inputs (x) yield different outputs (y). No linear relationship. Different inputs (x) yield the same output (y). The slope is not equal to zeroThe slope is equal to zero
11 We can draw inference about 1 from b 1 by testing H 0 : 1 = 0 H 1 : 1 = 0 (or 0) –The test statistic is –If the error variable is normally distributed, the statistic is Student t distribution with d.f. = n-2. The standard error of b 1. where
12 Solution –Solving by hand –To compute “t” we need the values of b 1 and s b1. –Using the computer There is overwhelming evidence to infer that the odometer reading affects the auction selling price.
13 Coefficient of determination –When we want to measure the strength of the linear relationship, we use the coefficient of determination.
14 –To understand the significance of this coefficient note: Overall variability in y The regression model Remains, in part, unexplained The error Explained in part by
15 x1x1 x2x2 y1y1 y2y2 y Two data points (x 1,y 1 ) and (x 2,y 2 ) of a certain sample are shown. Total variation in y = Variation explained by the regression line) + Unexplained variation (error)
16 R 2 measures the proportion of the variation in y that is explained by the variation in x. Variation in y = SSR + SSE R 2 takes on any value between zero and one. R 2 = 1: Perfect match between the line and the data points. R 2 = 0: There are no linear relationship between x and y.
17 Example 2 –Find the coefficient of determination for example 17.1; what does this statistic tell you about the model? Solution –Solving by hand; –Using the computer From the regression output we have 65% of the variation in the auction selling price is explained by the variation in odometer reading. The rest (35%) remains unexplained by this model.