Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Published byModified over 5 years ago
Presentation on theme: "Lecture 19 Simple linear regression (Review, 18.5, 18.8)"— Presentation transcript:
1 Lecture 19 Simple linear regression (Review, 18.5, 18.8) Homework 5 is posted and due next Tuesday by 3 p.m.Extra office hour on Thursday after class.
2 Review of Regression Analysis Goal: Estimate E(Y|X) – the regression functionUses:E(Y|X) is a good prediction of Y based on XE(Y|X) describes the relationship between Y and XSimple linear regression model: E(Y|X) is a straight line (the regression line)
3 The Simple Linear Regression Line Example 18.2 (Xm18-02)A car dealer wants to find the relationship between the odometer reading and the selling price of used cars.A random sample of 100 cars is selected, and the data recorded.Find the regression line.Independent variable xDependent variable y
4 Simple Linear Regression Model The data are assumed to be a realization ofare the unknown parameters of the model. Objective of regression is to estimate them., the slope, is the amount that Y changes on average for each one unit increase in X., the standard error of estimate, is the standard deviation of the amount by which Y differs from E(Y|X), i.e., standard deviation of the errors
5 Estimation of Regression Line We estimate the regression line by the least squares line , the line that minimizes the sum of squared prediction errors for the data.
6 Fitted Values and Residuals The least squares line decomposes the data into two parts whereare called the fitted or predicted values.are called the residuals.The residuals are estimates of the errors
7 EstimatingThe standard error of estimate (root mean squared error) is an estimate ofThe standard error of estimate is basically the standard deviation of the residuals.measures how useful the simple linear regression model is for predictionIf the simple regression model holds, then approximately68% of the data will lie within one of the LS line.95% of the data will lie within two of the LS line.
8 18.4 Error Variable: Required Conditions The error e is a critical part of the regression model.Four requirements involving the distribution of e must be satisfied.The probability distribution of e is normal.The mean of e is zero for each x: E(e|x) = 0 for each x.The standard deviation of e is se for all values of x.The set of errors associated with different values of y are all independent.
9 but the mean value changes with x The Normality of eE(y|x3)The standard deviation remains constant,m3b0 + b1x3E(y|x2)b0 + b1x2m2E(y|x1)but the mean value changes with xm1b0 + b1x1From the first three assumptions we have:y is normally distributed with meanE(y) = b0 + b1x, and a constant standard deviation se given x.x1x2x3
10 Coefficient of determination To measure the strength of the linear relationship we use the coefficient of determination R2 .
11 Coefficient of determination To understand the significance of this coefficient note:The regression modelExplained in part byOverall variability in yRemains, in part, unexplainedThe error
12 Coefficient of determination y2Two data points (x1,y1) and (x2,y2)of a certain sample are shown.yVariation in y = SSR + SSEy1x1x2Total variation in y =Variation explained by theregression line+ Unexplained variation (error)
13 Coefficient of determination R2 measures the proportion of the variation in y that is explained by the variation in x.R2 takes on any value between zero and one.R2 = 1: Perfect match between the line and the data points.R2 = 0: There is no linear relationship between x & y
14 Coefficient of determination, Example Find the coefficient of determination for Example 18.2; what does this statistic tell you about the model?SolutionSolving by hand;
16 SEs of Parameter Estimates From the JMP output,Imagine yourself taking repeated samples of the prices of cars with the odometer readings from the “population.”For each sample, you could estimate the regression line by least squares. Each time, the least squares line would be a little different.The standard errors estimate how much the least squares estimates of the slope and intercept would vary over these repeated samples.
17 Confidence IntervalsIf simple linear regression model holds, estimated slope follows a t-distribution.A 95% confidence interval for the slope is given byA 95% confidence interval for the interceptis given by
18 The slope is not equal to zero Testing the slopeWhen no linear relationship exists between two variables, the regression line should be horizontal.qqqqqqqqqqqqLinear relationship.Linear relationship.Linear relationship.Linear relationship.No linear relationship.Different inputs (x) yieldthe same output (y).Different inputs (x) yielddifferent outputs (y).The slope is not equal to zeroThe slope is equal to zero
19 Testing the Slope We can draw inference about b1 from b1 by testing H0: b1 = 0H1: b1 = 0 (or < 0,or > 0)The test statistic isIf the error variable is normally distributed, the statistic is Student t distribution with d.f. = n-2.whereThe standard error of b1.
20 Testing the Slope, Example Test to determine whether there is enough evidence to infer that there is a linear relationship between the car auction price and the odometer reading for all three-year-old Tauruses, in Example Use a = 5%.
21 Testing the Slope, Example Solving by handTo compute “t” we need the values of b1 and sb1.The rejection region is t > t.025 or t < -t.025 with n = n-2 = 98. Approximately, t.025 = 1.984
22 Testing the Slope, Example Xm18-02Using the computerThere is overwhelming evidence to inferthat the odometer reading affects theauction selling price.
23 Cause-and-effect Relationship A test of whether the slope is zero is a test of whether there is a linear relationship between x and y in the observed data, i.e., is a change in x associated with a change in y.This does not test whether a change in x causes a change in y. Such a relationship can only be established based on a carefully controlled experiment or extensive subject matter knowledge about the relationship.
24 Example of PitfallA researcher measures the number of television sets per person X and the average life expectancy Y for the world’s nations. The regression line has a positive slope – nations with many TV sets have higher life expectancies. Could we lengthen the lives of people in Rwanda by shipping them TV sets?
25 18.7 Using the Regression Equation Before using the regression model, we need to assess how well it fits the data.If we are satisfied with how well the model fits the data, we can use it to predict the values of y.To make a prediction we usePoint prediction, andInterval prediction
26 Point Prediction Example 18.7 Predict the selling price of a three-year-old Taurus with 40,000 miles on the odometer (Example 18.2).A point predictionIt is predicted that a 40,000 miles car would sell for $14,575.How close is this prediction to the real price?
27 Interval EstimatesTwo intervals can be used to discover how closely the predicted value will match the true value of y.Prediction interval – predicts y for a given value of x,Confidence interval – estimates the average y for a given x.The prediction intervalThe confidence interval
28 Interval Estimates, Example Example continuedProvide an interval estimate for the bidding price on a Ford Taurus with 40,000 miles on the odometer.Two types of predictions are required:A prediction for a specific carAn estimate for the average price per car
29 Interval Estimates, Example SolutionA prediction interval provides the price estimate for a single car:t.025,98Approximately
30 Interval Estimates, Example Solution – continuedA confidence interval provides the estimate of the mean price per car for a Ford Taurus with 40,000 miles reading on the odometer.The confidence interval (95%) =
31 The effect of the given xg on the length of the interval As xg moves away from x the interval becomes longer. That is, the shortest interval is found at
32 The effect of the given xg on the length of the interval As xg moves away from the interval becomes longer. That is, the shortest interval is found at
33 The effect of the given xg on the length of the interval As xg moves away from the interval becomes longer. That is, the shortest interval is found at .