Presentation is loading. Please wait. # Lecture 19 Simple linear regression (Review, 18.5, 18.8)

## Presentation on theme: "Lecture 19 Simple linear regression (Review, 18.5, 18.8)"— Presentation transcript:

Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Homework 5 is posted and due next Tuesday by 3 p.m. Extra office hour on Thursday after class.

Review of Regression Analysis
Goal: Estimate E(Y|X) – the regression function Uses: E(Y|X) is a good prediction of Y based on X E(Y|X) describes the relationship between Y and X Simple linear regression model: E(Y|X) is a straight line (the regression line)

The Simple Linear Regression Line
Example 18.2 (Xm18-02) A car dealer wants to find the relationship between the odometer reading and the selling price of used cars. A random sample of 100 cars is selected, and the data recorded. Find the regression line. Independent variable x Dependent variable y

Simple Linear Regression Model
The data are assumed to be a realization of are the unknown parameters of the model. Objective of regression is to estimate them. , the slope, is the amount that Y changes on average for each one unit increase in X. , the standard error of estimate, is the standard deviation of the amount by which Y differs from E(Y|X), i.e., standard deviation of the errors

Estimation of Regression Line
We estimate the regression line by the least squares line , the line that minimizes the sum of squared prediction errors for the data.

Fitted Values and Residuals
The least squares line decomposes the data into two parts where are called the fitted or predicted values. are called the residuals. The residuals are estimates of the errors

Estimating The standard error of estimate (root mean squared error) is an estimate of The standard error of estimate is basically the standard deviation of the residuals. measures how useful the simple linear regression model is for prediction If the simple regression model holds, then approximately 68% of the data will lie within one of the LS line. 95% of the data will lie within two of the LS line.

18.4 Error Variable: Required Conditions
The error e is a critical part of the regression model. Four requirements involving the distribution of e must be satisfied. The probability distribution of e is normal. The mean of e is zero for each x: E(e|x) = 0 for each x. The standard deviation of e is se for all values of x. The set of errors associated with different values of y are all independent.

but the mean value changes with x
The Normality of e E(y|x3) The standard deviation remains constant, m3 b0 + b1x3 E(y|x2) b0 + b1x2 m2 E(y|x1) but the mean value changes with x m1 b0 + b1x1 From the first three assumptions we have: y is normally distributed with mean E(y) = b0 + b1x, and a constant standard deviation se given x. x1 x2 x3

Coefficient of determination
To measure the strength of the linear relationship we use the coefficient of determination R2 .

Coefficient of determination
To understand the significance of this coefficient note: The regression model Explained in part by Overall variability in y Remains, in part, unexplained The error

Coefficient of determination
y2 Two data points (x1,y1) and (x2,y2) of a certain sample are shown. y Variation in y = SSR + SSE y1 x1 x2 Total variation in y = Variation explained by the regression line + Unexplained variation (error)

Coefficient of determination
R2 measures the proportion of the variation in y that is explained by the variation in x. R2 takes on any value between zero and one. R2 = 1: Perfect match between the line and the data points. R2 = 0: There is no linear relationship between x & y

Coefficient of determination, Example
Find the coefficient of determination for Example 18.2; what does this statistic tell you about the model? Solution Solving by hand;

Example 18.2 in JMP

SEs of Parameter Estimates
From the JMP output, Imagine yourself taking repeated samples of the prices of cars with the odometer readings from the “population.” For each sample, you could estimate the regression line by least squares. Each time, the least squares line would be a little different. The standard errors estimate how much the least squares estimates of the slope and intercept would vary over these repeated samples.

Confidence Intervals If simple linear regression model holds, estimated slope follows a t-distribution. A 95% confidence interval for the slope is given by A 95% confidence interval for the intercept is given by

The slope is not equal to zero
Testing the slope When no linear relationship exists between two variables, the regression line should be horizontal. q q q q q q q q q q q q Linear relationship. Linear relationship. Linear relationship. Linear relationship. No linear relationship. Different inputs (x) yield the same output (y). Different inputs (x) yield different outputs (y). The slope is not equal to zero The slope is equal to zero

Testing the Slope We can draw inference about b1 from b1 by testing
H0: b1 = 0 H1: b1 = 0 (or < 0,or > 0) The test statistic is If the error variable is normally distributed, the statistic is Student t distribution with d.f. = n-2. where The standard error of b1.

Testing the Slope, Example
Test to determine whether there is enough evidence to infer that there is a linear relationship between the car auction price and the odometer reading for all three-year-old Tauruses, in Example Use a = 5%.

Testing the Slope, Example
Solving by hand To compute “t” we need the values of b1 and sb1. The rejection region is t > t.025 or t < -t.025 with n = n-2 = 98. Approximately, t.025 = 1.984

Testing the Slope, Example
Xm18-02 Using the computer There is overwhelming evidence to infer that the odometer reading affects the auction selling price.

Cause-and-effect Relationship
A test of whether the slope is zero is a test of whether there is a linear relationship between x and y in the observed data, i.e., is a change in x associated with a change in y. This does not test whether a change in x causes a change in y. Such a relationship can only be established based on a carefully controlled experiment or extensive subject matter knowledge about the relationship.

Example of Pitfall A researcher measures the number of television sets per person X and the average life expectancy Y for the world’s nations. The regression line has a positive slope – nations with many TV sets have higher life expectancies. Could we lengthen the lives of people in Rwanda by shipping them TV sets?

18.7 Using the Regression Equation
Before using the regression model, we need to assess how well it fits the data. If we are satisfied with how well the model fits the data, we can use it to predict the values of y. To make a prediction we use Point prediction, and Interval prediction

Point Prediction Example 18.7
Predict the selling price of a three-year-old Taurus with 40,000 miles on the odometer (Example 18.2). A point prediction It is predicted that a 40,000 miles car would sell for \$14,575. How close is this prediction to the real price?

Interval Estimates Two intervals can be used to discover how closely the predicted value will match the true value of y. Prediction interval – predicts y for a given value of x, Confidence interval – estimates the average y for a given x. The prediction interval The confidence interval

Interval Estimates, Example
Example continued Provide an interval estimate for the bidding price on a Ford Taurus with 40,000 miles on the odometer. Two types of predictions are required: A prediction for a specific car An estimate for the average price per car

Interval Estimates, Example
Solution A prediction interval provides the price estimate for a single car: t.025,98 Approximately

Interval Estimates, Example
Solution – continued A confidence interval provides the estimate of the mean price per car for a Ford Taurus with 40,000 miles reading on the odometer. The confidence interval (95%) =

The effect of the given xg on the length of the interval
As xg moves away from x the interval becomes longer. That is, the shortest interval is found at

The effect of the given xg on the length of the interval
As xg moves away from the interval becomes longer. That is, the shortest interval is found at

The effect of the given xg on the length of the interval
As xg moves away from the interval becomes longer. That is, the shortest interval is found at .

Practice Problems 18.84,18.86,18.88,18.90,18.94

Download ppt "Lecture 19 Simple linear regression (Review, 18.5, 18.8)"

Similar presentations

Ads by Google