Download presentation

Presentation is loading. Please wait.

1
**Lecture 19 Simple linear regression (Review, 18.5, 18.8)**

Homework 5 is posted and due next Tuesday by 3 p.m. Extra office hour on Thursday after class.

2
**Review of Regression Analysis**

Goal: Estimate E(Y|X) – the regression function Uses: E(Y|X) is a good prediction of Y based on X E(Y|X) describes the relationship between Y and X Simple linear regression model: E(Y|X) is a straight line (the regression line)

3
**The Simple Linear Regression Line**

Example 18.2 (Xm18-02) A car dealer wants to find the relationship between the odometer reading and the selling price of used cars. A random sample of 100 cars is selected, and the data recorded. Find the regression line. Independent variable x Dependent variable y

4
**Simple Linear Regression Model**

The data are assumed to be a realization of are the unknown parameters of the model. Objective of regression is to estimate them. , the slope, is the amount that Y changes on average for each one unit increase in X. , the standard error of estimate, is the standard deviation of the amount by which Y differs from E(Y|X), i.e., standard deviation of the errors

5
**Estimation of Regression Line**

We estimate the regression line by the least squares line , the line that minimizes the sum of squared prediction errors for the data.

6
**Fitted Values and Residuals**

The least squares line decomposes the data into two parts where are called the fitted or predicted values. are called the residuals. The residuals are estimates of the errors

7
Estimating The standard error of estimate (root mean squared error) is an estimate of The standard error of estimate is basically the standard deviation of the residuals. measures how useful the simple linear regression model is for prediction If the simple regression model holds, then approximately 68% of the data will lie within one of the LS line. 95% of the data will lie within two of the LS line.

8
**18.4 Error Variable: Required Conditions**

The error e is a critical part of the regression model. Four requirements involving the distribution of e must be satisfied. The probability distribution of e is normal. The mean of e is zero for each x: E(e|x) = 0 for each x. The standard deviation of e is se for all values of x. The set of errors associated with different values of y are all independent.

9
**but the mean value changes with x**

The Normality of e E(y|x3) The standard deviation remains constant, m3 b0 + b1x3 E(y|x2) b0 + b1x2 m2 E(y|x1) but the mean value changes with x m1 b0 + b1x1 From the first three assumptions we have: y is normally distributed with mean E(y) = b0 + b1x, and a constant standard deviation se given x. x1 x2 x3

10
**Coefficient of determination**

To measure the strength of the linear relationship we use the coefficient of determination R2 .

11
**Coefficient of determination**

To understand the significance of this coefficient note: The regression model Explained in part by Overall variability in y Remains, in part, unexplained The error

12
**Coefficient of determination**

y2 Two data points (x1,y1) and (x2,y2) of a certain sample are shown. y Variation in y = SSR + SSE y1 x1 x2 Total variation in y = Variation explained by the regression line + Unexplained variation (error)

13
**Coefficient of determination**

R2 measures the proportion of the variation in y that is explained by the variation in x. R2 takes on any value between zero and one. R2 = 1: Perfect match between the line and the data points. R2 = 0: There is no linear relationship between x & y

14
**Coefficient of determination, Example**

Find the coefficient of determination for Example 18.2; what does this statistic tell you about the model? Solution Solving by hand;

15
Example 18.2 in JMP

16
**SEs of Parameter Estimates**

From the JMP output, Imagine yourself taking repeated samples of the prices of cars with the odometer readings from the “population.” For each sample, you could estimate the regression line by least squares. Each time, the least squares line would be a little different. The standard errors estimate how much the least squares estimates of the slope and intercept would vary over these repeated samples.

17
Confidence Intervals If simple linear regression model holds, estimated slope follows a t-distribution. A 95% confidence interval for the slope is given by A 95% confidence interval for the intercept is given by

18
**The slope is not equal to zero**

Testing the slope When no linear relationship exists between two variables, the regression line should be horizontal. q q q q q q q q q q q q Linear relationship. Linear relationship. Linear relationship. Linear relationship. No linear relationship. Different inputs (x) yield the same output (y). Different inputs (x) yield different outputs (y). The slope is not equal to zero The slope is equal to zero

19
**Testing the Slope We can draw inference about b1 from b1 by testing**

H0: b1 = 0 H1: b1 = 0 (or < 0,or > 0) The test statistic is If the error variable is normally distributed, the statistic is Student t distribution with d.f. = n-2. where The standard error of b1.

20
**Testing the Slope, Example**

Test to determine whether there is enough evidence to infer that there is a linear relationship between the car auction price and the odometer reading for all three-year-old Tauruses, in Example Use a = 5%.

21
**Testing the Slope, Example**

Solving by hand To compute “t” we need the values of b1 and sb1. The rejection region is t > t.025 or t < -t.025 with n = n-2 = 98. Approximately, t.025 = 1.984

22
**Testing the Slope, Example**

Xm18-02 Using the computer There is overwhelming evidence to infer that the odometer reading affects the auction selling price.

23
**Cause-and-effect Relationship**

A test of whether the slope is zero is a test of whether there is a linear relationship between x and y in the observed data, i.e., is a change in x associated with a change in y. This does not test whether a change in x causes a change in y. Such a relationship can only be established based on a carefully controlled experiment or extensive subject matter knowledge about the relationship.

24
Example of Pitfall A researcher measures the number of television sets per person X and the average life expectancy Y for the world’s nations. The regression line has a positive slope – nations with many TV sets have higher life expectancies. Could we lengthen the lives of people in Rwanda by shipping them TV sets?

25
**18.7 Using the Regression Equation**

Before using the regression model, we need to assess how well it fits the data. If we are satisfied with how well the model fits the data, we can use it to predict the values of y. To make a prediction we use Point prediction, and Interval prediction

26
**Point Prediction Example 18.7**

Predict the selling price of a three-year-old Taurus with 40,000 miles on the odometer (Example 18.2). A point prediction It is predicted that a 40,000 miles car would sell for $14,575. How close is this prediction to the real price?

27
Interval Estimates Two intervals can be used to discover how closely the predicted value will match the true value of y. Prediction interval – predicts y for a given value of x, Confidence interval – estimates the average y for a given x. The prediction interval The confidence interval

28
**Interval Estimates, Example**

Example continued Provide an interval estimate for the bidding price on a Ford Taurus with 40,000 miles on the odometer. Two types of predictions are required: A prediction for a specific car An estimate for the average price per car

29
**Interval Estimates, Example**

Solution A prediction interval provides the price estimate for a single car: t.025,98 Approximately

30
**Interval Estimates, Example**

Solution – continued A confidence interval provides the estimate of the mean price per car for a Ford Taurus with 40,000 miles reading on the odometer. The confidence interval (95%) =

31
**The effect of the given xg on the length of the interval**

As xg moves away from x the interval becomes longer. That is, the shortest interval is found at

32
**The effect of the given xg on the length of the interval**

As xg moves away from the interval becomes longer. That is, the shortest interval is found at

33
**The effect of the given xg on the length of the interval**

As xg moves away from the interval becomes longer. That is, the shortest interval is found at .

34
Practice Problems 18.84,18.86,18.88,18.90,18.94

Similar presentations

© 2021 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google