Lecture 19 Simple linear regression (Review, 18.5, 18.8)

Slides:



Advertisements
Similar presentations
Simple Linear Regression 1. review of least squares procedure 2
Advertisements

Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Chapter 12 Simple Linear Regression
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Inference for Regression
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Economics 173 Business Statistics Lecture 14 Fall, 2001 Professor J. Petry
Simple Linear Regression
Chapter 12 Simple Linear Regression
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Simple Linear Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Lecture 20 Simple linear regression (18.6, 18.9)
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
1 Simple Linear Regression and Correlation Chapter 17.
SIMPLE LINEAR REGRESSION
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Keller: Stats for Mgmt & Econ, 7th Ed
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Economics 173 Business Statistics Lectures Summer, 2001 Professor J. Petry.
Lecture 10: Correlation and Regression Model.
Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
1 Simple Linear Regression Review 1. review of scatterplots and correlation 2. review of least squares procedure 3. inference for least squares lines.
Introduction. We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Warm-Up The least squares slope b1 is an estimate of the true slope of the line that relates global average temperature to CO2. Since b1 = is very.
The simple linear regression model and parameter estimation
Lecture 11: Simple Linear Regression
Inference for Least Squares Lines
Linear Regression.
Linear Regression and Correlation Analysis
Chapter 11: Simple Linear Regression
Simple Linear Regression Review 1
Simple Linear Regression
Presentation transcript:

Lecture 19 Simple linear regression (Review, 18.5, 18.8) Homework 5 is posted and due next Tuesday by 3 p.m. Extra office hour on Thursday after class.

Review of Regression Analysis Goal: Estimate E(Y|X) – the regression function Uses: E(Y|X) is a good prediction of Y based on X E(Y|X) describes the relationship between Y and X Simple linear regression model: E(Y|X) is a straight line (the regression line)

The Simple Linear Regression Line Example 18.2 (Xm18-02) A car dealer wants to find the relationship between the odometer reading and the selling price of used cars. A random sample of 100 cars is selected, and the data recorded. Find the regression line. Independent variable x Dependent variable y

Simple Linear Regression Model The data are assumed to be a realization of are the unknown parameters of the model. Objective of regression is to estimate them. , the slope, is the amount that Y changes on average for each one unit increase in X. , the standard error of estimate, is the standard deviation of the amount by which Y differs from E(Y|X), i.e., standard deviation of the errors

Estimation of Regression Line We estimate the regression line by the least squares line , the line that minimizes the sum of squared prediction errors for the data.

Fitted Values and Residuals The least squares line decomposes the data into two parts where are called the fitted or predicted values. are called the residuals. The residuals are estimates of the errors

Estimating The standard error of estimate (root mean squared error) is an estimate of The standard error of estimate is basically the standard deviation of the residuals. measures how useful the simple linear regression model is for prediction If the simple regression model holds, then approximately 68% of the data will lie within one of the LS line. 95% of the data will lie within two of the LS line.

18.4 Error Variable: Required Conditions The error e is a critical part of the regression model. Four requirements involving the distribution of e must be satisfied. The probability distribution of e is normal. The mean of e is zero for each x: E(e|x) = 0 for each x. The standard deviation of e is se for all values of x. The set of errors associated with different values of y are all independent.

but the mean value changes with x The Normality of e E(y|x3) The standard deviation remains constant, m3 b0 + b1x3 E(y|x2) b0 + b1x2 m2 E(y|x1) but the mean value changes with x m1 b0 + b1x1 From the first three assumptions we have: y is normally distributed with mean E(y) = b0 + b1x, and a constant standard deviation se given x. x1 x2 x3

Coefficient of determination To measure the strength of the linear relationship we use the coefficient of determination R2 .

Coefficient of determination To understand the significance of this coefficient note: The regression model Explained in part by Overall variability in y Remains, in part, unexplained The error

Coefficient of determination y2 Two data points (x1,y1) and (x2,y2) of a certain sample are shown. y Variation in y = SSR + SSE y1 x1 x2 Total variation in y = Variation explained by the regression line + Unexplained variation (error)

Coefficient of determination R2 measures the proportion of the variation in y that is explained by the variation in x. R2 takes on any value between zero and one. R2 = 1: Perfect match between the line and the data points. R2 = 0: There is no linear relationship between x & y

Coefficient of determination, Example Find the coefficient of determination for Example 18.2; what does this statistic tell you about the model? Solution Solving by hand;

Example 18.2 in JMP

SEs of Parameter Estimates From the JMP output, Imagine yourself taking repeated samples of the prices of cars with the odometer readings from the “population.” For each sample, you could estimate the regression line by least squares. Each time, the least squares line would be a little different. The standard errors estimate how much the least squares estimates of the slope and intercept would vary over these repeated samples.

Confidence Intervals If simple linear regression model holds, estimated slope follows a t-distribution. A 95% confidence interval for the slope is given by A 95% confidence interval for the intercept is given by

The slope is not equal to zero Testing the slope When no linear relationship exists between two variables, the regression line should be horizontal. q q q q q q q q q q q q Linear relationship. Linear relationship. Linear relationship. Linear relationship. No linear relationship. Different inputs (x) yield the same output (y). Different inputs (x) yield different outputs (y). The slope is not equal to zero The slope is equal to zero

Testing the Slope We can draw inference about b1 from b1 by testing H0: b1 = 0 H1: b1 = 0 (or < 0,or > 0) The test statistic is If the error variable is normally distributed, the statistic is Student t distribution with d.f. = n-2. where The standard error of b1.

Testing the Slope, Example Test to determine whether there is enough evidence to infer that there is a linear relationship between the car auction price and the odometer reading for all three-year-old Tauruses, in Example 18.2. Use a = 5%.

Testing the Slope, Example Solving by hand To compute “t” we need the values of b1 and sb1. The rejection region is t > t.025 or t < -t.025 with n = n-2 = 98. Approximately, t.025 = 1.984

Testing the Slope, Example Xm18-02 Using the computer There is overwhelming evidence to infer that the odometer reading affects the auction selling price.

Cause-and-effect Relationship A test of whether the slope is zero is a test of whether there is a linear relationship between x and y in the observed data, i.e., is a change in x associated with a change in y. This does not test whether a change in x causes a change in y. Such a relationship can only be established based on a carefully controlled experiment or extensive subject matter knowledge about the relationship.

Example of Pitfall A researcher measures the number of television sets per person X and the average life expectancy Y for the world’s nations. The regression line has a positive slope – nations with many TV sets have higher life expectancies. Could we lengthen the lives of people in Rwanda by shipping them TV sets?

18.7 Using the Regression Equation Before using the regression model, we need to assess how well it fits the data. If we are satisfied with how well the model fits the data, we can use it to predict the values of y. To make a prediction we use Point prediction, and Interval prediction

Point Prediction Example 18.7 Predict the selling price of a three-year-old Taurus with 40,000 miles on the odometer (Example 18.2). A point prediction It is predicted that a 40,000 miles car would sell for $14,575. How close is this prediction to the real price?

Interval Estimates Two intervals can be used to discover how closely the predicted value will match the true value of y. Prediction interval – predicts y for a given value of x, Confidence interval – estimates the average y for a given x. The prediction interval The confidence interval

Interval Estimates, Example Example 18.7 - continued Provide an interval estimate for the bidding price on a Ford Taurus with 40,000 miles on the odometer. Two types of predictions are required: A prediction for a specific car An estimate for the average price per car

Interval Estimates, Example Solution A prediction interval provides the price estimate for a single car: t.025,98 Approximately

Interval Estimates, Example Solution – continued A confidence interval provides the estimate of the mean price per car for a Ford Taurus with 40,000 miles reading on the odometer. The confidence interval (95%) =

The effect of the given xg on the length of the interval As xg moves away from x the interval becomes longer. That is, the shortest interval is found at

The effect of the given xg on the length of the interval As xg moves away from the interval becomes longer. That is, the shortest interval is found at

The effect of the given xg on the length of the interval As xg moves away from the interval becomes longer. That is, the shortest interval is found at .

Practice Problems 18.84,18.86,18.88,18.90,18.94