# Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.

## Presentation on theme: "Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent."— Presentation transcript:

Lesson #32 Simple Linear Regression

Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent variable(s), X’ s.  ere, all independent variables are numeric Simple Linear Regression - one independent variable, X

Assume the “true” relationship is Y =  +  X +   is the intercept ( or Y =  0 +  1 X +   is the slope  represents random error  ~ N(0,  2 ) regression coefficients

Y =  +  X Y X

Assumptions LINELINE - Linear relationship - Independent observations - Normal errors - Equal variances

Estimation We want to estimate the regression coefficients The estimates are the estimated intercept the estimated slope The estimated line is then

Y X We want the line that “best fits” the data

Y X XiXi YiYi eiei

The predicted value at the i th point is: The residual at the i th point is: Ordinary least squares (OLS) chooses the line that minimizes the sum of the squared residuals

For OLS: The estimated line, or prediction equation, is

Sums of Squares SS TOTAL SS ERROR = SS RESIDUAL SS REGRESSION = SS TOTAL - SS ERROR - total variation in dependent variable - unexplained variation after fitting model - variation “explained” by model

ANOVA Table Source df SS MS F Regression Error Total 1 n–2 n–1 SS REG SS ERROR SS TOTAL MS REG MS ERROR F0F0

E ( MS ERROR ) =  2 Reject H 0 if F 0 > F (1,n-2),1-  F 0 = tests H 0 :  = 0 H 1 :   0 R 2 = R 2 = (r) 2

X = fat cal. Y = chol. 28 35 43 24 51 33 18 40 177 194 247 186 232 207 143 206 = (28 - 34)(177 - 199) + (40 - 34)(206 - 199)  = 2180 = (28-34) 2 +…+ (40-34) 2 = 800

Y = Cholesterol X = % Calories from Fat

= 2.725 = 199 – (2.725)(34)= 106.35 = 106.35 + 2.725(fat cal.) = 106.35 + 2.725(30)= 188.1 = 106.35 + 2.725(28)= 182.65

X = fat cal. Y = chol. 28 35 43 24 51 33 18 40 177 194 247 186 232 207 143 206 182.650 201.725 223.525 171.750 245.325 196.275 155.400 215.350 e -5.650 -7.725 23.475 14.250 -13.325 10.725 -12.400 -9.350 SS ERROR = (-5.650) 2 + … + (-9.350) 2 SS TOTAL = (177-199) 2 +…+ (206-199) 2 = 1379.5 = 7320 SS REGRESSION = 7320 – 1379.5= 5940.5

Source df SS MS F Regression Error Total 1 6 7 5940.5 25.84 5940.5 1379.5 7320 Reject H 0 if F 0 > F (1,6),.95 = 5.99 Reject H 0 Conclude there is a positive linear relationship between calories from fat and cholesterol R 2 = =.8115

Download ppt "Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent."

Similar presentations