# This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.

## Presentation on theme: "This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3."— Presentation transcript:

This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3

Linear Regression Investigate the relationship between two variables Dependent variable –The variable that is being predicted or explained Independent variable –The variable that is doing the predicting or explaining Think of data in pairs (x i, y i )

Linear Regression - Purpose Is there an association between the two variables –Is BP change related to weight change? Estimation of impact –How much BP change occurs per pound of weight change Prediction –If a person loses 10 pounds how much of a drop in blood pressure can be expected

Assumption for Linear Regression For each value of X there is a population of Y’s that are normally distributed The population means form a straight line Each population has the same variance  2 Note: The X’s do not need to be normally distributed, in fact the researcher can select these prior to data collection

Simple Linear Regression Equation n The simple linear regression equation is:  y =  0 +  1 x   0 is the mean when x=0  The mean increases by  1 for each increase of x by 1

Simple Linear Regression Model  The equation that describes how individual y values relate to x and an error term is called the regression model. y =  0 +  1 x +    reflects how individuals deviate from others with the same value of x

Estimated Simple Linear Regression Equation n The estimated simple linear regression equation is: b 0 is the estimate for  0 b 0 is the estimate for  0 b 1 is the estimate for  1 b 1 is the estimate for  1 is the estimated (predicted) value of y for a given x value. It is the estimated mean for that x. is the estimated (predicted) value of y for a given x value. It is the estimated mean for that x.

Least Squares Method  Least Squares Criterion: Choose    and    to minimize Of all possible lines pick the one that minimizes the sum of the distances squared of each point from that line S =  y i –  0  1 x i ) 2

Slope: The Least Squares Estimates Intercept:

 An Estimate of  2 The mean square error (MSE) provides the estimate of  2, and the notation s 2 is also used. s 2 = MSE = SSE/(n-2) s 2 = MSE = SSE/(n-2)where: Estimating the Variance If points are close to the regression line then SSE will be small If points are far from the regression line then SSE will be large

Estimating   An Estimate of   To estimate  we take the square root of  2.  The resulting s is called the root mean square error.

Hypothesis Testing for   H o :  1 = 0 no relation between x and y H a :  1 ≠  0 relation between x and y Test Statistic: t = b 1 /SE(b 1 ) SE(b 1 ) depends on Sample size How well the estimated line fits the points How spread out the range of x values are

n Rejection Rule Reject H 0 if t t  where: t  is based on a t distribution with n - 2 degrees of freedom with n - 2 degrees of freedom Testing for Significance: t Test

Confidence Interval for  1 is cutoff value from t-distribution with n-2 df CLM option in SAS on model statement

Estimating the Mean for a Particular X  Simply plug in your value of x in the estimated regression equation Want to estimate the mean BP for persons aged 50 Suppose b 0 = 100 and b 1 = 0.80 Estimate = 100 + 0.80*50 = 140 mmHg  Can compute 95% CI for the estimate using SAS CLM option on model statement

The Coefficient of Determination  Relationship Among SST, SSR, SSE SST = SSR + SSE where: SST = total sum of squares SST = total sum of squares SSR = sum of squares due to regression SSR = sum of squares due to regression SSE = sum of squares due to error SSE = sum of squares due to error ^^

n The coefficient of determination is: r 2 = SSR/SST where: SST = total sum of squares SST = total sum of squares SSR = sum of squares due to regression SSR = sum of squares due to regression r 2 = proportion of variability explained by X (must be between 0 and 1) The Coefficient of Determination

Residuals  How far off (distance) an individual point is from the estimated regression line residual = predicted value – observed value

SAS CODE FOR REGRESSION ; PROC REG DATA=datasetname SIMPLE; MODEL depvar = indvar(s); PLOT depvar * indvar ; RUN; Several options on model and plot statements.

OPTIONS ON MODEL STATEMENT ; MODEL depvar = indvar(s)/options OptionWhat it does clb95% CI for  1 pPredicted values rResiduals clm95% CI for the mean at value of x

OUTPUT FROM PROC REG Dependent Variable: quarsales Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 1 SSR 14200 14200 74.25 <.0001 Error 8 SSE 1530 191.25000 Corrected Total 9 SST 15730 Root MSE 13.82932 R-Square 0.9027 Dependent Mean 130.00000 Coeff Var 10.63794 Coefficient of Determination 14200/15730 MSE

Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 60.00000 9.22603 6.50 0.0002 studentpop 1 5.00000 0.58027 8.62 <.0001 REGRESSION EQUATION : Y = 60.0 + 5.0*X QUARSALES = 60 + 5*STUDENTPOP b1 SE(b1)

Download ppt "This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3."

Similar presentations