Simple Linear Regression Statistics 515 Lecture
Example for Illustration The human body takes in more oxygen when exercising than when it is at rest. To deliver oxygen to the muscles, the heart must beat faster. Heart rate is easy to measure, but measuring oxygen uptake requires elaborate equipment. If oxygen uptake (VO2) can be accurately predicted from heart rate (HR), the predicted values may replace actually measured values for various research purposes. Unfortunately, not all human bodies are the same, so no single prediction equation works for all people. Researchers can, however, measure both HR and VO2 for one person under varying sets of exercise conditions and calculate a regression equation for predicting that person’s oxygen uptake from heart rate. 12/3/2018 Simple Linear Regression
Data From An Individual Goals in this illustration: Scatterplot: linear relationship or not? Obtain the best-fitting line using least-squares. To test whether the model is significant or not. To obtain a confidence interval for the regression coefficient. To obtain predictions. 12/3/2018 Simple Linear Regression
Simple Linear Regression The Scatterplot 12/3/2018 Simple Linear Regression
Simple Linear Regression Model 1. Conditional on X=x, the response variable Y has mean equal to m(x) = a + bx. 2. a is the y-intercept; while b is the slope of the regression line, which could be interpreted as the change in the mean value per unit change in the independent variable. 3. For each X = x, the conditional distribution of Y is normal with mean m(x) and variance s2. 4. Y1, Y2, …, Yn are independent of each other. Shorthand: Yi = a + bxi + ei with ei IID N(0,s2) 12/3/2018 Simple Linear Regression
Least-Squares (LS) Regression One of the goals in regression analysis is to estimate the parameters a, b, and s2 of the regression model. Denote by The estimate of the regression line, so that a estimates a, and b estimates b. Then for the observed values of X, which are x1, x2, …, xn, we may obtain the predicted values of the response variable Y for each of these X-values. These are: 12/3/2018 Simple Linear Regression
Simple Linear Regression Predicted Values A good estimate of the regression line should produce predicted values that are close to the actual observed values of the response variable. That is, the set of deviations Should ideally be close (if not equal) to zeros. These deviations between observed and predicted values are also called as residuals. 12/3/2018 Simple Linear Regression
Principle of Least-Squares (LS) In least-squares regression, the best-fitting regression line is that which will make the sum of these squared deviations or residuals as small as possible. Thus, the regression coefficients a and b are chosen in order to minimize the quantity: Using calculus, the values of a and b that will minimize this quantity are given by: 12/3/2018 Simple Linear Regression
Least-Squares Solution 12/3/2018 Simple Linear Regression
Estimating the Variance 12/3/2018 Simple Linear Regression
Interpretations of Quantities SSE : measures variation not explained by the predictor variable. SSR : measures the amount of variation explained by the predictor variable. SYY: total variation in the Y-values. This is partitioned into SSR and SSE. R2 = (SSR)/(SYY) : coefficient of determination; indicates proportion of variation in Y-values explained by the predictor variable. MSE = (SSE)/(n-2) : is the mean-squared error. This provides an unbiased estimate of the common variance s2. 12/3/2018 Simple Linear Regression
Sampling Distributions of Estimators To estimate the variance, s2 is replaced by the MSE. 12/3/2018 Simple Linear Regression
Simple Linear Regression Testing Hypothesis To test the null hypothesis H0: b = b0 versus H1: b not equal to b0 we use the t-statistic given by: Which follows a t-distribution with degrees-of-freedom equal to n-2 under the null hypothesis. Thus, we reject H0 if |Tc| > tn-2;a/2. Similarly, for testing H0: a = a0, we use: 12/3/2018 Simple Linear Regression
Simple Linear Regression Confidence Interval for Mean and Predicting the Value of Y of a new Unit Estimate of Mean and Predicted Value at x0: Variance: CI for m(x0): CI for Y(x0): 12/3/2018 Simple Linear Regression
Results of Regression Analysis (using Minitab) Prediction Line P-value for regression P-Value Coefficient of Determination (MSR)/(MSE) 12/3/2018 Simple Linear Regression
Fitted Line on the Scatterplot 12/3/2018 Simple Linear Regression
Simple Linear Regression Confidence Interval for Mean and Prediction Interval For predicting the mean value For predicting the value of the response 12/3/2018 Simple Linear Regression