 Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.

Presentation on theme: "Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If."— Presentation transcript:

Simple Linear Regression

G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If we can model the relationship between two quantitative variables, we can use one variable, X, to predict another variable, Y. If we can model the relationship between two quantitative variables, we can use one variable, X, to predict another variable, Y. –Use height to predict weight. –Use percentage of hardwood in pulp to predict the tensile strength of paper. –Use square feet of warehouse space to predict monthly rental cost.

G. Baker, Department of Statistics University of South Carolina; Slide 3 Relationship Between Two Quantitative Variables We use data to create the model We use data to create the model –Observational study  Height and Weight example.  Square footage of warehouse space and cost example. –Designed experiment  Percentage of hardwood and tensile strength of paper example.

G. Baker, Department of Statistics University of South Carolina; Slide 4 Simple Linear Regression Simple: only one predictor variable Simple: only one predictor variable Linear: Straight line relationship Linear: Straight line relationship Regression: Fit data to (straight line) model Regression: Fit data to (straight line) model y (Response or Dependent Variable) x (Predictor, Regressor, or Independent Variable)

G. Baker, Department of Statistics University of South Carolina; Slide 5 Use Scatter Plot to See Relationship Predictor Response

G. Baker, Department of Statistics University of South Carolina; Slide 6 Absorbed Liquid Data In a chemical process, batches of liquid are passed through a bed containing an ingredient that is absorbed by the liquid. In a chemical process, batches of liquid are passed through a bed containing an ingredient that is absorbed by the liquid. We will attempt to relate the absorbed percentage of the ingredient (y) to the amount of liquid in the batch (x). We will attempt to relate the absorbed percentage of the ingredient (y) to the amount of liquid in the batch (x). Exercise 6.11 in text.

G. Baker, Department of Statistics University of South Carolina; Slide 7 Absorbed Liquid Data

G. Baker, Department of Statistics University of South Carolina; Slide 8 Absorbed Liquid Data

G. Baker, Department of Statistics University of South Carolina; Slide 9 Abs% = -1822 + 435(Amt) The regression line or model is deterministic.

G. Baker, Department of Statistics University of South Carolina; Slide 10 We are going to use a probabilistic model which accounts for the variation around the line.

G. Baker, Department of Statistics University of South Carolina; Slide 11 Probabilistic Model Probabilistic Model: deterministic plus error component for unexplained variation. Probabilistic Model: deterministic plus error component for unexplained variation.

G. Baker, Department of Statistics University of South Carolina; Slide 12 Regression Equation y = deterministic model + random error β 0 = y-intercept β 1 = slope ε = random error Regression line is estimate of the mean value of y at a given value of x.

G. Baker, Department of Statistics University of South Carolina; Slide 13 Probabilistic Model Probabilistic Model: deterministic plus error component for unexplained variation. Probabilistic Model: deterministic plus error component for unexplained variation.

G. Baker, Department of Statistics University of South Carolina; Slide 14 Estimating β 0 and β 1 Once we determine that a straight line model is reasonable, we want to establish the best line by estimating β 0 and β 1. Once we determine that a straight line model is reasonable, we want to establish the best line by estimating β 0 and β 1. µ = E(y) = β 0 + β 1 x β 1 is the slope. It is the amount by which y will change with a unit increase in x. β 1 is the slope. It is the amount by which y will change with a unit increase in x. β 0 is the y-intercept. It is the expected (mean) value of y when x = 0. (This may or may not be meaningful.) β 0 is the y-intercept. It is the expected (mean) value of y when x = 0. (This may or may not be meaningful.)

G. Baker, Department of Statistics University of South Carolina; Slide 15 If Amount goes up by 1 unit, then the Absorb% is expected to go up by 435 %. If Amount = 0, the expected Absorb% = -1822 %.

G. Baker, Department of Statistics University of South Carolina; Slide 16 Absorbed Liquid Data Do not consider x values outside the range of the data.

G. Baker, Department of Statistics University of South Carolina; Slide 17 Errors of Prediction = Vertical Distance Between Points and Line

G. Baker, Department of Statistics University of South Carolina; Slide 18 Method of Least Squares Sum of prediction errors = 0. Sum of prediction errors = 0. Sum of the squared errors = Sum of Squares Error = SSE Sum of the squared errors = Sum of Squares Error = SSE Many lines for which the sum of errors = 0. Many lines for which the sum of errors = 0. Only one line for which SSE is minimized. Only one line for which SSE is minimized. Least squares line = regression line = line for which SSE is minimized. Least squares line = regression line = line for which SSE is minimized. or

G. Baker, Department of Statistics University of South Carolina; Slide 19 Least Squares Estimates Deviation of i th point from estimated value: Deviation of i th point from estimated value: The sum of the square of deviations for all n points: The sum of the square of deviations for all n points: Values of and that minimize SSE are called the least squares estimates. They are minimum variance unbiased estimates. Values of and that minimize SSE are called the least squares estimates. They are minimum variance unbiased estimates.

G. Baker, Department of Statistics University of South Carolina; Slide 20 Formulas for Least Squares Estimates where

G. Baker, Department of Statistics University of South Carolina; Slide 21 Estimate of Variance at each x, σ 2 s is estimated standard error of regression model.

G. Baker, Department of Statistics University of South Carolina; Slide 22 MSE and Root MSE

G. Baker, Department of Statistics University of South Carolina; Slide 23 Sampling Distribution of β1β1 Standard Error for :

G. Baker, Department of Statistics University of South Carolina; Slide 24 Test of Model Utility H 0 : β 1 = 0 H a : β 1 = 0 Test Statistic: Confidence Interval:

G. Baker, Department of Statistics University of South Carolina; Slide 25 Amt and Absorb% PredictorCoef. SE Coef. TP-value Intercept-1822366-4.978<0.0001 Slope435607.25<0.0001 H 0 : β 1 = 0 H a : β 1 = 0

G. Baker, Department of Statistics University of South Carolina; Slide 26 Coefficient of Correlation Correlation measures the linear relationship between two quantitative variables. Correlation measures the linear relationship between two quantitative variables. To get a visual picture, use a scatter plot. To get a visual picture, use a scatter plot. To assign a numeric value: Pearson product moment coefficient of correlation, r. To assign a numeric value: Pearson product moment coefficient of correlation, r. r is scale less and will vary from –1 to +1.

G. Baker, Department of Statistics University of South Carolina; Slide 27 Coefficient of Correlation r = +1 r = -1

G. Baker, Department of Statistics University of South Carolina; Slide 28 Coefficient of Correlation r =.95 r = 0 r = -.80

G. Baker, Department of Statistics University of South Carolina; Slide 29 Coefficient of Determination Coefficient of Determination, r 2, measures the contribution of x in the predicting of y. Coefficient of Determination, r 2, measures the contribution of x in the predicting of y. Recall: Recall: If x makes no contribution to prediction of y, then, and SSE = SS yy. If x makes no contribution to prediction of y, then, and SSE = SS yy. If x contributes to prediction of y, then we expect SSE << SS yy. If x contributes to prediction of y, then we expect SSE << SS yy.

G. Baker, Department of Statistics University of South Carolina; Slide 30 Coefficient of Determination Recall: Recall: –SS yy is total sample variation around y. –SSE is unexplained sample variability after fitting regression line. Proportion of total sample variation explained by linear relationship: Proportion of total sample variation explained by linear relationship:

G. Baker, Department of Statistics University of South Carolina; Slide 31 Coefficient of Determination = proportion of total sample variability around y that is explained by the linear relationship between y and x. r 2 varies from 0 to 1.

G. Baker, Department of Statistics University of South Carolina; Slide 32 Using Model for Estimation Use model to estimate mean value of y, E[y], for specific value of x. Use model to estimate mean value of y, E[y], for specific value of x. Solving regression equation for particular value of x gives point estimate for y at that value of x. Solving regression equation for particular value of x gives point estimate for y at that value of x.

G. Baker, Department of Statistics University of South Carolina; Slide 33 (1-α)100% Confidence Interval for y at x = x p is a statistic. It has a sampling distribution. Since we are operating under the normal assumption, the Confidence Interval = Pt. Est. + t α/2 (Std Error of ). is a statistic. It has a sampling distribution. Since we are operating under the normal assumption, the Confidence Interval = Pt. Est. + t α/2 (Std Error of ). where t α/2 has n-2 degrees of freedom.

G. Baker, Department of Statistics University of South Carolina; Slide 34 Predict a New Individual y Value for a Given x. Individual values have more variation than means. (1-α)100% Prediction Interval for Individual Value of y at x = x p : Individual values have more variation than means. (1-α)100% Prediction Interval for Individual Value of y at x = x p : where t α/2 has n-2 degrees of freedom.

G. Baker, Department of Statistics University of South Carolina; Slide 35 Confidence and Prediction Bands

G. Baker, Department of Statistics University of South Carolina; Slide 36 Assumptions of a Regression Analysis Assumptions involve distribution of errors. Assumptions involve distribution of errors. –Actual errors: –Estimated errors - residuals Use plots of residuals to check the assumptions. Use plots of residuals to check the assumptions.

G. Baker, Department of Statistics University of South Carolina; Slide 37 There are Four Assumptions (1) The mean of the errors is 0 at each value of x. YESNO X values

G. Baker, Department of Statistics University of South Carolina; Slide 38 StatCrunch Plot of Residuals vs X Values

G. Baker, Department of Statistics University of South Carolina; Slide 39 There are Four Assumptions (2) Variance of errors is constant across all values of x. YESNO X values

G. Baker, Department of Statistics University of South Carolina; Slide 40 StatCrunch Plot of Residuals vs X Values

G. Baker, Department of Statistics University of South Carolina; Slide 41 There are Four Assumptions (3) Errors have normal distribution at each x. YES NO

G. Baker, Department of Statistics University of South Carolina; Slide 42 StatCrunch QQ Plot of Residuals

G. Baker, Department of Statistics University of South Carolina; Slide 43 There are Four Assumptions (4) Errors are independent – must know how data was gathered. By time, person, etc. YESNO

Download ppt "Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If."

Similar presentations