## Presentation on theme: "BCOR 1020 Business Statistics"— Presentation transcript:

Lecture 25 – April 22, 2008

Overview Chapter 12 – Linear Regression
Ordinary Least Squares Formulas Tests for Significance Analysis of Variance: Overall Fit Confidence and Prediction Intervals for Y Example(s)

Chapter 12 – Ordinary Least Squares Formulas
Slope and Intercept: The ordinary least squares method (OLS) estimates the slope and intercept of the regression line so that the residuals are small. Recall that the residuals are the differences between observed y-values and the fitted y-values on the line… The sum of the residuals = 0 for any line… So, we consider the sum of the squared residuals (the SSE)…

Chapter 12 – Ordinary Least Squares Formulas
Slope and Intercept: To find our OLS estimators, we need to find the values of b0 and b1 that minimize the SSE… The OLS estimator for the slope is: The OLS estimator for the intercept is: or These are computed by the regression function on your computer or calculator.

Chapter 12 – Ordinary Least Squares Formulas
Example (Regression Output): We will consider the dataset “ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y). Using MegaStat we can generate a regression output (in handout)… Demonstration in Excel…

Chapter 12 – Ordinary Least Squares Formulas
Example (Regression Output): Regression Analysis 0.672 n 12 r 0.820 k 1 Std. Error Dep. Var. Ship Cost (Y) ANOVA table Source SS df MS F p-value Regression 7,340, 1 20.46 .0011 Residual 3,588, 10 358, Total 10,929, 11 Regression output confidence interval variables coefficients std. error t (df=10) p-value 95% lower 95% upper Intercept 1, -0.029 .9771 -2, 2, Orders (X) 4.9322 1.0905 4.523 .0011 2.5024 7.3619

Chapter 12 – Ordinary Least Squares Formulas
Assessing Fit: We want to explain the total variation in Y around its mean (SST for Total Sums of Squares) The regression sum of squares (SSR) is the explained variation in Y

Chapter 12 – Ordinary Least Squares Formulas
Assessing Fit: The error sum of squares (SSE) is the unexplained variation in Y If the fit is good, SSE will be relatively small compared to SST. A perfect fit is indicated by an SSE = 0. The magnitude of SSE depends on n and on the units of measurement.

Chapter 12 – Ordinary Least Squares Formulas
Coefficient of Determination: R2 is a measure of relative fit based on a comparison of SSR and SST. 0 < R2 < 1 Often expressed as a percent, an R2 = 1 (i.e., 100%) indicates perfect fit. In a bivariate regression, R2 = (r)2

Clickers Suppose you are have found the regression model
for a given set of bivariate data. If the correlation is r = -0.72, what is the coefficient of determination? (A) (B) (C) (D) (E)

Chapter 12 – Test for Significance
Standard Error of Regression: The standard error (syx) is an overall measure of model fit. If the fitted model’s predictions are perfect (SSE = 0), then syx = 0. Thus, a small syx indicates a better fit. Used to construct confidence intervals. Magnitude of syx depends on the units of measurement of Y and on data magnitude.

Chapter 12 – Test for Significance
Confidence Intervals for Slope and Intercept: Standard error of the slope: Standard error of the intercept: Confidence interval for the true slope: Confidence interval for the true intercept:

Chapter 12 – Test for Significance
Hypothesis Tests: If b1 = 0, then X cannot influence Y and the regression model collapses to a constant b0 plus random error. The hypotheses to be tested are: These are tested in the standard regression output in any statistics package like MegaStat.

Chapter 12 – Test for Significance
Hypothesis Tests: A t test is used with n = n – 2 degrees of freedom The test statistics for the slope and intercept are: tn-2 is obtained from Appendix D or Excel for a given a. Reject H0 if t > ta or if p-value < a. The p-value is provided in the regression output.

Chapter 12 – Test for Significance
Example (Regression Output): Let’s revisit the regression output from the dataset “ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y). Go through tests for significance on b0 and b1.

Chapter 12 – Analysis of Variance
Decomposition of Variance: To explain the variation in the dependent variable around its mean, use the formula This same decomposition for the sums of squares is The decomposition of variance is written as SST (total variation around the mean) SSE (unexplained or error variation) SSR (variation explained by the regression) = +

Chapter 12 – Analysis of Variance
F Statistic for Overall Fit: For a bivariate regression, the F statistic is For a given sample size, a larger F statistic indicates a better fit. Reject H0 if F > F1,n-2 from Appendix F for a given significance level a or if p-value < a.

Chapter 12 – Analysis of Variance
Example (Regression Output): Let’s revisit the regression output from the dataset “ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y). Go through the Analysis of Variance (ANOVA) to assess overall fit.

Chapter 12 – Example Example (Exam Scores):
We will consider the dataset “ExamScores” from your text (Table 12.3 on p.434) which considers the relationship between Study Hours (X) and Exam Scores (Y). Generate MegaStat regression output. Output on Overhead…

Clickers If a randomly selected student had studied 12
hours for this exam, what score would this model Predict (to the nearest %)? (A) 51% (B) 61% (C) 73% (D) 82%

Clickers Find the p-value on the hypothesis test… (A) 0.0012
(B) (C) (D)

Clickers Recall from Tuesday’s lecture, the critical value for
testing whether the correlation is significant is given by Compute the critical value and determine whether the correlation is significant using a = 10%. (A) Yes, r is significant. (B) No, r is not significant.

Clickers – Work… Work… Since n = 10 and a = 10%, ta/2,n-2 = t.05,8 = From the output, r = Since |r| > ra, we can reject H0: r = 0 in favor of H1: r 0. Or, using … Since |T*| > ta/2,n-2 = t.05,8 = 1.860, we reach the same conclusion. The correlation is significant.

Chapter 12 – Confidence & Prediction Intervals for Y
How to Construct an Interval Estimate for Y The regression line is an estimate of the conditional mean of Y. An interval estimate is used to show a range of likely values of the point estimate. Confidence Interval for the conditional mean of Y

Chapter 12 – Confidence & Prediction Intervals for Y
How to Construct an Interval Estimate for Y Prediction interval for individual values of Y is Prediction intervals are wider than confidence intervals because individual Y values vary more than the mean of Y.

Chapter 12 – Confidence & Prediction Intervals for Y
MegaStat’s Confidence and Prediction Intervals: