Presentation on theme: "BCOR 1020 Business Statistics"— Presentation transcript:
1 BCOR 1020 Business Statistics Lecture 25 – April 22, 2008
2 Overview Chapter 12 – Linear Regression Ordinary Least Squares FormulasTests for SignificanceAnalysis of Variance: Overall FitConfidence and Prediction Intervals for YExample(s)
3 Chapter 12 – Ordinary Least Squares Formulas Slope and Intercept:The ordinary least squares method (OLS) estimates the slope and intercept of the regression line so that the residuals are small.Recall that the residuals are the differences between observed y-values and the fitted y-values on the line…The sum of the residuals = 0 for any line…So, we consider the sum of the squared residuals (the SSE)…
4 Chapter 12 – Ordinary Least Squares Formulas Slope and Intercept:To find our OLS estimators, we need to find the values of b0 and b1 that minimize the SSE…The OLS estimator for the slope is:The OLS estimator for the intercept is:orThese are computed by the regression function on your computer or calculator.
5 Chapter 12 – Ordinary Least Squares Formulas Example (Regression Output):We will consider the dataset “ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y).Using MegaStat we can generate a regression output (in handout)…Demonstration in Excel…
7 Chapter 12 – Ordinary Least Squares Formulas Assessing Fit:We want to explain the total variation in Y around its mean (SST for Total Sums of Squares)The regression sum of squares (SSR) is the explained variation in Y
8 Chapter 12 – Ordinary Least Squares Formulas Assessing Fit:The error sum of squares (SSE) is the unexplained variation in YIf the fit is good, SSE will be relatively small compared to SST.A perfect fit is indicated by an SSE = 0.The magnitude of SSE depends on n and on the units of measurement.
9 Chapter 12 – Ordinary Least Squares Formulas Coefficient of Determination:R2 is a measure of relative fit based on a comparison of SSR and SST.0 < R2 < 1Often expressed as a percent, an R2 = 1 (i.e., 100%) indicates perfect fit.In a bivariate regression, R2 = (r)2
10 Clickers Suppose you are have found the regression model for a given set of bivariate data. If the correlationis r = -0.72, what is the coefficient of determination?(A)(B)(C)(D)(E)
11 Chapter 12 – Test for Significance Standard Error of Regression:The standard error (syx) is an overall measure of model fit.If the fitted model’s predictions are perfect (SSE = 0), then syx = 0. Thus, a small syx indicates a better fit.Used to construct confidence intervals.Magnitude of syx depends on the units of measurement of Y and on data magnitude.
12 Chapter 12 – Test for Significance Confidence Intervals for Slope and Intercept:Standard error of the slope:Standard error of the intercept:Confidence interval for the true slope:Confidence interval for the true intercept:
13 Chapter 12 – Test for Significance Hypothesis Tests:If b1 = 0, then X cannot influence Y and the regression model collapses to a constant b0 plus random error.The hypotheses to be tested are:These are tested in the standard regression output in any statistics package like MegaStat.
14 Chapter 12 – Test for Significance Hypothesis Tests:A t test is used with n = n – 2 degrees of freedom The test statistics for the slope and intercept are:tn-2 is obtained from Appendix D or Excel for a given a.Reject H0 if t > ta or if p-value < a.The p-value is provided in the regression output.
15 Chapter 12 – Test for Significance Example (Regression Output):Let’s revisit the regression output from the dataset “ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y).Go through tests for significance on b0 and b1.
16 Chapter 12 – Analysis of Variance Decomposition of Variance:To explain the variation in the dependent variable around its mean, use the formulaThis same decomposition for the sums of squares isThe decomposition of variance is written asSST(total variation around the mean)SSE(unexplained or error variation)SSR(variation explained by the regression)=+
17 Chapter 12 – Analysis of Variance F Statistic for Overall Fit:For a bivariate regression, the F statistic isFor a given sample size, a larger F statistic indicates a better fit.Reject H0 if F > F1,n-2 from Appendix F for a given significance level a or if p-value < a.
18 Chapter 12 – Analysis of Variance Example (Regression Output):Let’s revisit the regression output from the dataset “ShipCost” from your text (12.19 on p.438) which considers the relationship between Number of Orders (X) and Shipping Costs (Y).Go through the Analysis of Variance (ANOVA) to assess overall fit.
19 Chapter 12 – Example Example (Exam Scores): We will consider the dataset “ExamScores” from your text (Table 12.3 on p.434) which considers the relationship between Study Hours (X) and Exam Scores (Y).Generate MegaStat regression output.Output on Overhead…
20 Clickers If a randomly selected student had studied 12 hours for this exam, what score would this modelPredict (to the nearest %)?(A) 51%(B) 61%(C) 73%(D) 82%
21 Clickers Find the p-value on the hypothesis test… (A) 0.0012 (B)(C)(D)
22 Clickers Recall from Tuesday’s lecture, the critical value for testing whether the correlation is significant isgiven byCompute the critical value and determine whetherthe correlation is significant using a = 10%.(A) Yes, r is significant.(B) No, r is not significant.
23 Clickers – Work…Work…Since n = 10 and a = 10%, ta/2,n-2 = t.05,8 =From the output, r =Since |r| > ra, we can reject H0: r = 0 in favor of H1: r 0.Or, using …Since |T*| > ta/2,n-2 = t.05,8 = 1.860, we reach the same conclusion. The correlation is significant.
24 Chapter 12 – Confidence & Prediction Intervals for Y How to Construct an Interval Estimate for YThe regression line is an estimate of the conditional mean of Y.An interval estimate is used to show a range of likely values of the point estimate.Confidence Interval for the conditional mean of Y
25 Chapter 12 – Confidence & Prediction Intervals for Y How to Construct an Interval Estimate for YPrediction interval for individual values of Y isPrediction intervals are wider than confidence intervals because individual Y values vary more than the mean of Y.
26 Chapter 12 – Confidence & Prediction Intervals for Y MegaStat’s Confidence and PredictionIntervals: