# BCOR 1020 Business Statistics Lecture 26 – April 24, 2007.

## Presentation on theme: "BCOR 1020 Business Statistics Lecture 26 – April 24, 2007."— Presentation transcript:

BCOR 1020 Business Statistics Lecture 26 – April 24, 2007

Overview Chapter 12 – Linear Regression –Violations of Assumptions –Unusual Observations –Example(s)

Chapter 12 – Violation of Assumptions Three Important Assumptions: 1.The errors are normally distributed. 2.The errors have constant variance (i.e., they are homoscedastic) 3.The errors are independent (i.e., they are nonautocorrelated). The error  i is unobservable. The residuals e i from the fitted regression give clues about the violation of these assumptions.

Chapter 12 – Violation of Assumptions Histogram of Residuals: Check for non-normality by creating histograms of the residuals or standardized residuals (each residual is divided by its standard error). Standardized residuals range between -3 and +3 unless there are outliers.

Chapter 12 – Violation of Assumptions Normal Probability Plot: The Normal Probability Plot tests the assumption H 0 : Errors are normally distributed H 1 : Errors are not normally distributed If H 0 is true, the residual probability plot should be linear.

Chapter 12 – Violation of Assumptions Tests for Heteroscedasticity: Plot the residuals against X. Ideally, there is no pattern in the residuals moving from left to right.

Chapter 12 – Violation of Assumptions Tests for Heteroscedasticity: The “fan-out” pattern of increasing residual variance is the most common pattern indicating heteroscedasticity.

Chapter 12 – Violation of Assumptions Example: Consider the plots of the residuals for the dataset Ship Cost… (overhead & handout) Let’s quickly assess whether the regression assumptions are reasonable.

Chapter 12 – Unusual Observations Standardized Residuals: Excel Use Excel’s Tools > Data Analysis > Regression Standardized Residuals: MegaStat MegaStat give same general output as Excel.

Chapter 12 – Unusual Observations Studentized Deleted Residuals: Studentized deleted residuals are another way to identify unusual observations. A studentized deleted residual whose absolute value is 2 or more may be considered unusual. A studentized deleted residual whose absolute value is 3 or more is an outlier.

Chapter 12 – Other Regression Problems Outliers: To fix the problem, - delete the data - delete the data - formulate a multiple regression model that includes the lurking variable Outliers may be caused by - an error in recording data - impossible data - an observation that has been influenced by an unspecified “lurking” variable that should have been controlled but wasn’t.

Example Consider Data Set B on p.545 of your text in which bivariate data is compiled to determine whether there is a relationship Number of Employees (X) and Revenue (Y) for n = 24 large automotive companies in1999. A scatter plot of the data follows: Note the high R 2 value – 86% of the total variation in y is accounted for by the regression line. The correlation, r =.9261 is significant. Now let’s generate the MegaStat regression output…

Example – Residuals… Studentized Deleted ObservationRevenuePredicted ResidualLeverageResidual 135.9037.36-1.460.043-0.082-0.080 2154.60135.3919.210.2231.1981.211 312.8027.15-14.350.050-0.810-0.803 413.8022.88-9.080.054-0.514-0.505 551.0068.03-17.030.052-0.962-0.960 6144.40106.0438.360.1232.2532.510 710.606.883.720.0770.2130.208 8161.30181.88-20.580.461-1.542-1.595 948.7035.0213.680.0450.7700.763 1012.709.503.200.0720.1830.179 1112.6027.94-15.340.049-0.866-0.860 129.1020.78-11.680.056-0.661-0.653 1313.8020.35-6.550.057-0.371-0.364 1416.1010.545.560.0710.3170.311 1527.508.9518.550.0731.0601.063 1651.5040.8410.660.0420.5990.590 1737.5048.52-11.020.042-0.619-0.610 1841.4042.97-1.570.042-0.088-0.086 1928.6058.58-29.980.045-1.687-1.767 2011.405.056.350.0800.3640.357 2199.7056.8742.830.0442.4092.744 2211.9024.59-12.690.052-0.717-0.709 2376.3091.62-15.320.089-0.883-0.878 2426.8022.244.560.0550.2580.252 Observations #6 and #21 have unusually large studentized residuals.

Clickers Based on the portion of the regression output below, how much of the total variation in the y-variable is accounted for by the regression line? (A) 0% (B) 18% (C) 86% (D) 93% Regression Analysis r²0.858n24 r0.926k1 Std. Error18.182Dep. Var.Revenue

Clickers Based on the regression output below, What can we conclude about the hypothesis test At the 10% level of significance? (A) Reject H 0 in favor of H 1. (B) Fail to reject H 0 in favor of H 1. (C) Not enough information is given. Regression outputconfidence interval variables coefficientsstd. error t (df=22)p-value95% lower95% upper Intercept0.81535.4168 0.151.8817-10.418512.0490 Employees0.30480.0265 11.5168.74E-110.24990.3597

Clickers The graph below would best be used to check which regression assumption? (A) The errors are normal. (B) The errors have constant variance. (C) The errors are independent.

FCQ…

Download ppt "BCOR 1020 Business Statistics Lecture 26 – April 24, 2007."

Similar presentations