BCOR 1020 Business Statistics Lecture 26 – April 24, 2007.
Published byModified over 4 years ago
Presentation on theme: "BCOR 1020 Business Statistics Lecture 26 – April 24, 2007."— Presentation transcript:
BCOR 1020 Business Statistics Lecture 26 – April 24, 2007
Overview Chapter 12 – Linear Regression –Violations of Assumptions –Unusual Observations –Example(s)
Chapter 12 – Violation of Assumptions Three Important Assumptions: 1.The errors are normally distributed. 2.The errors have constant variance (i.e., they are homoscedastic) 3.The errors are independent (i.e., they are nonautocorrelated). The error i is unobservable. The residuals e i from the fitted regression give clues about the violation of these assumptions.
Chapter 12 – Violation of Assumptions Histogram of Residuals: Check for non-normality by creating histograms of the residuals or standardized residuals (each residual is divided by its standard error). Standardized residuals range between -3 and +3 unless there are outliers.
Chapter 12 – Violation of Assumptions Normal Probability Plot: The Normal Probability Plot tests the assumption H 0 : Errors are normally distributed H 1 : Errors are not normally distributed If H 0 is true, the residual probability plot should be linear.
Chapter 12 – Violation of Assumptions Tests for Heteroscedasticity: Plot the residuals against X. Ideally, there is no pattern in the residuals moving from left to right.
Chapter 12 – Violation of Assumptions Tests for Heteroscedasticity: The “fan-out” pattern of increasing residual variance is the most common pattern indicating heteroscedasticity.
Chapter 12 – Violation of Assumptions Example: Consider the plots of the residuals for the dataset Ship Cost… (overhead & handout) Let’s quickly assess whether the regression assumptions are reasonable.
Chapter 12 – Unusual Observations Standardized Residuals: Excel Use Excel’s Tools > Data Analysis > Regression Standardized Residuals: MegaStat MegaStat give same general output as Excel.
Chapter 12 – Unusual Observations Studentized Deleted Residuals: Studentized deleted residuals are another way to identify unusual observations. A studentized deleted residual whose absolute value is 2 or more may be considered unusual. A studentized deleted residual whose absolute value is 3 or more is an outlier.
Chapter 12 – Other Regression Problems Outliers: To fix the problem, - delete the data - delete the data - formulate a multiple regression model that includes the lurking variable Outliers may be caused by - an error in recording data - impossible data - an observation that has been influenced by an unspecified “lurking” variable that should have been controlled but wasn’t.
Example Consider Data Set B on p.545 of your text in which bivariate data is compiled to determine whether there is a relationship Number of Employees (X) and Revenue (Y) for n = 24 large automotive companies in1999. A scatter plot of the data follows: Note the high R 2 value – 86% of the total variation in y is accounted for by the regression line. The correlation, r =.9261 is significant. Now let’s generate the MegaStat regression output…
Clickers Based on the portion of the regression output below, how much of the total variation in the y-variable is accounted for by the regression line? (A) 0% (B) 18% (C) 86% (D) 93% Regression Analysis r²0.858n24 r0.926k1 Std. Error18.182Dep. Var.Revenue
Clickers Based on the regression output below, What can we conclude about the hypothesis test At the 10% level of significance? (A) Reject H 0 in favor of H 1. (B) Fail to reject H 0 in favor of H 1. (C) Not enough information is given. Regression outputconfidence interval variables coefficientsstd. error t (df=22)p-value95% lower95% upper Intercept0.81535.4168 0.151.8817-10.418512.0490 Employees0.30480.0265 11.5168.74E-110.24990.3597
Clickers The graph below would best be used to check which regression assumption? (A) The errors are normal. (B) The errors have constant variance. (C) The errors are independent.