Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-1 Using Statistics The Simple.

Similar presentations


Presentation on theme: "COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-1 Using Statistics The Simple."— Presentation transcript:

1 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-1 Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares Error Variance and the Standard Errors of Regression Estimators Correlation Hypothesis Tests about the Regression Relationship How Good is the Regression? Analysis of Variance Table and an F Test of the Regression Model Residual Analysis and Checking for Model Inadequacies Use of the Regression Model for Prediction Using the Computer Summary and Review of Terms Simple Linear Regression and Correlation 10

2 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-2 10-1 Using Statistics

3 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-3 X Y X Y X 0 0 0 0 0 Y X Y X Y X Y Examples of Other Scatterplots

4 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-4 The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship. A statistical model separates the systematic component of a relationship from the random component. The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship. A statistical model separates the systematic component of a relationship from the random component. Data Statistical model Systematic component + Random errors In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE). In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line. In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE). In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line. Model Building

5 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-5 The population simple linear regression model: Y=  0 +  1 X +  Nonrandom or Random Systematic Component Component where Y is the dependent variable, the variable we wish to explain or predict; X is the independent variable, also called the predictor variable; and  is the error term, the only random component in the model, and thus, the only source of randomness in Y.  0 is the intercept of the systematic component of the regression relationship.  1 is the slope of the systematic component. The conditional mean of Y: The population simple linear regression model: Y=  0 +  1 X +  Nonrandom or Random Systematic Component Component where Y is the dependent variable, the variable we wish to explain or predict; X is the independent variable, also called the predictor variable; and  is the error term, the only random component in the model, and thus, the only source of randomness in Y.  0 is the intercept of the systematic component of the regression relationship.  1 is the slope of the systematic component. The conditional mean of Y: 10-2 The Simple Linear Regression Model

6 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-6 The simple linear regression model posits an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Y i ]=  0 +  1 X i Actual observed values of Y differ from the expected value by an unexplained or random error: Y i = E[Y i ] +  i =  0 +  1 X i +  i The simple linear regression model posits an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Y i ]=  0 +  1 X i Actual observed values of Y differ from the expected value by an unexplained or random error: Y i = E[Y i ] +  i =  0 +  1 X i +  i X Y E[Y]=  0 +  1 X XiXi } }  1 = Slope 1  0 = Intercept YiYi { Error:  i Regression Plot Picturing the Simple Linear Regression Model

7 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-7 The relationship between X and Y is a straight-line relationship. The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term  i. The errors  i are normally distributed with mean 0 and variance  2. The errors are uncorrelated (not related) in successive observations. That is:  ~ N(0,  2 ) The relationship between X and Y is a straight-line relationship. The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term  i. The errors  i are normally distributed with mean 0 and variance  2. The errors are uncorrelated (not related) in successive observations. That is:  ~ N(0,  2 ) X Y E[Y]=  0 +  1 X Assumptions of the Simple Linear Regression Model Identical normal distributions of errors, all centered on the regression line. Assumptions of the Simple Linear Regression Model

8 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-8 Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line. The estimated regression equation: Y=b 0 + b 1 X + e where b 0 estimates the intercept of the population regression line,  0 ; b 1 estimates the slope of the population regression line,  1 ; and e stands for the observed errors - the residuals from fitting the estimated regression line b 0 + b 1 X to a set of n points. Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line. The estimated regression equation: Y=b 0 + b 1 X + e where b 0 estimates the intercept of the population regression line,  0 ; b 1 estimates the slope of the population regression line,  1 ; and e stands for the observed errors - the residuals from fitting the estimated regression line b 0 + b 1 X to a set of n points. 10-3 Estimation: The Method of Least Squares

9 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-9 Fitting a Regression Line X Y Data X Y Three errors from a fitted line X Y Three errors from the least squares regression line e X Errors from the least squares regression line are minimized

10 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-10. { Y X Errors in Regression

11 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-11 Least Squares Regression b0b0 SSE b1b1 Least squares b 0 Least squares b 1

12 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-12 Sums of Squares, Cross Products, and Least Squares Estimators

13 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-13 MilesDollarsMiles 2 Miles*Dollars 1211180214665212182222 1345240518090253234725 1422200520220842851110 1687251128459694236057 1849233234188014311868 2026230541046764669930 2133301645496896433128 2253338550760097626405 2400309057600007416000 2468369460910249116792 2699337172846019098329 28063998787363611218388 30823555949872410956510 320946921029768115056628 346642441201315614709704 364352981327144919300614 385248011483790418493452 403351471626508920757852 426757381820728824484046 449864202023200428877160 453360592054808827465448 480464262307841630870504 509063212590810032173890 523370262738428836767056 543969642958272037877196 7949810605293426944390185024 MilesDollarsMiles 2 Miles*Dollars 1211180214665212182222 1345240518090253234725 1422200520220842851110 1687251128459694236057 1849233234188014311868 2026230541046764669930 2133301645496896433128 2253338550760097626405 2400309057600007416000 2468369460910249116792 2699337172846019098329 28063998787363611218388 30823555949872410956510 320946921029768115056628 346642441201315614709704 364352981327144919300614 385248011483790418493452 403351471626508920757852 426757381820728824484046 449864202023200428877160 453360592054808827465448 480464262307841630870504 509063212590810032173890 523370262738428836767056 543969642958272037877196 7949810605293426944390185024 Example 10-1

14 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-14 MTB > Regress 'Dollars' 1 'Miles'; SUBC> Constant. Regression Analysis The regression equation is Dollars = 275 + 1.26 Miles Predictor Coef Stdev t-ratio p Constant 274.8 170.3 1.61 0.120 Miles 1.25533 0.04972 25.25 0.000 s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4% Analysis of Variance SOURCE DF SS MS F p Regression 1 64527736 64527736 637.47 0.000 Error 23 2328161 101224 Total 24 66855896 MTB > Regress 'Dollars' 1 'Miles'; SUBC> Constant. Regression Analysis The regression equation is Dollars = 275 + 1.26 Miles Predictor Coef Stdev t-ratio p Constant 274.8 170.3 1.61 0.120 Miles 1.25533 0.04972 25.25 0.000 s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4% Analysis of Variance SOURCE DF SS MS F p Regression 1 64527736 64527736 637.47 0.000 Error 23 2328161 101224 Total 24 66855896 Example 10-1: Using the Computer

15 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-15 The results on the right side are the output created by selecting REGRESSION option from the DATA ANALYSIS toolkit. Example 10-1: Using Computer-Excel

16 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-16 Residual Analysis. The plot shows the absence of a relationship between the residuals and the X-values (miles). Residuals vs. Miles -800 -600 -400 -200 0 200 400 600 0100020003000400050006000 Miles Residuals Example 10-1: Using Computer-Excel

17 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-17 Y X What you see when looking at the total variation of Y. X What you see when looking along the regression line at the error variance of Y. Y Total Variance and Error Variance

18 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-18 X Y Square and sum all regression errors to find SSE. 10-4 Error Variance and the Standard Errors of Regression Estimators

19 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-19 Standard Errors of Estimates in Regression

20 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-20 Length = 1 Height = Slope Least-squares point estimate: b 1 =1.25533 Upper 95% bound on slope: 1.35820 Lower 95% bound: 1.15246 (not a possible value of the regression slope at 95%) 0 Confidence Intervals for the Regression Parameters

21 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-21 The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by , can take on any value from -1 to 1. The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by , can take on any value from -1 to 1.  indicates a perfect negative linear relationship -1<  <0 indicates a negative linear relationship  indicates no linear relationship 0<  <1 indicates a positive linear relationship  indicates a perfect positive linear relationship The absolute value of  indicates the strength or exactness of the relationship.  indicates a perfect negative linear relationship -1<  <0 indicates a negative linear relationship  indicates no linear relationship 0<  <1 indicates a positive linear relationship  indicates a perfect positive linear relationship The absolute value of  indicates the strength or exactness of the relationship. 10-5 Correlation

22 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-22 Y X  =0 Y X  =-.8 Y X  =.8 Y X  =0 Y X  =-1 Y X  =1 Illustrations of Correlation

23 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-23 Example 10-1: = r SS XY SS X Y   51402852.4 40947557.8466855898 51402852.4 5232194329 9824 ()().. *Note: If  0, b 1 >0 Covariance and Correlation

24 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-24 Example 10-2: Using Computer-Excel

25 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-25 8 9101112 2 3 4 5 6 7 8 9 United States International Y = -8.76252 + 1.42364XR-Sq = 0.9846 Regression Plot Example 10-2: Regression Plot

26 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-26 H 0 :  =0(No linear relationship) H 1 :  0(Some linear relationship) Test Statistic: Hypothesis Tests for the Correlation Coefficient

27 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-27 Y X Y X Y X Constant YUnsystematic VariationNonlinear Relationship A hypothesis test for the existence of a linear relationship between X and Y: H 0 H 1 Test statistic for the existence of a linear relationship between X and Y: (-) where is the least-squares estimate ofthe regression slope and() is the standard error of. When thenull hypothesis is true, the statistic has a distribution with- degrees offreedom. : : ()   1 0 1 0 2 1 1 111 2    t n b sb bsbb tn Hypothesis Tests about the Regression Relationship

28 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-28 Hypothesis Tests for the Regression Slope

29 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-29 The coefficient of determination, r 2, is a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data.. { Y X { } Total Deviation Explained Deviation Unexplained Deviation Percentage of total variation explained by the regression. 10-7 How Good is the Regression?

30 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-30 Y X r 2 =0SSE SST Y X r 2 =0.90 SSESSE SST SSR Y X r 2 =0.50 SSE SST SSR The Coefficient of Determination

31 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-31 10-8 Analysis of Variance and an F Test of the Regression Model

32 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-32 10-9 Residual Analysis and Checking for Model Inadequacies

33 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-33 Point Prediction – A single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation. Prediction Interval – For a value of Y given a value of X Variation in regression line estimate Variation of points around regression line – For an average value of Y given a value of X Variation in regression line estimate Point Prediction – A single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation. Prediction Interval – For a value of Y given a value of X Variation in regression line estimate Variation of points around regression line – For an average value of Y given a value of X Variation in regression line estimate 10-10 Use of the Regression Model for Prediction

34 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-34 X Y X Y Regression line Upper limit on slope Lower limit on slope 1) Uncertainty about the slope of the regression line X Y X Y Regression line Upper limit on intercept Lower limit on intercept 2) Uncertainty about the intercept of the regression line Errors in Predicting E[Y|X]

35 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-35 X Y X Prediction Interval for E[Y|X] Y Regression line The prediction band for E[Y|X] is narrowest at the mean value of X. The prediction band widens as the distance from the mean of X increases. Predictions become very unreliable when we extrapolate beyond the range of the sample itself. The prediction band for E[Y|X] is narrowest at the mean value of X. The prediction band widens as the distance from the mean of X increases. Predictions become very unreliable when we extrapolate beyond the range of the sample itself. Prediction Interval for E[Y|X] Prediction band for E[Y|X]

36 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-36 Additional Error in Predicting Individual Value of Y 3) Variation around the regression line X Y Regression line X Y X Prediction Interval for E[Y|X] Y Regression line Prediction band for E[Y|X] Prediction band for Y

37 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-37 Prediction Interval for a Value of Y

38 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-38 Prediction Interval for the Average Value of Y

39 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-39 MTB > regress 'Dollars' 1 'Miles' tres in C3 fits in C4; SUBC> predict 4000; SUBC> residuals in C5. Regression Analysis The regression equation is Dollars = 275 + 1.26 Miles Predictor Coef Stdev t-ratio p Constant 274.8 170.3 1.61 0.120 Miles 1.25533 0.04972 25.25 0.000 s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4% Analysis of Variance SOURCE DF SS MS F p Regression 1 64527736 64527736 637.47 0.000 Error 23 2328161 101224 Total 24 66855896 Fit Stdev.Fit 95.0% C.I. 95.0% P.I. 5296.2 75.6 ( 5139.7, 5452.7) ( 4619.5, 5972.8) MTB > regress 'Dollars' 1 'Miles' tres in C3 fits in C4; SUBC> predict 4000; SUBC> residuals in C5. Regression Analysis The regression equation is Dollars = 275 + 1.26 Miles Predictor Coef Stdev t-ratio p Constant 274.8 170.3 1.61 0.120 Miles 1.25533 0.04972 25.25 0.000 s = 318.2 R-sq = 96.5% R-sq(adj) = 96.4% Analysis of Variance SOURCE DF SS MS F p Regression 1 64527736 64527736 637.47 0.000 Error 23 2328161 101224 Total 24 66855896 Fit Stdev.Fit 95.0% C.I. 95.0% P.I. 5296.2 75.6 ( 5139.7, 5452.7) ( 4619.5, 5972.8) Using the Computer

40 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-40 5500500045004000350030002500200015001000 500 0 -500 Miles R e s i d s 700060005000400030002000 500 0 -500 Fits R e s i d s MTB > PLOT 'Resids' * 'Fits'MTB > PLOT 'Resids' *'Miles' Plotting on the Computer (1)

41 COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-41 Plotting on the Computer (2) MTB > HISTOGRAM 'StRes' 210-1-2 8 7 6 5 4 3 2 1 0 StRes F r e q u e n c y 5500500045004000350030002500200015001000 7000 6000 5000 4000 3000 2000 Miles D o l l a r s MTB > PLOT 'Dollars' * 'Miles'


Download ppt "COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-1 Using Statistics The Simple."

Similar presentations


Ads by Google