Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 11 日 第十二週:建立迴歸模型.

Similar presentations


Presentation on theme: "1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 11 日 第十二週:建立迴歸模型."— Presentation transcript:

1 1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 11 日 第十二週:建立迴歸模型

2 2 2 Slide Chapter 16 Regression Analysis: Model Building n General Linear Model n Determining When to Add or Delete Variables n Analysis of a Larger Problem n Variable-Selection Procedures n Residual Analysis n Multiple Regression Approach to Analysis of Variance and to Analysis of Variance and Experimental Design Experimental Design

3 3 3 Slide General Linear Model Models in which the parameters (  0,  1,...,  p ) all have exponents of one are called linear models. n First-Order Model with One Predictor Variable n Second-Order Model with One Predictor Variable n Second-Order Model with Two Predictor Variables with Interaction with Interaction

4 4 4 Slide General Linear Model Often the problem of nonconstant variance can be corrected by transforming the dependent variable to a different scale. n Logarithmic Transformations Most statistical packages provide the ability to apply logarithmic transformations using either the base-10 (common log) or the base e = 2.71828... (natural log). n Reciprocal Transformation Use 1/ y as the dependent variable instead of y.

5 5 5 Slide Models in which the parameters (  0,  1,...,  p ) have exponents other than one are called nonlinear models. In some cases we can perform a transformation of variables that will enable us to use regression analysis with the general linear model. n Exponential Model The exponential model involves the regression equation: We can transform this nonlinear model to a linear model by taking the logarithm of both sides. General Linear Model

6 6 6 Slide Determining When to Add or Delete Variables n F Test To test whether the addition of x 2 to a model involving x 1 (or the deletion of x 2 from a model involving x 1 and x 2 ) is statistically significant

7 7 7 Slide Variable-Selection Procedures n Stepwise Regression At each iteration, the first consideration is to see whether the least significant variable currently in the model can be removed because its F value, FMIN, is less than the user-specified or default F value, FREMOVE. At each iteration, the first consideration is to see whether the least significant variable currently in the model can be removed because its F value, FMIN, is less than the user-specified or default F value, FREMOVE. If no variable can be removed, the procedure checks to see whether the most significant variable not in the model can be added because its F value, FMAX, is greater than the user-specified or default F value, FENTER. If no variable can be removed, the procedure checks to see whether the most significant variable not in the model can be added because its F value, FMAX, is greater than the user-specified or default F value, FENTER. If no variable can be removed and no variable can be added, the procedure stops. If no variable can be removed and no variable can be added, the procedure stops.

8 8 8 Slide n Forward Selection This procedure is similar to stepwise-regression, but does not permit a variable to be deleted. This procedure is similar to stepwise-regression, but does not permit a variable to be deleted. This forward-selection procedure starts with no independent variables. This forward-selection procedure starts with no independent variables. It adds variables one at a time as long as a significant reduction in the error sum of squares (SSE) can be achieved. It adds variables one at a time as long as a significant reduction in the error sum of squares (SSE) can be achieved. Variable-Selection Procedures

9 9 9 Slide n Backward Elimination This procedure begins with a model that includes all the independent variables the modeler wants considered. This procedure begins with a model that includes all the independent variables the modeler wants considered. It then attempts to delete one variable at a time by determining whether the least significant variable currently in the model can be removed because its F value, FMIN, is less than the user-specified or default F value, FREMOVE. It then attempts to delete one variable at a time by determining whether the least significant variable currently in the model can be removed because its F value, FMIN, is less than the user-specified or default F value, FREMOVE. Once a variable has been removed from the model it cannot reenter at a subsequent step. Once a variable has been removed from the model it cannot reenter at a subsequent step. Variable-Selection Procedures

10 10 Slide n Best-Subsets Regression The three preceding procedures are one-variable- at-a-time methods offering no guarantee that the best model for a given number of variables will be found. The three preceding procedures are one-variable- at-a-time methods offering no guarantee that the best model for a given number of variables will be found. Some software packages include best-subsets regression that enables the use to find, given a specified number of independent variables, the best regression model. Some software packages include best-subsets regression that enables the use to find, given a specified number of independent variables, the best regression model. Minitab output identifies the two best one- variable estimated regression equations, the two best two-variable equation, and so on. Minitab output identifies the two best one- variable estimated regression equations, the two best two-variable equation, and so on. Variable-Selection Procedures

11 11 Slide Example: PGA Tour Data The Professional Golfers Association keeps a variety of statistics regarding performance measures. Data include the average driving distance, percentage of drives that land in the fairway, percentage of greens hit in regulation, average number of putts, percentage of sand saves, and average score. The variable names and definitions are shown on the next slide.

12 12 Slide n Variable Names and Definitions Drive : average length of a drive in yards Fair : percentage of drives that land in the fairway Green : percentage of greens hit in regulation (a par-3 green is “hit in regulation” if the player’s first shot lands on the green) Putt : average number of putts for greens that have been hit in regulation Sand : percentage of sand saves (landing in a sand trap and still scoring par or better) Score : average score for an 18-hole round Example: PGA Tour Data

13 13 Slide n Sample Data Drive Fair Green Putt Sand Score Drive Fair Green Putt Sand Score 277.6.681.6671.768.55069.10 259.6.691.6651.810.53671.09 269.1.657.6491.747.47270.12 267.0.689.6731.763.67269.88 267.3.581.6371.781.52170.71 255.6.778.6741.791.45569.76 272.9.615.6671.780.47670.19 265.4.718.6991.790.55169.73 Example: PGA Tour Data

14 14 Slide n Sample Data (continued) Drive Fair Green Putt Sand Score Drive Fair Green Putt Sand Score 272.6.660.6721.803.43169.97 263.9.668.6691.774.49370.33 267.0.686.6871.809.49270.32 266.0.681.6701.765.59970.09 258.1.695.6411.784.50070.46 255.6.792.6721.752.60369.49 261.3.740.7021.813.52969.88 262.2.721.6621.754.57670.27 Example: PGA Tour Data

15 15 Slide n Sample Data (continued) Drive Fair Green Putt Sand Score Drive Fair Green Putt Sand Score 260.5.703.6231.782.56770.72 271.3.671.6661.783.49270.30 263.3.714.6871.796.46869.91 276.6.634.6431.776.54170.69 252.1.726.6391.788.49370.59 263.0.687.6751.786.48670.20 263.0.639.6471.760.37470.81 253.5.732.6931.797.51870.26 266.2.681.6571.812.47270.96 Example: PGA Tour Data

16 16 Slide n Sample Correlation Coefficients Score Drive Fair Green Putt Score Drive Fair Green Putt Drive -.154 Fair -.427-.679 Green -.556-.045.421 Putt.258-.139.101.354 Sand -.278-.024.265.083 -.296 Example: PGA Tour Data

17 17 Slide n Best Subsets Regression of SCORE Vars R-sq R-sq(a) C-p s D F G P S 130.927.926.9.39685X 130.927.926.9.39685X 118.214.635.7.43183X 254.750.512.4.32872XX 254.650.512.5.32891XX 360.755.110.2.31318XXX 359.153.311.4.31957XXX 472.266.84.2.26913XXXX 460.953.112.1.32011XXXX 572.665.46.0.27499XXXXX Example: PGA Tour Data

18 18 Slide n Minitab Output The regression equation Score = 74.678 -.0398(Drive) - 6.686(Fair) - 10.342(Green) + 9.858(Putt) - 10.342(Green) + 9.858(Putt) Predictor Coef Stdev t-ratio p Constant74.6786.95210.74.000 Drive-.0398.01235-3.22.004 Fair-6.6861.939-3.45.003 Green-10.3423.561-2.90.009 Putt9.8583.1803.10.006 s =.2691 R-sq = 72.4% R-sq(adj) = 66.8% Example: PGA Tour Data

19 19 Slide n Minitab Output Analysis of Variance SOURCE DF SS MS F P Regression 4 3.79469.94867 13.10.000 Error 20 1.44865.07243 Total 24 5.24334 Example: PGA Tour Data

20 20 Slide Residual Analysis: Autocorrelation n Durbin-Watson Test for Autocorrelation Statistic Statistic The statistic ranges in value from zero to four. The statistic ranges in value from zero to four. If successive values of the residuals are close together (positive autocorrelation), the statistic will be small. If successive values of the residuals are close together (positive autocorrelation), the statistic will be small. If successive values are far apart (negative auto- If successive values are far apart (negative auto- correlation), the statistic will be large. A value of two indicates no autocorrelation. A value of two indicates no autocorrelation.

21 21 Slide End of Chapter 16


Download ppt "1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 5 月 11 日 第十二週:建立迴歸模型."

Similar presentations


Ads by Google