Presentation is loading. Please wait.

Presentation is loading. Please wait.

MLB STATS Group SIX Astrid AmsallemJoel De Martini Naiwen ChangQi He Wenjie HuangWesley Thibault.

Similar presentations


Presentation on theme: "MLB STATS Group SIX Astrid AmsallemJoel De Martini Naiwen ChangQi He Wenjie HuangWesley Thibault."— Presentation transcript:

1 MLB STATS Group SIX Astrid AmsallemJoel De Martini Naiwen ChangQi He Wenjie HuangWesley Thibault

2 1.What affects the WINS of a pitcher? 2.How can these hot pitchers get so much $?! 3.What factors determine the attendance of the game! We are interested in…

3 1.New York Yankees 2.Boston Red Sox 3.Chicago Cubs 4.Philadelphia Phillies 5.New York Mets 6.Detroit Tigers 7.Chicago White Sox 8.Los Angeles Angels 9.San Francisco Giants 10.Los Angeles Dodgers Top 10 salaries in MLB

4 1. Starter Pitchers 2. ERA: Earned Run Average 3. K9: Strike out per nine innings 4. AVG 5. WINS (player) 6. WHIP: Walks and hits per inning pitched 7. Career Experience ERAk9whipavg Variables Explanation

5 Regress Wins against other variables Dependent Variable: LNWINS Method: Least Squares Date: 12/01/10 Time: 11:46 Sample: 1 45 Included observations: 45 LNWINS= C(1)+C(2)*K9+C(3)*EXPERIENCE+C(4)*ERA+C(5)*AVG +C(6)*WHIP CoefficientStd. Errort-StatisticProb. C(1)3.5985842.4819661.4498930.1551 C(2)0.0249040.0939780.2650010.7924 C(3)0.1141700.0176856.4559170.0000 C(4)-0.4210040.271878-1.5485020.1296 C(5)9.1487708.5350011.0719120.2903 C(6)-0.8273801.297105-0.6378670.5273 R-squared0.583988 Mean dependent var4.281809 Adjusted R-squared0.530654 S.D. dependent var0.616112 S.E. of regression0.422092 Akaike info criterion1.236377 Sum squared resid6.948293 Schwarz criterion1.477265 Log likelihood-21.81849 F-statistic10.94948 Durbin-Watson stat1.656059 Prob(F-statistic)0.000001

6 Dropping K9 Dependent Variable: LNWINS Method: Least Squares Date: 12/01/10 Time: 11:46 Sample: 1 45 Included observations: 45 LNWINS= C(1)+C(3)*EXPERIENCE+C(4)*ERA+C(5)*AVG+C(6)*WHIP CoefficientStd. Errort-StatisticProb. C(1)4.1682561.2260363.3997830.0015 C(3)0.1134820.0172896.5640310.0000 C(4)-0.4278530.267483-1.5995500.1176 C(5)7.2991924.8550531.5034220.1406 C(6)-0.7411321.240930-0.5972390.5537 R-squared0.583239 Mean dependent var4.281809 Adjusted R-squared0.541563 S.D. dependent var0.616112 S.E. of regression0.417157 Akaike info criterion1.193732 Sum squared resid6.960804 Schwarz criterion1.394472 Log likelihood-21.85896 F-statistic13.99459 Durbin-Watson stat1.660198 Prob(F-statistic)0.000000

7 Dropping WHIP Dependent Variable: LNWINS Method: Least Squares Date: 12/01/10 Time: 11:47 Sample: 1901 1945 Included observations: 45 LNWINS=C(1)+C(3)*EXPERIENCE+C(4)*ERA+C(5)*AVG CoefficientStd. Errort-StatisticProb. C(1)3.7270990.9708363.8390630.0004 C(3)0.1146020.0170516.7210230.0000 C(4)-0.5487330.173501-3.1627070.0029 C(5)7.0660364.8012161.4717180.1487 R-squared0.579523 Mean dependent var4.281809 Adjusted R-squared0.548756 S.D. dependent var0.616112 S.E. of regression0.413872 Akaike info criterion1.158165 Sum squared resid7.022876 Schwarz criterion1.318757 Log likelihood-22.05872 F-statistic18.83610 Durbin-Watson stat1.656580 Prob(F-statistic)0.000000

8 Dropping AVG Dependent Variable: LNWINS Method: Least Squares Date: 12/01/10 Time: 11:48 Sample: 1 45 Included observations: 45 LNWINS=C(1)+C(3)*EXPERIENCE+C(4)*ERA CoefficientStd. Errort-StatisticProb. C(1)4.8833490.5781978.4458150.0000 C(3)0.1157200.0172696.7009570.0000 C(4)-0.3930060.139395-2.8193620.0073 R-squared0.557310 Mean dependent var4.281809 Adjusted R-squared0.536229 S.D. dependent var0.616112 S.E. of regression0.419577 Akaike info criterion1.165201 Sum squared resid7.393882 Schwarz criterion1.285645 Log likelihood-23.21702 F-statistic26.43725 Durbin-Watson stat1.688959 Prob(F-statistic)0.000000

9 White Heteroskedasticity Test: F-statistic0.738906 Probability0.598933 Obs*R-squared3.894030 Probability0.564772 Test Equation: Dependent Variable: RESID^2 Method: Least Squares Date: 12/01/10 Time: 12:45 Sample: 1901 1945 Included observations: 45 VariableCoefficientStd. Errort-StatisticProb. C-6.5246285.254761-1.2416600.2218 EXPERIENCE0.2325870.1950571.1924090.2403 EXPERIENCE^20.0032620.0046710.6983790.4891 EXPERIENCE*ERA-0.0709340.049483-1.4334920.1597 ERA2.9981532.5914491.1569410.2543 ERA^2-0.3177620.326293-0.9738550.3361 R-squared0.086534 Mean dependent var0.164308 Adjusted R-squared-0.030577 S.D. dependent var0.449694 S.E. of regression0.456518 Akaike info criterion1.393186 Sum squared resid8.127921 Schwarz criterion1.634075 Log likelihood-25.34669 F-statistic0.738906 Durbin-Watson stat1.865662 Prob(F-statistic)0.598933 a=0.05 Chi-Square (5) =11.0705 > 3.894030 There is no heteroskedasticity.

10 We get that experience and ERA are the most important factors in determining how many wins a player has. It is intuitive that the longer the player is in the league, the more wins he will inevitably receive, but it also is important to note that ERA is the most important performance statistic in determining the number of wins. Conclusion 1

11 Regress Salary against other variables Dependent Variable: SALARY Method: Least Squares Date: 11/29/10 Time: 22:16 Sample: 1 45 Included observations: 45 SALARY=C(1)+C(2)*K9+C(3)*ERA+C(4)*EXPERIENCE+C(5)*AVG +C(6)*WINS+C(7)*WHIP CoefficientStd. Errort-StatisticProb. C(1)13.2624026.938820.4923150.6253 C(2)0.5350601.0198480.5246470.6029 C(3)-5.8240673.073178-1.8951280.0657 C(4)0.9658930.2986153.2345700.0025 C(5)39.5724194.305490.4196190.6771 C(6)0.0105380.0229380.4593940.6486 C(7)-3.34088014.13198-0.2364060.8144 R-squared0.571672 Mean dependent var8.531823 Adjusted R-squared0.504041 S.D. dependent var6.503770 S.E. of regression4.580238 Akaike info criterion6.023414 Sum squared resid797.1862 Schwarz criterion6.304450 Log likelihood-128.5268 F-statistic8.452832 Durbin-Watson stat1.995391 Prob(F-statistic)0.000007

12 Dropping WHIP Dependent Variable: SALARY Method: Least Squares Date: 11/29/10 Time: 22:17 Sample: 1 45 Included observations: 45 SALARY=C(1)+C(2)*K9+C(3)*ERA+C(4)*EXPERIENCE+C(5)*AVG +C(6)*WINS CoefficientStd. Errort-StatisticProb. C(1)12.7797126.534210.4816310.6328 C(2)0.4745650.9751980.4866340.6292 C(3)-6.3289312.182998-2.8991930.0061 C(4)0.9640780.2948813.2693750.0023 C(5)33.7448789.918240.3752840.7095 C(6)0.0110230.0225680.4884400.6280 R-squared0.571042 Mean dependent var8.531823 Adjusted R-squared0.516047 S.D. dependent var6.503770 S.E. of regression4.524459 Akaike info criterion5.980439 Sum squared resid798.3586 Schwarz criterion6.221327 Log likelihood-128.5599 F-statistic10.38359 Durbin-Watson stat1.985975 Prob(F-statistic)0.000002

13 Dropping AVG Dependent Variable: SALARY Method: Least Squares Date: 11/29/10 Time: 22:18 Sample: 1 45 Included observations: 45 SALARY=C(1)+C(2)*K9+C(3)*ERA+C(4)*EXPERIENCE+C(6)*WINS CoefficientStd. Errort-StatisticProb. C(1)21.8775110.670972.0501900.0469 C(2)0.1848620.5894660.3136100.7554 C(3)-5.9241451.877419-3.1554720.0030 C(4)0.9417780.2857143.2962290.0021 C(6)0.0124770.0219930.5673310.5737 R-squared0.569493 Mean dependent var8.531823 Adjusted R-squared0.526442 S.D. dependent var6.503770 S.E. of regression4.475605 Akaike info criterion5.939599 Sum squared resid801.2417 Schwarz criterion6.140340 Log likelihood-128.6410 F-statistic13.22841 Durbin-Watson stat1.995520 Prob(F-statistic)0.000001

14 Dropping K9 Dependent Variable: SALARY Method: Least Squares Date: 11/29/10 Time: 22:18 Sample: 1 45 Included observations: 45 SALARY=C(1)+C(3)*ERA+C(4)*EXPERIENCE+C(6)*WINS CoefficientStd. Errort-StatisticProb. C(1)24.481386.6289713.6930880.0006 C(3)-6.2201631.604937-3.8756420.0004 C(4)0.9507030.2811503.3814780.0016 C(6)0.0107900.0210890.5116310.6117 R-squared0.568434 Mean dependent var8.531823 Adjusted R-squared0.536856 S.D. dependent var6.503770 S.E. of regression4.426119 Akaike info criterion5.897611 Sum squared resid803.2117 Schwarz criterion6.058203 Log likelihood-128.6962 F-statistic18.00096 Durbin-Watson stat1.983763 Prob(F-statistic)0.000000

15 Dropping WINS Dependent Variable: SALARY Method: Least Squares Date: 11/29/10 Time: 21:48 Sample: 1 45 Included observations: 45 SALARY=C(1)+C(3)*ERA+C(4)*EXPERIENCE CoefficientStd. Errort-StatisticProb. C(1)25.809696.0455664.2691940.0001 C(3)-6.5491641.457502-4.4934160.0001 C(4)1.0602660.1805645.8719590.0000 R-squared0.565679 Mean dependent var8.531823 Adjusted R-squared0.544997 S.D. dependent var6.503770 S.E. of regression4.387048 Akaike info criterion5.859530 Sum squared resid808.3399 Schwarz criterion5.979975 Log likelihood-128.8394 F-statistic27.35131 Durbin-Watson stat1.967714 Prob(F-statistic)0.000000

16 White Heteroskedasticity Test: F-statistic3.148422 Probability0.017555 Obs*R-squared12.94058 Probability0.023942 Test Equation: Dependent Variable: RESID^2 Method: Least Squares Date: 11/30/10 Time: 16:14 Sample: 1901 1945 Included observations: 45 VariableCoefficientStd. Errort-StatisticProb. C250.6808336.64940.7446340.4610 ERA-81.81855166.0227-0.4928150.6249 ERA^26.23788120.904150.2984040.7670 ERA*EXPERIENCE4.3204113.1701611.3628360.1808 EXPERIENCE-23.2117112.49642-1.8574680.0708 EXPERIENCE^20.5195190.2992791.7359000.0905 R-squared0.287569 Mean dependent var17.96311 Adjusted R-squared0.196231 S.D. dependent var32.62247 S.E. of regression29.24707 Akaike info criterion9.713002 Sum squared resid33360.26 Schwarz criterion9.953890 Log likelihood-212.5425 F-statistic3.148422 Durbin-Watson stat1.975917 Prob(F-statistic)0.017555 a=0.05 Chi-Square (5) = 11.0705<12.94058 So there is Heteroskedasticity.

17 We come to the same conclusion for salary. Experience and ERA are the main contributing factors to salary as we found before for wins. Conclusion 2

18 Offensive 1.BA 2.HR 3.RBI Wins 4.Last Wins: previous year’s wins 5.WINS Other 6.City Population 7.Team payroll Variables Explanation

19 Are hitting statistics important factors for attendance? 2010 percent home attendance vs. important 2010 batting statistics Dependent Variable: PERCENT Method: Least Squares Date: 11/30/10 Time: 23:21 Sample: 1 30 Included observations: 30 VariableCoefficientStd. Errort-StatisticProb. BA31.89626456.53290.0698660.9448 HR-0.0672210.164119-0.4095870.6855 RBI0.1317820.0906431.4538560.1580 C-18.4595191.70705-0.2012880.8420 R-squared0.197309 Mean dependent var68.53000 Adjusted R-squared0.104690 S.D. dependent var19.27158 S.E. of regression18.23493 Akaike info criterion8.768120 Sum squared resid8645.326 Schwarz criterion8.954947 Log likelihood-127.5218 F-statistic2.130342 Durbin-Watson stat0.591678 Prob(F-statistic)0.120674 Are hitting statistics important factors for attendance? 2010 percent home attendance vs. important 2010 batting statistics

20 Dependent Variable: PERCENT Method: Least Squares Date: 12/01/10 Time: 12:44 Sample: 1 30 Included observations: 30 VariableCoefficientStd. Errort-StatisticProb. HR-0.0712570.150760-0.4726520.6403 RBI0.1359780.0666382.0405450.0512 C-12.4702131.97302-0.3900230.6996 R-squared0.197158 Mean dependent var68.53000 Adjusted R-squared0.137688 S.D. dependent var19.27158 S.E. of regression17.89574 Akaike info criterion8.701642 Sum squared resid8646.950 Schwarz criterion8.841761 Log likelihood-127.5246 F-statistic3.315261 Durbin-Watson stat0.587604 Prob(F-statistic)0.051583 Drop BA

21 Drop Constant Dependent Variable: PERCENT Method: Least Squares Date: 12/01/10 Time: 12:40 Sample: 1 30 Included observations: 30 VariableCoefficientStd. Errort-StatisticProb. RBI0.1133850.0324373.4955580.0016 HR-0.0521440.140397-0.3714000.7131 R-squared0.192635 Mean dependent var68.53000 Adjusted R-squared0.163800 S.D. dependent var19.27158 S.E. of regression17.62270 Akaike info criterion8.640593 Sum squared resid8695.666 Schwarz criterion8.734006 Log likelihood-127.6089 F-statistic6.680705 Durbin-Watson stat0.481354 Prob(F-statistic)0.015250 RBI’S ARE THE MOST IMPORTANT OFFENSIVE STATISTIC

22 Are previous year’s wins significant for attendance? (Yes) Dependent Variable: PERCENT Method: Least Squares Date: 12/01/10 Time: 12:47 Sample: 1 30 Included observations: 30 VariableCoefficientStd. Errort-StatisticProb. LASTWINS1.1916400.2252225.2909660.0000 C-27.9928618.41785-1.5198770.1398 R-squared0.499949 Mean dependent var68.53000 Adjusted R-squared0.482090 S.D. dependent var19.27158 S.E. of regression13.86898 Akaike info criterion8.161526 Sum squared resid5385.758 Schwarz criterion8.254939 Log likelihood-120.4229 F-statistic27.99432 Durbin-Watson stat1.528954 Prob(F-statistic)0.000013

23 Are current year’s wins significant for attendance? (Yes)* Dependent Variable: PERCENT Method: Least Squares Date: 12/01/10 Time: 12:48 Sample: 1 30 Included observations: 30 VariableCoefficientStd. Errort-StatisticProb. WINS0.9359050.2797213.3458510.0023 C-7.27834322.85866-0.3184060.7525 R-squared0.285618 Mean dependent var68.53000 Adjusted R-squared0.260104 S.D. dependent var19.27158 S.E. of regression16.57687 Akaike info criterion8.518234 Sum squared resid7694.195 Schwarz criterion8.611648 Log likelihood-125.7735 F-statistic11.19472 Durbin-Watson stat0.832869 Prob(F-statistic)0.002348 *but not as significant as previous years’ record (Bandwagon)

24 Is city population significant for attendance? (No) Dependent Variable: PERCENT Method: Least Squares Date: 12/01/10 Time: 12:51 Sample: 1 30 Included observations: 30 VariableCoefficientStd. Errort-StatisticProb. CITYPOP2.95E-061.64E-061.7975940.0830 C63.596004.36224014.578750.0000 R-squared0.103465 Mean dependent var68.53000 Adjusted R-squared0.071446 S.D. dependent var19.27158 S.E. of regression18.57039 Akaike info criterion8.745354 Sum squared resid9656.063 Schwarz criterion8.838767 Log likelihood-129.1803 F-statistic3.231345 Durbin-Watson stat0.609337 Prob(F-statistic)0.083035 -It’s more general baseball enthusiasm rather than population: E.g. Boston population= 645,169 Attendance%=100.9% Arizona population= 1,601,587 Attendance%=51.8%

25 Is team payroll significant for attendance? (Yes) Dependent Variable: PERCENT Method: Least Squares Date: 12/01/10 Time: 12:52 Sample: 1 30 Included observations: 30 VariableCoefficientStd. Errort-StatisticProb. TEAMPAY3.63E-076.60E-085.5030250.0000 C35.480786.4982765.4600300.0000 R-squared0.519588 Mean dependent var68.53000 Adjusted R-squared0.502430 S.D. dependent var19.27158 S.E. of regression13.59391 Akaike info criterion8.121461 Sum squared resid5174.243 Schwarz criterion8.214874 Log likelihood-119.8219 F-statistic30.28328 Durbin-Watson stat1.229796 Prob(F-statistic)0.000007 High team payroll=big names=more fans.

26 Are city population and payroll correlated? Dependent Variable: CITYPOP Method: Least Squares Date: 11/30/10 Time: 23:57 Sample: 1 30 Included observations: 30 VariableCoefficientStd. Errort-StatisticProb. TEAMPAY0.0331870.0082614.0171550.0004 C-1349341.813618.3-1.6584440.1084 R-squared0.365619 Mean dependent var1671312. Adjusted R-squared0.342963 S.D. dependent var2099771. S.E. of regression1702029. Akaike info criterion31.59688 Sum squared resid8.11E+13 Schwarz criterion31.69029 Log likelihood-471.9532 F-statistic16.13753 Durbin-Watson stat2.409794 Prob(F-statistic)0.000401 Even though big cities don’t necessarily lead to high att.%, it is correlated with team payroll. There is an indirect affect from city pop, larger pop=>higher payroll=>attracts big names.

27 Conclusion Most Significant Statistics: Salary Experience, ERA Wins Experience, ERA, Offense (Constant) Attendance : RBI, Wins, Team Payroll

28 Questions?


Download ppt "MLB STATS Group SIX Astrid AmsallemJoel De Martini Naiwen ChangQi He Wenjie HuangWesley Thibault."

Similar presentations


Ads by Google