Presentation is loading. Please wait.

Presentation is loading. Please wait.

Baseball Statistics: Just for Fun!. 2/16 Issues, Theory, and Data Hypothesis Hypothesis Testing Home Run hitters: more strikeouts and four balls, and.

Similar presentations


Presentation on theme: "Baseball Statistics: Just for Fun!. 2/16 Issues, Theory, and Data Hypothesis Hypothesis Testing Home Run hitters: more strikeouts and four balls, and."— Presentation transcript:

1 Baseball Statistics: Just for Fun!

2 2/16 Issues, Theory, and Data Hypothesis Hypothesis Testing Home Run hitters: more strikeouts and four balls, and less steals? Data collection Korea Baseball Organization and US Major League Home Pages Model y1=#strikeouts,y2=#steals,y3=#4Bs, x=#HRs. Regress y on constant, x. Test the statistical significance of regression slopes using t-tests.

3 3/16 2. Data Collection KBO http://www.koreabaseball.or.kr US Major League Baseball http://www.majorleaguebaseball.com

4 4/16 3. Model I (#strike outs) =  1 +  1 (#HRs) +  (#strike outs) =  1 +  1 (#HRs) + 

5 5/16 3. Model II (#steals made) =  2 +  2 (#HRs) +  (#steals made) =  2 +  2 (#HRs) +  (#steals attempted) =  3 +  3 (#HRs) +  (#steals attempted) =  3 +  3 (#HRs) + 

6 6/16 3. Model III (# four balls) =  4 +  4 (#HRs) +  (# four balls) =  4 +  4 (#HRs) + 

7 7/16 4. Hypothesis Testing  t-test on   1 = 0.84 t -value = 2.89  1 = ??  1 = ??  4 = 0.51 t -value = 2.50  4 = ??  4 = ??  2 = -0.12  3 = -0.18  2 = -0.12  3 = -0.18 t -value = t -value = -0.94 t -value =-1.14  2,  3 = ??  2,  3 = ?? Insignificant Significant

8 8/16 4. Hypothesis Testing (1)HR hitters get more strike outs! (3)HR hitters pull out more four balls! (2) HR hitter does not well steal a base because of his big body. Insignificant

9 Wait a minute! To prevent “ spurious correlation ” between #HRs and #strike-outs, #steals, #4Balls, we need to control for the number of appearance at the batter box. Right!

10 10/16 Multiple Regression – control for “ #at bats ” -  Without “control for # at bats,” a hitter with more appearances would record a higher number in each category than others, generating “spurious correlation between any pair of variables among #HRs, #strike-outs, #steals, and #four balls.  Two ways of control for # at batter box 1.Use a subsample of hitters who appeared more than 100. 2.Use “# at bats” as a control variable in multiple regression.

11 11/16 Model I (extended) (#strike outs) =  1 +  1 (#HRs) +  2 (#at bats) (#strike outs) =  1 +  1 (#HRs) +  2 (#at bats)

12 12/16 Results  1 = 0.89 (2.88)  1 = 0.89 (2.88)  2 -0.03 (-0.49)  2 = -0.03 (-0.49)  1 = 0.84 (2.89)  1 = 0.84 (2.89)  1 = 2.40 (11.64)  1 = 2.40 (11.64)  1 = 0.63 (3.11)  1 = 0.63 (3.11)  2 0.14 (12.53)  2 = 0.14 (12.53) using entire sample using sub-sample

13 13/16 When using a sub-sample which is already rather homogeneous in terms of number at bats, it doesn’t make much diference whether you control for # at bats or not. However, when using the entire sample which comprises of hitters vastly differing in terms of number at bats, control for # at bats does matter. In this entire sample, you would get distorted results if you do not control for # at bats. Interpretation  1 = 0.89 (2.88)  1 = 0.89 (2.88)  2 -0.03 (-0.49)  2 = -0.03 (-0.49)  1 = 0.84 (2.89)  1 = 0.84 (2.89)  1 = 2.40 (11.64)  1 = 2.40 (11.64)  1 = 0.63 (3.11)  1 = 0.63 (3.11)  2 0.14 (12.53)  2 = 0.14 (12.53) entire sample sub-sample

14 14/16 Model II (extended) (#4Balls) =  1 +  1 (#HRs) +  2 (#at bats) (#4Balls) =  1 +  1 (#HRs) +  2 (#at bats)

15 15/16 Results  1 = 0.34 (1.71)  1 = 0.34 (1.71)  2 0.12 (2.77)  2 = 0.12 (2.77)  1 = 0.51 (2.50)  1 = 0.51 (2.50)  1 = 1.32 (11.01)  1 = 1.32 (11.01)  1 = 0.33 (2.73)  1 = 0.33 (2.73)  2 (11.51)  2 = 0.07 (11.51) entire sample sub-sample

16 The End Was it fun?


Download ppt "Baseball Statistics: Just for Fun!. 2/16 Issues, Theory, and Data Hypothesis Hypothesis Testing Home Run hitters: more strikeouts and four balls, and."

Similar presentations


Ads by Google