Download presentation

Presentation is loading. Please wait.

Published byBrooke Coleman Modified over 3 years ago

1
1 An Intelligence Approach to Evaluation of Sports Teams by Edward Kambour, Ph.D.

2
Agenda I.College Football II.Linear Model III.Generalized Linear Model IV.Intelligence (Bayesian) Approach V.Results VI.Other Sports VII.Future Work

3
General Background Goals Forecast winners of future games Beat the Bookie! Estimate the outcome of unscheduled games Whats the probability that Iowa would have beaten Ohio St? Generate reasonable rankings

4
Major College Football No playoff system Computer rankings are an element of the BCS 114 teams 12 games for each in a season

5
Linear Model Rothman (1970s), Harville (1977), Stefani (1977), …, Kambour (1991), …, Sagarin??? Response, Y, is the net result (point-spread) Parameter,, is the vector of ratings For a game involving teams i and j, E[Y] = i - j

6
Linear Model (cont.) Let X be a row vector with E[Y]=X

7
Regression Model Notes Least Squares Normality, Homogeneity College Football Estimate 100 parameters Sample size for a full season is about 600 Design Matrix is sparse and not full rank

8
Home-field Advantage Generic Advantage (Stefani, 1980) Force i to be home team and j the visiting team Add an intercept term to X Adds one more parameter to estimate UAB = Alabama Rice = Texas A&M Team Specific Advantage Doubles the number of parameters to estimate

9
Linear Model Issues Normality Homogeneity Lots of parameters, with relatively small sample size Overfitting The bookie takes you to the cleaners!

10
Linear Model Issues (cont.) Should we model point differential A and B play twice A by 34 in first, B by 14 in the second A by 10 each time Running up the score (or lack thereof) BCS: Thou shalt not use margin of victory in thy ratings!

11
Logistic Regression Rothman (1970s) Linear Model Use binary variable Winning is all that matters Avoid margin of victory Coin Flips

12
Logistic Regression Issues Still have sample size issues Throw away a lot of information Undefeated teams

13
Transformations Transform the differentials to normality Power transformations Rothman logistic transform Transforms points to probabilities for logistic regression Diminishing returns transforms Downweights runaway scores

14
Power Transforms Transform the point-spread Y = sign(Z)|Z| a a = 1 straight margin of victory a = 0 just win baby a = 0 Poisson or Gamma ish

15
Maximum Likelihood Transform seasons MLE = 0.98 Power-2ln(likelihood)

16
Predicting the Score Model point differential Y 1 = S i – S j Additionally model the sum of the points scored Y 2 = S i + S j Fit a similar linear model (different parameter estimates) Forecast home and visitors score H = (Y 1 + Y 2 )/2, V = (Y 2 - Y 1 )/2

17
Another Transformation Idea Scores (touchdowns or field goals) are arrivals, maybe Poisson Final score = 7 times a Poisson + 3 times a Poisson + … Transform the scores to homogeneity and normality first The differences (and sums) should follow suit

18
Square Root Transform Since the score is similar to a linear combination of Poissons, square root should work Transformation Why k? For small Poisson arrival rates, get better performance (Anscombe, 1948)

19
Likelihood Test LRT: No transformation vs. square root with fitted k Used College Football results from k = 21 Transformation was significantly better p-value = , chi-square = 9.26

20
Predicting the Score with Transform Model point differential Additionally model the sum of the points scored Forecast home and visitors score H = ((Y 1 + Y 2 )/2) 2, V = ((Y 2 - Y 1 )/2) 2 Note the point differential is the product

21
Unresolved Linear Model Issues Overfitting History Going into the season, we have a good idea as to how teams will do The best teams tend to stay the best The worst teams tend to stay the worst Changes happen Kansas State

22
Intelligence Model Concept The ratings and home-ads for year t are similar to those of year t-1. There is some drift from one year to the next. Model

23
Intelligence Model (Details) Notation L teams M seasons of data N i games in the ith season X i : the N i by 2L X matrix for season i Y i : the N i vector of results for season i i : the N i vector of results for season I

24
Details (cont.) Data Distribution: For all i = 1, 2, …, M

25
Details (cont.) Prior Distribution

26
Details (finally, the end) The Posterior Distribution of M and -2 is closed form and can be calculated by an iterative method The Predictive Distribution for future results (transformed sum or difference) is straight- forward correlated normal (given the variance)

27
Forecasts For Scores Simply untransform E[Z 2 ] = Var[Z] + E[Z] 2 For the point-spread Product of two normals Simulate results

28
Enhanced Model Fit the prior parameters Hierarchical models Drifts and initial variances No closed form for posterior and predictive distributions (at least as far as I know) The complete conditionals are straight-forward, so Gibbs sampling will work (eventually)

29
Results (www.geocities.com/kambour/football.html) 2002 Final Rankings TeamRatingHome Miami72.23 (1.03)0.21 (0.04) Kansas St72.04 (1.04)0.44 (0.03) USC71.95 (1.03)0.04 (0.03) Oklahoma71.85 (1.02)0.18 (0.03) Texas71.57 (1.03)0.36 (0.03) Georgia71.49 (1.03)0.02 (0.03) Alabama71.45 (1.03)-0.09 (0.03) Iowa71.30 (1.03)0.21 (0.04) Florida St71.29 (1.02)0.43 (0.03) Virginia Tech71.25 (1.03)0.12 (0.03) Ohio St71.18 (1.03)0.27 (0.03)

30
Results 2002 Final Rankings TeamRatingHome Miami Kansas St USC Oklahoma Texas Georgia Alabama Iowa Florida St Virginia Tech Ohio St

31
Results 2002 Final Rankings TeamRatingHome Miami Kansas St USC Oklahoma Texas Georgia Alabama Iowa Florida St Virginia Tech Ohio St

32
Bowl Predictions Ohio St 17 Miami Fl (-13) Washington St 21 Oklahoma (-6.5) Iowa 21 USC (-6) NC State (E) 20 Notre Dame Florida St (+4) 24 Georgia

33
2002 Final Record Picking Winners 522 – Against the Vegas lines 367 – 307 – Best Bets 9 – In 2001,

34
ESPN College Pickem (http://games.espn.go.com/cpickem/leader) 1. Barry Schultz Jim Dobbs Michael Reeves Fup Biz Joe * Rising Cream Intelligence Ratings 5559

35
Ratings System Comparison (http://tbeck.freeshell.org/fb/awards2002.html) Todd Beck Ph.D. Statistician Rush Institute Intelligence Ratings – Best Predictors

36
College Football Conclusions Can forecast the outcome of games Capture the random nature High variability Sparse design Scientists should avoid BCS Statistical significance is impossible Problem Complexity Other issues

37
NFL Similar to College Football Square root transform is applicable Drift is a little higher than College Football Better design matrix Small sample size Playoff

38
NFL Results (www.geocities.com/kambour/NFL.html) 2002 Final Rankings (after the Super Bowl) TeamRatingHome Tampa Bay Oakland Philadelphia New England Atlanta NY Jets Pittsburgh Green Bay Kansas City Denver Miami

39
2002 Final NFL Record Picking Winners 162 – 104 – Against the Vegas lines 135 – 128 – Best Bets 9 –

40
NFL Europe Similar to College and NFL Square root transform Dramatic drift Teams change dramatically in mid-season Few teams Better design matrix

41
College Basketball Transform? Much more normal (Central Limit Theorem) A lot more games Intersectional games Less emphasis on programs than in College Football More drift NCAA tournament

42
NCAA Basketball Pre-tournament Ratings TeamRatingHome Arizona Kentucky Kansas Texas Duke Oklahoma Florida Wake Forest Syracuse Xavier Louisville

43
NBA Similar to College Basketball Normal – No transformation A lot more games – fewer teams Playoffs are completely different from regular season Regular season – very balanced, strong home court Post season – less balanced, home court lessened

44
Hockey Transform Rare events = Poissonish Square root with k around 1 A lot more games History matters Playoffs seem similar to regular season Balance

45
Soccer Similar to hockey Transform Square root with low k Not a lot of games Friendlys versus cup play Home pitch is pronounced Varies widely

46
Soccer Results Correctly forecasted 2002 World Cup final Brazil over Germany Correctly forecasted US run to quarter-finals Won the PROS World Cup Soccer Pool

47
Future Enhancements Hierarchical Approaches Conferences More complicated drift models Correlations Individual drifts Drift during the season Mean correcting drift More informative priors

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google