Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Sports Prediction Using Machine Learning Ryan Baird Supervisor: A. Prof. David Dowe.

Similar presentations


Presentation on theme: "Probabilistic Sports Prediction Using Machine Learning Ryan Baird Supervisor: A. Prof. David Dowe."— Presentation transcript:

1 Probabilistic Sports Prediction Using Machine Learning Ryan Baird Supervisor: A. Prof. David Dowe.

2 OVERVIEW: Introduction –Why Sports Prediction? Background –The Elo System of Prediction. Research Undertaken –Prediction and Estimation of K. –Estimation Techniques. –Extending Elo – Home Ground Advantages –Model Selection –Results Conclusion

3 Introduction: Perfect domain for testing and comparing prediction techniques. –Large amounts of past data readily available. –Easy comparison of machine and human performance. Widely understood by public. It’s fun! Why Sports Prediction?

4 The Elo Rating System Devised as a chess player rating scheme by Arpad E. Elo in 1966. Each team / player given a starting rank. Eg. R A = 1000 Predictions given by: S E (A) = Where S E (A) is the expected score of Team A, or the probability that A will defeat B. Rankings updated after each match: R’ A = R A + K( S(A) - S E (A) ) c R A c R A + c R B New Rating Old Rating Expected Score (or prediction) Actual Score 1 = Win 0 = Loss 0.5 = Draw But what is K??

5 The Update Factor, K Needs to be determined Estimate using: –Maximum Likelihood –Minimum Message Length Predict using: –Bayesian Averaging K is a constant, which controls how much the ratings change, given a particular result.

6 The Likelihood Function Based on scoring system used by CSSE footy tipping competition. http://www.csse.monash.edu.au/~footy Predictions scored as follows –If “A” wins: 1 + log 2 p –If “A” loses: 1 + log 2 (1-p) –If drawn:1 + log 2 p(1-p) Let  i = 1 if A wins, 0 if A loses, 0.5 if drawn. L(K) = -∑  i (log 2 p i ) +(1-  i )log 2 (1–p i ) i=1 n

7 Minimum Message Length Wallace & Boulton 1968 Estimate by minimizing the length of a two part message. MsgLen = MsgLen(H) +MsgLen(D|H) Provides a trade off between model fit and hypothesis complexity. Applied using Wallace–Freeman(87) approximation to the message length.

8 Bayesian Averaging Form predictions using each of a fixed range of K values (eg. 0-100) Calculate score so far this season for each of those K values. Calculate weighted average of predictions, using scores so far this season multiplied by a prior for each K, as weighting factors for predictions from each K. i.e. P(A) = ∑ P K (A). S K. Prior K

9 Extending Elo – Home Ground Advantages Most tippers are familiar with the idea of “Home Ground Advantage”. Predict Using: S E (A) = HGA factors must also be estimated Can have: –no HGA, which applies to all teams when at home –1 HGA per team, regardless of opposition –So on up to any number. How many should we include? c R A + c HGA c R A + c R B + c HGA

10 Model Selection Can select models using techniques already discussed: –Maximum Likelihood – Choose the model which maximises the likelihood function. –Minimum Message Length – Choose the model which minimises the 2 part message length. –Bayesian Averaging – Don’t choose a single model: combine them with a weighted average. Or new techniques: –Akaike’s Information Criterion

11 Akaike’s Information Criteria Hirotugu Akaike, 1969 Adresses the over fitting of Maximum Likelihood. Where multiple models are possible, AIC provides a metric for comparing, and selecting the best model. AIC = (-2)log(max.lik.) + 2(# of params.)

12 Results – A.F.L. Season 2000

13 Results – A.F.L. Season 2001

14 Results – A.F.L. Season 2002

15 Conclusion / Achievements: Applied, evaluated and compared prediction and estimation techniques in the domain of sports prediction. Created software for sports prediction, using Elo with MML inference of K and HGAs.

16 Future Work Add more HGAs – perhaps 1 per team per opposition. Software currently works for sports with two competitors: –Cricket, Tennis Extend to sports with more competitors: –Motor Racing, Swimming, Athletics. –More difficult to model. –No longer a win – lose situation.

17 Thankyou! Any questions?


Download ppt "Probabilistic Sports Prediction Using Machine Learning Ryan Baird Supervisor: A. Prof. David Dowe."

Similar presentations


Ads by Google