Presentation is loading. Please wait.

Presentation is loading. Please wait.

Josh Weissbock Using Performance Metrics to Forecast Success in the National Hockey League.

Similar presentations

Presentation on theme: "Josh Weissbock Using Performance Metrics to Forecast Success in the National Hockey League."— Presentation transcript:

1 Josh Weissbock Using Performance Metrics to Forecast Success in the National Hockey League

2 Introduction to Hockey Introduction to Performance Metrics
Outline Introduction to Hockey Introduction to Performance Metrics Predicting the outcome of a Single Game Exploration of Performance Metrics J. Weissbock (2013)

3 Predicting other series Random Chance and Luck in the NHL
Outline Continued Predicting other series Random Chance and Luck in the NHL Predicting Best-of-Seven Playoff Series Use of NLP and Game report Predictions Future Work J. Weissbock (2013)

4 Aim Performance Metrics (or “Advanced Stats”) have been shown on the internet to correlate much higher to wins and points in the standing, for the National Hockey League, than traditional statistics posted by the NHL. Can we use these advanced stats predict success in the NHL? J. Weissbock (2013)

5 Introduction Lack of academic attention to hockey.
Hard to analyze due to the lack of events (goals). We attempt to use Machine Learning to predict a single game in the National Hockey League: Using Traditional Statistics; Using Performance Metrics; and “Tuning Performance Metrics” J. Weissbock (2013)

6 What is Machine Learning
Branch of Artificial Intelligence. The construction and study of systems that learn from data. “Learning by example”. “Field of study that gives computers the ability to learn without being explicitly programmed” – Arthur Samuel, 1959. E.g. OCR – Learning of printed digits from previous examples. J. Weissbock (2013)

7 Simple ML Example J. Weissbock (2013)

8 Sports in Machine Learning
Chen et al. (1994) used Neural Networks to predict greyhound races. 2006 Soccer World Cup prediction accuracy 75%. NFL accuracy with neural networks: 78.6%. NCAA Football games prediction accuracy: 76%. NBA basketball accuracy at 76%. J. Weissbock (2013)

9 Background Hockey is a sport played on a rectangular sheet of ice 60-61m x 25-30m. 2x teams of 5x players and 1x goal keeper (goalie). Team the scores the most goals in 60 minutes wins. NHL top league in the world. Other major leagues: KHL, SHL, ELH. Top 16 teams at the end of the year compete in an elimination tournament for the Stanley Cup. 4x rounds of best-of-seven series. J. Weissbock (2013)

10 Background Traditional Statistics are “Real Time Scoring System” Statistics, based on goals (low events), usually are simple goal based stats and subject to rink bias. i.e. Goals, Assists, +/-, Giveaway, Takeaway, etc. Demonstrated bias amongst RTSS statistics in NHL arenas. Advanced Statistics based on more events (all shots, misses, blocks, and goals), shown to be highly correlated to wins and points. J. Weissbock (2013)

11 Advanced vs Traditional Statistics
Home/Points r-squared Road/Points Fenwick Close 0.623 0.218 Goals Against 0.358 -0.119 Goal Differential 0.297 0.196 Goals 0.233 0.084 Wins 0.186 0.199 Points 0.177 Blocked Shots 0.157 -0.117 Hits 0.116 -0.021 Giveaways -0.001 -0.002 Takeaways -0.003 Source: J. Weissbock (2013)

12 Advanced vs Traditional Statistics
Team Stat Relationships vs Wins R^2 vs Points R^2 5-5 F/A 0.605 0.655 GA/G 0.472 0.510 PP+PK 0.372 0.390 G/G 0.352 0.360 Sv% 0.227 0.263 PP% 0.221 0.231 SA/G 0.198 0.191 S/G 0.170 0.203 S% 0.160 0.145 PK% 0.152 FO% 0.097 0.109 Source: J. Weissbock (2013)

13 Advanced Statistics Fenwick Close: statistic of posession, summation of shots, missed shots, blocks, and goals. Correlates to zone time. “Close” refers to only when the score is within 1 in the 1st/2nd period, or when the score is tied in 3rd/OT to eliminate “Score Effects”. PDO: Statistic of “luck” (or random chance), summation of Shooting % + Save %. Regresses to 100% +/- 2% over a 82-game season. 5/5 Goals For/Against: The ratio of goals scored for and against during even strength play. J. Weissbock (2013)

14 Experiment 1 Predicting a single game in NHL using both advanced and traditional statistics. To see how high of an accuracy we can obtain To see if advanced or traditional statistics better help predicting at the micro-scale. J. Weissbock (2013)

15 Data 517 games of 2012-2013 NHL Season (72% of season)
Python script to collect data daily. 14 Features/Team collected including: Location, Goals For & Against, Season Goals For & Against, PP%, PK%, Sh%, Sv%, 5v5 Goals For/Against, Win Streak, Conference Standing, Fenwick Close, PDO. Data collected before and after game: Goals scored for & against, shots for & against. To assist calculating statistics for future games. Sources:,, J. Weissbock (2013)

16 Example of data J. Weissbock (2013)

17 Experiment Data represented as differentials between both teams.
Two entries for each game, one for each team. Labelled as either “win” or “loss”. Weka’s implementations of SMO (Support Vector Machines), Neural Networks, J48 (Decision Tree), and NaiveBayes. Binary classification using 10-fold cross-validation. Compared datasets of only traditional and advanced statistics, as well as both. J. Weissbock (2013)

18 Experiment Traditional Advanced Mixed Baseline 50% SMO 58.61% 54.55%
NB 57.25% 54.93% 56.77% J48 55.42% 50.29% 55.51% NN 57.06% 52.42% 59.38% J. Weissbock (2013)

19 Experiment Best results from Neural Networks:
With additional tuning, accuracy of 59.38%; Not statistically different than SMO. Splitting the data into testing/training (66%/33%) – accuracy of 57.83%: Looked at pairs labelled Win/Win or Loss/Loss by algorithm, keeping the label with the highest confidence and inverting the other, accuracy of 59%. Ensemble learning w/ stacking and voting returned similar accuracy. J. Weissbock (2013)

20 Experiment Using Consistency Subset Evaluation, the top three features were: Location Goals against Goal differential J. Weissbock (2013)

21 Experiment Second half of our experimental evaluation, we consider shortening PDO to the last n games to see how “lucky” a team has been recently PDO1 PDO3 PDO5 PDO10 PDO25 PDOAll Baseline 50% SMO 58.61% NB 56.38% 56.96% 56.58% 56.77% J48 54.93% 55.71% 55.42% 55.90% 55.61% 55.51% NN 57.64% 56.67% 58.03% 57.74% 58.41% J. Weissbock (2013)

22 Discussion Altering PDO does not appear to have a significant affect on accuracy. Can predict ~60% of games correctly. Despite possession shown to be more useful in long term predictions, the traditional statistics are better for predicting a single game. In a single game the most valued features are: goals against, goal differential and location. J. Weissbock (2013)

23 Future Work Collecting additional features:
Rest days, days of travel, time-zone shifts, altitude shifts, change in weather at arena, gambling odds, injures, score-adjusted Fenwick, possession over the last n games. Collecting a full season of data (1230 games regularly). Training on past seasons of data. Compare prediction of single game in other league with same features Predicting the playoffs with best-of-seven series. J. Weissbock (2013)

24 Random Chance Also known as “luck” or “stochastic processes”.
A single goal can often cause a game of hockey to be won. i.e., the puck deflecting off of another player. Low scoring games with small goal differential. How relevant is this to the observed outcome in the NHL? We can analyze this with Classical Test Theory. J. Weissbock (2013)

25 Observed Score = Talent + Error
Classical Test Theory Used to predict outcomes of psychological testing such as the ability of test-takers. Modeled with: Observed Score = Talent + Error J. Weissbock (2013)

26 In the NHL Assuming that all teams’ strengths are distributed evenly, reasonable assumption to use Bernoulli trial = 0.5(1-0.5): Var(talent) = (0.5*0.5)/82 Looking at the observed winning percentage in the last 7 seasons (to , post lockout): Var(observed) = 0.09^2 J. Weissbock (2013)

27 In the NHL Var(observed) = var(talent) + var(error)
Var(talent) = var(obs) – var(error) Var(talent) = 0.09^2 – (.25/82) Var(talent) = – Var(talent) = J. Weissbock (2013)

28 In the NHL Portion of observed results are made up of talent?
( / (0.09^2)) = Luck = 1 - ( / (0.09^2)) = Luck explains 37.64% of the variance in the results in the standings. Is there a theoretical limit in Machine Learning for predicting a single game? J. Weissbock (2013)

29 Monte Carlo Method Using a Monte Carlo method to simulate NHL seasons with 10,000 iterations, with random strengths assigned to each team, we compared the found standard deviations (SD) of win % to the observed. The results found were: Observed: 0.09. “All Skill” Better team always wins: 0.3 (p=4.8x10^-16). “All Luck”, 50% change of winning: (p=0.02). J. Weissbock (2013)

30 Monte Carlo Method Neither are close. J. Weissbock (2013)

31 Monte Carlo Pt 2 Varied the luck/skill required to win a game.
90% luck, 10% skill. Rule used to determine game winner: If rand() < luck, then game has 50% chance of winning; Else: better team wins. J. Weissbock (2013)

32 Monte Carlo Pt 2 24% skill, 76% luck SD is 0.894 (p=0.992)
J. Weissbock (2013)

33 Results Using the Monte Carlo method it appears the 24% skill league is the closest to the NHL observed league. In other words: The actual observed distribution of win-loss records in the NHL is indistinguishable from a league in which 76% of the games are decided at random and not by the comparative strength of each opponent. J. Weissbock (2013)

34 Discussion If we know that 24% of outcomes will be won by the better team, and by random chance you’ll win half of the other 76% by “luck”, this suggests a theoretical limit in predictions for machine learning to be 24% + (76%/2) = 62%. Similar work has been done in the NFL and finds a limit of ~76% which is in line with the machine learning research in the NFL. Makes sense due to low number of events in NHL. Basketball and Tennis have high number of events, likely to have a higher limit. J. Weissbock (2013)

35 Ontario Hockey League Applied the exact same method to the Ontario Hockey League (OHL). The OHL is 1 of 3 leagues in Canadian Hockey League, based in Ontario and Michigan, for years old. The CHL is the main league where the NHL drafts players from. The other two leagues in the CHL are the Western Hockey League (Prairies, BC, and Western US), and the QMJHL (Quebec and Maritimes). J. Weissbock (2013)

36 Ontario Hockey League Collected and calculated data on all 682 games in the regular season. Used the most useful features as in the NHL: Goals Against, Goal Differential, Location. Represented data in the exact same vector differential (2x a game) as the NHL. Used the same classifiers as with the NHL: SMO, NN, NB and J48. J. Weissbock (2013)

37 Ontario Hockey League Best results also come from tuned Neural Networks: ~64%. Observed SD in win% for the OHL since 1997 is (vs 0.09 in the NHL). Running a Monte Carlo with the same method, the OHL is similar to a league where results are determined by 43% skill, 57% luck. Suggests theoretical maximum ML prediction rate of 71.5%. J. Weissbock (2013)

38 Ontario Hockey League Suggests the OHL is more predictable than the NHL. More goals scored per game, lower goalie sv%, forwards closer to their peak age, greater distribution in talent due to age of development Suggests can predict near theoretical maximum using three features: Goals Against, Goal Differential, Location. Need to look at two other CHL leagues as well as other leagues (AHL, ECHL, NCAA, European, Australian). Need to compare the levels of parity amongst these leagues and see how it correlates to prediction rate. J. Weissbock (2013)

39 Canadian Hockey League
Exact same method was performed on the QMJHL and the WHL with similar results: WHL: ~62% w/ tuned Neural Network, obs SD since 1997, 71% theoretical maximum. QMJHL: ??% w/ tuned Neural Network, obs SD win% since 1997, 72.5% theoretical max. Suggests can build a classifier that will predict near (3% +/- 1%) a league’s theoretical maximum using Goals Against, Goal Differential and Location. J. Weissbock (2013)

40 Playoff Predictions Using traditional RTSS statistics and advanced performance metrics. 6 seasons of data, 15 series a season, 90 total series to train on (Small Sample Size). ~30 features for each team: Home, Distance, Conference Standing, Division Ratings, Z-Rating, BSWP, Strength of Schedule, Season Fenwick-Close, Score-Adjusted Fenwick, Corsi, Sh%, Sv%, PDO, Cap Hit, 5v5 Goals For, Against, Differential, Win%, PEwin%, Points, 5/5 Goals For/against, PP%, PK%, STI, Days rest, Games total, Fenwick Last-7, Corsi Last-7, Goal Year Sv%, GAA, Ev Sv%. J. Weissbock (2013)

41 Playoff Predictions 10-fold cross-validation. SMO: 71.11%.
Logistic Regression: 70%. NeuralNetworks: 67.77%. Voting: 68.89%. All statistically different than baseline. With Tuning: SMO: 74.44%. NN: 71.11%. J. Weissbock (2013)

42 Playoff Prediction Most useful features (per CfsSubsetEval): Z-Rating;
Pythagorean Expected Win%; and Fenwick Last-7. In playoff series prediction, advanced statistics outperform traditional statistics. Easier to predict than a single game due to larger sample. Future work: Roster Strengths. J. Weissbock (2013)

43 NLP Predictions 1230 pre-game reports. Each game has text for both teams, only looked at paragraphs from each team. Example: Team 1: It wasn't a banner offseason for the Flyers, who lost on attempts at signing Zach Parise, Ryan Suter and Shea Weber, and then watched as defenseman Matt Carle and top-line forward Jaromir Jagr departed in free agency… Team 2: Pittsburgh had Stanley Cup hopes last spring, but the Flyers dashed them in six games in the first round. Now, with Sidney Crosby and Evgeni Malkin healthy at the same time, the Penguins will be looking to start the season the right way... J. Weissbock (2013)

44 NLP Predictions Removed stop-words and Player Names.
Labeled winning teams text with “Win” and losing teams text with “Loss”. Two entries to train on per game. Using WEKA, Bag of Words, SMO default values, and 10-fold cross-validation. Achieves Accuracy of 55%. Better than baseline (50%) but close to simple classifier of home team always winning (56%). J. Weissbock (2013)

45 Future Work NLP: Classify into subjective/objective, and then objective into Positive/Negative, look at correlation with winning. Using the SentiWordNet positive/negative values for each word in the game report. Use of AFINN, Linguistic Inquiry, Latent Semantic Analysis, Polarity Lexicon. Twitter predictions. J. Weissbock (2013)

46 Conclusion ~60% Accuracy to predict a single game.
Traditional statistics more effective in predicting a single game than advanced statistics. Predicting a single game is difficult due to large variance in the standings. Theoretical limit in prediction for machine learning for a single game in the NHL appears to be 62%. Changes based on the parity of the league and number of events. J. Weissbock (2013)

47 Conclusion Predicting playoff series accuracy at ~74%
Not enough data, but will take a decade for another 150 to be available Advanced Statistics are better than Traditional over long term NLP can be used to predict games better than the baseline Simple BOW and SMO returns 55%, still more work here J. Weissbock (2013)

48 Follow my hockey analysis on twitter: @joshweissbock
Questions? Joshua Weissbock Follow my hockey analysis on J. Weissbock (2013)

Download ppt "Josh Weissbock Using Performance Metrics to Forecast Success in the National Hockey League."

Similar presentations

Ads by Google