Download presentation
Presentation is loading. Please wait.
1
Data Science with Your Hair on Fire
Applied Research in Soccer/Football Ted Knutson, CEO StatsBomb
2
Where Do You Start? Unlike American sports, soccer was relatively unexplored from a statistical perspective. Early quants: Charles Reep. Dr. Garry Gelade (Chelsea with Mike Forde). Some basic work with Sam Allardyce’s group from Bolton. StatDNA (bought by Arsenal). Public work? - Dejardins. Grayson. Taylor.
3
Hockey Framework Correlations. Ratios. Shots. What do you care about? Well, we really care about goals. How are they scored? Who scores them? Can we predict young player performance into the future? Team performance (largely for gambling purposes)? Theoretical Framework proved problematic. 1) All shots are not alike. And in soccer they are far less alike than in hockey. 2) Unlike hockey, pace seems to vary more in soccer, so ratios fall apart quickly. A 2:1 shot ratio could be 18:9 or 12:6 in reality. One of these ratios will draw far more than the other, and result in very different scorelines.
4
Luis Suarez Analysis Summer Luis Suarez has scored 23 goals for Liverpool. High volume shooter, high volume of key passes and dribbles. Context: Great Poor Generally inefficient. Question: What do you do if someone offers you £35M for Suarez? Answer: You try to negotiate them up to £40M… …and then you bite their hand off.
5
Luis Suarez vs 13-14
6
The Story of Andros and Neymar
Two major things football teams need to know about data: There are learning curves involved. (Science takes time.) Your evaluations are still dependent on analysts.
7
Andros and Neymar
9
Shot Maps
11
The Transition to xG Logistic Regression Measures chance quality
1 = certain goal, 0 = certain miss Based on how often similar shots were scored historically Similarity based on distance from goal, kick vs header, type of assist etc Proven more predictive of future team performance than headline metrics (points, goal difference etc) Not great for single shots or games
12
Passing Model and GAMs Credit: Initial version from Łukasz Szczepański & Ian McHale StatsBomb version: Marek Kwiatkowski The model consists of two stages: the baseline stage and the mixed stage. The baseline model is a GAM (Generalized Additive Model) seeking to estimate the a priori probability of a pass being completed, and is perfectly analogous to xG models. The mixed model (a GLMM, Generalized Linear Mixed Model) postulates that the completion likelihood of a pass is a function of its baseline xP value and the identity of the player. Players are modelled as random effects and the fitted value of the effect's magnitude is taken to represent the corresponding player's skill.
13
Passing Model and GAMs
14
Bring Out the Bayes - Finishing Skill
Credit: Marek Kwiatkowski
15
And then… We broke everything.
Enter StatsBomb Data Players applying pressure Duration of pressure Actions under pressure Location of players on pitch during a shot Position of Goalkeeper Ball receipt
16
SO MANY BROKEN THINGS
17
SO MANY BROKEN THINGS Additions to the passing model include
Right foot/Left foot Under pressure? Intended Recipient Backheel Subtractions from the passing model include About 20 million open play passes...
18
And also some NEW things
Credit: Derrick Yam, StatsBomb
19
NEW Things Spacial relative risk model based on handedness of GK.
(Credit: Derrick Yam, StatsBomb)
20
While science progressed...
We continued to do active work inside of football/soccer, including Superior styles of play Player Recruitment Manager Recruitment Innovation in set pieces Is Burnley Manager Sean Dyche a warlock? Investigation of new edges (sleep, fatigue, etc) Our understanding of the game was deeply imperfect the entire time. However, because football is a competitive sport/industry, perfection isn’t the requirement. We simply need to be less wrong than our competition.
21
Thank You. Ted Knutson
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.