Download presentation
Presentation is loading. Please wait.
Published byArchibald Willis Modified over 8 years ago
1
Predicting the Next Pitch? Gartheeban Ganeshapillai, John Guttag Data Driven Medicine Group
2
Data Driven Medicine Group? Our day job is finding novel ways to predict and/or avoid adverse medical events, e.g, – Choosing cardiovascular therapies – Avoiding antibiotic resistant infections – Reducing brain damage in premature infants Wanted to see if techniques could be brought to bear on a more important problem
3
If the bases are loaded in a tie game and the count is 2-2, should Pedroia guess that Sabbathia will throw a fastball?
4
In General Given information about: – Past performance – The state of the game and of the at bat Predict whether a particular type of pitch will be thrown Result: We can do considerably better than simply taking the probability based on priors
5
Factors Considered Batter’s profile – Slugging percentage, runs, pitch types faced and performance Game state – Inning, score differential, men on base, outs At bat state – Count, type and location of previous pitch Pitcher’s tendency to throw a pitch (prior) – Coupled with other factors Batter Count
6
Method : Supervised Machine Learning Machine Learning Algorithm Many labeled feature vectors Classifier Yes / No Train offline on historical data Pitcher- specific Classifier One un-labeled feature vector Use learned classifier in game setting
7
Method Binary classifier for each pitcher / pitch type – 300 pitchers – 6 pitch types – 1,800 classifiers 49 features per data point – Weights automatically learned for each classifier Features are mapped to output (yes / no) – E.g., will CC Sabbathia’s next pitch be a fastball?
8
Features Often Heavily Weighted Previous pitch’s velocity Velocity Gradient Previous pitch type Inning Outs Score difference Bases occupied Pitcher – Batter prior Pitcher – Count prior Shrunk Pitcher – Batter prior Shrunk Pitcher – Count prior
9
Shrunk Priors? Not enough data normal priors are not reliable Consider Tim Wakefield Knuckle BallFastballSupport Global Average82%12% Jason Giambi90%9%59 Adam Everett33%66%3
10
What Is A Shrunk Prior? Parameters – s : probability (Tim Wakefield’s prior against Adam Everett is 33% ) – n : support (Support for prior against Adam Everett is 3) – p : global average (Tim Wakefield 82% Knuckle Ball) – β a global constant (4 in our experiments) Shrunk prior
11
Wakefield’s Shrunk Priors Knuckle BallFastballSupportShrunk Prior (fast ball) Global Average82%12% Jason Giambi90%9%599.2% Adam Everett33%66%335%
12
Experiments Trained on MLB pitch data for 2008 from STATS Inc. Selection criteria for pitcher/pitch type combination: Pitchers with at least 300 pitches in both years Evaluation over entire 2009 season: Accuracy Improvement, I = A our_model / A naive_model – 1
13
Highlights for Fastball Accuracy is 70% across 359 pitchers Mean improvement of 18% over naive model Maximum improvement is 311% – Andy Sonnanstine learned a new pitch in the off season Threw mostly fastball in ‘08 and mostly cutters in ‘09
14
Summary Of Results Prior probability for pitch type between 0.3 and 0.7
15
Some Examples
16
Predictability Against Count
17
Future Work Improve accuracy of binary classifiers by – Adjust meta-parameters on a per pitcher basis – Improving features – Updating with data from earlier in season (or even game) Use similar ideas to create a multi-class classifier
18
If the bases are loaded in a tie game and the count is 2-2, should Pedroia guess that Sabbathia will throw a fastball?
19
Fastball Bases are loaded and tied Game Still Fastball Dustin Pedroia at bat Still Fastball Count is 2-2 SINKER
20
Acknowledgements STATS Inc. provided us with the data Quanta Computer Inc. supported work Data Driven Medicine Group @ MIT CSAIL “What Pitches Batters Chase & Other Observations about Pitch Location” Jenna Wiens, Joel Brooks, Anima Singh, Guha Balakrishnan
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.