# Optimization of Batting Order Frank R. Zheng. A Quick Introduction to Baseball  Two teams alternate batting and fielding.  Batting team tries to score.

## Presentation on theme: "Optimization of Batting Order Frank R. Zheng. A Quick Introduction to Baseball  Two teams alternate batting and fielding.  Batting team tries to score."— Presentation transcript:

Optimization of Batting Order Frank R. Zheng

A Quick Introduction to Baseball  Two teams alternate batting and fielding.  Batting team tries to score runs.  Runners must advance through first, second and third base in order to reach home  Runners are advanced by players getting hits, drawing walks, stealing bases, or errors by the opposing team’s defense  The team with the most runs at the end of the game wins

Batting Order  Before each game, the team’s coach must submit the batting order of the team  The batting order dictates the order in which players step up to the plate  Substitutions such as pitch hitters or pitch runners are allowed, but are relatively rare  The optimal batting order maximizes the expected run production

Batting Order Optimization as a Scheduling Problem  Finding the optimal batting order for a team can be thought of as a single-machine scheduling problem  Each batter is modeled as a job, and the batting order is a set of 9 such jobs  The objective function is to maximize the run production of the lineup  This is a complicated function that requires simulation to analyze

Approach to Optimize Batting Order  Each baseball team has a roster of ~15 batters, of which only 9 compose the batting order  Brute forcing all the possible lineups is somewhat impractical – need to calculate 15!/6! combinations (over 1.8 billion unique lineups)  Solution is to combine a qualitative “conventional wisdom” approach with a data-driven quantitative methodology

Batting Order Conventional Wisdom  Over the many decades baseball has been played, coaches have dedicated much thought to finding the best lineup  Traditional lineups follow this general order   1-2 – batters who get on base on a lot  3-5 – batters who get a lot of extra base hits  6-8 – weak batters  9 – pitcher/weak batter/batter who gets on base a lot  Key is to have players with a high realization value (lots of runs batted in) follow those with a high potential value (getting on base a lot)  i.e., get runners on base so your power hitters can drive them home

Underlying Causes of Run Production  There is a limited set of events that have the potential to score runs  We refer to these as “Run-Producing Events” or RPEs  RPEs include   Singles (1B)  Doubles (2B)  Triples (3B)  Home Runs (HR)  Bases on Balls/Batter Hitter by Pitch (BB+HBP)  Errors (ERR)

Batting Performance  Does the model fully capture differences among player batting characteristics?  How to distinguish between ‘table setters’ vs. ‘sluggers/cleanup hitters’? Regression Value-0.10400.46590.32550.76131.04561.40310.4340 OUT1BBB+HBP2B3BHRERR

Realization Value vs. Potential Value  Realization Value is the expected number of runs each RPE actually scores  Potential Value is the effect each RPE has on the team’s chances to score additional runs in the same inning  Differentiating between these two metrics allows us to quantitatively determine which players create the potential for scoring runs and which ones are good at bringing those players to home plate OUT1BBB+HBP2B3BHRERR Realization Value0.00000.23140.03280.51200.74111.73870.1000 Potential Value-0.10400.23450.29270.24930.3045-0.33560.3340 Total Value-0.10400.46590.32550.76131.04561.40310.4340

Differentiating Players  By comparing each individual’s realization value and potential value to the team’s overall averages, we can group players into one of four categories   (R+, P+) Strong Hitters – players who bat in a lot of runs but also create the potential for more runs  (R+, P-) Run Producers – players who bat in a lot of runs  (R-, P+) Table Setters – players who create a lot of potential for more runs  (R-, P-) Weak Hitters – the team’s worst players  This gives us the quantitative data we need to apply the conventional wisdom discussed earlier

Overview of Heuristic  Now we have the tools we need to combine the holistic conventional wisdom with quantitative data  We adapted this heuristic from the work of Sokol  After determining which players fall into which set, we attempt to follow the conventional wisdom of placing batters with high realization values after a group of batters with high potential values  We want to build up potential value and then release it with realization value  The optimal order of the four sets is   (R-, P+)  (R+, P+)  (R+, P-)  (R-, P-)

Heuristic Steps  Select the two batters with the highest P in the (R-, P+) set and assign them to the top two slots in the batting order, by order of increasing P  Place all batters in the (R+, P+) group in the next slots, ordered by decreasing P  Fill as many remaining slots as possible with batters from the (R+, P-) group, ordered by decreasing P  If there are any remaining slots, fill them with batters in the (R-, P-) group, ordered by increasing P  For each player left in the (R-, P+) group, replace a (R-, P-) player if possible, ordering the new (R-, P+) players by increasing P

Application to 2011 New York Yankees  In order to see the effects of our heuristic, we applied it to the 2011 New York Yankees  First, we placed each player into the appropriate category (R+, P-) Run Producers (R-, P-) Weak Hitters (R-, P+) Table Setters (R+, P+) Strong Hitters

Simulation  In order to determine the value of our objective function (the expected number of runs scored per game) we need to simulate a game of baseball using the designated lineup  Our simulation follows the structure of a normal game of baseball  At each point in time, the next batter steps up to the plate and either generates a RPE or gets out, depending on that player’s distribution  RPEs advance runners according to the rules of baseball or by probabilistic outcomes determined using data from the 2011 season  The number of outs and runs is recorded for each of 16,200 games

Results of Analysis Standard Lineup  This lineup generated an average of 5.68 runs, and is expected to have a 61.3% chance of winning a 5-game series against the Detroit Tigers Heuristic Lineup  This lineup generated an average of 5.84 runs, with a 64.7% chance of winning a 5-game series against the Detroit Tigers Batting OrderPlayerSet 1Derek JeterR-, P+ 2Curtis GrandersonR+, P- 3Robinson CanoR+, P- 4Alex RodriguezR+, P+ 5Mark TeixeiraR+, P- 6Nick SwisherR-, P+ 7Jorge PosadaR-, P- 8Russel MartinR-, P- 9Brett GardnerR-, P+ Batting OrderPlayerSet 1Brett GardnerR-, P+ 2Derek JeterR-, P+ 3Alex RodriguezR+, P+ 4Robinson CanoR+, P- 5Curtis GrandersonR+, P- 6Andruw JonesR+, P- 7Mark TeixeiraR+, P- 8Russel MartinR-, P- 9Nick SwisherR-, P+

Conclusions and Other Applications  The heuristic was only able to generate a lineup with a 3% increase in the amount of expected runs  Since statistical analysis in baseball is a known quantity, it may be the case that the NYY have already studied this problem in great detail  Even if the gains in expected run production were minimal, there are other applications for our methodology   Potential trades or acquisitions of new players can be evaluated by what effect they would have on the team’s expected run production  Can apply a game-theoretic approach to maximize your expected win rate by adjusting the distribution of your team’s run production to maximize the potential of winning a game against a specific team

Download ppt "Optimization of Batting Order Frank R. Zheng. A Quick Introduction to Baseball  Two teams alternate batting and fielding.  Batting team tries to score."

Similar presentations