Presentation is loading. Please wait.

Presentation is loading. Please wait.

PECOTA Under the Hood Nate Silver, Baseball Prospectus, 7-11-07.

Similar presentations

Presentation on theme: "PECOTA Under the Hood Nate Silver, Baseball Prospectus, 7-11-07."— Presentation transcript:

1 PECOTA Under the Hood Nate Silver, Baseball Prospectus, 7-11-07

2 Background  PECOTA originally stood for  Pitcher  Empirical  Comparison [and]  Optimization  Test  Algorithm  Developed in Spring/Summer 2002 on my own time  Original model was limited:  Pitchers only  No minor league statistics  Sold to Gary Huckabay at Baseball Prospectus; became part of Baseball Prospectus Premium subscription package.

3 Why Did the World Need another Forecasting System?  Different Aging Curves for Different Players  Interrelationships between different Skills  Comparable Players  Range of Performance Outcomes  Multi-year Forecasting

4 The Three Steps 1. Baseline Forecast 2. Selection of Comparables 3. Forecast Range Developed based on Performance of Comparables

5 Baseline Forecast  All statistics are normalized  Park factors (customized for PECOTA)  League/offensive environment factors  League difficulty factors (new in 2007)  “Role” adjustments (starter/reliever)  The Big, Fat, Hairy Regression  Prior three years of major/minor league data are analyzed  Robust dataset provides for flexibility/creativity  Builds in some second-order relationships

6 Selection of Comparable Players  Key Concept: Forward-looking comparables are different from backward-looking comparables; goal is to identify those factors that are most important from a forecasting point of view.  Weights originally developed based on Analysis of Variance (ANOVA)  Resembles a “nearest neighbor” analysis

7 Selection of Comparable Players Hitters Isolated Power Batting Average Walk Rate Speed Score Strikeout Rate Groundball/Flyball Ratio Playing Time Position Weight Major League Experience Height Handedness (LH/RH) Pitchers Strikeout Rate Walk Rate Groundball/Flyball Ratio Isolated Power Against Batting Average Against Playing Time Role (Starter/Relief) Handedness Major League Experience Height Weight

8 Forecast Range  The actual performances of the comparable players compared against their respective baselines; this creates an implied performance of the subject relative to his baseline  One key variable (EqA/EqERA) is used to calibrate other statistical categories, which are determined based on an iterative process involving regression PlayerBaselineActualDelta Aaron.320.330+.010 Murphy.300.280-.020 Robinson.330.360+.030 A-Rod.320.340+.020 AVERAGE+.010 Pujols.330.340+.010

9 But Does it Work?  Internal Study (2003): PECOTA most accurate forecasting system for pitching; tied for 1 st in hitting  External Study (2006): PECOTA most accurate for hitting; 2 nd for pitching  Percentile Forecasts (2005 Internal Study)  Accurate for hitting forecasts, e.g. almost exactly 10% of players exceed their 90 th percentile forecasts  Pitching forecasts tended to slightly underestimate range of outcomes (10 th /90 th percentiles not wide enough); problem has since been corrected  PECOTA team W/L forecasts beat 23 of 30 Vegas Over/Under Lines in 2006  PECOTA prospect lists performed competitively with scouting-based lists in 2006

10 Challenges and Caveats  PECOTA has inherent problems with highly unique players (Ichiro/Bonds)  Interaction between playing time and rate performance is complicated  Minor League pitchers can be hard to distinguish based on statistics alone  PECOTA uses data since 1946; may miss recent changes in aging curves  No detailed injury information  Subject to some noise from sample size effects

11 Key Findings  Interactions between different statistical categories do matter  Players with robust skill sets tend to age better  Certain skills cannot be understood in isolation (walk rate for hitters, opponents’ BA for pitchers)  Minor league statistics should be read differently from major league statistics  Example: strikeout rate for hitters, home run rate for pitchers are comparatively more important  Inflection points / Rexrode Threshold  Pitchers are not so unpredictable if you focus on the right statistics  Attrition rates are higher than is generally acknowledged for both pitchers and hitters  Some players are riskier than others (Beta)

Download ppt "PECOTA Under the Hood Nate Silver, Baseball Prospectus, 7-11-07."

Similar presentations

Ads by Google