Nate Silver, Baseball Prospectus,

Slides:



Advertisements
Similar presentations
Leveraging Big Data to Develop Next Generation Demand Side Management Programs and Energy Regulations Daniel Young, Energy Solutions Mike McGaraghan, Energy.
Advertisements

Super Baseball Simulator Game Review by, Mark Leich.
Baseball Statistics By Krishna Hajari Faraz Hyder William Walker.
Satchel Paige A biography is a selection about a real person's life that is written by another person.
Optimization of Batting Order Frank R. Zheng. A Quick Introduction to Baseball  Two teams alternate batting and fielding.  Batting team tries to score.
GameRank: Ranking and Analyzing Baseball Network Zifei Shan, Shiyingxue Li, Yafei Dai
PREDICTING MLB CAREER SALARIES Stephanie Aube Mike Tarpey Justin Teal.
Overview Motivation Data and Sources Methods Results Summary.
CSE 219 COMPUTER SCIENCE III PROJECT INTRODUCTION: A FANTASY BASEBALL DRAFT KIT.
Baseball is a bat and ball sport played between two teams of nine players each. The goal is to score runs by hitting a thrown ball with a bat and touching.
Departments of Medicine and Biostatistics
Biol 500: basic statistics
SHOWTIME! STATISTICAL TOOLS IN EVALUATION DESCRIPTIVE VALUES MEASURES OF VARIABILITY.
Sample Standard Deviaiton Section 5.4 Standard: MM2D2 EQ: What is the difference between sample standard deviation and population standard deviation? Can.
Calculating Baseball Statistics Using Algebraic Formulas By E. W. Click the Baseball Bat to Begin.
Hit Tracker Power Projection Aaron Rowand 1-year report: 2008 Projections, using 2007 data.
Baseball Taylor Brown. Baseball A baseball game is played by two teams who alternate between offense and defense. There are nine players on each side.
HOW TO PLAY THIS GAME!. WHAT MAKES A TEAM! SelflessSelfish AnticipatePresume ResponsibilityPrideful ChallengeProblem.
12/18/13 Owen Richardson  Real Name-George Herman Ruth  Grew up very poor in Baltimore Maryland. Oldest of 8 children  Was sent away to a boarding.
Were the 1994 Expos Just Lucky? Estimating the “real” skill level of a team Phil Birnbaum –
Chapter 8 Standardized Scores and Normal Distributions
Chapter Eleven A Primer for Descriptive Statistics.
Sabermetrics- Advanced Statistics in the MLB. More On Base Percentage (OBP) measures the most important thing a batter can do at the plate: not make.
Some Prospects are Easier to Convert Than Others.
Energy and protein allowances in diets for athletic horses used in western riding Department of Animal Science University of Padova
Vegas Baby A trip to Vegas is just a sample of a random variable (i.e. 100 card games, 100 slot plays or 100 video poker games) Which is more likely? Win.
Scores & Norms Derived Scores, scales, variability, correlation, & percentiles.
Chapter 2 Characterizing Your Data Set Allan Edwards: “Before you analyze your data, graph your data.
Measures of Dispersion 9/24/2013. Readings Chapter 2 Measuring and Describing Variables (Pollock) (pp.37-44) Chapter 6. Foundations of Statistical Inference.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.
Mike Herrmann Audience: 6 th Grade Students  Major league baseball is composed of two leagues  National League present  American League –
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Chapter 2 Risk Measurement and Metrics. Measuring the Outcomes of Uncertainty and Risk Risk is a consequence of uncertainty. Although they are connected,
The Statistics of Baseball and Politics Can Baseball be used to predict to outcome of elections? An introduction to Nate Silver, the 2 nd Chicago whiz-kid.
Hit Tracker Power Projection Jim Edmonds 1-year report: 2008 Projections, using 2007 data.
Scheduling the Optimal Baseball Line-up Stefanie Molin Christian Morales Sarah Daniels.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures of variability.
Using the Empirical Rule. Normal Distributions These are special density curves. They have the same overall shape  Symmetric  Single-Peaked  Bell-Shaped.
February 2012 Sampling Distribution Models. Drawing Normal Models For cars on I-10 between Kerrville and Junction, it is estimated that 80% are speeding.
Continue viewing this PowerPoint to read all about the 27 Time World Series Champions! Mason Siegel Presented by: Mason Siegel January 10, 2012.
Automated QA/QC Technique for Climate Sensor Data EPSCoR Hawaii HGDR Scientific Data Management Portal Development Team.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
dimaggio-damn-yankees-baseball/ html BASEBALL American’s ‘National Pasttime’
Baseball. Rules Each team has 9 players plus substitutes. The only ways you can be out is by being caught, tagged, struck out or out on a base. The.
+ Chapter 2: Modeling Distributions of Data Section 2.1 Describing Location in a Distribution The Practice of Statistics, 4 th edition - For AP* STARNES,
Baseball Statistics: Just for Fun!. 2/16 Issues, Theory, and Data Hypothesis Hypothesis Testing Home Run hitters: more strikeouts and four balls, and.
Sentence Fragment Baseball Reviewing Chapter 5 – Identify the fragment.
Babe Ruth Term 1 7/14/2013 Babe Ruth The Home Run King Troy Barone.
Section 2.1 Part 1: Percentiles, CRFGs, and Z- scores.
Summary For all 9-11 and year olds (Jr. High 14 year olds). Our goal is to create a fun and competitive atmosphere for young players to enjoy baseball.
XIAO WU DATA ANALYSIS & BASIC STATISTICS.
The New York Yankees Learn fast facts and stats about the best team in the East, before you strike out! Presented by: Allegra Jacobs Project #12: My Favorite.
Studying the Effects of Aging in Major League Baseball Phil Birnbaum
My Baseball survey by Angel Aguila
Who was the greatest person in baseball? By: Austin Kidder.
Comparative Advantage and Specialization Sports and Trade
Chapter 3.3 – 3.4 Applications of the Standard Deviation and Measures of Relative Standing.
THE NORMAL DISTRIBUTIONS Chapter 2. Z-Scores and Density Curves.
Aluminum bats vs Wood bats By: Sam. What aluminum bats are made of and how  Aluminum bats consist of manly just aluminum. They but them in a 3 ton comperes.
At Bats Hits Runs Doubles Triples Home Runs RBI’s Walks Batting Average Strikeouts.
The statistics behind the game
The statistics behind the game
Bellwork Suppose that the average blood pressures of patients in a hospital follow a normal distribution with a mean of 108 and a standard deviation of.
The Math of Baseball Will Cranford 11/1/2018.
Science Fair – Baseball
Hit Tracker Power Projection
Probability in Baseball
Sample Presentation – Mr. Linden
Presentation transcript:

Nate Silver, Baseball Prospectus, 7-11-07 PECOTA Under the Hood Nate Silver, Baseball Prospectus, 7-11-07

Background PECOTA originally stood for Pitcher Empirical Comparison [and] Optimization Test Algorithm Developed in Spring/Summer 2002 on my own time Original model was limited: Pitchers only No minor league statistics Sold to Gary Huckabay at Baseball Prospectus; became part of Baseball Prospectus Premium subscription package.

Why Did the World Need another Forecasting System? Different Aging Curves for Different Players Interrelationships between different Skills Comparable Players Range of Performance Outcomes Multi-year Forecasting

The Three Steps Baseline Forecast Selection of Comparables Forecast Range Developed based on Performance of Comparables

Baseline Forecast All statistics are normalized Park factors (customized for PECOTA) League/offensive environment factors League difficulty factors (new in 2007) “Role” adjustments (starter/reliever) The Big, Fat, Hairy Regression Prior three years of major/minor league data are analyzed Robust dataset provides for flexibility/creativity Builds in some second-order relationships

Selection of Comparable Players Key Concept: Forward-looking comparables are different from backward-looking comparables; goal is to identify those factors that are most important from a forecasting point of view. Weights originally developed based on Analysis of Variance (ANOVA) Resembles a “nearest neighbor” analysis

Selection of Comparable Players Hitters Isolated Power Batting Average Walk Rate Speed Score Strikeout Rate Groundball/Flyball Ratio Playing Time Position Weight Major League Experience Height Handedness (LH/RH) Pitchers Isolated Power Against Batting Average Against Role (Starter/Relief) Handedness

Forecast Range The actual performances of the comparable players compared against their respective baselines; this creates an implied performance of the subject relative to his baseline One key variable (EqA/EqERA) is used to calibrate other statistical categories, which are determined based on an iterative process involving regression Player Baseline Actual Delta Aaron .320 .330 +.010 Murphy .300 .280 - .020 Robinson .360 +.030 A-Rod .340 +.020 AVERAGE Pujols

But Does it Work? Internal Study (2003): PECOTA most accurate forecasting system for pitching; tied for 1st in hitting External Study (2006): PECOTA most accurate for hitting; 2nd for pitching Percentile Forecasts (2005 Internal Study) Accurate for hitting forecasts, e.g. almost exactly 10% of players exceed their 90th percentile forecasts Pitching forecasts tended to slightly underestimate range of outcomes (10th/90th percentiles not wide enough); problem has since been corrected PECOTA team W/L forecasts beat 23 of 30 Vegas Over/Under Lines in 2006 PECOTA prospect lists performed competitively with scouting-based lists in 2006

Challenges and Caveats PECOTA has inherent problems with highly unique players (Ichiro/Bonds) Interaction between playing time and rate performance is complicated Minor League pitchers can be hard to distinguish based on statistics alone PECOTA uses data since 1946; may miss recent changes in aging curves No detailed injury information Subject to some noise from sample size effects

Key Findings Interactions between different statistical categories do matter Players with robust skill sets tend to age better Certain skills cannot be understood in isolation (walk rate for hitters, opponents’ BA for pitchers) Minor league statistics should be read differently from major league statistics Example: strikeout rate for hitters, home run rate for pitchers are comparatively more important Inflection points / Rexrode Threshold Pitchers are not so unpredictable if you focus on the right statistics Attrition rates are higher than is generally acknowledged for both pitchers and hitters Some players are riskier than others (Beta)