An Intelligence Approach to Evaluation of Sports Teams

Slides:



Advertisements
Similar presentations
1 Radio Maria World. 2 Postazioni Transmitter locations.
Advertisements

Números.
JavaScript: Functions
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Angstrom Care 培苗社 Quadratic Equation II
/ /17 32/ / /
Reflection nurulquran.com.
1
EuroCondens SGB E.
Worksheets.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
STATISTICS Linear Statistical Models
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Addition and Subtraction Equations
1 When you see… Find the zeros You think…. 2 To find the zeros...
4-4 Variability Objective: Learn to find measures of variability.
CALENDAR.
Summative Math Test Algebra (28%) Geometry (29%)
Frequency Tables, Stem-and-Leaf Plots, and Line Plots 7-1
1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, Gerrit Rooks Sociology of Innovation.
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Stationary Time Series
Break Time Remaining 10:00.
The basics for simulations
Solving Multi-Step Equations
Factoring Quadratics — ax² + bx + c Topic
PP Test Review Sections 6-1 to 6-6
MM4A6c: Apply the law of sines and the law of cosines.
Finish Test 15 minutes Wednesday February 8, 2012
Frequency Tables and Stem-and-Leaf Plots 1-3
Chapter 16 Goodness-of-Fit Tests and Contingency Tables
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
The Camo Bots Hiding Since Team Members Mr. Brian Landry - Advisor Mr. Patrick Farley - Advisor Mr. Marty OHora - Advisor Doug Yatsonsky.
Regression with Panel Data
1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.
Hypothesis Tests: Two Independent Samples
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Progressive Aerobic Cardiovascular Endurance Run
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
1.) On a spinner there are 4 evenly spaced sections: $100, $200, $300, $400. You spin, note whether or not it’s $400, then spin again, conducting the experiment.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
12 October, 2014 St Joseph's College ADVANCED HIGHER REVISION 1 ADVANCED HIGHER MATHS REVISION AND FORMULAE UNIT 2.
Subtraction: Adding UP
: 3 00.
5 minutes.
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
ANALYTICAL GEOMETRY ONE MARK QUESTIONS PREPARED BY:
Resistência dos Materiais, 5ª ed.
Clock will move after 1 minute
PSSA Preparation.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Simple Linear Regression Analysis
Multiple Regression and Model Building
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
9. Two Functions of Two Random Variables
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Commonly Used Distributions
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Presentation transcript:

An Intelligence Approach to Evaluation of Sports Teams by Edward Kambour, Ph.D. 1

Agenda College Football Linear Model Generalized Linear Model Intelligence (Bayesian) Approach Results Other Sports Future Work

General Background Goals Forecast winners of future games Beat the Bookie! Estimate the outcome of unscheduled games What’s the probability that Iowa would have beaten Ohio St? Generate reasonable rankings

Major College Football No playoff system “Computer rankings” are an element of the BCS 114 teams 12 games for each in a season

Linear Model Rothman (1970’s), Harville (1977), Stefani (1977), …, Kambour (1991), …, Sagarin??? Response, Y, is the net result (point-spread) Parameter, , is the vector of ratings For a game involving teams i and j, E[Y] = i - j

Linear Model (cont.) Let X be a row vector with E[Y]=X

Regression Model Notes Least Squares  Normality, Homogeneity College Football Estimate 100 parameters Sample size for a full season is about 600 Design Matrix is sparse and not full rank

Home-field Advantage Generic Advantage (Stefani, 1980) Force i to be home team and j the visiting team Add an intercept term to X Adds one more parameter to estimate UAB = Alabama Rice = Texas A&M Team Specific Advantage Doubles the number of parameters to estimate

Linear Model Issues Normality Homogeneity Lots of parameters, with relatively small sample size Overfitting The bookie takes you to the cleaners!

Linear Model Issues (cont.) Should we model point differential A and B play twice A by 34 in first, B by 14 in the second A by 10 each time Running up the score (or lack thereof) BCS: Thou shalt not use margin of victory in thy ratings!

Logistic Regression Rothman (1970s) Linear Model Use binary variable Winning is all that matters Avoid margin of victory Coin Flips

Logistic Regression Issues Still have sample size issues Throw away a lot of information Undefeated teams

Transformations Transform the differentials to normality Power transformations Rothman logistic transform Transforms points to probabilities for logistic regression “Diminishing returns” transforms Downweights runaway scores

Power Transforms Transform the point-spread Y = sign(Z)|Z|a a = 1  straight margin of victory a = 0  just win baby a = 0  Poisson or Gamma “ish”

Maximum Likelihood Transform 1995-2002 seasons MLE = 0.98 Power -2ln(likelihood) 0.1 52487 0.3 41213 0.5 35128 0.67 32597 0.8 31418 1 31193

Predicting the Score Model point differential Y1 = Si – Sj Additionally model the sum of the points scored Y2 = Si + Sj Fit a similar linear model (different parameter estimates) Forecast home and visitors score H = (Y1 + Y2 )/2, V = (Y2 - Y1)/2

Another Transformation Idea Scores (touchdowns or field goals) are arrivals, maybe Poisson Final score = 7 times a Poisson + 3 times a Poisson + … Transform the scores to homogeneity and normality first The differences (and sums) should follow suit

Square Root Transform Since the score is “similar” to a linear combination of Poissons, square root should work Transformation Why k? For small Poisson arrival rates, get better performance (Anscombe, 1948)

Likelihood Test LRT: No transformation vs. square root with fitted k Used College Football results from 1995-2002 k = 21 Transformation was significantly better p-value = 0.0023, chi-square = 9.26

Predicting the Score with Transform Model point differential Additionally model the sum of the points scored Forecast home and visitors score H = ((Y1 + Y2 )/2)2 , V = ((Y2 - Y1)/2)2 Note the point differential is the product

Unresolved Linear Model Issues Overfitting History Going into the season, we have a good idea as to how teams will do The best teams tend to stay the best The worst teams tend to stay the worst Changes happen Kansas State

Intelligence Model Concept Model The ratings and home-ads for year t are similar to those of year t-1. There is some drift from one year to the next. Model

Intelligence Model (Details) Notation L teams M seasons of data Ni games in the ith season Xi : the Ni by 2L “X” matrix for season i Yi : the Ni vector of results for season i i : the Ni vector of results for season I

Details (cont.) Data Distribution: For all i = 1, 2, …, M

Details (cont.) Prior Distribution

Details (finally, the end) The Posterior Distribution of M and -2 is closed form and can be calculated by an iterative method The Predictive Distribution for future results (transformed sum or difference) is straight-forward correlated normal (given the variance)

Forecasts For Scores For the point-spread Simply untransform E[Z2] = Var[Z] + E[Z]2 For the point-spread Product of two normals Simulate 10000 results

Enhanced Model Fit the prior parameters Hierarchical models Drifts and initial variances No closed form for posterior and predictive distributions (at least as far as I know) The complete conditionals are straight-forward, so Gibbs sampling will work (eventually)

Results (www.geocities.com/kambour/football.html) 2002 Final Rankings Team Rating Home Miami 72.23 (1.03) 0.21 (0.04) Kansas St 72.04 (1.04) 0.44 (0.03) USC 71.95 (1.03) 0.04 (0.03) Oklahoma 71.85 (1.02) 0.18 (0.03) Texas 71.57 (1.03) 0.36 (0.03) Georgia 71.49 (1.03) 0.02 (0.03) Alabama 71.45 (1.03) -0.09 (0.03) Iowa 71.30 (1.03) Florida St 71.29 (1.02) 0.43 (0.03) Virginia Tech 71.25 (1.03) 0.12 (0.03) Ohio St 71.18 (1.03) 0.27 (0.03)

Results 2002 Final Rankings Team Rating Home Miami 72.23 0.21 Kansas St 72.04 0.44 USC 71.95 0.04 Oklahoma 71.85 0.18 Texas 71.57 0.36 Georgia 71.49 0.02 Alabama 71.45 -0.09 Iowa 71.30 Florida St 71.29 0.43 Virginia Tech 71.25 0.12 Ohio St 71.18 0.27

Results 2002 Final Rankings Team Rating Home Miami 72.23 0.21 Kansas St 72.04 0.44 USC 71.95 0.04 Oklahoma 71.85 0.18 Texas 71.57 0.36 Georgia 71.49 0.02 Alabama 71.45 -0.09 Iowa 71.30 Florida St 71.29 0.43 Virginia Tech 71.25 0.12 Ohio St 71.18 0.27

Bowl Predictions Ohio St 17 Miami Fl (-13) 31 0.8255 0.5228 Washington St 21 Oklahoma (-6.5) 31 0.7347 0.5797 Iowa 21 USC (-6) 30 0.7174 0.5721 NC State (E) 20 Notre Dame 17 0.5639 0.5639 Florida St (+4) 24 Georgia 27 0.5719 0.5320

2002 Final Record Picking Winners Against the Vegas lines Best Bets 522 – 157 0.769 Against the Vegas lines 367 – 307 – 5 0.544 Best Bets 9 – 7 0.563 In 2001, 11 - 4

ESPN College Pick’em (http://games.espn.go.com/cpickem/leader) 1. Barry Schultz 5830 2. Jim Dobbs 5687 3. Michael Reeves 5651 4. Fup Biz 5594 5. Joe * 5587 6. Rising Cream 5562 7. Intelligence Ratings 5559

Ratings System Comparison (http://tbeck. freeshell. org/fb/awards2002 Todd Beck Ph.D. Statistician Rush Institute Intelligence Ratings – Best Predictors

College Football Conclusions Can forecast the outcome of games Capture the random nature High variability Sparse design Scientists should avoid BCS Statistical significance is impossible Problem Complexity Other issues

NFL Similar to College Football Square root transform is applicable Drift is a little higher than College Football Better design matrix Small sample size Playoff

NFL Results (www.geocities.com/kambour/NFL.html) 2002 Final Rankings (after the Super Bowl) Team Rating Home Tampa Bay 70.72 0.29 Oakland 70.57 0.28 Philadelphia 70.55 0.10 New England 70.16 0.12 Atlanta 70.13 0.20 NY Jets 70.10 -0.01 Pittsburgh 69.95 Green Bay 69.92 Kansas City 69.90 0.51 Denver 69.89 0.50 Miami 0.49

2002 Final NFL Record Picking Winners Against the Vegas lines 162 – 104 – 1 0.609 Against the Vegas lines 135 – 128 – 4 0.513 Best Bets 9 – 8 0.529

NFL Europe Similar to College and NFL Square root transform Dramatic drift Teams change dramatically in mid-season Few teams Better design matrix

College Basketball Transform? A lot more games Much more normal (Central Limit Theorem) A lot more games Intersectional games Less emphasis on programs than in College Football More drift NCAA tournament

NCAA Basketball Pre-tournament Ratings Team Rating Home Arizona 100.06 3.97 Kentucky 99.33 4.32 Kansas 95.89 3.85 Texas 93.42 4.44 Duke 92.90 4.66 Oklahoma 90.19 4.31 Florida 90.65 3.99 Wake Forest 88.70 3.65 Syracuse 88.50 3.49 Xavier 87.89 3.37 Louisville 87.88 4.16

NBA Similar to College Basketball A lot more games – fewer teams Normal – No transformation A lot more games – fewer teams Playoffs are completely different from regular season Regular season – very balanced, strong home court Post season – less balanced, home court lessened

Hockey Transform A lot more games History matters Rare events = “Poissonish” Square root with k around 1 A lot more games History matters Playoffs seem similar to regular season Balance

Soccer Similar to hockey Transform Not a lot of games Square root with low k Not a lot of games Friendlys versus cup play Home pitch is pronounced Varies widely

Soccer Results Correctly forecasted 2002 World Cup final Brazil over Germany Correctly forecasted US run to quarter-finals Won the PROS World Cup Soccer Pool

Future Enhancements Hierarchical Approaches Conferences More complicated drift models Correlations Individual drifts Drift during the season Mean correcting drift More informative priors