Presentation on theme: "Predicting basketball RPI. What is RPI? Ratings Percentage Index Based on win/loss percentage throughout the season. Not necessarily a predictor of a."— Presentation transcript:
Predicting basketball RPI
What is RPI? Ratings Percentage Index Based on win/loss percentage throughout the season. Not necessarily a predictor of a stronger team.
How is RPI Calculated? Weighted wins, losses Wins worth 1.4 away, 1 neutral,.6 at home Losses worth 1.4 at home, 1 neutral,.6 away Two parts: – Win pct (wins/(wins+losses) – Strength of schedule Opponents unweighted win pct Opponents’ opponents unweighted win pct
What is RPI used for? Estimator of team strength, as it factors in strength of schedule Helps to seed the NCAA tournament. Helps selection committee/analysts determine quality of wins.
The Selection Committee 10-person committee that determines who will receive an at-large bid and seeding for the tournament 5 year Tenure Use multitude of selection tools – Win/loss – Conference strength – How a team won – Voodoo – Apparently, more random numbers than me. – Decidedly NOT just RPI.
How can RPI be predicted? Predict outcomes of games Run through season Rinse, repeat (Monte-Carlo!)
Kenpom statistics Statistics on all division 1 basketball teams – Offensive Efficiency – Defensive Efficiency – Tempo Average possesions per game: FGA-OR+TO+.42 FTA Meteorologist from Salt Lake City, Utah Basketball stats just a hobby, no background Stats referenced by ESPN, wall street journal Season averages
Step 1: Predicting games Generate scores: compare and mark Home games: – xscore<- round(x/100*rnorm(1,1.05,.2)*(((x+y)/2)* rnorm(1,1.1,.2))+((y*rnorm(1,1.05,.2))/80)) – yscore<- round(y/100*rnorm(1,.95,.2)*(((y+x)/2)*r norm(1,.9,.2))+((x*rnorm(1,.95,.2))/80))
Step 2: Recording data Compare scores Higher score wins Mark wins/losses in appropriate places
Step 3: Run through season 5284 division 1 vs division 1 games. Import list of all games, which team is home, away, to be called and put into game function. Run 1 of three game situations based on court (1 is home, 2 is away, 3 is neutral). Each team plays approximately 30 games.
Step 4: Compile RPI, rank, repeat! After season is done, run through game list to grab opponent’s win-losses. Next, re-run through game list to grab opponent’s opponent’s win-losses by.
What do these results mean? Interesting estimator, but cannot be taken too seriously. True Top 25 RPI missed by average of 13 places. My top 25 missed true RPI by average of 11. Kansas, WVU right where they should be! Villanova ranked 18, therefore project should be considered a success.
Is the RPI reliable? Wake Forest(7.4%), DePaul (9.9%) more likely to make tournament than nearly 200 other teams based on RPI alone. Too much weight placed on who you play, not how you play. Still only one factor in determining NCAA tournament.
Interesting Oddities Program took over 15 hours to run. In 10,000 simulated seasons, 31 teams will not receive an at-large bid (will not be in the top 37 RPI at the end of the season) Of those 31 teams, half of them would likely end up as a play-in team going to the final four. – VCU received at large bid with RPI rank of 49 (to fill 36th-38th at large bid), Harvard with RPI rank of 35 denied tournament bid
Are these results reliable? Maybe? Only 2 teams predicted correctly in top 25. Season averages inaccurate for day-of play, but might average out over whole season. Effect of random variables should eventually absorb things like suspensions, injuries, team morale. Possible that one of the seasons actually matches this season perfectly.
Theoretical/Technical Issues Unable to account for mid-season tournaments, changes in schedule, delayed games. ‘Labor Intensive’ program – 5 trillion calculations. Still near-impossible to seed mock tournament without just taking 68 highest RPI (which might not be a bad idea)
March Madness There are about 14,757,395,260,000,000,000 different brackets of the NCAA tournament (but only 1 winner!) Over 6 million brackets were submitted to ESPN.com this march in competition. Of those brackets, the best bracket, just 1 of 6 million, got 52, or 77.6% of their picks correct. This year was the first time 2 11 seeds made the sweet 16, and the first time no 1 or 2 seeds made the final 4. Most even field the tournament has ever had, no great teams
My terrible bracket My original bracket: – 33.6 th percentile on ESPN at 480 pts (4-millionth place) Champion: Notre Dame – No final four team, only 2 elite 8 (UNC, Uconn) Basically, terrible. Last place in every pool I was in.
Can I make my bracket any better? This year, no, but next year? Maybe! Goal: create a bracket based on Kenpom rankings, and see if it does any better.
Results: Worth twice as many points on ESPN, enough to put me in the 92 nd percentile! – Beat 5,520,000 brackets! Still had no final 4 team.
Is it a reliable method? Votes for no: Highest championship percentage was less than 6%, only 4% better than flipping a coin. Many games were decided by less than a percentage point Doesn’t take into account injuries, coaching, stage fright, ‘home field,’ streaks, incredible ability to lose the lead, or recruiting violations. Examples: – Georgetown and St. Johns both had their star players hurt going into the tournament and lost in the first round. – Tennessee head coach Bruce Pearl was hit with school and NCAA sanctions the day before the game and lost by 30 points. – George Mason entered the tournament on an 11 game wn- streak
Votes for Yes: Randomness exists to account for the issues previously mentioned. Anything can happen, this season could have been that 1 in 10,000 chance for VCU, data could be reliable. 55.2% of the bracket picked correctly, up from… well, zero-ish.
Oddities and anecdotes First trial of the tournament I ran (before looping) yielded Butler over Uconn, with Kentucky and Kansas in the final 4. Defeated teams sometimes more likely to advance: Notre Dame has a higher chance of making the championship game over Wisconsin, 8.47% over 8.38%, but Wisconsin is more likely to win the championship, 4.90% over 4.76% VCU had only a 51.5% chance of winning it’s first game, a 22% chance of advancing past Georgetown, and a 2.64% chance of advancing to the final 4.
Conclusions: RPI Can RPI (remember RPI?) be predicted for a season using Monte-Carlo methods? – Decent yardstick, but not perfect – Since RPI is just a yardstick anyway, should work okay. Can it be used to seed a tournament? – Difficult but yes, would need to run through 31 conference tournaments and determine an Ivy league AQ as well.
Conclusions: NCAA tournament Can Monte-Carlo methods be used to predict the NCAA tournament? – Better predictor than me and better than a coin flip. – Good for calculating odds but not for absolute winner.