Presentation is loading. Please wait.

Presentation is loading. Please wait.

By Andrew Finley. Research Question Is it possible to predict a football player’s professional based on collegiate performance? That is, is it possible.

Similar presentations


Presentation on theme: "By Andrew Finley. Research Question Is it possible to predict a football player’s professional based on collegiate performance? That is, is it possible."— Presentation transcript:

1 By Andrew Finley

2 Research Question Is it possible to predict a football player’s professional based on collegiate performance? That is, is it possible to accurately predict some player’s NFL statistic using only their collegiate statistics? Why – Too many “busts” How – Gather statistics for both NCAA and NFL players Use statistics and ML algorithms to train a program Use program to predict unseen examples

3 Presentation Outline Related Works Alternate applications of machine learning in sport My Approach Machine Learning - Classification Decision Tree Algorithm Implementation Statistics to predict Gather and Format Statistics Insert into Weka (ML software) Build Decision Tree Results and Analysis Cross-validation Feature Selection

4 Related Works Mr. NFL/NCAA (Predicts Games) Classification using Linear Regression on Team Statistics FFtoday.com (Predicts Fantasy Football Stats) Linear Regression on Fantasy Football Statistics Draft Tek (Predicts NFL Draft) Ranks college players and takes a matrix of team needs at every position SABRmetrics Use statistical analysis to create new baseball statistics Example: RUNS = (.41) 1B + (.82) 2B + (1.06) 3B + (1.42) HR

5 Machine Learning Type – Supervised Learning (Classification) Program is given a set of examples (instances) from which it learns to classify unseen examples Each instance is a set of attribute values and with a known class The goal is to generate a set of rules that will correctly classify new examples Algorithm: Decision Tree

6 Create a graph (tree) from the training data. The leaves are the classes, and branches are attribute values Goal is to make the smallest tree possible that covers all instances Use the tree to make a set of classification rules

7 My Data I narrowed my predictions down to just Quarterbacks and Running backs Input (NCAA): Individual and team stats from every year of college play, as well as team rankings and strength of schedule, and height and weight Combine data not included due to lack of participation Output (NFL): RB: Yrds/Carry, Total Rushing Yards, and Rushing TDs, for each of first 3 seasons, starting after 3 seasons QB: Total Passing Yards, Passing TDs, Interceptions, and QB Rating, for each of first 3 seasons, starting after 3 seasons

8 Data Retrieval Step 1 – Find statistics Online: NFL.com, NCAA.org Collegio Football: Database Software Step 2 – Extract data Python scripts parsed necessary statistics off websites Statistics from Collegio were exported manually Step 3 – Convert data into correct format Python scripts used to combine data into 2 large.csv files for, one for RB and one for QB Missing data is filled in as accurately as possible

9 Example PlayerSchoolYear1Pos1Cl1G1Rush Yds1Car1Rush TD1Yds/Car1RushYds/G1Rec Yds1Rec1Rec TD1Yds/Rec1Rec/G1RecYds/G1PR1PR Yds1PR TD1Yds/PR1PR/G1KR1KR Yds1KR TD1Yds/KR1KR/G1Ret TD1Tot Yds1Tot TD1TotYds/G1 Ronnie BrownAuburn2002RBSo121008175135.76841669118.4013.80000000000011741497.8 Year2Pos2Cl2G2Rush Yds2Car2Rush TD2Yds/Car2RushYds/G2Rec Yds2Rec2Rec TD2Yds/Rec2Rec/G2RecYds/G2PR2PR Yds2PR TD2Yds/PR2PR/G2KR2KR Yds2KR TD2Yds/KR2KR/G2Ret TD2Tot Yds2Tot TD2TotYds/G2 2003RBJr64469554.774.3808010113.300000000000526587.6 Year3Pos3Cl3G3Rush Yds3Car3Rush TD3Yds/Car3RushYds/G3Rec Yds3Rec3Rec TD3Yds/Rec3Rec/G3RecYds/G3PR3PR Yds3PR TD3Yds/PR3PR/G3KR3KR Yds3KR TD3Yds/KR3KR/G3Ret TD3Tot Yds3Tot TD3TotYds/G3 2004RBSr1291315385.9776.13133419.2226.10000000000012269102.2 HeightWeight 6'-1''230 Season1Team1G1GS1Att1RushYds1RushAvg1RushLng1RushTD1Rec1RecYds1RecAvg1RecLng1RecTD1FUM1Lost1Starting 2005MiamiDolphins15142079074.4654322327.338144TRUE Season2Team2G2GS2Att2RushYds2RushAvg2RushLng2RushTD2Rec2RecYds2RecAvg2RecLng2RecTD2FUM2Lost2Starting 2006MiamiDolphins131224110084.2475332768.424042TRUE Season3Team3G3GS3Att3RushYds3RushAvg3RushLng3RushTD3Rec3RecYds3RecAvg3RecLng3RecTD3FUM3Lost3Starting 2007MiamiDolphins771196025.1604393891043100TRUE Blue = NCAA data Red = NFL data

10 Weka Data Processing Weka is a machine learning algorithm database built in Java. Only accepts.csv files in particular format. Preprocessing: Apply filters to fix missing stats Remove all NFL data except statistic being predicted Classify the desired statistic: if numeric separate into ranges, if nominal separate by values. Specify attributes

11 Building the Tree Tree is constructed from specified attributes. Weka converts tree to classification rules. Accuracy is measured using cross validation. Cross validation: Break the training data into a specified number of sets, use each set once as the test data, while the rest is used as training data.

12 Initial Results Initial runs with all attributes used failed; created a 1 layer tree mapped to false for predicted statistic. The accuracy varies greatly with slight changes to attributes used. Tree size seems to increase as the attributes used decreases.

13 Analysis The initial 1 layer tree that was built gave an accuracy of 68%. This is the worst possible tree, so I should be able to get accuracy better than this. Attribute selection needs to improve.

14 Next Improve attribute selection to optimize accuracy. (If time) Implement other algorithms to compare accuracy.

15 Questions?


Download ppt "By Andrew Finley. Research Question Is it possible to predict a football player’s professional based on collegiate performance? That is, is it possible."

Similar presentations


Ads by Google