Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sports Data Sources and Data Extraction Gavin Zhang MIS580 University of Arizona 02-06-2008.

Similar presentations


Presentation on theme: "Sports Data Sources and Data Extraction Gavin Zhang MIS580 University of Arizona 02-06-2008."— Presentation transcript:

1 Sports Data Sources and Data Extraction Gavin Zhang MIS580 University of Arizona 02-06-2008

2 2 Outline Sports Data Sources –Baseball –Basketball –Football –Olympics –Greyhound Data Extraction –Case Study: AZGreyhound System

3 3 http://www.baseball1.com/ Baseball Data Source Download the database

4 4 Data Download This database contains pitching, hitting, and fielding statistics for Major League Baseball from 1871 through 2007. –The data are provided in Microsoft Access, CVS and other formats. –The newest version is Version 5.5. The database can be downloaded at: http://baseball1.com/content/view/57/82/

5 5 Database Detailed description of the database is available at: http://baseball1.com/content/view/57/82/ The database has 21 tables; main tables include: –MASTER Table- Player names, DOB, and biographical info; –Batting Table- batting statistics; –Pitching Table- pitching statistics; –Fielding Table- fielding statistics. Detailed description about each data field in each table is available. AwardPlayers.csv … …

6 6 Basketball Data Source http://databaseBasketball.com/ Download all of the player and team statistics

7 7 Data Download The website contains the NBA data from 1947 to 2007 and ABA data from 1968 to 1976 on players, teams, leagues, all-star games, awards, and coaches. Download at: http://databasebasketball.com/ stats_download.htm

8 8 Database This download contains nine column delimited files (.txt format), each of which represents a table in the database. If you open the files up in excel, you may need to select Data -> Text to Columns, then use the bar ("|") character as the delimiter. Teams.txt team|location|name|leag ANA|Anaheim|Amigos|A AND|Anderson|Duffey Packers|N ATL|Atlanta|Hawks|N BA1|Baltimore|Bullets|N BAL|Baltimore|Bullets|N BOS|Boston|Celtics|N BUF|Buffalo|Braves|N CAP|Capital|Bullets|N CAR|Carolina|Cougars|A CH1|Chicago|Stags|N CH2|Chicago|Zephyrs|N CHA|Charlotte|Hornets|N CHI|Chicago|Bulls|N … …

9 9 Football Data Source http://www.pro-football-reference.com/

10 10 Data Download A copy of data set (in CVS format) can be downloaded from: http://ai.arizona.edu/hchen/chencourse/SportsData/Pro-football- refernce_CSV.zip This version contains the game data from 1995 to 2006. The dataset contains 64,327 players and the games they played in. Tables include: –Masterinformation about players –Seasonsthe statistics of the players records by season –Gamesthe statistics of the players records by game Detailed description about each data field in each table is available.

11 11 Database … … Master.csv

12 12 Some Other Football Data Sources http://www.databasefootball.com/ –The website contains the National Football League (NFL) data from 1922 to 2005 and Australian Football League (AFL) data from 1960 to 1969 on players, teams, leagues, awards, and coaches. –Data set can not be downloaded directly. The data need to be extracted from the HTML Web pages by using parsing programs. http://www.jt-sw.com/football/ –The website contains the player/coach statistics of NFL from 1920 to present and statistics of AFL from 1960 to 1969. –Data set can not be downloaded directly. The data need to be extracted from the HTML Web pages by using parsing programs.

13 13 Olympics Data Source http://www.databaseolympics.com/

14 14 Data Format DatabaseOlympics.com is your source for every Summer and Winter Olympics medal winner. –Summer Olympics from 1896- 2004; – Winter Olympics 1924 -2002 You'll find every medal winner for every country with easy links to each Olympics, sports, and athletes.

15 15 Data Format

16 16 Greyhound http://66.236.122.233:8080/tracklink/

17 17 Data Format Data includes daily race programs (videos) and odds charts (.txt file format) for all US Greyhound tracks. Some tracks had both Afternoon and Evening programs.

18 18 Data Format Chart.txt 1st Grade: B Distance: 550 Condition: Fast DOG WT P O 1/8 Str Fin Time Odds Comment PTL Jane 63.5 6 3 1 1 1 ns 32.00 11.60 Held At Wire Inside Silver Speck 68.5 1 1 2 2 2 ns 32.01 2.80 Cutff 1st, Stayd Cls Jain't It Doug 75 7 7 6 6 3 1.5 32.10 7.50 Closed For Show Outs Flyer Whitesocks 75.5 8 8 7 3 4 1.5 32.11 2.30 In The Hunt Flying Detroit 69 5 5 4 4 5 2 32.15 9.00 Not Far Behind Mdtrk VP Twix Twizala 59.5 3 4 3 5 6 4.5 32.31 4.20 Losing Position Ins Sergio 73 4 6 5 7 7 5 32.34 13.30 Blocked 1st Turn Heartattack Jack 71.5 2 2 8 8 8 5.5 32.39 7.10 Bumped 1st Turn … … … …

19 Case Study: AZGreyhound System By Rob Schumaker

20 20 AZGreyhound System Design AZGreyhound System DB Race Data Odds Data Greyhound Data AZGreyhound Model Building Training / Testing Prediction Win Metrics Accuracy Payout Efficiency Place Show Traditional Betting Engine Exacta Trifecta Superfecta Straight Bets Quiniela Trifecta Superfecta Box Bets

21 21 Greyhound Data Extraction Grayhound data was gathered from www.trackinfo.com. The Web site links to: –GreyMatter http://66.236.122.233:8080/tracklink/ –TrackInfo http://www.trackinfo.com/index2.html The race and odds data was parsed into a SQL Server database; then the data was sent to the AZGreyhound system for prediction.

22 22 Example code public void RacePrograms() throws Exception {... String URL1 = "http://www.trackinfo.com/trakdocs/hound/"; String URL2 = "/Rpages";... OpenConnection2(); try {...... TrackAbbrev = rSet.getString("TrackAbbrev"); String URL = URL1 + TrackAbbrev + URL2; Feed = web.Scraper(URL, 1);... NumItems = web.NumItems(Feed, "~icons/html.gif"); for(int y = 1; y <= NumItems; y++) { Feed = Feed.substring(Feed.indexOf("~icons/html.gif")); FileName = web.ExtractText(Feed, " "); Feed = Feed.substring(Feed.indexOf("<A HREF=")); FileDate = web.ExtractText(Feed, "NOWRAP>", " "); FileContents = web.Scraper(URL + "/" + FileName, 1); FileContents = FileContents.replaceAll("'", "-"); db.Insert2DBProgram(FileName, FileDate, FileContents); } } CloseConnection2(); } catch(SQLException e) { System.out.println(e); } } This method picks up the overall race information and puts it in the database Data parsing URL Parsing out each data field Insert into DB

23 23 You can use the sports data sources introduced in this set of slides for your data mining project. You are strongly encouraged to identify other interesting public sports data sets for your project. Thanks!


Download ppt "Sports Data Sources and Data Extraction Gavin Zhang MIS580 University of Arizona 02-06-2008."

Similar presentations


Ads by Google