Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Surfing on the World Wide Web – Part 2

Similar presentations


Presentation on theme: "Data Surfing on the World Wide Web – Part 2"— Presentation transcript:

1 Data Surfing on the World Wide Web – Part 2
Robin Lock Burry Professor of Statistics St. Lawrence University 2016 Joint Statistics Meetings – Seattle, WA August 2016

2 Datasurfing on the World Wide Web Part 1
it.stlawu.edu/~rlock/data96 Department of Math, CS & Stat 1996 JSM - Chicago Do any of the 1996 links still work?

3 New location hosted by ASA
New location hosted by DataDesk Not updated Now at causeweb.org/wiki/chance New address

4 Yes but not updated Yes but not updated New address – mostly NIST Reference Datasets

5

6 Time for some Updating... myslu.stlawu.edu/~rlock/data2016.html

7 Categories of Data Sources
Data Archives with Teaching Support Webpages with Data Links Government Sources R Packages Data from Visualizations More Data for Countries Survey/Study Repositories Fun and Games Data Scraping

8 Data Archives with Teaching Support
JSE Data Archive: More than 100 datasets, most with accompanying JSE Dataset articles on use in the classroom. DASL – Data and Story Library: An established collection of datasets with accompanying stories, that is searchable by statistical method or data subject. Resurrected at DataDesk by Paul Velleman. ICPSR Data-Driven Learning Guides topics linked to datasets from political and social research surveys. TSHS Resources Portal: A new initiative by the ASA’s Teaching Statistics in the Health Sciences Scection.

9 Jenny Bagliovo (through ASA/MAA Joint Committee) has quick summaries of some favorite JSE datasets at

10

11

12 TSHS Resources Portal

13

14 Webpages with Data Links
Winner's Data Links: Lots of links (data and documentation) maintained by Larry Winner at Univ. of Florida, organized by statistical technique. Kuiper's Sources of Data: Lots of links (data and documentation) maintained by Larry Winner at Univ. of Florida, organized by statistical technique. Awesome Public Datasets: very large list of links to public data organized by subject area (Sammy Chen). May take some digging to get to actual data. 33 Brilliant And Free Data Sources For 2016: article by Bernard Marr in Forbes

15

16

17

18

19

20 Government Data Sources
Data.gov: Links to LOTS of datasets generated by U.S. government agencies. Canada Open Data Portal: Similar site for Canadian data sources. Similar sites for lots of countries… OpenDataSoft List: open data portals from around the world organized by country

21

22

23 122 variables for 7800 colleges and universities

24

25

26

27

28 R Packages Many R packages come with built-in datasets, and some are mostly datasets. MosaicData organized by Prium, Kaplan, Horton for the Mosaic package Lock5Data - datasets from “Statistics: Unlocking the Power of Data” by Lock5, also at lock5stat.com Stat2Data – datasets from “Stat2” text by Cannon, et al.

29 Some Mosaic Datasets But you need R to access the data in these packages…

30 RDatasets An R script developed by Vincent Arel-Bundock, can be used to create an html page with links to .csv files and documentation for datasets in a list of installed R packages.

31 758 datasets

32 Data from Online Visualizations

33 529 variables over time

34 More Country Data World Bank Open Data: Search by individual countries, general categories, or specific indicators. CIA World Factbook: Lots of data on individual countries – but hard to get good downloadable versions

35 Survey/Study Repositories
ICPSR: Inter-university Consortium for Political and Social Research Data Dryad: seeks to promote the availability of data underlying findings in the scientific literature for research and educational reuse.

36

37

38

39

40

41 Fun & Games Sports: Baseball-reference.com (MLB)
Basketball-reference.com (NBA) Pro-football-reference.com (NFL) Hockey-reference.com (NHL) or get to any of these plus college basketball, college football and Olympics at Sports-reference.com

42

43

44 Fun & Games Shonda Kuiper’s Stat2Labs Games

45

46

47 Web Scraping Useful R packages: rvest and httr
(search for vignettes/tutorials) Create tools for intro students to easily obtain data (e.g. with shiny app) Project source for more advanced students

48 (created by Ivan Ramler and Tenzin Choeyang)
shiny.stlawu.edu:3838/sample-apps/imdb_tv/ (created by Ivan Ramler and Tenzin Choeyang)

49 Questions? See you in 2036 for Part 3!
myslu.stlawu.edu/~rlock/data2016.html Questions? See you in 2036 for Part 3!


Download ppt "Data Surfing on the World Wide Web – Part 2"

Similar presentations


Ads by Google