Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Commute: The Battle of Finding Distance

Similar presentations


Presentation on theme: "The Commute: The Battle of Finding Distance"— Presentation transcript:

1 The Commute: The Battle of Finding Distance
Noah Pollock The Commute: The Battle of Finding Distance

2 Purpose 1. Provide the tools and expertise to determine commuting distances and times for a population of students using google maps. How far are students from campus? 2. Showcase the power of APIs and R. Why distance? Descriptives Analytic Models Time spent commuting Retention Distance commuted Student Success $ spent on gas Admissions: Likelihood of enrollment Air polution Outreach Effectiveness Gift Giving: Likelihood of donation Miscellaneous Reduce return to sender errors!

3 How Do We Find Distance?

4 Google Maps: The API API: Application Programming Interface
A set of methods, tools, and protocols that facilitate communication between two or more software (including web services) Google Maps GUI and API Whereas the GUI involves a user interacting directly with the software, the API pushes the users interaction back a step. That is, the API allows the user to interact with the software through an intermediary application. In this case, we will be using the statistics package R to load location data and to send that data to Google Maps in bulk. APIs were designed for Software and App Developers, but can be extremely useful to Data Scientists!

5 Use R ‘ggmap’ Package to Connect to Google Maps
R: “R is a free software environment for statistical computing and graphics.” - Open Source, users are free to use and change and add new functionality to R. GUI is either limited or non-existent. Nearly all processes are executed at the command line. Derived from the “S” language. R Package: “ggmap” “A collection of functions to visualize spatial data and models on top of static maps from various online sources (e.g Google Maps and Stamen Maps). It includes tools common to those tasks, including functions for geolocation and routing.” D. Kahle and H. Wickham. ggmap: Spatial Visualization with ggplot2. The R Journal, 5(1), URL

6 Let’s Get Distance!

7 Using R to Query the Google Maps API: A Basic Example
1. Download and install R here, then open it! Consider also installing RStudio. 2. Install the ggmap package, load the package, and check how many queries you have left today (limit of 2,500 a day, 100 elements per query, and 100 elements per 10 seconds). install.packages(‘ggmap’) library(ggmap) distQueryCheck() 3. Load the address data file. Use “/” instead of “\” in the filepath. If your data has variable names in the first row, then header=TRUE. You can also use file.choose() instead of a filepath. datafile <- read.csv(‘C:/Users/username/Documents/datafile.csv’, header=FALSE) 4. Create the character string of addresses that you will send to google maps. Note that “address” represents the header for the column that contains the address data. If your file does not have headers, then R will assign them. The function names(datafile) will tell you your variable names. location_data <- as.vector(datafile$address)

8 Using R to Query the Google Maps API: continued…
5. Create the character strings containing your student addreses and your institution’s address or wherever you would like to calculate distance to. This example only pulls in the first 100 addresses. from <- location_data[1:100] to <- “Oakland University, 2200 N. Squirrel Rd. Rochester, MI 48309” 6. Query the Google Maps API. Google_Maps_Query <- mapdist(from,to) 7. Save the data to a file. save_filepath <- ‘C:/Users/username/Documents/datafile with distance data.csv’ write.csv(Google_Maps_Query, save_filepath)

9 A Few Insights: Balance accuracy with quantity:
Trimming to the five digit zip code might cover an entire file of students, but ZIP Codes can sometimes cover 100s of square miles of an irregular polygonal region. Full street addresses are accurate, but will quickly exceed Google Map’s daily limit. Consider excluding unrealistic locations: I doubt any students commute from Alaska or California Strings less than five characters will likely yield NA or unreliable results. Consider excluding addresses from students who live on campus

10 APIs, Resources, and References
Google Maps API: Python: Use Python instead of R. U.S. Census Bureau API U.S. Department of Education API Various APIs R RStudio ggmap: David Kahle and Hadley Wickham D. Kahle and H. Wickham. ggmap: Spatial Visualization with ggplot2. The R Journal, 5(1), URL


Download ppt "The Commute: The Battle of Finding Distance"

Similar presentations


Ads by Google