Download presentation
Presentation is loading. Please wait.
Published byHoward Hodge Modified over 9 years ago
1
Analyzing open gov data with IBM Bluemix Carlos Hoyos
2
Objective Show how to use Bluemix to import large open gov data and prepare it for analysis or to be used in applications Bluemix is a simple platform that allows anyone to run applications with little coding or deployment effort. Data scientist and developers can use it to ingest and understand open source data.
3
Scenario Analyze parking violations data from NYC. Import it into BigInsights Massage it using BigSheets (a big data spreadsheet) Visualize it in a map Create a heat map to find the “hottest spots” for parking violations. End goal: give users a way to query how likely they are to get a parking violation at a specific location.
4
Scenario (cont.) Correlate parking tickets with parking regulations. Are there streets that have meters but are not frequented by ticketing agents?
5
Step 1 – Prepare the data
6
1.Find data sources in data.gov 2.NYC parking violations are here: http://catalog.data.gov/dataset/parking- violations-issued-fiscal-year-2014-august-2013- june-2014-c1a76 1.Download file
7
Step 2 – Deploy Analytics for Hadoop and import data
8
Getting started Register for a free BlueMix account at ibm.biz/Datafest
9
Deploy IBM Analytics for Hadoop Under Catalog, search for “Analytics” and deploy the IBM analytics for Hadoop package.
10
Deploy IBM Analytics for Hadoop Once deployed, you can launch the service from your dashboard.
11
Upload data to be analyzed Once you launch, go to files, and under user create a folder ‘imports’ Files section 1- Under “user” create new folder 2- call it imports 1- Under “user” create new folder 2- call it imports
12
Upload file Upload the file you downloaded in step 1 1- select upload file 2- select file from your local machine 1- select upload file 2- select file from your local machine
13
Create a new bigsheet workbook (i) 1- select Bigsheets > new workbook 2- Name it “ticket data” and select the file you uploaded 1- select Bigsheets > new workbook 2- Name it “ticket data” and select the file you uploaded
14
Create a new bigsheet workbook (ii) 1- select Bigsheets > new workbook 2- Name it “ticket data” and select the file you uploaded 1- select Bigsheets > new workbook 2- Name it “ticket data” and select the file you uploaded
15
Use a CSV importer 1- Select reader > CSV 2- Select “headers” since file has them 1- Select reader > CSV 2- Select “headers” since file has them
16
Lets explore the data First, lets find out, which precincts have the most tickets. Select add new chart Create a big data heath map.
17
Create a chart 1- Select for the X axis (violations by precinct) 2- Select count occurrences of X axis 3- Run the chart 1- Select for the X axis (violations by precinct) 2- Select count occurrences of X axis 3- Run the chart
18
Visualize the data When it comes to parking tickets, precincts 14 & 19 seem to be the toughest one
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.