Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Science for Tackling the Challenges of Big Data

Similar presentations


Presentation on theme: "Data Science for Tackling the Challenges of Big Data"— Presentation transcript:

1 Data Science for Tackling the Challenges of Big Data
Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community November 14, 2014

2 Overview Six Week MIT Online Course:
Started November 4th and Completed November 12th. Mined this MIT Online Course for Data Sets and Ideas: Found subset of the slides that contained data sets and ideas and were interesting and useful visualizations in themselves. Professor Karger's Lecture Slides on Visualization User Interfaces Were All About My Heroes: Tukey, Tufte, Sneiderman, and Spotfire. (In fact it was everything leading up to Spotfire, but Spotfire itself!) Preserve My Work & Present Tutorial to the Federal Big Data Working Group Meetup: MindTouch Knowledge Base, Excel Spreadsheet Index, and Spotfire Interactive Visualizations.

3 MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Course Assessment
Web Site (private)

4 MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Course Progress

5 MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Big Data Storage
Web Site (private)

6 MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Modern Databases
Script Web Site (private) and Script (Public)

7 Courseware: Big Data Storage
I was especially interested in the following since both Professors Stonebraker and Madden presented to our Federal Big Data Working Group Meetup: This module begins with an overview of a number of these technologies by renowned database professor Mike Stonebraker. In his unique and ardent fashion, Mike expresses his skepticism about many new technologies, particularly Hadoop/MapReduce and NoSQL, and voices support for many new relational technologies, including column stores and main memory databases. After that, Professors Matei Zaharia and Samuel Madden provide a more nuanced view of the tradeoffs between the various approaches, discussing Hadoop and its derivatives, as well as NoSQL and its tradeoffs, in more detail. Professor Stonebraker expresses a number of strong opinions in this module. Which of them do you agree with? Which do you disagree with? Why? 3.0 Introduction to Big Data Storage and Discussion 3

8 Selected Slides: Professor Sam Madden
What Is This Course Going to Cover? Other Techniques We'll Cover

9 Selected Slides: Professor David Karger
Overview Interaction Strategy

10 Selected Slides: Professor Daniela Rus
Case Study: Transportation in Singapore 1.1 Case Study: Transportation - PDF of Presentation slides (Rus)

11 Google Search: Singapore Taxi Data

12 Think Business: Why can’t I find a taxi when I really need one?
Based on: Labor Supply Decisions of Singaporean Cab Drivers, May 8, 2013 Newer Paper: Labor Supply Decisions of Singaporean Cab Drivers, September 2014

13 Labor Supply Decisions of Singaporean Cab Drivers: Table 1: Summary Statistics by Days

14 MIT Big Data Knowledge Base: Table 1 Spreadsheet
My Note: Image PDF so had to hand build! Spreadsheet

15 Singapore Land Transport Authority: Traffic Info Service Providers

16 Singapore Land Transport Authority: MyTransport.sg
Screen Scrape

17 Singapore Land Transport Authority: All Datasets Spreadsheet

18 MIT Big Data Knowledge Base: MindTouch
Labor Supply Decisions of Singaporean Cab Drivers, September 2014, as a Data Science Data Publication Data Science for Tackling the Challenges of Big Data

19 MIT Big Data: Knowledge Base Spreadsheet

20 MIT Big Data: Course Participant Spreadsheet
My Note: This was mapped in Spotfire after data curation (cleaning of the country names). Spotfire has built in data curation functions. Spreadsheet

21 MIT Big Data: Spotfire Cover Page
Web Player

22 MIT Big Data: Student Enrollment
Web Player

23 MIT Big Data: Singaporean Cab Drivers
Web Player

24 New York City Open Data: Socrata

25 New York City Open Data: Search Results
My Note: Could Only Find Taxi Drivers Data. Web Site

26 New York City Open Data: Data Table
Download: XLSX Web Site and Medallion_Drivers_-_Active.xlsx

27 Visualizing NYC’s Open Data: Socrata Beta

28 MIT Big Data Assessment: Questions and Answers
Big Data Collection 2) Data science requires: Knowledge of statistics Knowledge of data management Knowledge of curation ​All of the above - correct Big Data Systems 13) For which of the following tasks is interactive visualization most useful? (choose all that apply) Developing a hypothesis about data - correct Formally confirming a hypothesis Communicating a conclusion about data - correct All of the above Big Data Analytics: 13) Big Data means that there's no shortage of useful data. True False - correct Story


Download ppt "Data Science for Tackling the Challenges of Big Data"

Similar presentations


Ads by Google