Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community

Similar presentations


Presentation on theme: "Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community"— Presentation transcript:

1 Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community November 14,

2 Overview Six Week MIT Online Course: – Started November 4 th and Completed November 12 th. Mined this MIT Online Course for Data Sets and Ideas: – Found subset of the slides that contained data sets and ideas and were interesting and useful visualizations in themselves. Professor Karger's Lecture Slides on Visualization User Interfaces Were All About My Heroes: – Tukey, Tufte, Sneiderman, and Spotfire. (In fact it was everything leading up to Spotfire, but Spotfire itself!) Preserve My Work & Present Tutorial to the Federal Big Data Working Group Meetup: – MindTouch Knowledge Base, Excel Spreadsheet Index, and Spotfire Interactive Visualizations. 2

3 MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Course Assessment 3 Web Site Web Site (private)

4 MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Course Progress 4 https://mitprofessionalx.edx.org/courses/MITProfessionalX/6.BDX/2T2014/progress

5 MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Big Data Storage 5 Web Site Web Site (private)

6 MITProfessionalX 6.BDx Tackling the Challenges of Big Data: Modern Databases 6 Web Site Web Site (private) and Script (Public)Script

7 Courseware: Big Data Storage I was especially interested in the following since both Professors Stonebraker and Madden presented to our Federal Big Data Working Group Meetup: – This module begins with an overview of a number of these technologies by renowned database professor Mike Stonebraker. In his unique and ardent fashion, Mike expresses his skepticism about many new technologies, particularly Hadoop/MapReduce and NoSQL, and voices support for many new relational technologies, including column stores and main memory databases. – After that, Professors Matei Zaharia and Samuel Madden provide a more nuanced view of the tradeoffs between the various approaches, discussing Hadoop and its derivatives, as well as NoSQL and its tradeoffs, in more detail. – Professor Stonebraker expresses a number of strong opinions in this module. Which of them do you agree with? Which do you disagree with? Why? Introduction to Big Data Storage3.0 Introduction to Big Data Storage and Discussion 3Discussion 3

8 Selected Slides: Professor Sam Madden 8 What Is This Course Going to Cover?Other Techniques We'll Cover

9 Selected Slides: Professor David Karger 9 Overview Interaction Strategy

10 Selected Slides: Professor Daniela Rus 10 Case Study: Transportation in Singapore 1.1 Case Study: Transportation - PDF of Presentation slides (Rus)

11 Google Search: Singapore Taxi Data 11

12 Think Business: Why can’t I find a taxi when I really need one? 12 Based on: Labor Supply Decisions of Singaporean Cab Drivers, May 8, 2013 Newer Paper: Labor Supply Decisions of Singaporean Cab Drivers, September 2014

13 Labor Supply Decisions of Singaporean Cab Drivers: Table 1: Summary Statistics by Days 13

14 MIT Big Data Knowledge Base: Table 1 Spreadsheet 14 Spreadsheet My Note: Image PDF so had to hand build!

15 Singapore Land Transport Authority: Traffic Info Service Providers 15

16 Singapore Land Transport Authority: MyTransport.sg 16 Screen Scrape

17 Singapore Land Transport Authority: All Datasets Spreadsheet 17 Spreadsheet

18 MIT Big Data Knowledge Base: MindTouch 18 Data Science for Tackling the Challenges of Big Data Labor Supply Decisions of Singaporean Cab Drivers, September 2014, as a Data Science Data Publication

19 MIT Big Data: Knowledge Base Spreadsheet 19 Spreadsheet

20 MIT Big Data: Course Participant Spreadsheet 20 Spreadsheet My Note: This was mapped in Spotfire after data curation (cleaning of the country names). Spotfire has built in data curation functions.

21 MIT Big Data: Spotfire Cover Page 21 Web Player

22 MIT Big Data: Student Enrollment 22 Web Player

23 MIT Big Data: Singaporean Cab Drivers 23 Web Player

24 New York City Open Data: Socrata https://nycopendata.socrata.com/ 24

25 New York City Open Data: Search Results Web Site 25 My Note: Could Only Find Taxi Drivers Data.

26 New York City Open Data: Data Table Web SiteWeb Site and Medallion_Drivers_-_Active.xlsxMedallion_Drivers_-_Active.xlsx 26 Download: XLSX

27 Visualizing NYC’s Open Data: Socrata Beta 27 https://nycopendata.socrata.com/viz

28 MIT Big Data Assessment: Questions and Answers Big Data Collection – 2) Data science requires: Knowledge of statistics Knowledge of data management Knowledge of curation ​All of the above - correct Big Data Systems – 13) For which of the following tasks is interactive visualization most useful? (choose all that apply) Developing a hypothesis about data - correct Formally confirming a hypothesis Communicating a conclusion about data - correct All of the above Big Data Analytics: – 13) Big Data means that there's no shortage of useful data. True False - correct 28 Story


Download ppt "Data Science for Tackling the Challenges of Big Data Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist Semantic Community"

Similar presentations


Ads by Google