Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics

Similar presentations


Presentation on theme: "Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics"— Presentation transcript:

1 Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics
Charles Tappert Seidenberg School of CSIS, Pace University

2 1.1 Big Data Overview Industries that gather and exploit data
Credit card companies monitor purchase Good at identifying fraudulent purchases Mobile phone companies analyze calling patterns – e.g., even on rival networks Look for customers might switch providers For social networks data is primary product Intrinsic value increases as data grows

3 Attributes Defining Big Data Characteristics
Huge volume of data Not just thousands/millions, but billions of items Complexity of data types and structures Varity of sources, formats, structures Speed of new data creation and grow High velocity, rapid ingestion, fast analysis

4 Sources of Big Data Deluge
Mobile sensors – GPS, accelerometer, etc. Social media – 700 Facebook updates/sec in2012 Video surveillance – street cameras, stores, etc. Video rendering – processing video for display Smart grids – gather and act on information Geophysical exploration – oil, gas, etc. Medical imaging – reveals internal body structures Gene sequencing – more prevalent, less expensive, healthcare would like to predict personal illnesses

5 Sources of Big Data Deluge

6 Example: Genotyping from 23andme.com

7 1.1.1 Data Structures: Characteristics of Big Data

8 Data Structures: Characteristics of Big Data
Structured – defined data type, format, structure Transactional data, OLAP cubes, RDBMS, CVS files, spreadsheets Semi-structured Text data with discernable patterns – e.g., XML data Quasi-structured Text data with erratic data formats – e.g., clickstream data Unstructured Data with no inherent structure – text docs, PDF’s, images, video

9 Example of Structured Data

10 Example of Semi-Structured Data

11 Example of Quasi-Structured Data visiting 3 websites adds 3 URLs to user’s log files

12 Example of Unstructured Data Video about Antarctica Expedition

13 1.1.2 Types of Data Repositories from an Analyst Perspective

14 1.2 State of the Practice in Analytics
Business Intelligence (BI) versus Data Science Current Analytical Architecture Drivers of Big Data Emerging Big Data Ecosystem and a New Approach to Analytics

15 Business Drivers for Advanced Analytics

16 1.2.1 Business Intelligence (BI) versus Data Science

17 1.2.2 Current Analytical Architecture Typical Analytic Architecture

18 Current Analytical Architecture
Data sources must be well understood EDW – Enterprise Data Warehouse From the EDW data is read by applications Data scientists get data for downstream analytics processing

19 1.2.3 Drivers of Big Data Data Evolution & Rise of Big Data Sources

20 1.2.4 Emerging Big Data Ecosystem and a New Approach to Analytics
Four main groups of players Data devices Games, smartphones, computers, etc. Data collectors Phone and TV companies, Internet, Gov’t, etc. Data aggregators – make sense of data Websites, credit bureaus, media archives, etc. Data users and buyers Banks, law enforcement, marketers, employers, etc.

21 Emerging Big Data Ecosystem and a New Approach to Analytics

22 1.3 Key Roles for the New Big Data Ecosystem
Deep analytical talent Advanced training in quantitative disciplines – e.g., math, statistics, machine learning Data savvy professionals Savvy but less technical than group 1 Technology and data enablers Support people – e.g., DB admins, programmers, etc.

23 Three Key Roles of the New Big Data Ecosystem

24 Three Recurring Data Scientist Activities
Reframe business challenges as analytics challenges Design, implement, and deploy statistical models and data mining techniques on Big Data Develop insights that lead to actionable recommendations

25 Profile of Data Scientist Five Main Sets of Skills

26 Profile of Data Scientist Five Main Sets of Skills
Quantitative skill – e.g., math, statistics Technical aptitude – e.g., software engineering, programming Skeptical mindset and critical thinking – ability to examine work critically Curious and creative – passionate about data and finding creative solutions Communicative and collaborative – can articulate ideas, can work with others

27 1.4 Examples of Big Data Analytics
Retailer Target Uses life events: marriage, divorce, pregnancy Apache Hadoop Open source Big Data infrastructure innovation MapReduce paradigm, ideal for many projects Social Media Company LinkedIn Social network for working professionals Can graph a user’s professional network 250 million users in 2014

28 Data Visualization of User’s Social Network Using InMaps

29 Summary Big Data comes from myriad sources
Social media, sensors, IoT, video surveillance, and sources only recently considered Companies are finding creative and novel ways to use Big Data Exploiting Big Data opportunities requires New data architectures New machine learning algorithms, ways of working People with new skill sets Always Review Chapter Exercises

30 Focus of Course Focus on quantitative disciplines – e.g., math, statistics, machine learning Provide overview of Big Data analytics In-depth study of a several key algorithms


Download ppt "Data Science and Big Data Analytics Chap1: Intro to Big Data Analytics"

Similar presentations


Ads by Google