Presentation is loading. Please wait.

Presentation is loading. Please wait.

B IG D ATA A NALYTICS A Presentation by Meg Monsen, Michael Leonard, and Eric Zeng.

Similar presentations


Presentation on theme: "B IG D ATA A NALYTICS A Presentation by Meg Monsen, Michael Leonard, and Eric Zeng."— Presentation transcript:

1 B IG D ATA A NALYTICS A Presentation by Meg Monsen, Michael Leonard, and Eric Zeng

2 A GENDA  Big Data Analytics and its Objectives  Financial Impact  Structured vs Unstructured Data  Users of Big Data  Relevant Technologies ( Hadoop, MongoDB)  Coding Examples  Future of Analytics

3 W HAT IS B IG D ATA AND WHY DOES IT MATTER ?  Defining Big Data Analytics  Examining large sets of data  Discovering patterns and trends  Data warehouses are insufficient  Purposes  Uncovering hidden needs of customers  Improve operational efficiency

4 B IG D ATA & O PERATIONAL E FFICIENCY  “By using big data for operations analysis, organizations can gain real-time visibility into operations, customer experience, transactions and behavior.” – IBM  Core Objectives  Gain  Analyze  Apply  Optimize

5 F INANCIAL I MPACT OF B IG D ATA  High cost of poor data quality  3.1 trillion to US government annually  10-25% of US business revenues  Opportunities for qualified analysts  Business Analyst: $66,000  Data Analyst: $60,000  Data Scientist: $113,000

6

7 D IMENSIONS OF B IG D ATA  Essential Characteriestics:  Volume - Data quantity  Velocity - Data Speed  Variety - Data Types

8 S TRUCTURED VS. U NSTRUCTURED D ATA Structured Data Represented as text Transactional data, formal reports, accounting records of sales and costs Relational databases / data warehouse SQL Unstructured Data May be textual or non-textual Mobile usage, click stream activity, social media responses, genomic data No structured database / data lake NoSQL (Not only SQL), SQL Batch Queries

9 I LLUSTRATIVE E XAMPLE Inventory AnalystInsurance Actuary

10 I NTERPRETATIONS Big Data Analytics Structured Data

11 U SERS OF B IG D ATA  Device manufacturers, ERP providers, consulting firms comprise 7 of top 10 users Big Data  Based on a survey conducted by Dell of large corporations in 2014…  55% now follow Big Data strategy  60% of Big Data projects involve a cloud  32% involve real-time or near real-time processing  22% use data lake  20% of projects by outside consultants

12 H ADOOP  Free, Java-Based programming framework  Distributes storage and processes large data sets  Started from a Google File System paper published in October 2003  Development was furthered by Apache  Named after Doug Cutting’s son’s toy elephant (logo!)

13 W HEN TO U SE ( AND N OT U SE ) H ADOOP YES!  Analytics  Search  Data Retention  Log File processing  Analysis of Text, Image, Audio, and Video Content  Recommendation systems like in E- Commerce Websites NO!  Low-latency or near real-time data access  Large number of small files to process  Multiple write scenarios requiring arbitrary writes between files

14 W HO U SES H ADOOP ?

15 H ADOOP F RAMEWORK  Hadoop Common: Contains all the libraries and utilities  Hadoop Distributed File System (HDFS): Storage with high bandwith  Hadoop YARN: Resource-management platform  Hadoop MapReduce: Programming Model  for data processing

16 HDFS

17 M AP R EDUCE

18 M AP R EDUCE E XAMPLE

19 MONGODB

20 M ONGO DB = “T HE DATABASE FOR GIANT IDEAS ”  Cross-platform document- oriented database  Open-source  “The database for giant ideas”  Founded in 2007 written to  handle specific problems with DoubleClick  Classified as NoSQL database

21

22 M ONGO DB E XAMPLE Also, we can practice! http://www.w3resource.com/mongodb- exercises/#PracticeOnline

23

24 T HE F UTURE OF B IG D ATA A NALYTICS

25 A NY Q UESTIONS ?


Download ppt "B IG D ATA A NALYTICS A Presentation by Meg Monsen, Michael Leonard, and Eric Zeng."

Similar presentations


Ads by Google