Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zoie Barrett and Brian Lam

Similar presentations


Presentation on theme: "Zoie Barrett and Brian Lam"— Presentation transcript:

1 Zoie Barrett and Brian Lam
Big Data and Hadoop Zoie Barrett and Brian Lam

2 Agenda What is Big Data? What Tools are There? Hadoop Hadoop vs SQL
Examples Questions?

3 What is Big Data? Large, complex, rapidly growing, unstructured data sets that are difficult to process using traditional methods Analyzing Big Data is very complex and requires skills of programmers and statistics majors

4 Dimensions Volume Determining relevance
How to use analytics to create value Velocity Unprecedented speeds for data streaming Reacting quickly Variety Multiple formats many unstructured Managing, merging and governing

5 Big Statistics 90% of the worlds data created in last 2 years
We create 2.5 quintillion bytes of data a day 48 hrs of video uploaded to YouTube every minute (nearly 8 years of content every day) 100 terrabytes of data uploaded to Facebook daily 230 million Tweets a day

6 Concerns with Big Data Data storage is becoming cheaper and cheaper but how do we manage it? Read/Write speeds are not keeping up with the amount of data being generated Data is unstructured and hard to analyze What is the solution?

7 Tools

8 Hadoop Open-source software framework for storage and processing large data sets Fundamental assumption: hardware failures are common Clusters of commodity hardware Batch not Real Time Hadoop based projects for real time analysis

9 History of Hadoop Doug Cutting and Mike Cafarella wanted to develop a better open source search engine Created Nutch (web crawler) Based on Lucene (search engine library)

10 How Hadoop works Hadoop Distributed Filesystem (HDFS)
designed to run on commodity hardware data is stored across multiple servers fault tolerant MapReduce processes data Map - divides jobs into pieces and distributes Reduce - combines results

11

12 Who Uses Hadoop?

13 Hadoop vs SQL SQL Data Storage:
logical, interrelated tables and defined columns Hadoop Data Storage: compressed file of text or other data types

14 Examples UPS - reduced maintenance cost
Schwan’s - analyzed customer feedback Memphis PD - used analytics to reduce crime

15 Dilbert

16 Questions?


Download ppt "Zoie Barrett and Brian Lam"

Similar presentations


Ads by Google