Zoie Barrett and Brian Lam

Slides:



Advertisements
Similar presentations
John Lenhart.  Data stores are growing by 50% each year, and that rate of increase is accelerating [1]  In 2010, we crossed the barrier of the zettabyte.
Advertisements

Big Data. What is Big Data? Analog starage vs digital. The FOUR V’s of Big Data. Who’s Generating Big Data The importance of Big Data. Optimalization.
Big Data Course Plans at Purdue Ananth Iyer. Big Data/Analytics Coursera course on Big Data by Bill Howe claims that Big Data involves issues of
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
This presentation was scheduled to be delivered by Brian Mitchell, Lead Architect, Microsoft Big Data COE Follow him Contact him.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Introduction to Hadoop and HDFS
SEMINAR ON Guided by: Prof. D.V.Chaudhari Seminar by: Namrata Sakhare Roll No: 65 B.E.Comp.
1 Intern Project Presentation Connor Richardson Big Data August 4, 2015.
O’Reilly – Hadoop: The Definitive Guide Ch.1 Meet Hadoop May 28 th, 2010 Taewhi Lee.
The exponential growth of data –Challenges for Google,Yahoo,Amazon & Microsoft in web search and indexing The volume of data being made publicly available.
(C) 2008 Clusterpoint(C) 2008 ClusterPoint Ltd. Empowering You to Manage and Drive Down Database Costs April 17, 2009 Gints Ernestsons, CEO © 2009 Clusterpoint.
Alastair Duncan STFC Pre Coffee talk STFC July 2014 The Trials and Tribulations and ultimate success of parallelisation using Hadoop within the SCAPE project.
How Companies are Using Spark And where the Edge in Big Data will be Matei Zaharia.
+ Big Data IST210 Class Lecture. + Big Data Summary by EMC Corporation ( More videos that.
HADOOP Carson Gallimore, Chris Zingraf, Jonathan Light.
What we know or see What’s actually there Wikipedia : In information technology, big data is a collection of data sets so large and complex that it.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
This is a free Course Available on Hadoop-Skills.com.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
BIG DATA/ Hadoop Interview Questions.
Apache Hadoop on Windows Azure Avkash Chauhan
Microsoft Ignite /28/2017 6:07 PM
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
A Tutorial on Hadoop Cloud Computing : Future Trends.
Big Data-An Analysis. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult.
Data Analytics (CS40003) Introduction to Data Lecture #1
Big Data & Test Automation
CNIT131 Internet Basics & Beginning HTML
Big Data, Data Mining, Tools
Big Data Analytics on Large Scale Shared Storage System
Big Data is a Big Deal!.
SNS COLLEGE OF TECHNOLOGY
SAS users meeting in Halifax
Big Data Enterprise Patterns
MapReduce Compiler RHadoop
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
Understanding Big Data
Hadoop Aakash Kag What Why How 1.
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Software Systems Development
Hadoop MapReduce Framework
Chapter 14 Big Data Analytics and NoSQL
Big Data Technology.
Hadoopla: Microsoft and the Hadoop Ecosystem
TABLE OF CONTENTS. TABLE OF CONTENTS Not Possible in single computer and DB Serialised solution not possible Large data backup difficult so data.
The Hadoop Sandbox The Playground for the Future of Your Career
Hadoop.
Rahi Ashokkumar Patel U
© 2016 Global Market Insights, Inc. USA. All Rights Reserved Fuel Cell Market size worth $25.5bn by 2024Low Power Wide Area Network.
Hadoop Market
Ministry of Higher Education
Big Data - in Performance Engineering
CS6604 Digital Libraries IDEAL Webpages Presented by
Hadoop Basics.
Overview of big data tools
Big Data Young Lee BUS 550.
TIM TAYLOR AND JOSH NEEDHAM
Charles Tappert Seidenberg School of CSIS, Pace University
Dep. of Information Technology By: Raz Dara Mohammad Amin
Big Data Analysis in Digital Marketing
AGENDA Buzz word. AGENDA Buzz word What is BIG DATA ? Big Data refers to massive, often unstructured data that is beyond the processing capabilities.
Big Data.
Presentation transcript:

Zoie Barrett and Brian Lam Big Data and Hadoop Zoie Barrett and Brian Lam

Agenda What is Big Data? What Tools are There? Hadoop Hadoop vs SQL Examples Questions?

What is Big Data? Large, complex, rapidly growing, unstructured data sets that are difficult to process using traditional methods Analyzing Big Data is very complex and requires skills of programmers and statistics majors

Dimensions Volume Determining relevance How to use analytics to create value Velocity Unprecedented speeds for data streaming Reacting quickly Variety Multiple formats many unstructured Managing, merging and governing

Big Statistics 90% of the worlds data created in last 2 years We create 2.5 quintillion bytes of data a day 48 hrs of video uploaded to YouTube every minute (nearly 8 years of content every day) 100 terrabytes of data uploaded to Facebook daily 230 million Tweets a day http://analyzingmedia.com/2012/infographic-big-flood-of-big-data-in-digital-marketing/

Concerns with Big Data Data storage is becoming cheaper and cheaper but how do we manage it? Read/Write speeds are not keeping up with the amount of data being generated Data is unstructured and hard to analyze What is the solution? http://www.slideshare.net/martyhall/hadoop-tutorial-overview-of-hadoop

Tools

Hadoop Open-source software framework for storage and processing large data sets Fundamental assumption: hardware failures are common Clusters of commodity hardware Batch not Real Time Hadoop based projects for real time analysis

History of Hadoop Doug Cutting and Mike Cafarella wanted to develop a better open source search engine Created Nutch (web crawler) Based on Lucene (search engine library)

How Hadoop works Hadoop Distributed Filesystem (HDFS) designed to run on commodity hardware data is stored across multiple servers fault tolerant MapReduce processes data Map - divides jobs into pieces and distributes Reduce - combines results

Who Uses Hadoop?

Hadoop vs SQL SQL Data Storage: logical, interrelated tables and defined columns Hadoop Data Storage: compressed file of text or other data types https://gigaom.com/2013/03/04/the-history-of-hadoop-from-4-nodes-to-the-future-of-data/

Examples UPS - reduced maintenance cost Schwan’s - analyzed customer feedback Memphis PD - used analytics to reduce crime

Dilbert

Questions?