Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.

Similar presentations


Presentation on theme: "Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries."— Presentation transcript:

1 Introduction to Apache Hadoop Zibo Wang

2 Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries for data-intensive computing using simple single map-reduce interface and its own distributed file system called HDFS.  Started by Doug Cutting and Mike Cazfarella.  Written in JAVA

3 Introduction  The use of Hadoop  Compute  Storage  Database  The advantages of Hadoop  Scalable Algorithms  Log Management  Extract-Transform-Load (ETL) Platform

4 Map-Reduce  Introduced by Google  A simple and powerful interface that enables automatic parallelization and distribution of large-scale computation.  Two major functions  Map  Reduce  Nodes and trackers

5 Map-Reduce

6 Hadoop Distributed File System (HDFS)  It has large block size (default 64mb) for storage to compensate for seek time to network bandwidth. So very large files for storage are ideal.  Streaming data access. Write once and read many times architecture. Since files are large time to read is significant parameter than seek to first record.  Commodity hardware. It is designed to run on commodity hardware which may fail. HDFS is capable of handling it.

7 HDFS Architecture  Filesystem Metadata  Framework of write  Framework of read

8 Prominent Users of Hadoop  Yahoo!  More than 10,000 core Linux cluster  Open scource  Facebook  30 PB data  Amazon  Amazon Elastic Compute Cloud  Amazon Simple Storage Service

9 Thank you!


Download ppt "Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries."

Similar presentations


Ads by Google