MapReduce. 2 (2012) Average Searches Per Day: 5,134,000,000 (2012) Average Searches Per Day: 5,134,000,000.

Slides:



Advertisements
Similar presentations
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung SOSP 2003 Presented by Wenhao Xu University of British Columbia.
Advertisements

MapReduce Simplified Data Processing on Large Clusters
 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
The google file system Cs 595 Lecture 9.
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
The Google File System Authors : Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung Presentation by: Vijay Kumar Chalasani 1CS5204 – Operating Systems.
Google File System 1Arun Sundaram – Operating Systems.
Distributed Computations
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
The Google File System. Why? Google has lots of data –Cannot fit in traditional file system –Spans hundreds (thousands) of servers connected to (tens.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
The Google File System.
Distributed Computations MapReduce
Google File System.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Case Study - GFS.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
1 The Google File System Reporter: You-Wei Zhang.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Introduction to MapReduce ECE7610. The Age of Big-Data  Big-data age  Facebook collects 500 terabytes a day(2011)  Google collects 20000PB a day (2011)
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Presenters: Rezan Amiri Sahar Delroshan
GFS : Google File System Ömer Faruk İnce Fatih University - Computer Engineering Cloud Computing
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
MapReduce and the New Software Stack CHAPTER 2 1.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
MapReduce : Simplified Data Processing on Large Clusters P 謝光昱 P 陳志豪 Operating Systems Design and Implementation 2004 Jeffrey Dean, Sanjay.
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Presenter: Seikwon KAIST The Google File System 【 Ghemawat, Gobioff, Leung 】
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture Chunkservers Master Consistency Model File Mutation Garbage.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
MapReduce B.Ramamurthy & K.Madurai 1. Motivation Process lots of data Google processed about 24 petabytes of data per day in A single machine A.
Map reduce Cs 595 Lecture 11.
Google File System.
Google Filesystem Some slides taken from Alan Sussman.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
Cse 344 May 4th – Map/Reduce.
Hadoop Technopoints.
The Google File System (GFS)
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

MapReduce

2 (2012) Average Searches Per Day: 5,134,000,000 (2012) Average Searches Per Day: 5,134,000,000

Motivation Process lots of data Google processed about 24 petabytes of data per day in A single machine A single machine cannot serve all the data in parallel You need a distributed system to store and process in parallel Parallel programming? Threading is hard! communication How do you facilitate communication between nodes? How do you scale to more machines? failures How do you handle machine failures? 3

MapReduce MapReduce [OSDI’04] provides – Automatic parallelization, distribution – I/O scheduling Load balancing Network and data transfer optimization – Fault tolerance Handling of machine failures Need more power: Scale out, not up! Large number of commodity servers as opposed to some high end specialized servers 4 Apache Hadoop: Open source implementation of MapReduce Apache Hadoop: Open source implementation of MapReduce

Typical problem solved by MapReduce Read a lot of data Map: extract something you care about from each record Shuffle and Sort Reduce: aggregate, summarize, filter, or transform Write the results 5

MapReduce workflow 6 Worker read local write remote read, sort Output File 0 Output File 1 write Split 0 Split 1 Split 2 Input DataOutput Data Map extract something you care about from each record Reduce aggregate, summarize, filter, or transform

Mappers and Reducers Need to handle more data? Just add more Mappers/Reducers! No need to handle multithreaded code – Mappers and Reducers are typically single threaded and deterministic Determinism allows for restarting of failed jobs – Mappers/Reducers run entirely independent of each other In Hadoop, they run in separate JVMs 7

8 Example: Word Count (2012) Average Searches Per Day: 5,134,000, nodes: each node will process 5,134,000 queries (2012) Average Searches Per Day: 5,134,000, nodes: each node will process 5,134,000 queries

Mapper Reads in input pair Outputs a pair – Let’s count number of each word in user queries (or Tweets/Blogs) – The input to the mapper will be : – The output would be: 9

Reducer Accepts the Mapper output, and aggregates values on the key – For our example, the reducer input would be: – The output would be: 10

MapReduce 11 Hadoop Program Hadoop Program Master fork assign map assign reduce Worker read local write remote read, sort Split 0 Split 1 Split 2 Input Data MapReduce Output File 0 Output File 1 write Output Data Transfer peta- scale data through network

Google File System (GFS) Hadoop Distributed File System (HDFS) Split data and store 3 replica on commodity servers 12

MapReduce 13 Master assign map assign reduce Worker local write remote read, sort Output File 0 Output File 1 write Split 0 Split 1 Split 2 Split 0 Split 1 Split 2 Input DataOutput Data MapReduce HDFS NameNode HDFS NameNode Read from local disk Where are the chunks of input data? Location of the chunks of input data

Locality Optimization Master scheduling policy: – Asks GFS for locations of replicas of input file blocks – Map tasks scheduled so GFS input block replica are on same machine or same rack Effect: Thousands of machines read input at local disk speed – Eliminate network bottleneck! 14

Failure in MapReduce norm Failures are norm in commodity hardware Worker failure – Detect failure via periodic heartbeats – Re-execute in-progress map/reduce tasks Master failure – Single point of failure; Resume from Execution Log Robust – Google’s experience: lost 1600 of 1800 machines once!, but finished fine 15

Fault tolerance: Handled via re-execution On worker failure: – Detect failure via periodic heartbeats – Re-execute completed and in-progress map tasks – Task completion committed through master Robust: [Google’s experience] lost 1600 of 1800 machines, but finished fine 16

Refinement: Redundant Execution Slow workers significantly lengthen completion time – Other jobs consuming resources on machine – Bad disks with soft errors transfer data very slowly – Weird things: processor caches disabled (!!) Solution: spawn backup copies of tasks – Whichever one finishes first "wins" 17

Refinement: Skipping Bad Records Map/Reduce functions sometimes fail for particular inputs Best solution is to debug & fix, but not always possible If master sees two failures for the same record: – Next worker is told to skip the record 18

A MapReduce Job 19 Mapper Reducer Run this program as a MapReduce job

20 Mapper Reducer Run this program as a MapReduce job

Summary MapReduce – Programming paradigm for data-intensive computing – Distributed & parallel execution model – Simple to program The framework automates many tedious tasks (machine selection, failure handling, etc.) 21

22

Contents Motivation Design overview – Write Example – Record Append Fault Tolerance & Replica Management Conclusions 23

Motivation : Large Scale Data Storage Manipulate large (Peta Scale) sets of data Large number of machine with commodity hardware Component failure is the norm Goal: Scalable, high performance, fault tolerant distributed file system 24

Why a new file system? None designed for their failure model Few scale as highly or dynamically and easily Lack of special primitives for large distributed computation 25

What should expect from GFS Designed for Google’s application – Control of both file system and application – Applications use a few specific access patterns Append to larges files Large streaming reads – Not a good fit for low-latency data access lots of small files, multiple writers, arbitrary file modifications Not POSIX, although mostly traditional – Specific operations: RecordAppend 26

Different characteristic than transactional or the “customer order” data : “write once read many (WORM)” e.g. web logs, web crawler’s data, or healthcare and patient information WORM inspired MapReduce programming model Google exploited this characteristics in its Google file system [SOSP’03] – Apache Hadoop: Open source project 27

Contents Motivation Design overview – Write Example – Record Append Fault Tolerance & Replica Management Conclusions 28

Components 29 Master (NameNode) – Manages metadata (namespace) – Not involved in data transfer – Controls allocation, placement, replication Chunkserver (DataNode) – Stores chunks of data – No knowledge of GFS file system structure – Built on local linux file system e704fa2012/2.2-HDFS.pptx

GFS Architecture 30

Write operation 31

Write(filename, offset, data) 32 Client Secondary ReplicaA Secondary ReplicaB Primary Replica Master 1) Who has the lease? 3) Data push Data Control 4) Commit 2) Lease info 6)Commit ACK 5) Serialized Commit 7) Success

RecordAppend(filename, data) Significant use in distributed apps. For example at Google production cluster: – 21% of bytes written – 28% of write operations Guaranteed: All data appended at least once as a single consecutive byte range Same basic structure as write Client obtains information from master Client sends data to data nodes (chunkservers) Client sends “append-commit” Lease holder serializes append Advantage: Large number of concurrent writers with minimal coordination 33

RecordAppend (2) Record size is limited by chunk size When a record does not fit into available space, – chunk is padded to end – and client retries request. 34

Contents Motivation Design overview – Write Example – Record Append Fault Tolerance & Replica Management Conclusions 35

Fault tolerance Replication – High availability for reads – User controllable, default 3 (non-RAID) – Provides read/seek bandwidth – Master is responsible for directing re-replication if a data node dies Online checksumming in data nodes – Verified on reads 36

Replica Management Bias towards topological spreading – Rack, data center Rebalancing – Move chunks around to balance disk fullness – Gently fixes imbalances due to: Adding/removing data nodes 37

Replica Management (Cloning) Chunk replica lost or corrupt Goal: minimize app disruption and data loss – Approximately in priority order More replica missing-> priority boost Deleted file-> priority decrease Client blocking on a write-> large priority boost – Master directs copying of data Performance on a production cluster – Single failure, full recovery (600GB): 23.2 min – Double failure, restored 2x replication: 2min 38

Garbage Collection Master does not need to have a strong knowledge of what is stored on each data node – Master regularly scans namespace – After GC interval, deleted files are removed from the namespace – Data node periodically polls Master about each chunk it knows of. – If a chunk is forgotten, the master tells data node to delete it. 39

Limitations Master is a central point of failure Master can be a scalability bottleneck Latency when opening/stating thousands of files Security model is weak 40

Conclusion Inexpensive commodity components can be the basis of a large scale reliable system Adjusting the API, e.g. RecordAppend, can enable large distributed apps Fault tolerant Useful for many similar apps 41

42

43 Client read Client write Datanode Replication

HDFS Architecture 44 Client read Client write Datanode Replication

Missing Replicas

Map Reduce

47