MapReduce: Simpliyed Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat To appear in OSDI 2004 (Operating Systems Design and Implementation)

Jeff Dean Sanjay Ghemawat

Important programming model for large- scale data-parallel application Introduce

Motivation - Parallel applications Widely used Special purpose applications - Common functionality Parallelize computation Distribute data Handle failures - Large Scale(Big Data) Data Processing

MapReduce? -Programming Model Parallel Generic Scalable -Data Map(Key-Value) pair -Implementation Commodity clusters  Commodity PC

# map(key, val) is run on each item in set  emits new-key / new-val pairs  # reduce(key, vals) is run for each unique key emitted by map()  emits final output MapReduce? # User define function

Example # Distributed Grep (Global / Regular Expression / Print ) # Count of URL Access Frequency (logs of webpage request) map reduce # Reverse Web-Link Graph map<target(linked url), source(web page) reduce

# Inverted Index map reduce # Distributed Sort map reduce (emits all pairs unchanged) # Term-Vector per Host ( a list of pair) map reduce (throwing away infrequent terms, and emits a final) Example

Execution overview

Typical cluster # Machines are typically 100s or 1000s of 2-CPU x86 machines(dual-processor x86 processors) running Linux, with 2-4 GB of memory # NetWork 100 megabits/second or 1 gigabit/second # Storage Storage is on local IDE disks # GFS GFS: distributed file system manages data # Job scheduling system - jobs made up of tasks - scheduler assigns tasks to machines # Language C++ library linked into user programs

Distributed-1? #1 - Split input file into M pieces (16M ~ 64M)(user via optional parameter) - start up many copies of the program on a cluster of machines #2 - Master(1) – on e of the copies of the program is special - worker(n) – assigned work by the master - Map task(M) / Reduce tasks(R) #3 - Map task reads the content (from input split) - pares (key/value pair)  user define map function - buffered in memory #5 Reduce workers - it uses remote procedure calls to read the buffered data from the local disks of the map workers #4 Map workers - Periodically, the buffered pairs are written to local disk - the local disk are passed back to the master - who is responsible for forwarding these locations to the reduce workers

#6 - reduce worker iterates(unique intermediate key encountered) - start up many copies of the program on a cluster of machines - The output of the Reduce function is appended to a finnal output le for this reduce partition. Distributed-2? #7 - When all map tasks and reduce tasks have been completed - the master wakes up the user program - the MapReduce call in the user program returns back to the user code. #8 - After successful completion - R output files(reduce)(file names as specied by the user) - the MapReduce call in the user program returns back to the user code.

Master Data Structures #Status  Idle( 비가동 )  in-progress( 가동 )  completed( 완료 )

Fault Tolerance( 결함의 허용 범위 ) #Worker Failure - The master pings every worker periodically - MapReduce is resilient to large-scale worker failures #Master Failure  mapreduce stop - It is easy to make the master write periodic checkpoints of the master data structures described above. - If the master task dies, a new copy can be started from the last checkpointed state. - Clients can check for this condition and retry the MapReduce operation if they desire. #Semantics in the Presence of Failures ( 실패의 의미 )

Locality( 지역성 ) #GFS 저장  네트워크 대역폭 절약 GFS divides each file into 64 MB blocks, and stores several copies of each block (typically 3 copies) on different machines. #When running largeMapReduce operations on a signicant fraction of the workers in a cluster, most input data is read locally and consumes no network bandwidth.

Task Granularity # 이상적인 : Map (M), Reduce(R) M,R > Machines - 동적 로드벨런싱 향상 - worker failure  복구시간 향상 #Master O(M+R) 개의 스캐줄링 생성  O(M+R) 개의 상태가 메모리에 유지  실질적인 허용 범위가 존재함  O(M+R) 의 상태는 최소 1byte 로 구성됨 #reduce(r) 사용자 로부터 제약을 받음 ( 각각의 시스템에서 처리 됨으로 ) #M=200,000 개 R=5,000 개 (Machines)Worker=2000 환경에서 MapReduce 연산을 수행

Backup Tasks # ”Straggler”  낙오자  Machines 전체 연산 중 가장 나중에 수행 되는 매우 처리가 오래 걸리는 map or reduce task # When a MapReduce operation is close to completion, the master schedules backup executions of the remaining in-progress tasks. #The task is marked as completed whenever either the primary or the backup execution completes.

Combiner Function Master Map Task Map Task Reduce Task Reduce Task Reduce Task Map Task Network Traffic CPU Performance N1 N3 N2

Status Infomation #The master runs an internal HTTP server and exports a set of status pages for human consumption #how many tasks have been completed #how many are in progress, bytes of input, bytes of intermediate data, bytes of output, processing rates # The user can use this data to predict how long the computation will take

Conclusions #First, the model is easy to use, even for programmers without experience with parallel and distributed systems, # Second, a large variety of problems are easily expressible as MapReduce computations # Third, we have developed an implementation of MapReduce that scales to large clusters of machines comprising thousands of machines # First, restricting the programming model makes it easy to parallelize and distribute computations and to make such computations fault-tolerant. # Second, network bandwidth is a scarce resource. # Third, redundant execution can be used to reduce the impact of slow machines, and to handle machine failures and data loss.

MapReduce: Simpliyed Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat To appear in OSDI 2004 (Operating Systems Design and Implementation)

Similar presentations

Presentation on theme: "MapReduce: Simpliyed Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat To appear in OSDI 2004 (Operating Systems Design and Implementation)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MapReduce: Simpliyed Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat To appear in OSDI 2004 (Operating Systems Design and Implementation)

Similar presentations

Presentation on theme: "MapReduce: Simpliyed Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat To appear in OSDI 2004 (Operating Systems Design and Implementation)"— Presentation transcript:

Similar presentations

About project

Feedback