Large-scale file systems and Map-Reduce Single-node architecture Memory Disk CPU Google example: 20+ billion web pages x 20KB = 400+ Terabyte 1 computer.

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Overview of MapReduce and Hadoop
LIBRA: Lightweight Data Skew Mitigation in MapReduce
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
Distributed Computations
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Map Reduce Allan Jefferson Armando Gonçalves Rocir Leite Filipe??
MapReduce Dean and Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Communications of the ACM, Vol. 51, No. 1, January Shahram.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Standard architecture emerging: – Cluster of commodity.
Jeffrey D. Ullman Stanford University. 2 Formal Definition Implementation Fault-Tolerance Example: Join.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Distributed Computations MapReduce
1 Generalizing Map-Reduce The Computational Model Map-Reduce-Like Algorithms Computing Joins.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Map Reduce Architecture
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
MapReduce.
MapReduce. Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture emerging: – Cluster of.
Map Reduce: Simplified Data Processing On Large Clusters Jeffery Dean and Sanjay Ghemawat (Google Inc.) OSDI 2004 (Operating Systems Design and Implementation)
Jeffrey D. Ullman Stanford University. 2 Chunking Replication Distribution on Racks.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
 Challenges:  How to distribute computation?  Distributed/parallel programming is hard  Map-reduce addresses all of the above  Google’s computational/data.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce and Hadoop 1 Wu-Jun Li Department of Computer Science and Engineering Shanghai Jiao Tong University Lecture 2: MapReduce and Hadoop Mining Massive.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
CS 345A Data Mining MapReduce. Single-node architecture Memory Disk CPU Machine Learning, Statistics “Classical” Data Mining.
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
DATA MINING LECTURE 15 The Map-Reduce Computational Paradigm Most of the slides are taken from: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman,
MapReduce and the New Software Stack CHAPTER 2 1.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
MapReduce: Simplified Data Processing on Large Clusters Lim JunSeok.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
Hadoop and MapReduce Shannon Quinn (with thanks from William Cohen of CMU)
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
 Much of the course will be devoted to large scale computing for data mining  Challenges:  How to distribute computation?  Distributed/parallel.
MapReduce: Simplified Data Processing on Large Cluster Authors: Jeffrey Dean and Sanjay Ghemawat Presented by: Yang Liu, University of Michigan EECS 582.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
Lecture 4. MapReduce Instructor: Weidong Shi (Larry), PhD
Large-scale file systems and Map-Reduce
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
湖南大学-信息科学与工程学院-计算机与科学系
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
Map-Reduce and the New Software Stack
Map-Reduce framework -By Jagadish Rouniyar.
CS 345A Data Mining MapReduce This presentation has been altered.
5/7/2019 Map Reduce Map reduce.
Big Data Analysis MapReduce.
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Large-scale file systems and Map-Reduce Single-node architecture Memory Disk CPU Google example: 20+ billion web pages x 20KB = 400+ Terabyte 1 computer reads MB/sec from disk ~4 months to read the web ~1,000 hard drives to store the web Takes even more to do something useful with the data New standard architecture is emerging: Cluster of commodity Linux nodes Gigabit ethernet interconnect Slide based on

Distributed File Systems Files are very large, read/append. They are divided into chunks. – Typically 64MB to a chunk. Chunks are replicated at several compute-nodes. A master (possibly replicated) keeps track of all locations of all chunks. Slide based on

Commodity clusters: compute nodes Organized into racks. Intra-rack connection typically gigabit speed. Inter-rack connection faster by a small factor. Recall that chunks are replicated Some implementations: GFS (Google File System – proprietary). In Aug 2006 Google had ~450,000 machines HDFS (Hadoop Distributed File System – open source). CloudStore (Kosmix File System, open source). Slide based on

Problems with Large-scale computing on commodity hardware Challenges: – How do you distribute computation? – How can we make it easy to write distributed programs? – Machines fail: One server may stay up 3 years (1,000 days) If you have 1,000 servers, expect to loose 1/day People estimated Google had ~1M machines in 2011 – 1,000 machines fail every day! Slide based on

Issue: Copying data over a network takes time Idea: – Bring computation close to the data – Store files multiple times for reliability Map-reduce addresses these problems – Google’s computational/data manipulation model – Elegant way to work with big data – Storage Infrastructure – File system Google: GFS. Hadoop: HDFS – Programming model Map-Reduce

Slide based on Problem: – If nodes fail, how to store data persistently? Answer: – Distributed File System: Provides global file namespace Google GFS; Hadoop HDFS; Typical usage pattern – Huge files (100s of GB to TB) – Data is rarely updated in place – Reads and appends are common

Racks of Compute Nodes File Chunks Replication Slide based on

3-way replication of files, with copies on different racks. Replication Slide based on

Map-Reduce You write two functions, Map and Reduce. – They each have a special form to be explained. System (e.g., Hadoop) creates a large number of tasks for each function. – Work is divided among tasks in a precise way. Slide based on

Map-Reduce Algorithms Map tasks convert inputs to key-value pairs. – “keys” are not necessarily unique. Outputs of Map tasks are sorted by key, and each key is assigned to one Reduce task. Reduce tasks combine values associated with a key. Slide based on

Simple map-reduce example: Word Count We have a large file of words, one word to a line Count the number of times each distinct word appears in the file Sample application: analyze web server logs to find popular URLs Different scenarios: – Case 1: Entire file fits in main memory – Case 2: File too large for main mem, but all pairs fit in main mem – Case 3: File on disk, too many distinct words to fit in memory Slide based on

Word Count Map task: For each word, e.g. CAT output (CAT,1) Total output: (w 1,1), (w 1,1), …., (w 1,1) (w 2,1), (w 2,1), …., (w 2,1) …… Hash each (w,1) to bucket h(w) in [0,r-1] in local intermediate file. r is the number of reducers Master: Group by key: (w 1,[1,1,…,1]), (w 2,[1,1,…,1]), Push group (w,[1,1,..,1]) to reducer h(w) Reduce task: Reducer h(w) Read : (w,[1,1,…,1]) Aggregate: each (w,[1,1,…,1]) into (w,sum) Output: (w,sum) into common output file Since addition is commutative and associative the map task could have sent : (w 1,sum 1 ), (w 2,sum 2 ), … Reduce task would receive: (w i,sum i,1 ), (w i,sum i,2 ), … (w j,sum j,1 ), (w j,sum j,2 ), … and output (w i,sum i ), (w j,sum j ), …. Slide based on

Partition Function Inputs to map tasks are created by contiguous splits of input file For reduce, we need to ensure that records with the same intermediate key end up at the same worker System uses a default partition function e.g., hash(key) mod R Sometimes useful to override – E.g., hash(hostname(URL)) mod R ensures URLs from a host end up in the same output file Slide based on

Coordination Master data structures – Task status: (idle, in-progress, completed) – Idle tasks get scheduled as workers become available – When a map task completes, it sends the master the location and sizes of its R intermediate files, one for each reducer – Master pushes this info to reducers Master pings workers periodically to detect failures Slide based on

Data flow Input, final output are stored on a distributed file system – Scheduler tries to schedule map tasks “close” to physical storage location of input data Intermediate results are stored on local FS of map and reduce workers Output is often input to another map-reduce task Master data structures – Task status: (idle, in-progress, completed) – Idle tasks get scheduled as workers become available – When a map task completes, it sends the master the location and sizes of its R intermediate files, one for each reducer – Master pushes this info to reducers Master pings workers periodically to detect failures Slide based on

Failures Map worker failure – Map tasks completed or in-progress at worker are reset to idle (result sits locally at worker) – Reduce workers are notified when task is rescheduled on another worker Reduce worker failure – Only in-progress tasks are reset to idle Master failure – Map-reduce task is aborted and client is notified Slide based on

How many Map and Reduce jobs? M map tasks, R reduce tasks Rule of thumb: – Make M and R much larger than the number of nodes in cluster – One DFS chunk per map is common – Improves dynamic load balancing and speeds recovery from worker failure Usually R is smaller than M, because output is spread across R files Slide based on

Relational operators with map-reduce Selection Map task: If C(t) is true output pair (t,t) Reduce task: With input (t,t) output t Selection is not really suitable for map-reduce, everything could have been done in the map task

Relational operators with map-reduce Projection Map task: Let t’ be the projection of t. Output pair (t’,t’) Reduce task: With input (t’,[t’,t’,…,t’] ) output t’ Here the duplicate elimination is done by the reduce task

Relational operators with map-reduce Union R  S Map task: for each tuple t of the chunk of R or S output (t, t) Reduce task: input is (t,[t]) or (t,[t, t]). Output t Intersection R  S Map task: for each tuple t of the chunk output (t,t) Reduce task: if input is (t,[t,t]), output t if input is (t,[t]), output nothing Difference R – S Map task: for each tuple t of R output (t,R) for each tuple t of S output (t,S) Reduce task: if input is (t,[R]), output t if input is (t,[R,S]), output nothing

Joining by Map-Reduce Suppose we want to compute R(A,B) JOIN S(B,C), using k Reduce tasks. – I.e., find tuples with matching B-values. R and S are each stored in a chunked file. Use a hash function h from B-values to k buckets. – Bucket = Reduce task. The Map tasks take chunks from R and S, and send: – Tuple R(a,b) to Reduce task h(b). Key = b value = R(a,b). – Tuple S(b,c) to Reduce task h(b). Key = b; value = S(b,c). Slide based on

Reduce task i Map tasks send R(a,b) if h(b) = i Map tasks send S(b,c) if h(b) = i All (a,b,c) such that h(b) = i, and (a,b) is in R, and (b,c) is in S. Key point: If R(a,b) joins with S(b,c), then both tuples are sent to Reduce task h(b). Thus, their join (a,b,c) will be produced there and shipped to the output file. Slide based on

Mapping tuples in joins Mapper for R(1,2) R(1,2)(2, (R,1)) Mapper for R(4,2) R(4,2) Mapper for S(2,3) S(2,3) Mapper for S(5,6) S(5,6) (2, (R,4)) (2, (S,3)) (5, (S,6)) 23 Reducer for B = 2 Reducer for B = 5 (2, [(R,1), (R,4), (S,3)]) (5, [(S,6)]) Slide based on

Output of the Reducers Reducer for B = 2 Reducer for B = 5 (2, [(R,1), (R,4), (S,3)]) (5, [(S,6)]) (1,2,3), (4,2,3) Slide based on

Relational operators with map-reduce Grouping and aggregation:  A,agg(B) (R(A,B,C)) Map task: for each tuple (a,b,c) output (a,[b]) Reduce task: if input is (a,[b 1, b 2, …, b n ]), output (a,agg(b 1, b 2, …, b n )) for example (a, b 1 +b 2 + …+b n )

Matrix-vector multiplication using map-reduce j=1

If vector doesn’t fit in main memory Divide matrix and vector into stripes: Each map task gets a chunk of stripe i of the matrix and the entire stripe i of the vector and produces pairs Reduce task i gets all pairs and produces pairs Slide based on

Example: MAPPERS: REDUCERS:

Examples: Hamming distance 1 between bit-strings Matrix multiplication in one MR-round Matrix multiplication in two MR-rounds Three-way joins in two rounds and in one round

Relational operators with map-reduce Three-Way Join We shall consider a simple join of three relations, the natural join R(A,B) ⋈ S(B,C) ⋈ T(C,D). One way: cascade of two 2-way joins, each implemented by map-reduce. Fine, unless the 2-way joins produce large intermediate relations. Slide based on

Another 3-Way Join Reduce processes use hash values of entire S(B,C) tuples as key. Choose a hash function h that maps B- and C-values to k buckets. There are k 2 Reduce processes, one for each (B-bucket, C-bucket) pair. Slide based on

Job of the Reducers Each reducer gets, for certain B-values b and C-values c : 1.All tuples from R with B = b, 2.All tuples from T with C = c, and 3.The tuple S(b,c) if it exists. Thus it can create every tuple of the form (a, b, c, d) in the join. Slide based on

Mapping for 3-Way Join uWe map each tuple S(b,c) to ((h(b), h(c)), (S, b, c)). uWe map each R(a,b) tuple to ((h(b), y), (R, a, b)) for all y = 1, 2,…,k. uWe map each T(c,d) tuple to ((x, h(c)), (T, c, d)) for all x = 1, 2,…,k. KeysValues Aside: even normal map-reduce allows inputs to map to several key-value pairs. Slide based on

Assigning Tuples to Reducers h(b) = h(c) = S(b,c) where h(b)=1; h(c)=2 R(a,b), where h(b)=2 T(c,d), where h(c)=3 Slide based on

DB = R(1,1) R(1,2) R(2,1) R(2,2) S(1,1) S(1,2) S(2,1) S(2,2) T(1,1) T(1,2) T(2,1) T(2,2)..etc R(1,1) R(2,1) S(1,1)T(1,1) T(1,2) R(1,1) R(2,1) S(1,2)T(2,1) T(2,2) R(1,2) R(2,2) S(2,1)T(1,1) T(1,2) R(1,2) R(2,2) S(2,2)T(2,1) T(2,2) Mapper R(1,1) ( 1,1,(R,1) ) ( 1,2,(R,1) ) Mapper S(1,2) ( 1,2,(S,1,2) )