Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

Slides:



Advertisements
Similar presentations
Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Advertisements

Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
M. Muztaba Fuad Masters in Computer Science Department of Computer Science Adelaide University Supervised By Dr. Michael J. Oudshoorn Associate Professor.
Parallel Computing MapReduce Examples Parallel Efficiency Assignment
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
SkewTune: Mitigating Skew in MapReduce Applications
Concurrency for data-intensive applications
Spark: Cluster Computing with Working Sets
Felix Halim, Roland H.C. Yap, Yongzheng Wu
Distributed Computations
Sorting, Searching, and Simulation in the MapReduce Framework Michael T. Goodrich Dept. of Computer Science.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
Distributed Computations MapReduce
7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
“Evaluating MapReduce for Multi-core and Multiprocessor Systems” Colby Ranger, Ramanan Raghuraman, Arun Penmetsa, Gary Bradski, Christos Kozyrakis Computer.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
The Group Runtime Optimization for High-Performance Computing An Install-Time System for Automatic Generation of Optimized Parallel Sorting Algorithms.
Tree-Based Density Clustering using Graphics Processors
High-Quality, Deterministic Parallel Placement for FPGAs on Commodity Hardware Adrian Ludwin, Vaughn Betz & Ketan Padalia FPGA Seminar Presentation Nov.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
Computer Science and Engineering Advanced Computer Architectures CSE 8383 February 14 th, 2008 Presentation 1 By: Dina El-Sakaan.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Orchestration by Approximation Mapping Stream Programs onto Multicore Architectures S. M. Farhad (University of Sydney) Joint work with Yousun Ko Bernd.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Storage in Big Data Systems
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.
MapReduce How to painlessly process terabytes of data.
MapReduce M/R slides adapted from those of Jeff Dean’s.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.
A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and Gagan Agrawal.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Hung-chih Yang 1, Ali Dasdan 1 Ruey-Lung Hsiao 2, D. Stott Parker 2
Sep 08, 2009 SPEEDUP – Optimization and Porting of Path Integral MC Code to New Computing Architectures V. Slavnić, A. Balaž, D. Stojiljković, A. Belić,
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Sunpyo Hong, Hyesoon Kim
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Performance Assurance for Large Scale Big Data Systems
TensorFlow– A system for large-scale machine learning
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
Parallel Algorithm Design
Accelerating MapReduce on a Coupled CPU-GPU Architecture
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
Department of Computer Science University of California, Santa Barbara
Omega: flexible, scalable schedulers for large compute clusters
Optimizing MapReduce for GPUs with Effective Shared Memory Usage
COS 518: Distributed Systems Lecture 11 Mike Freedman
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

Department of Computer Science 2 MapReduce A model for parallel programming Proposed by Google Large scale distributed systems – 1,000 node clusters Applications: Distributed sort Distributed grep Indexing Simple, high-level interface Runtime handles: parallelization, scheduling, synchronization, and communication

Department of Computer Science 3 Cell B. E. Architecture A heterogeneous computing platform: 1 PPE, 8 SPEs Programming is hard Multi-threading is explicit SPE local memories are software-managed The Cell is like a “cluster-on-a-chip”

Department of Computer Science 4 Motivation MapReduce Scalable parallel model Simple interface Cell B. E. Complex parallel architecture Hard to program MapReduce for the Cell B.E. Architecture

Department of Computer Science 5 Overview Motivation MapReduce Cell B.E. Architecture MapReduce Example Design Evaluation Workload Characterization Application Performance Conclusions and Future Work

Department of Computer Science 6 MapReduce Example Counting word occurrences in a set of documents:

Department of Computer Science 7 Overview Motivation MapReduce Cell B.E. Architecture MapReduce Example Design Evaluation Workload Characterization Application Performance Conclusions and Future Work

Department of Computer Science 8 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce

Department of Computer Science 9 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce 1. Map streams key/value pairs

Department of Computer Science 10 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce 1. Map streams key/value pairs Key grouping implemented as: 2. Partition – hash and distribute 3. Quick-sort 4. Merge-sort two-phase external sort

Department of Computer Science 11 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce 1. Map streams key/value pairs Key grouping implemented as: 2. Partition – hash and distribute 3. Quick-sort 4. Merge-sort two-phase external sort

Department of Computer Science 12 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce 1. Map streams key/value pairs Key grouping implemented as: 2. Partition – hash and distribute 3. Quick-sort 4. Merge-sort two-phase external sort

Department of Computer Science 13 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce 1. Map streams key/value pairs Key grouping implemented as: 2. Partition – hash and distribute 3. Quick-sort 4. Merge-sort 5. Reduce “reduces” key/list-of-values pairs to key/value pairs. two-phase external sort

Department of Computer Science 14 Overview Motivation MapReduce Cell B.E. Architecture MapReduce Example Design Evaluation Workload Characterization Application Performance Conclusions and Future Work

Department of Computer Science 15 Evaluation Methodology MapReduce Model Characterization Synthetic micro-benchmark with six parameters Run on a 3.2 GHz Cell Blade Measured effect of each parameter on execution time Application Performance Comparison Six full applications MapReduce versions run on 3.2 GHz Cell Blade Single-threaded versions run on 2.4 GHz Core 2 Duo Evaluation Measured speedup comparing execution times Measured overheads on the Cell monitoring SPE idle time Measured ideal speedup assuming no Cell overheads

Department of Computer Science 16 MapReduce Model Characterization Model Characteristics CharacteristicDescription Map intensityExecution cycles per input byte to Map Reduce intensityExecution cycles per input byte to Reduce Map fan-outRatio of input size to output size in Map Reduce fan-inNumber of values per key in Reduce PartitionsNumber of partitions Input sizeInput size in bytes Effect on Execution Time

Department of Computer Science 17 Application Performance Applications histogram:counts bitmap RGB occurrences kmeans:clustering algorithm linearReg:least-squares linear regression wordCount:word count NAS_EP:EP benchmark from NAS suite distSort:distributed sort

Department of Computer Science 18 Speedup Over Core 2 Duo

Department of Computer Science 19 Runtime Overheads

Department of Computer Science 20 Overview Motivation MapReduce Cell B.E. Architecture MapReduce Example Design Evaluation Workload Characterization Application Performance Conclusions and Future Work

Department of Computer Science 21 Conclusions and Future Work Conclusions Programmability benefits High-performance on computationally intensive workloads Not applicable to all application types Future Work Additional performance tuning Extend for clusters of Cell processors Hierarchical MapReduce

Department of Computer Science Questions?

Department of Computer Science Backup Slides

Department of Computer Science 24 MapReduce API void MapReduce_exec(MapReduce Specification specification); The exec function initializes the MapReduce runtime and executes MapReduce according to the user specification. void MapReduce_emitIntermediate(void **key, void **value); void MapReduce_emit(void **value); These two functions are called by the user-defined Map and Reduce functions, respectively. These functions take references to pointers as arguments, and modify the referenced pointer to point to pre-allocated storage. It is then the responsibility of the application to provision this storage.

Department of Computer Science 25 Optimizations 1) Priority work queue Distributes load Avoids serialization Pipelined execution maximizes concurrency 2) Double-buffering 3) Application support Map only Map with sorted output Chaining invocations

Department of Computer Science 26 Optimizations 1) Priority work queue Distributes load Avoids serialization Pipelined execution maximizes concurrency 2) Double-buffering 3) Application support Map only Map with sorted output Chaining invocations

Department of Computer Science 27 Optimizations 4) Balanced merge (n / log(n) better bandwidth utilization as n → ∞) 5) Map and Reduce output regions pre-allocated. optimal memory alignment bulk memory transfers no user memory management no dynamic allocation overhead