Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

Similar presentations


Presentation on theme: "Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam."— Presentation transcript:

1 Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam

2 Department of Computer Science 2 MapReduce A model for parallel programming Proposed by Google Large scale distributed systems – 1,000 node clusters Applications: Distributed sort Distributed grep Indexing Simple, high-level interface Runtime handles: parallelization, scheduling, synchronization, and communication

3 Department of Computer Science 3 Cell B. E. Architecture A heterogeneous computing platform: 1 PPE, 8 SPEs Programming is hard Multi-threading is explicit SPE local memories are software-managed The Cell is like a “cluster-on-a-chip”

4 Department of Computer Science 4 Motivation MapReduce Scalable parallel model Simple interface Cell B. E. Complex parallel architecture Hard to program MapReduce for the Cell B.E. Architecture

5 Department of Computer Science 5 Overview Motivation MapReduce Cell B.E. Architecture MapReduce Example Design Evaluation Workload Characterization Application Performance Conclusions and Future Work

6 Department of Computer Science 6 MapReduce Example Counting word occurrences in a set of documents:

7 Department of Computer Science 7 Overview Motivation MapReduce Cell B.E. Architecture MapReduce Example Design Evaluation Workload Characterization Application Performance Conclusions and Future Work

8 Department of Computer Science 8 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce

9 Department of Computer Science 9 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce 1. Map streams key/value pairs

10 Department of Computer Science 10 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce 1. Map streams key/value pairs Key grouping implemented as: 2. Partition – hash and distribute 3. Quick-sort 4. Merge-sort two-phase external sort

11 Department of Computer Science 11 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce 1. Map streams key/value pairs Key grouping implemented as: 2. Partition – hash and distribute 3. Quick-sort 4. Merge-sort two-phase external sort

12 Department of Computer Science 12 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce 1. Map streams key/value pairs Key grouping implemented as: 2. Partition – hash and distribute 3. Quick-sort 4. Merge-sort two-phase external sort

13 Department of Computer Science 13 Design Flow of Execution Five stages: Map, Partition, Quick-sort, Merge-sort, Reduce 1. Map streams key/value pairs Key grouping implemented as: 2. Partition – hash and distribute 3. Quick-sort 4. Merge-sort 5. Reduce “reduces” key/list-of-values pairs to key/value pairs. two-phase external sort

14 Department of Computer Science 14 Overview Motivation MapReduce Cell B.E. Architecture MapReduce Example Design Evaluation Workload Characterization Application Performance Conclusions and Future Work

15 Department of Computer Science 15 Evaluation Methodology MapReduce Model Characterization Synthetic micro-benchmark with six parameters Run on a 3.2 GHz Cell Blade Measured effect of each parameter on execution time Application Performance Comparison Six full applications MapReduce versions run on 3.2 GHz Cell Blade Single-threaded versions run on 2.4 GHz Core 2 Duo Evaluation Measured speedup comparing execution times Measured overheads on the Cell monitoring SPE idle time Measured ideal speedup assuming no Cell overheads

16 Department of Computer Science 16 MapReduce Model Characterization Model Characteristics CharacteristicDescription Map intensityExecution cycles per input byte to Map Reduce intensityExecution cycles per input byte to Reduce Map fan-outRatio of input size to output size in Map Reduce fan-inNumber of values per key in Reduce PartitionsNumber of partitions Input sizeInput size in bytes Effect on Execution Time

17 Department of Computer Science 17 Application Performance Applications histogram:counts bitmap RGB occurrences kmeans:clustering algorithm linearReg:least-squares linear regression wordCount:word count NAS_EP:EP benchmark from NAS suite distSort:distributed sort

18 Department of Computer Science 18 Speedup Over Core 2 Duo

19 Department of Computer Science 19 Runtime Overheads

20 Department of Computer Science 20 Overview Motivation MapReduce Cell B.E. Architecture MapReduce Example Design Evaluation Workload Characterization Application Performance Conclusions and Future Work

21 Department of Computer Science 21 Conclusions and Future Work Conclusions Programmability benefits High-performance on computationally intensive workloads Not applicable to all application types Future Work Additional performance tuning Extend for clusters of Cell processors Hierarchical MapReduce

22 Department of Computer Science Questions?

23 Department of Computer Science Backup Slides

24 Department of Computer Science 24 MapReduce API void MapReduce_exec(MapReduce Specification specification); The exec function initializes the MapReduce runtime and executes MapReduce according to the user specification. void MapReduce_emitIntermediate(void **key, void **value); void MapReduce_emit(void **value); These two functions are called by the user-defined Map and Reduce functions, respectively. These functions take references to pointers as arguments, and modify the referenced pointer to point to pre-allocated storage. It is then the responsibility of the application to provision this storage.

25 Department of Computer Science 25 Optimizations 1) Priority work queue Distributes load Avoids serialization Pipelined execution maximizes concurrency 2) Double-buffering 3) Application support Map only Map with sorted output Chaining invocations

26 Department of Computer Science 26 Optimizations 1) Priority work queue Distributes load Avoids serialization Pipelined execution maximizes concurrency 2) Double-buffering 3) Application support Map only Map with sorted output Chaining invocations

27 Department of Computer Science 27 Optimizations 4) Balanced merge (n / log(n) better bandwidth utilization as n → ∞) 5) Map and Reduce output regions pre-allocated. optimal memory alignment bulk memory transfers no user memory management no dynamic allocation overhead


Download ppt "Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam."

Similar presentations


Ads by Google