Concurrent Algorithms. Summing the elements of an array 2 7 3 15 10 13 18 6 4 10 25 31 10 35 41 76.

Slides:



Advertisements
Similar presentations
Section 5: More Parallel Algorithms
Advertisements

CS 206 Introduction to Computer Science II 02 / 27 / 2009 Instructor: Michael Eckmann.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
CS 206 Introduction to Computer Science II 03 / 02 / 2009 Instructor: Michael Eckmann.
1 Divide & Conquer Algorithms. 2 Recursion Review A function that calls itself either directly or indirectly through another function Recursive solutions.
Divide-and-Conquer Recursive in structure –Divide the problem into several smaller sub-problems that are similar to the original but smaller in size –Conquer.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Sorting.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Standard architecture emerging: – Cluster of commodity.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
ADA: 3. Insertion Sort1 Objective o asymptotic analysis of insertion sort Algorithm Design and Analysis (ADA) , Semester
Jeffrey D. Ullman Stanford University.  Mining of Massive Datasets, J. Leskovec, A. Rajaraman, J. D. Ullman.  Available for free download at i.stanford.edu/~ullman/mmds.html.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Frankie Pike. 2010: 1.2 zettabytes 1.2 trillion gigabytes DVDs past the moon 2-way = 6 newspapers everyday ~58% growth per year Why care?
Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
天文信息技术联合实验室 New Progress On Astronomical Cross-Match Research Zhao Qing.
Vyassa Baratham, Stony Brook University April 20, 2013, 1:05-2:05pm cSplash 2013.
Nattee Niparnan. Recall  Complexity Analysis  Comparison of Two Algos  Big O  Simplification  From source code  Recursive.
File Organization and Processing Week 13 Divide and Conquer.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz (Slides by Tyler S. Randolph)
Dynamic Programming Nattee Niparnan. Dynamic Programming  Many problem can be solved by D&C (in fact, D&C is a very powerful approach if you generalized.
Iterators, Linked Lists, MapReduce, Dictionaries, and List Comprehensions... OH MY! Special thanks to Scott Shawcroft, Ryan Tucker, and Paul Beck for their.
Higher Order Functions Special thanks to Scott Shawcroft, Ryan Tucker, and Paul Beck for their work on these slides. Except where otherwise noted, this.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
MapReduce Algorithm Design Based on Jimmy Lin’s slides
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Dynamic Programming. Many problem can be solved by D&C – (in fact, D&C is a very powerful approach if you generalize it since MOST problems can be solved.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Map-Reduce examples 1. So, what is it? A two phase process geared toward optimizing broad, widely distributed parallel computing platforms Apache Hadoop.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
CS 206 Introduction to Computer Science II 10 / 10 / 2008 Instructor: Michael Eckmann.
Recursion. Objectives At the conclusion of this lesson, students should be able to Explain what recursion is Design and write functions that use recursion.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
HADOOP Priyanshu Jha A.D.Dilip 6 th IT. Map Reduce patented[1] software framework introduced by Google to support distributed computing on large data.
MapReduce, Dictionaries, List Comprehensions Special thanks to Scott Shawcroft, Ryan Tucker, and Paul Beck for their work on these slides. Except where.
Sorting and Runtime Complexity CS255. Sorting Different ways to sort: –Bubble –Exchange –Insertion –Merge –Quick –more…
Item Based Recommender System SUPERVISED BY: DR. MANISH KUMAR BAJPAI TARUN BHATIA ( ) VAIBHAV JAISWAL( )
Image taken from: slideshare
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
Higher Order Functions
MapReduce Compiler RHadoop
Map Reduce.
Concurrent Algorithms
Central Florida Business Intelligence User Group
Efficiency add remove find unsorted array O(1) O(n) sorted array
Parallel Sorting Algorithms
Concurrent Algorithms
CS110: Discussion about Spark
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Distributed System Gang Wu Spring,2018.
Parallel Sorting Algorithms
Charles Tappert Seidenberg School of CSIS, Pace University
Lambda Functions, MapReduce and List Comprehensions
Intro to Computer Science CS1510 Dr. Sarah Diesburg
Concurrent Algorithms
Chapter 2 Lin and Dyer & MapReduce Basics Chapter 2 Lin and Dyer &
Concurrent Algorithms
Concurrent Algorithms
Presentation transcript:

Concurrent Algorithms

Summing the elements of an array

Parallel sum and parallel prefix sum It’s relatively easy to see how to sum up the elements of an array in a parallel fashion This is a special case of a reduce operation—combining a number of values into a single value It’s harder to see how to do a prefix (cumulative) sum For example, the list [3, 1, 4, 1, 6] to [3, 4, 8, 9, 15] This is a special case of what is sometimes called a scan operation An example is shown on the next slide The algorithm is done in two passes: The first pass is “up” the tree, retaining the summands The second pass is “down” the tree Note: These two examples are from Principles of Parallel Programming by Calvin Lin and Lawrence Snyder 3

Summing the elements of an array = = = = = = = (0+35) (0+10) 66 (35+31) (10+15) (41+13)

Using parallel prefix sum to filter Apply the filter operation to each element of the sequence (in parallel), yielding 1’s and 0’s Starting with -1, perform prefix sum on the resultant list The first element for each sum is to be kept Example: Selecting only even numbers a[0]=4, a[1]=2, a[2]=6, a[3]=6 5

Batcher’s Bitonic sort Batcher’s bitonic sort is a sorting algorithm with the following characteristics: It’s a variation of MergeSort It’s designed for 2 n processors It fully occupies all 2 n processors Unlike array sum, which uses fewer processors on each pass I’m not going to go through this algorithm—I just want you to be able to say you’ve heard of it 6

MapReduce MapReduce is a patented technique perfected by Google to deal with huge data sets on clusters of computers From Wikipedia: "Map" step: The master node takes the input, chops it up into smaller sub-problems, and distributes those to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes that smaller problem, and passes the answer back to its master node.tree "Reduce" step: The master node then takes the answers to all the sub- problems and combines them in a way to get the output - the answer to the problem it was originally trying to solve. Hadoop is a free Apache version of MapReduce 7

Basic idea of MapReduce In MapReduce, the programmer has to write only two functions, and the framework takes care of everything else The Map function is applied (in parallel) to each item of data, producing a list of key-value pairs The framework collects all the lists, and groups the key-value pairs by key The Reduce function is applied (in parallel) to each group, returning either a single value, or nothing The framework collects all the returns 8

MapReduce picture Source: reduce.png&imgrefurl= o9xqKbO4FaSEXpfViPX2cgesJo=&h=393&w=504&sz=12&hl=en&start=0&sig2=m4ExSHfMsoQUbWTbGTHwwA&zoom=1&tbnid=xi4TPuXkb g5f- M:&tbnh=150&tbnw=193&ei=voOwTe6mM6Xt0gGJtd2SCQ&prev=/images%3Fq%3Dmapreduce%26hl%3Den%26safe%3Doff%26biw%3D981% 26bih%3D666%26gbv%3D2%26tbm%3Disch&itbs=1&iact=rc&dur=505&page=1&ndsp=12&ved=1t:429,r:11,s:0&tx=48&ty=53 9

Example: Counting words (Python) The following Python program counts how many times each word occurs in a set of data, and returns the list of words and their counts def mapper(key, value): words=key.split() for word in words: Wmr.emit(word, '1') def reducer(key, iter): sum = 0 for s in iter: sum = sum + int(s) Wmr.emit(key, str(sum)) 10

Example: Counting words (Java) * Mapper for word count */ class Mapper { public void mapper(String key, String value) { String words[] = key.split(" "); int i = 0; for (i = 0; i < words.length; i++) Wmr.emit(words[i], "1"); } } /* Reducer for word count */ class Reducer { public void reducer(String key, WmrIterator iter) { int sum = 0; while (iter.hasNext()) { sum += Integer.parseInt(iter.next()); } Wmr.emit(key, Integer.valueOf(sum).toString()); } } 11

Example: Average movie ratings #!/usr/bin/env python def mapper(key, value): avgRating = float(value) binRating = 0.0 if (0 < avgRating < 1.25): binRating = 1.0 elif (1.25 <= avgRating < 1.75): binRating = 1.5 elif (1.75 <= avgRating < 2.25): binRating = 2.0 elif (2.25 <= avgRating < 2.75): binRating = 2.5 elif (2.75 <= avgRating < 3.25): binRating = 3.0 elif (3.25 <= avgRating < 3.75): binRating = 3.5 elif (3.75 <= avgRating < 4.25): binRating = 4.0 elif (4.25 <= avgRating < 4.75): binRating = 4.5 elif (4.75 <= avgRating < 5.0): binRating = 5.0 else: binRating = 99.0 Wmr.emit(str(binRating), key) #!/usr/bin/env python def reducer(key, iter): count = 0 for s in iter: count = count + 1 Wmr.emit(key, str(count)) 12

The End 13