Data Structures and Algorithms in Parallel Computing Lecture 4.

Slides:



Advertisements
Similar presentations
Pregel: A System for Large-Scale Graph Processing
Advertisements

Lecture 12: MapReduce: Simplified Data Processing on Large Clusters Xiaowei Yang (Duke University)
Piccolo: Building fast distributed programs with partitioned tables Russell Power Jinyang Li New York University.
1 TDD: Topics in Distributed Databases Distributed Query Processing MapReduce Vertex-centric models for querying graphs Distributed query evaluation by.
Lecture 8: Asynchronous Network Algorithms
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.
Armend Hoxha Trevor Hodde Kexin Shi Mizan: A system for Dynamic Load Balancing in Large-Scale Graph Processing Presented by:
Distributed Graph Processing Abhishek Verma CS425.
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
LFGRAPH: SIMPLE AND FAST DISTRIBUTED GRAPH ANALYTICS Hoque, Imranul, Vmware Inc. and Gupta, Indranil, University of Illinois at Urbana-Champaign – TRIOS.
Piccolo – Paper Discussion Big Data Reading Group 9/20/2010.
Reference: Message Passing Fundamentals.
Slide 1 Parallel Computation Models Lecture 3 Lecture 4.
Distributed Computations
Yuzhou Zhang ﹡, Jianyong Wang #, Yi Wang §, Lizhu Zhou ¶ Presented by Nam Nguyen Parallel Community Detection on Large Networks with Propinquity Dynamics.
Graph Processing Recap: data-intensive cloud computing – Just database management on the cloud – But scaling it to thousands of nodes – Handling partial.
Pregel: A System for Large-Scale Graph Processing
Big Data Infrastructure Jimmy Lin University of Maryland Monday, April 13, 2015 Session 10: Beyond MapReduce — Graph Processing This work is licensed under.
Paper by: Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski (Google, Inc.) Pregel: A System for.
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 22: Stream Processing, Graph Processing All slides © IG.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
Introduction of Apache Hama Edward J. Yoon, October 11, 2011.
Hadoop Ida Mele. Parallel programming Parallel programming is used to improve performance and efficiency In a parallel program, the processing is broken.
Pregel: A System for Large-Scale Graph Processing
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
CS 221 – May 13 Review chapter 1 Lab – Show me your C programs – Black spaghetti – connect remaining machines – Be able to ping, ssh, and transfer files.
11 If you were plowing a field, which would you rather use? Two oxen, or 1024 chickens? (Attributed to S. Cray) Abdullah Gharaibeh, Lauro Costa, Elizeu.
Introduction Distributed Algorithms for Multi-Agent Networks Instructor: K. Sinan YILDIRIM.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
Big Graph Processing on Cloud Jeffrey Xu Yu ( 于旭 ) The Chinese University of Hong Kong
Bulk Synchronous Parallel Processing Model Jamie Perkins.
CSE 486/586 CSE 486/586 Distributed Systems Graph Processing Steve Ko Computer Sciences and Engineering University at Buffalo.
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
RAM, PRAM, and LogP models
LogP and BSP models. LogP model Common MPP organization: complete machine connected by a network. LogP attempts to capture the characteristics of such.
Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.
Distributed Galois Andrew Lenharth 2/27/2015. Goals An implementation of the operator formulation for distributed memory – Ideally forward-compatible.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Data Structures and Algorithms in Parallel Computing Lecture 1.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Data Structures and Algorithms in Parallel Computing
Pregel: A System for Large-Scale Graph Processing Nov 25 th 2013 Database Lab. Wonseok Choi.
1 Distributed BDD-based Model Checking Orna Grumberg Technion, Israel Joint work with Tamir Heyman, Nili Ifergan, and Assaf Schuster CAV00, FMCAD00, CAV01,
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Mizan:Graph Processing System
TensorFlow– A system for large-scale machine learning
Jeremy Martin Alex Tiskin
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
Distributed Systems – Paxos
The Echo Algorithm The echo algorithm can be used to collect and disperse information in a distributed system It was originally designed for learning network.
PREGEL Data Management in the Cloud
The University of Adelaide, School of Computer Science
Data Structures and Algorithms in Parallel Computing
Distributed Systems CS
Replication-based Fault-tolerance for Large-scale Graph Processing
Da Yan, James Cheng, Yi Lu, Wilfred Ng Presented By: Nafisa Anzum
Computational Advertising and
Operating System Overview
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Data Structures and Algorithms in Parallel Computing Lecture 4

Parallel computer A parallel computer consists of a set of processors that work together on solving a problem Moore’s law has broken down in 2007 because of increasing power consumption – O(f 3 ) where f is the frequency – Tianhe-2 has 3.12 million processor cores and consumes 17.8 MW – It is as cheap to put 2 cores on the same chip as putting just one – 2 cores running at f/2 consume only ¼ of the power of a single core running at f – However it is hard to achieve the same total computational speed in practice

Bulk Synchronous Parallelism BSP model 1 st proposed in 1989 Alternative to the PRAM model Used for distributed memory computers Fast local memory access Algorithm developers need not worry about network details, only about global performance Efficient algorithms which can be run on many different parallel computers

BSP Processors + network + synchronization Superstep – Concurrent parallel computation – Message exchanges between processors – Barrier synchronization All processors reaching this point wait for the rest

Supersteps A BSP algorithm is a sequence of supersteps – Computation superstep Many small steps – Example: floating point operations (addition, subtraction, etc.) – Communication superstep Communication operations each transmitting a data word – Example: transfer a real number between 2 processors In theory we distinguish between the 2 types of supersteps In practice we assume a single superstep

Communication superstep h-relation – Superstep in which every processor sends and receives at most h data words – h=max{h s, h r } h s is the maximum number of data words sent by a processor h d is the maximum number of data words received by a processor Cost – T(h)=hg+l Where g is the time per data word and l is the global synchronization time

Time of an h-relation on a 4-core Apple iMac desktop Taken from

Computation superstep T=w+l – Where w is the maximum number of flops of a processor in a superstep, and l is the global synchronization time Processors with less than w flops wait idle

Total cost of a BSP algorithm Add cost of all supersteps a+bg+cl – g and l are a function of the number of processors – a, b, and c depend on p and the problem size n

BSP implementations Google Pregel MapReduce Apache Giraph – Open source implementation of Pregel Apache Hama – Inspired from Pregel BSPLib …

Pregel framework Computations consist of a sequence of iterations called supersteps During a superstep, the framework invokes a user defined function for each vertex which specifies the behavior at a single vertex V and a single superstep S The function can: – Read messages sent to V in superstep S-1 – Send messages to other vertices that will be received in superstep S+1 – Modify the state of V and of the outgoing edges – Make topology changes (add/remove/update edges/vertices)

Model of computation: Progress In superstep 0 all vertices are active Only active vertices participate in a superstep – They can go inactive by voting for halt – They can be reactivated by an external message from another vertex The algorithm terminates when all vertices have voted for halt and there are no messages in transit

Model of computation: Vertex

Pregel execution

Master-worker paradigm 1.Partition graph I.Random hash based II.Custom (mincut, …) 2.Assign vertices to workers (processors) 3.Mark all vertices as active 4.Each active vertex executes a Compute() function and delivers messages which were sent in the previous superstep 1.Get/Modify current vertex value using GetValue()/MutableValue() 5.Respond to master with active vertices for the next superstep All workers execute the same code! Master is used for coordination

Fault tolerance At the start of each superstep the master instructs workers to save their state Master pings workers to see who is running If failure is detected, master reasigns partitions to available workers

Example: find largest value in a graph

What’s next? BSP model – SSSP, connected components, pagerank Vertex centric vs. subgraph centric Load balancing – Importance of partitioning and graph type...