Bulk Synchronous Parallel Processing Model Jamie Perkins.

Slides:



Advertisements
Similar presentations
© 2013 IBM Corporation Enabling easy creation of HW reconfiguration scenarios for system level pre-silicon simulation Erez Bilgory Alex Goryachev Ronny.
Advertisements

1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Introduction to MIMD architectures
An Introduction To PARALLEL PROGRAMMING Ing. Andrea Marongiu
Reference: Message Passing Fundamentals.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
An Overview of the BSP Model of Parallel Computation Overview Only.
Graph Analysis with High Performance Computing by Bruce Hendrickson and Jonathan W. Berry Sandria National Laboratories Published in the March/April 2008.
Models of Parallel Computation
1 Lecture 8 Architecture Independent (MPI) Algorithm Design Parallel Computing Fall 2007.
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
An Overview of the BSP Model of Parallel Computation Michael C. Scherger Department of Computer Science Kent State University.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Programming with CUDA WS 08/09 Lecture 13 Thu, 04 Dec, 2008.
Bulk Synchronous Parallel (BSP) Model Illustration of a BSP superstep.
Storage area network and System area network (SAN)
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
Introduction of Apache Hama Edward J. Yoon, October 11, 2011.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
Compiler, Languages, and Libraries ECE Dept., University of Tehran Parallel Processing Course Seminar Hadi Esmaeilzadeh
Inter-process Communication and Coordination Chaitanya Sambhara CSC 8320 Advanced Operating Systems.
Synchronization and Communication in the T3E Multiprocessor.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Atlanta, Georgia TiNy Threads on BlueGene/P: Exploring Many-Core Parallelisms Beyond The Traditional OS Handong Ye, Robert Pavel, Aaron Landwehr, Guang.
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
CH2 System models.
Parallel ICA Algorithm and Modeling Hongtao Du March 25, 2004.
Parallel Computer Architecture and Interconnect 1b.1.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Rio de Janeiro, October, 2005 SBAC Portable Checkpointing for BSP Applications on Grid Environments Raphael Y. de Camargo Fabio Kon Alfredo Goldman.
MIMD Distributed Memory Architectures message-passing multicomputers.
Chapter 6 Multiprocessor System. Introduction  Each processor in a multiprocessor system can be executing a different instruction at any time.  The.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
-1.1- Chapter 2 Abstract Machine Models Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Internetworking Concept and Architectural Model
LogP Model Motivation BSP Model Limited to BW of Network (g) and Load of PE Requires large load per super steps. Need Better Models for Portable Algorithms.
LogP and BSP models. LogP model Common MPP organization: complete machine connected by a network. LogP attempts to capture the characteristics of such.
Bulk Synchronous Processing (BSP) Model Course: CSC 8350 Instructor: Dr. Sushil Prasad Presented by: Chris Moultrie.
Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.
Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Spring 2003CSE P5481 Issues in Multiprocessors Which programming model for interprocessor communication shared memory regular loads & stores message passing.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Parallel Processing & Distributed Systems Thoai Nam Chapter 2.
DISTRIBUTED COMPUTING
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Distributed simulation with MPI in ns-3 Joshua Pelkey and Dr. George Riley Wns3 March 25, 2011.
Data Structures and Algorithms in Parallel Computing Lecture 4.
Programmability Hiroshi Nakashima Thomas Sterling.
Computer Science 340 Software Design & Testing Software Architecture.
Interconnection network network interface and a case study.
Software Overhead in Messaging Layers Pitch Patarasuk.
Outline Why this subject? What is High Performance Computing?
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
COMPUTER ORGANIZATIONS CSNB123 NSMS2013 Ver.1Systems and Networking1.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Background Computer System Architectures Computer System Software.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.
The Concept of Universal Service
Jeremy Martin Alex Tiskin
Dynamo: A Runtime Codesign Environment
CS5102 High Performance Computer Systems Thread-Level Parallelism
Multi-Processing in High Performance Computer Architecture:
Guoliang Chen Parallel Computing Guoliang Chen
Bulk Synchronous Parallel (BSP) Model Illustration of a BSP superstep.
Presentation transcript:

Bulk Synchronous Parallel Processing Model Jamie Perkins

Overview Four W’s – Who, What, When and Why Goals for BSP BSP Design and Program Cost Functions Languages and Machines Four W’s – Who, What, When and Why Goals for BSP BSP Design and Program Cost Functions Languages and Machines

A Bridge for Parallel Computation Von Neumann model Designed to insulate hardware and software BSP model (Bulk Synchronous Parallel) Proposed by Leslie Valiant of Harvard University in 1990 Developed by W.F. McColl of Oxford Designed to be a “bridge” for parallel computation Von Neumann model Designed to insulate hardware and software BSP model (Bulk Synchronous Parallel) Proposed by Leslie Valiant of Harvard University in 1990 Developed by W.F. McColl of Oxford Designed to be a “bridge” for parallel computation

Goals for BSP Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture Predictability – performance of SW on different architecture must be predictable in a straight forward way Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture Predictability – performance of SW on different architecture must be predictable in a straight forward way

BSP Design Three Components Node Processor and Local Memory Router or Communication Network Message Passing or Point-to-Point communication Barrier or Synchronization Mechanism Implemented in hardware Three Components Node Processor and Local Memory Router or Communication Network Message Passing or Point-to-Point communication Barrier or Synchronization Mechanism Implemented in hardware

BSP Design Fixed memory architecture Hashing to allocate memory in “random” fashion Fast access to local memory Uniformly slow access to remote memory Fixed memory architecture Hashing to allocate memory in “random” fashion Fast access to local memory Uniformly slow access to remote memory

Illustration of BSP Computer Communication Network P M P M P M Node Barrier peace.snu.ac.kr/courses/parallelprocessing /

BSP Program Composed of S supersteps Superstep consists of: A computation where each processor uses only locally held values A global message transmission from each processor to any subset of the others A barrier synchronization Composed of S supersteps Superstep consists of: A computation where each processor uses only locally held values A global message transmission from each processor to any subset of the others A barrier synchronization

Strategies for programming on BSP Balance the computation between processes Balance the communication between processes Minimize the number of supersteps Balance the computation between processes Balance the communication between processes Minimize the number of supersteps

BSP Program Superstep 1 Superstep 2 Barrier P1P2P3P4 Computation Communication peace.snu.ac.kr/courses/parallelprocessing /

Advantages of BSP Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness) Synchronization allows automatic optimization of the communication pattern BSP model provides a simple cost function for analyzing the complexity of algorithms Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness) Synchronization allows automatic optimization of the communication pattern BSP model provides a simple cost function for analyzing the complexity of algorithms

Cost Function g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one superstep w – largest amount of work performed (per processor) h – largest number of packets sent or received w i + gh i + L = execution time for the superstep i g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one superstep w – largest amount of work performed (per processor) h – largest number of packets sent or received w i + gh i + L = execution time for the superstep i

Languages & Machines BSP ++ C C++ Fortran JBSP Opal BSP ++ C C++ Fortran JBSP Opal IBM SP1 SGI Power Challenge (Shared Memory) Cray T3D Hitachi SR2001 TCP/IP

Thank You Any Questions

References McColl, W.F. The BSP Approach to Architecture Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec United States Patent Valiant, L.G. A Bridging Model for Parallel Computation. Communications of the ACM 33,8 (1990), McColl, W.F. The BSP Approach to Architecture Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec United States Patent Valiant, L.G. A Bridging Model for Parallel Computation. Communications of the ACM 33,8 (1990),