Bulk Synchronous Parallel Processing Model Jamie Perkins.

Bulk Synchronous Parallel Processing Model Jamie Perkins

Overview Four W’s – Who, What, When and Why Goals for BSP BSP Design and Program Cost Functions Languages and Machines Four W’s – Who, What, When and Why Goals for BSP BSP Design and Program Cost Functions Languages and Machines

A Bridge for Parallel Computation Von Neumann model Designed to insulate hardware and software BSP model (Bulk Synchronous Parallel) Proposed by Leslie Valiant of Harvard University in 1990 Developed by W.F. McColl of Oxford Designed to be a “bridge” for parallel computation Von Neumann model Designed to insulate hardware and software BSP model (Bulk Synchronous Parallel) Proposed by Leslie Valiant of Harvard University in 1990 Developed by W.F. McColl of Oxford Designed to be a “bridge” for parallel computation

Goals for BSP Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture Predictability – performance of SW on different architecture must be predictable in a straight forward way Scalability – performance of HW & SW must be scalable from a single processor to thousands of processors Portability – SW must run unchanged, with high performance, on any general purpose parallel architecture Predictability – performance of SW on different architecture must be predictable in a straight forward way

BSP Design Three Components Node Processor and Local Memory Router or Communication Network Message Passing or Point-to-Point communication Barrier or Synchronization Mechanism Implemented in hardware Three Components Node Processor and Local Memory Router or Communication Network Message Passing or Point-to-Point communication Barrier or Synchronization Mechanism Implemented in hardware

BSP Design Fixed memory architecture Hashing to allocate memory in “random” fashion Fast access to local memory Uniformly slow access to remote memory Fixed memory architecture Hashing to allocate memory in “random” fashion Fast access to local memory Uniformly slow access to remote memory

Illustration of BSP Computer Communication Network P M P M P M Node Barrier http:// peace.snu.ac.kr/courses/parallelprocessing /

BSP Program Composed of S supersteps Superstep consists of: A computation where each processor uses only locally held values A global message transmission from each processor to any subset of the others A barrier synchronization Composed of S supersteps Superstep consists of: A computation where each processor uses only locally held values A global message transmission from each processor to any subset of the others A barrier synchronization

Strategies for programming on BSP Balance the computation between processes Balance the communication between processes Minimize the number of supersteps Balance the computation between processes Balance the communication between processes Minimize the number of supersteps

BSP Program Superstep 1 Superstep 2 Barrier P1P2P3P4 Computation Communication http:// peace.snu.ac.kr/courses/parallelprocessing /

Advantages of BSP Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness) Synchronization allows automatic optimization of the communication pattern BSP model provides a simple cost function for analyzing the complexity of algorithms Eliminates need for programmers to manage memory, assign communication and perform low-level synchronization (w/ sufficient parallel slackness) Synchronization allows automatic optimization of the communication pattern BSP model provides a simple cost function for analyzing the complexity of algorithms

Cost Function g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one superstep w – largest amount of work performed (per processor) h – largest number of packets sent or received w i + gh i + L = execution time for the superstep i g – “gap” or bandwidth inefficiency L – “latency”, minimum time needed for one superstep w – largest amount of work performed (per processor) h – largest number of packets sent or received w i + gh i + L = execution time for the superstep i

Languages & Machines BSP ++ C C++ Fortran JBSP Opal BSP ++ C C++ Fortran JBSP Opal IBM SP1 SGI Power Challenge (Shared Memory) Cray T3D Hitachi SR2001 TCP/IP

Thank You Any Questions

References http://peace.snu.ac.kr/courses/parallelprocessing/ http://wwwcs.uni-paderborn.de/fachbereich/AG/agmad http://www.cs.mu.oz.au/677/notes/node41.html McColl, W.F. The BSP Approach to Architecture Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994 United States Patent 5083265 Valiant, L.G. A Bridging Model for Parallel Computation. Communications of the ACM 33,8 (1990), 103-111. http://peace.snu.ac.kr/courses/parallelprocessing/ http://wwwcs.uni-paderborn.de/fachbereich/AG/agmad http://www.cs.mu.oz.au/677/notes/node41.html McColl, W.F. The BSP Approach to Architecture Independent Parallel Programming. Technical report, Oxford University Computing Laboratory, Dec. 1994 United States Patent 5083265 Valiant, L.G. A Bridging Model for Parallel Computation. Communications of the ACM 33,8 (1990), 103-111.

Bulk Synchronous Parallel Processing Model Jamie Perkins.

Similar presentations

Presentation on theme: "Bulk Synchronous Parallel Processing Model Jamie Perkins."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bulk Synchronous Parallel Processing Model Jamie Perkins.

Similar presentations

Presentation on theme: "Bulk Synchronous Parallel Processing Model Jamie Perkins."— Presentation transcript:

Similar presentations

About project

Feedback