An Advanced Simulation & Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.

Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.

Dynamic Load Balancing for VORPAL Viktor Przebinda Center for Integrated Plasma Studies.

Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.

Multiple Processor Systems

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

Review: Chapters 1 – Chapter 1: OS is a layer between user and hardware to make life easier for user and use hardware efficiently Control program.

MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,

Reference: Message Passing Fundamentals.

1 Parallel Computing—Introduction to Message Passing Interface (MPI)

Efficient Parallelization for AMR MHD Multiphysics Calculations Implementation in AstroBEAR.

Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.

An Advanced Simulation & Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear.

Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.

An Advanced Simulation & Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear.

Lecture 1 – Parallel Programming Primer CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

Parallel Processing LAB NO 1.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

Collective Communication

L15: Putting it together: N-body (Ch. 6) October 30, 2012.

 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author ： Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source ： Proceedings of the 2nd IASTED.

Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,

Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.

SciDAC All Hands Meeting, March 2-3, 2005 Northwestern University PIs:Alok Choudhary, Wei-keng Liao Graduate Students:Avery Ching, Kenin Coloma, Jianwei.

Definitions Speed-up Efficiency Cost Diameter Dilation Deadlock Embedding Scalability Big Oh notation Latency Hiding Termination problem Bernstein’s conditions.

Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster computers –shared memory model ( access nsec) –message passing multiprocessor.

Advanced Simulation and Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear.

Parallelization of 2D Lid-Driven Cavity Flow

Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.

The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.

MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.

Parallel Programming with MPI By, Santosh K Jena..

Multilevel Parallelism using Processor Groups Bruce Palmer Jarek Nieplocha, Manoj Kumar Krishnan, Vinod Tipparaju Pacific Northwest National Laboratory.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture FIT5174 Distributed & Parallel Systems Lecture 5 Message Passing and MPI.

Interconnection network network interface and a case study.

CS 351/ IT 351 Modeling and Simulation Technologies HPC Architectures Dr. Jim Holten.

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.

CCA Common Component Architecture Distributed Array Component based on Global Arrays Manoj Krishnan, Jarek Nieplocha High Performance Computing Group Pacific.

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.

SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.

Motivation: dynamic apps Rocket center applications: –exhibit irregular structure, dynamic behavior, and need adaptive control strategies. Geometries are.

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.

Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.

Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Parallel Programming By J. H. Wang May 2, 2017.

Parallel Objects: Virtualization & In-Process Components

Parallel and Distributed Simulation Techniques

Parallel Algorithm Design

CS703 - Advanced Operating Systems

Implementing Simplified Molecular Dynamics Simulation in Different Parallel Paradigms Chao Mei April 27th, 2006 CS498LVK.

Parallel Programming with MPI and OpenMP

CMSC 611: Advanced Computer Architecture

More on MPI Nonblocking point-to-point routines Deadlock

CSCE569 Parallel Computing

By Brandon, Ben, and Lee Parallel Computing.

More on MPI Nonblocking point-to-point routines Deadlock

Presentation transcript:

An Advanced Simulation & Computing (ASC) Academic Strategic Alliances Program (ASAP) Center at The University of Chicago The Center for Astrophysical Thermonuclear Flashes FLASH Tutorial May 13, 2004 Parallel Computing and MPI

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago What is Parallel Computing ? And why is it useful qParallel Computing is more than one cpu working together on one problem qIt is useful when q Large problem, could take very long q Data size too big to fit in the memory of one processor qWhen to parallelize q Problem could be subdivided into relatively independent tasks qHow much to parallelize q While the speedup in computation relative to single processor is of the order of number of processors

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Parallel paradigms qSIMD – Single instruction multiple data q Processors work in lock-step qMIMD – Multiple instruction multiple data q Processors do their own thing with occasional synchronization qShared Memory q One way communications qDistributed Memory q Message passing qLoosely Coupled q When the process on each cpu is fairly self contained and relatively independent of processes on other cpu’s qTightly Coupled q When cpu’s need to communicate with each other frequently

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago How to Parallelize qDivide a problem into a set of mostly independent tasks q Partitioning a problem qTasks get their own data q Localize a task qThey operate on their own data for the most part q Try to make it self contained qOccasionally q Data may be needed from other tasks q Inter-process communication q Synchronization may be required between tasks q Global operation qMap tasks to different processors q One processor may get more than one task q Task distribution should be well balanced

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago New Code Components qInitialization qQuery parallel state q Identify process q Identify number of processes qExchange data between processes q Local, Global qSynchronization q Barriers, Blocking Communication, Locks qFinalization

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago MPI qMessage Passing Interface, standard for distributed memory model of parallelism qMPI-2 will support one-way communication, commonly associated with shared memory operations qWorks with communicators; a collection of processors q MPI_COMM_WORLD default qHas support for lowest level communication operations and composite operations qHas blocking and non-blocking operations

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Communicators COMM1 COMM2

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Low level Operations in MPI qMPI_Init qMPI_Comm_size q Find number of processors qMPI_Comm_rank q Find my processor number qMPI_Send/Recv q Communicate with other processors one at a time qMPI_Bcast q Global data transmission qMPI_Barrier q Synchronization qMPI_Finalize

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Advanced Constructs in MPI qComposite Operations q Gather/Scatter q Allreduce q Alltoall qCartesian grid operations q Shift qCommunicators q Creating subgroups of processors to operate on qUser-defined Datatypes qI/O q Parallel file operations

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Communication Patterns Collective 0123 Shift 10 2 All to All Point to Point One to All Broadcast

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Communication Overheads qLatency vs. Bandwidth qBlocking vs. Non-Blocking q Overlap q Buffering and copy qScale of communication q Nearest neighbor q Short range q Long range qVolume of data q Resource contention for links qEfficiency q Hardware, software, communication method

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Parallelism in FLASH qShort range communications q Nearest neighbor qLong range communications q Regridding qOther global operations q All-reduce operations on physical quantities q Specific to solvers q multi-pole method q FFT based solvers

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Domain Decomposition P0 P1 P2P3

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Border Cells / Ghost Points qWhen splitting up solnData, need data from other processors. qNeed a layer of cells from each processor qNeed to update each time step

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Border/Ghost Cells Short Range communication

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Two MPI Methods for doing it qMPI_Cart_create q Create topology qMPE_Decomp1d q Domain decomp on topology qMPI_Cart_shift q Who’s on the left/right? qMPI_SendRecv q Ghost cells left qMPI_SendRecv q Ghost cells right qMPI_Comm_rank qMPI_Comm_size qManually decompose grid over processors qCalculate left/right qMPI_Send/MPI_Recv q Carefully to avoid deadlocks

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Adaptive Grid Issues qDiscretization not uniform qSimple left-right guard cell fills inadequate qAdjacent grid points may not be mapped to the nearest neighbors in processors topology qRedistribution of work necessary

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Regridding qChange in number of cells/blocks qSome processors get more work than others qLoad imbalance qRedistribute data to even out work on all processors qLong range communications qLarge quantities of data moved

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Regridding

The ASC/Alliances Center for Astrophysical Thermonuclear Flashes The University of Chicago Other parallel operations in FLASH qGlobal max/sum etc (Allreduce) q Physical quantities q In solvers q Performance monitoring qAlltoall q FFT based solver on UG qUser defined datatypes and file operations q Parallel I/O