Managing Heterogeneous MPI Application Interoperation and Execution. From PVMPI to SNIPE based MPI_Connect() Graham E. Fagg*, Kevin S. London, Jack J.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
Remote Procedure Call (RPC)
Remote Procedure Call Design issues Implementation RPC programming
Reference: / MPI Program Structure.
Approaches to EJB Replication. Overview J2EE architecture –EJB, components, services Replication –Clustering, container, application Conclusions –Advantages.
Slide 1 Client / Server Paradigm. Slide 2 Outline: Client / Server Paradigm Client / Server Model of Interaction Server Design Issues C/ S Points of Interaction.
1 Parallel Computing—Higher-level concepts of MPI.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Architecture Of ASP.NET. What is ASP?  Server-side scripting technology.  Files containing HTML and scripting code.  Access via HTTP requests.  Scripting.
Chapter 9 Message Passing Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Operating Systems, by Dhananjay Dhamdhere2 Introduction.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
PVM and MPI What is more preferable? Comparative analysis of PVM and MPI for the development of physical applications on parallel clusters Ekaterina Elts.
Programming Environment & Training (PET) Sep 99 1 Parallel I/O for Distributed Applications Dr Graham E Fagg Innovative Computing Laboratory University.
Director of Contra Costa College High Performance Computing Center
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.
SDM Center February 2, 2005 Progress on MPI-IO Access to Mass Storage System Using a Storage Resource Manager Ekow J. Otoo, Arie Shoshani and Alex Sim.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
PERVASIVE COMPUTING MIDDLEWARE BY SCHIELE, HANDTE, AND BECKER A Presentation by Nancy Shah.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.
AEROSPACE DIVISION Optimised MPI for HPEC applications Benoit Guillon: Thales Research&Technology Gerard Cristau: Thales Computers Vincent Chuffart: Thales.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
1 MPI_Connect and Parallel I/O for Distributed Applications Dr Graham E Fagg Innovative Computing Laboratory University of Tennessee Knoxville, TN
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
A new thread support level for hybrid programming with MPI endpoints EASC 2015 Dan Holmes, Mark Bull, Jim Dinan
MPI: Portable Parallel Programming for Scientific Computing William Gropp Rusty Lusk Debbie Swider Rajeev Thakur.
PVM (Parallel Virtual Machine)‏ By : Vishal Prajapati Course CS683 Computer Architecture Prof. Moreshwar R Bhujade.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
An Introduction to MPI (message passing interface)
Parallel Computing in Numerical Simulation of Laser Deposition The objective of this proposed project is to research and develop an effective prediction.
Project18 Communication Design + Parallelization Camilo A Silva BIOinformatics Summer 2008.
HPC across Heterogeneous Resources Sathish Vadhiyar.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Implementing Processes and Threads CS550 Operating Systems.
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
PVM and MPI.
1 MPI: Message Passing Interface Prabhaker Mateti Wright State University.
Introduction to parallel computing concepts and technics
Distributed Shared Memory
MPI: Portable Parallel Programming for Scientific Computing
Introduction to MPI.
MPI Message Passing Interface
MPI-Message Passing Interface
Message Passing Models
MPI: Message Passing Interface
Lab Course CFD Parallelisation Dr. Miriam Mehl.
Introduction to parallelism and the Message Passing Interface
MPJ: A Java-based Parallel Computing System
Lecture 6: RPC (exercises/questions)
Lecture 7: RPC (exercises/questions)
Parallel I/O for Distributed Applications (MPI-Conn-IO)
MPI Message Passing Interface
Presentation transcript:

Managing Heterogeneous MPI Application Interoperation and Execution. From PVMPI to SNIPE based MPI_Connect() Graham E. Fagg*, Kevin S. London, Jack J. Dongarra and Shirley V. Browne. University of Tennessee and Oak Ridge National Laboratory * contact at

Project Targets Allow intercommunication between different MPI implementations or instances of the same implementation on different machines Provide heterogeneous intercommunicating MPI applications with access to some of the MPI-2 process control and parallel I/O features Allow use of optimized vendor MPI implementations while still permitting distributed heterogeneous parallel computing in a transparent manner.

MPI-1 Communicators All processes in an MPI-1 application belong to a global communicator called MPI_COMM_WORLD All other communicators are derived from this global communicator. Communication can only occur within a communicator. –Safe communication

MPI Internals Processes All process groups are derived from the membership of the MPI_COMM_WORLD communicator. –i.e.. no external processes. MPI-1 process membership is static not dynamic like PVM or LAM 6.X –simplified consistency reasoning –fast communication (fixed addressing) even across complex topologies. –interfaces well to simple run-time systems as found on many MPPs.

MPI-1 Application MPI_COMM_WORLD Derived_Communicator

Disadvantages of MPI-1 Static process model –If a process fails, all communicators it belongs to become invalid. I.e. No fault tolerance. –Dynamic resources either cause applications to fail due to loss of nodes or make applications inefficient as they cannot take advantage of new nodes by starting/spawning additional processes. –When using a dedicated MPP MPI implementation you cannot usually use off-machine or even off- partion nodes.

MPI-2 Problem areas and needed additional features identified in MPI-1 are being addressed by the MPI-2 forum. These include: –inter-language operation –dynamic process control / management –parallel IO –extended collective operations Support for inter-implementation communication/control was considered ?? –See other projects such as NIST IMPI, PACX and PLUS

User requirements for Inter-Operation Dynamic connections between tasks on different platforms/systems/sites Transparent inter-operation once it has started Single style of API, i.e. MPI only! Access to virtual machine / resource management when required not mandatory use Support for complex features across implementations such as user derived data types etc.

MPI_Connect/PVMPI 2 API in the MPI-1 style Inter-application communication using standard point to point MPI-1 functions calls –Supporting all variations of send and receive –All MPI data types including user derived Naming Service functions similar to the semantics and functionality of current MPI inter-communicator functions –ease of use for programmers only experienced in MPI and not other message passing systems such as PVM / Nexus or TCP socket programming!

Process Identification Process groups are identified by a character name. Individual processes are address by a set of tuples. In MPI: –{communicator, rank} –{process group, rank} In MPI_Connect/PVMPI2: –{name, rank} –{name, instance} Instance and rank are identical and range from 0..N-1 where N is the number of processes.

Registration I.e. naming MPI applications Process groups register their name with a global naming service which returns them a system handle used for future operations on this name / communicator pair. –int MPI_Conn_register (char *name, MPI_Comm local_comm, int * handle); –call MPI_CONN_REGISTER (name, local_comm, handle, ierr) Processes can remove their name from the naming service with: –int MPI_Conn_remove (int handle); A process may have multiple names associated with it. Names can be registered, removed and reregistered multiple times without restriction.

MPI-1 Inter-communicators An inter-communicator is used for point to point communication between disjoint groups of processes. Inter-communicators are formed using MPI_Intercomm_create() which operates upon two existing non-overlapping intra-communicator and a bridge communicator. MPI_Connect/PVMPI could not use this mechanism as there is not a MPI bridge communicator between groups formed from separate MPI applications as their MPI_COMM_WORLDs do not overlap.

Forming Inter-communicators MPI_Connect/PVMPI2 forms its inter-communicators with a modified MPI_Intercomm_create call. The bridging communication is performed automatically and the user only has to specify the remote groups registered name. –int MPI_Conn_intercomm_create (int local_handle, char *remote_group_name, MPI_Comm *new_inter_comm); –Call MPI_CONN_INTERCOMM_CREATE (localhandle, remotename, newcomm, ierr)

Inter-communicators Once an inter-communicator has been formed it can be used almost exactly as any other MPI inter-communicator –All point to point operations –Communicator comparisons and duplication –Remote group information Resources released by MPI_Comm_free()

Simple example Air Model /* air model */ MPI_Init (&argc, &argv); MPI_Conn_register (“AirModel”, MPI_COMM_WORLD, &air_handle); MPI_Conn_intercomm_create ( handle, “OceanModel”, &ocean_comm); MPI_Comm_rank( MPI_COMM_WORLD, &myrank); while (!done) { /* do work using intra-comms */ /* swap values with other model */ MPI_Send( databuf, cnt, MPI_DOUBLE, myrank, tag, ocean_comm); MPI_Recv( databuf, cnt, MPI_DOUBLE, myrank, tag, ocean_comm, status); } /* end while done work */ MPI_Conn_remove ( air_handle ); MPI_Comm_free ( &ocean_comm ); MPI_Finalize();

Ocean model /* ocean model */ MPI_Init (&argc, &argv); MPI_Conn_register (“OceanModel”, MPI_COMM_WORLD, &ocean_handle); MPI_Conn_intercomm_create ( handle, “AirModel”, &air_comm); MPI_Comm_rank( MPI_COMM_WORLD, &myrank); while (!done) { /* do work using intra-comms */ /* swap values with other model */ MPI_Recv( databuf, cnt, MPI_DOUBLE, myrank, tag, air_comm, &status); MPI_Send( databuf, cnt, MPI_DOUBLE, myrank, tag, air_comm); } MPI_Conn_remove ( ocean_handle ); MPI_Comm_free ( &air_comm ); MPI_Finalize();

Coupled model MPI Application Ocean ModelMPI Application Air Model MPI_COMM_WORLD Global inter-communicator air_comm -> <- ocean_comm

MPI_Connect Internals I.e. SNIPE SNIPE is a meta-computing system from UTK that was designed to support long-term distributed applications. Uses SNIPE as a communications layer. Naming services is provided by the RCDS RC_Server system (which is also used by HARNESS for repository information). MPI application startup is via SNIPE daemons that interface to standard batch/queuing systems –These understand LAM, MPICH, MPIF (POE), SGI MPI, squb variations soon condor and lsf

PVMPI2 Internals I.e. PVM Uses PVM as a communications layer Naming services is provided by the PVM Group Server in PVM3.3.x and by the Mailbox system in PVM 3.4. PVM 3.4 is simpler as it has user controlled message contexts and message handlers. MPI application startup is provided by specialist PVM tasker processes. –These understand LAM, MPICH, IBM MPI (POE), SGI MPI

Internals linking with MPI MPI_Connect and PVMPI are built as a MPI profiling interface. Thus they are transparent to user applications. During building it can be configured to call other profiling interfaces and hence allow inter-operation with other MPI monitoring/tracing/debugging tool sets.

MPI_Connect / PVMPI Layering MPI_function Users Code Intercomm Library Return code Look up communicators etc If true MPI intracomm then use profiled MPI call PMPI_Function Else translate into SNIPE/PVM addressing and use SNIPE/PVM functions other library Work out correct return code

Process Management PVM and/or SNIPE can handle the startup of MPI jobs –General Resource Manager / Specialist PVM Tasker control / SNIPE daemons Jobs can also be started by MPIRUN –useful when testing on interactive nodes Once enrolled, MPI processes are under SNIPE or PVM control –signals (such as TERM/HUP) –notification messages (fault tolerance)

Process Management MPICH Cluster IBM SP2 SGI O2K MPICH Tasker POE Tasker Pbs/qsub Tasker and PVMD GRM User Request

Conclusions MPI_Connect and PVMPI allow different MPI implementations to inter-operate. Only 3 additional calls required. Layering requires a full profiled MPI library (complex linking). Intra-communication performance may be slightly effected. Inter-communication as fast as intercomm library used (either PVM or SNIPE).

MPI_Connect interface much like Names, addresses and ports in MPI-2 Well know address host:port –MPI_PORT_OPEN makes a port –MPI_ACCEPT lets client connect –MPI_CONNECT client side connection Service naming –MPI_NAME_PUBLISH (port, info, service) –MPI_NAME_GET client side to get port

Server-Client model Server Client MPI_Accept() {host:port} MPI_Connect Inter-comm

Server-Client model Server Client MPI_Port_open() {host:port} NAME MPI_Name_publish MPI_Name_get

SNIPE

Single GlobalName space –Built using RCDS supports URLs,URNs and LIFNs testing against LDAP Scalable / Secure Multiple Resource Managers Safe execution environment –Java etc.. Parallelism is the basic unit of execution

Additional Information PVM : PVMPI : MPI_Connect : SNIPE : htp:// and RCDS : htp:// ICL : CRPC : DOD HPC MSRCs –CEWES : –ARL –ASC