Presentation is loading. Please wait.

Presentation is loading. Please wait.

SCTP-based Middleware for MPI

Similar presentations


Presentation on theme: "SCTP-based Middleware for MPI"— Presentation transcript:

1 SCTP-based Middleware for MPI
Humaira Kamal, Brad Penoff, Alan Wagner Department of Computer Science University of British Columbia

2 What is MPI and SCTP? Message Passing Interface (MPI)
Library that is widely used to parallelize scientific and compute-intensive programs Stream Control Transmission Protocol (SCTP) General purpose unicast transport protocol for IP network data communications Recently standardized by IETF Can be used anywhere TCP is used

3 What is MPI and SCTP? Stream Control Transmission Protocol (SCTP)
Message Passing Interface (MPI) Library that is widely used to parallelize scientific and compute-intensive programs Stream Control Transmission Protocol (SCTP) General purpose unicast transport protocol for IP network data communications Recently standardized by IETF Can be used anywhere TCP is used Question Can we take advantage of SCTP features to better support parallel applications using MPI?

4 Communicating MPI Processes
TCP is often used as transport protocol for MPI SCTP SCTP

5 SCTP Key Features Reliable in-order delivery, flow control, full duplex transfer. SACK is built in the protocol TCP-like congestion control

6 SCTP Key Features Message oriented Use of associations Multihoming
Multiple streams within an association

7 Logical View of Multiple Streams in an Association

8 Partially Ordered User Messages Sent on Different Streams

9 MPI Middleware MPI_Send(msg,count,type,dest-rank,tag,context)
MPI_Recv(msg,count,type,source-rank,tag,context) Message matching is done based on Tag, Rank and Context (TRC). Combinations such as blocking, non-blocking, synchronous, asynchronous, buffered, unbuffered. Use of wildcards for receive

10 MPI Messages Using Same Context, Two Processes

11 MPI Messages Using Same Context, Two Processes
Out of order messages with same tags violate MPI semantics

12 MPI Middleware Message Progression Layer
Short Messages vs. Long Messages

13 Design and Implementation
LAM (Local Area Multi-computer) is an open source implementation of MPI library We redesigned LAM-MPI to use SCTP Three-phased iterative process Use of One-to-One Style Sockets Use of Multiple Streams Use of One-to-Many Style Sockets

14 Using SCTP for MPI Striking similarities between SCTP and MPI

15 Implementation Issues
Maintaining State Information Maintain state appropriately for each request function to work with the one-to-many style. Message Demultiplexing Extend RPI initialization to map associations to rank. Demultiplexing of each incoming message to direct it to the proper receive function. Concurrency and SCTP Streams Consistently map MPI tag-rank-context to SCTP streams, maintaining proper MPI semantics. Resource Management Make RPI more message-driven. Eliminate the use of the select() system call, making the implementation more scalable. Eliminating the need to maintain a large number of socket descriptors.

16 Implementation Issues
Eliminating Race Conditions Finding solutions for race conditions due to added concurrency. Use of barrier after association setup phase. Reliability Modify out-of-band daemons and request progression interface (RPI) to use a common transport layer protocol to allow for all components of LAM to multihome successfully. Support for large messages Devised a long-message protocol to handle messages larger than socket send buffer. Experiments with different SCTP stacks

17 Features of Design Head-of-Line Blocking Multihoming and Reliability
Security

18 Head-of-Line Blocking

19 Multihoming Heartbeats Failover Retransmissions
User adjustable controls

20 Added Security SCTP’s Use of Signed Cookie
User data can be piggy-backed on third and fourth leg SCTP’s Use of Signed Cookie

21 Limitations Comprehensive CRC32c checksum – offload to NIC not yet commonly available SCTP bundles messages together so it might not always be able to pack a full MTU SCTP stack is in early stages and will improve over time Performance is stack dependant (Linux lksctp stack << FreeBSD KAME stack)

22 Experiments for Loss Performance of MPI Program that Uses Multiple Tags

23 Experiments: Head-of-Line Blocking
Use of Different Tags vs. Same Tags

24 Experiments: SCTP versus TCP
MPBench Ping Pong Test under No Loss

25 Conclusions SCTP is a better suited for MPI
Avoids unnecessary head-of-line blocking due to use of streams Increased fault tolerant in presence of multihomed hosts In-built security features SCTP might be key to moving MPI programs from LANs to WANs.

26 Thank you! More information about our work is at:


Download ppt "SCTP-based Middleware for MPI"

Similar presentations


Ads by Google