Presentation is loading. Please wait.

Presentation is loading. Please wait.

SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia Distributed Research Group.

Similar presentations


Presentation on theme: "SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia Distributed Research Group."— Presentation transcript:

1 SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia Distributed Research Group SC-2005 Nov 16

2 What is SCTP? Stream Control Transmission Protocol (SCTP) General purpose unicast transport protocol for IP network data communications Recently standardized by IETF Can be used anywhere TCP is used

3 What is SCTP? Stream Control Transmission Protocol (SCTP) General purpose unicast transport protocol for IP network data communications Recently standardized by IETF Can be used anywhere TCP is used Question Can we take advantage of SCTP features to better support parallel applications using MPI?

4 Communicating MPI Processes TCP is often used as transport protocol for MPI SCTP

5 Overview of SCTP

6 SCTP Key Features Reliable in-order delivery, flow control, full duplex transfer. TCP-like congestion control Selective ACK is built-in the protocol

7 SCTP Key Features Message oriented Use of associations Multihoming Multiple streams within an association

8 Associations and Multihoming Endpoint X NIC1 2 Endpoint Y NIC3 4 Network 207.10.x.x Network 168.1.x.x IP=207.10.40.1 IP=168.1.140.10IP=168.1.10.30 IP=207.10.3.20 Association

9 Logical View of Multiple Streams in an Association

10 Partially Ordered User Messages Sent on Different Streams

11

12

13

14

15

16

17

18

19

20

21 Can be received in the same order as it was sent (required in TCP).

22 Partially Ordered User Messages Sent on Different Streams

23

24

25

26

27

28 Delivery constraints: A must be before C and C must be before D

29 MPI Point-to-Point Overview

30 MPI Point-to-Point Message matching is done based on Tag, Rank and Context (TRC). Combinations such as blocking, non-blocking, synchronous, asynchronous, buffered, unbuffered. Use of wildcards for receive MPI_Send(msg,count,type,dest-rank,tag,context ) MPI_Recv(msg,count,type,source-rank,tag,context )

31 MPI Messages Using Same Context, Two Processes

32 Out of order messages with same tags violate MPI semantics

33 Using SCTP for MPI Striking similarities between SCTP and MPI

34 SCTP-based MPI

35 MPI over SCTP : Design and Implementation LAM (Local Area Multi-computer) is an open source implementation of MPI library. We redesigned LAM TCP RPI module to use SCTP. RPI module is responsible maintaining state information of all requests.

36 Implementation Issues Maintaining State Information Maintain state appropriately for each request function to work with the one-to-many style. Message Demultiplexing Extend RPI initialization to map associations to rank. Demultiplexing of each incoming message to direct it to the proper receive function. Concurrency and SCTP Streams Consistently map MPI tag-rank-context to SCTP streams, maintaining proper MPI semantics. Resource Management Make RPI more message-driven. Eliminate the use of the select() system call, making the implementation more scalable. Eliminating the need to maintain a large number of socket descriptors.

37 Implementation Issues Eliminating Race Conditions Finding solutions for race conditions due to added concurrency. Use of barrier after association setup phase. Reliability Modify out-of-band daemons and request progression interface (RPI) to use a common transport layer protocol to allow for all components of LAM to multihome successfully. Support for large messages Devised a long-message protocol to handle messages larger than socket send buffer. Experiments with different SCTP stacks

38 Features of Design Head-of-Line Blocking Avoidance Scalability, 1 socket per process Multihoming Added Security

39 Head-of-Line Blocking

40

41

42

43

44

45

46

47 Performance

48 SCTP Performance SCTP stack is in early stages and will improve over time Performance is stack dependant (Linux lksctp stack << FreeBSD KAME stack) - SCTP bundles messages together so it might not always be able to pack a full MTU - Comprehensive CRC32c checksum – offload to NIC not yet commonly available

49 Experiments MPBench Ping-pong comparison NAS Parallel benchmarks Task Farm Program 8 nodes, Dummynet, fair comparison: Same socket buffer sizes, Nagle disabled, SACK ON, No multihoming, CRC32c OFF

50 Experiments: Ping-pong MPBench Ping Pong Test under No Loss

51 Experiments: NAS

52 Experiments: Task Farm Non-blocking communication Overlap computation with communication Use of multiple tags

53 Task Farm - Short Messages

54 Task Farm - Head-of-line blocking

55 Conclusions SCTP is a better match for MPI Avoids unnecessary head-of-line blocking due to use of streams Increased fault tolerance in presence of multihomed hosts Built-in security features Improved congestion control SCTP may enable more MPI programs to execute in LAN and WAN environments.

56 Future Work Release our LAM SCTP RPI module Modify real applications to use tags as streams Continue to look for opportunities to take advantage of standard IP transport protocols for MPI

57 More information about our work is at: http://www.cs.ubc.ca/labs/dsg/mpi-sctp/ Thank you! Or Google “sctp mpi”

58 Extra Slides

59 Associations and Multihoming Endpoint X NIC1 2 Endpoint Y NIC3 4 Network 207.10.x.x Network 168.1.x.x IP=207.10.40.1 IP=168.1.140.10IP=168.1.10.30 IP=207.10.3.20

60 MPI over SCTP : Design and Implementation Challenges: Lack of documentation Code examination Our document is linked-off LAM/MPI website Extensive instrumentation Diagnostic traces Identification of problems in SCTP protocol

61 MPI API Implementation Request Progression Layer Short Messages vs. Long Messages

62 Partially Ordered User Messages Sent on Different Streams

63 Added Security User data can be piggy-backed on third and fourth leg SCTP’s Use of Signed Cookie

64 Added Security 32 bit Verification Tag – reset attack Autoclose feature No half-closed state

65 NAS Benchmarks The NAS benchmarks approximate real world parallel scientific applications We experimented with a suite of 7 benchmarks, 4 data set sizes SCTP performance comparable to TCP for large datasets.

66 Farm Program - Long Messages

67 Head-of-line blocking – Long messages

68 Experiments: Benchmarks SCTP outperformed TCP under loss for ping pong test.

69 Experiments: Benchmarks SCTP outperformed TCP under loss for ping pong test.

70 Experiments: Benchmarks SCTP outperformed TCP under loss for ping pong test.

71


Download ppt "SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia Distributed Research Group."

Similar presentations


Ads by Google