Presentation is loading. Please wait.

Presentation is loading. Please wait.

SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia.

Similar presentations


Presentation on theme: "SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia."— Presentation transcript:

1 SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia

2 Outline Self Introduction Research background Research presentation SCTP & MPI background MPI over SCTP design Design features Results Conclusions

3 Who am I? Born and raised in Columbus area OSU alumni Europa alumni Worked a few years Grad student finishing my MSc at UBC

4 UBC d

5 Who do I work with? Alan Wagner (Prof, UBC) Humaira Kamal (PhD, UBC) Mike Yao Chen Tsai (MSc, UBC) Edith Vong (BSc, UBC) Randall Stewart (Cisco)

6 What field do we work in? Parallel computing Concurrently utilize multiple resources

7 What field do we work in? Parallel computing Concurrently utilize multiple resources 1 cook

8 What field do we work in? Parallel computing Concurrently utilize multiple resources 1 cook vs 8 cooks

9 What field do we work in? Parallel computing Concurrently utilize multiple resources

10 What field do we work in? Message passing programming model Message Passing Interface (MPI) Standardized API for applications

11 What field do we work in? Middleware for MPI Glues necessary components together for parallel environment

12 What field do we work in? Middleware for MPI Glues necessary components together for parallel environment ←

13 What field do we work in? Parallel library component Implements MPI API for various interconnects Shared memory Myrinet Infiniband Specialized hardware (BlueGene/L, ASCI Red, etc)

14 What field do we work in? TCP/IP protocol stack interconnect Stream Control Transmission Protocol

15 SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia Supercomputing 2005, Seattle, Washington USA

16 What is MPI and SCTP? Message Passing Interface (MPI) Library that is widely used to parallelize scientific and compute-intensive programs Stream Control Transmission Protocol (SCTP) General purpose unicast transport protocol for IP network data communications Recently standardized by IETF Can be used anywhere TCP is used

17 What is MPI and SCTP? Message Passing Interface (MPI) Library that is widely used to parallelize scientific and compute-intensive programs Stream Control Transmission Protocol (SCTP) General purpose unicast transport protocol for IP network data communications Recently standardized by IETF Can be used anywhere TCP is used Question Can we take advantage of SCTP features to better support parallel applications using MPI?

18 Communicating MPI Processes TCP is often used as transport protocol for MPI SCTP

19 SCTP Key Features Reliable in-order delivery, flow control, full duplex transfer. Selective ACK is built-in the protocol TCP-like congestion control

20 SCTP Key Features Message oriented Use of associations Multihoming Multiple streams within an association

21 Associations and Multihoming Primary address Heartbeats Retransmissions Failover User adjustable controls CMT

22 Logical View of Multiple Streams in an Association

23 Partially Ordered User Messages Sent on Different Streams

24

25

26

27

28

29

30

31

32

33

34 Can be received in the same order as it was sent (required in TCP).

35 Partially Ordered User Messages Sent on Different Streams

36

37

38

39 MPI API Implementaion Message matching is done based on Tag, Rank and Context (TRC). Combinations such as blocking, non-blocking, synchronous, asynchronous, buffered, unbuffered. Use of wildcards for receive MPI_Send(msg,count,type,dest-rank,tag,context ) MPI_Recv(msg,count,type,source-rank,tag,context )

40 MPI Messages Using Same Context, Two Processes

41 Out of order messages with same tags violate MPI semantics

42 MPI API Implementation Request Progression Layer Short Messages vs. Long Messages

43 MPI over SCTP : Design and Implementation LAM (Local Area Multi-computer) is an open source implementation of MPI library. Origins at Ohio Supercomputing Center We redesigned LAM TCP RPI module to use SCTP. RPI module is responsible maintaining state information of all requests.

44 MPI over SCTP : Design and Implementation Challenges: Lack of documentation Code examination Our document is linked-off LAM/MPI website Extensive instrumentation Diagnostic traces Identification of problems in SCTP protocol

45 Using SCTP for MPI Striking similarities between SCTP and MPI

46 Implementation Issues Maintaining State Information Maintain state appropriately for each request function to work with the one-to-many style. Message Demultiplexing Extend RPI initialization to map associations to rank. Demultiplexing of each incoming message to direct it to the proper receive function. Concurrency and SCTP Streams Consistently map MPI tag-rank-context to SCTP streams, maintaining proper MPI semantics. Resource Management Make RPI more message-driven. Eliminate the use of the select() system call, making the implementation more scalable. Eliminating the need to maintain a large number of socket descriptors.

47 Implementation Issues Eliminating Race Conditions Finding solutions for race conditions due to added concurrency. Use of barrier after association setup phase. Reliability Modify out-of-band daemons and request progression interface (RPI) to use a common transport layer protocol to allow for all components of LAM to multihome successfully. Support for large messages Devised a long-message protocol to handle messages larger than socket send buffer. Experiments with different SCTP stacks

48 Features of Design Scalability Head-of-Line Blocking

49 Scalability TCP

50 Scalability SCTP

51 Head-of-Line Blocking

52

53

54

55

56

57

58

59 Limitations Comprehensive CRC32c checksum – offload to NIC not yet commonly available SCTP bundles messages together so it might not always be able to pack a full MTU SCTP stack is in early stages and will improve over time Performance is stack dependant (Linux lksctp stack << FreeBSD KAME stack)

60 Experiments Controlled environment - Eight nodes - Dummynet Used standard benchmarks as well as real world programs Fair comparison Buffer sizes, Nagle disabled, SACK ON, No multihoming, CRC32c OFF

61 Experiments: Benchmarks MPBench Ping Pong Test under No Loss

62 NAS Benchmarks The NAS benchmarks approximate real world parallel scientific applications We experimented with a suite of 7 benchmarks, 4 data set sizes SCTP performance comparable to TCP for large datasets.

63 Latency Tolerant Programs Bulk Farm Processor program Real-world application Non-blocking communication Overlap computation with communication Use of multiple tags

64 Farm Program - Short Messages

65 Head-of-line blocking – Short messages

66 Conclusions SCTP is a better suited for MPI Avoids unnecessary head-of-line blocking due to use of streams Increased fault tolerance in presence of multihomed hosts In-built security features Robust under loss SCTP might be key to moving MPI programs from LANs to WANs.

67 Future Work Release LAM SCTP RPI module at SC|05 Incorporate our work into Open MPI and/or MPICH2 Modify real applications to use tags as streams

68 More information about our work is at: http://www.cs.ubc.ca/labs/dsg/mpi-sctp/ Thank you!

69 Extra Slides

70 Partially Ordered User Messages Sent on Different Streams

71 Added Security User data can be piggy-backed on third and fourth leg SCTP’s Use of Signed Cookie

72 Added Security 32 bit Verification Tag – reset attack Autoclose feature No half-closed state

73 Farm Program - Long Messages

74 Head-of-line blocking – Long messages

75 Experiments: Benchmarks SCTP outperformed TCP under loss for ping pong test.

76 Experiments: Benchmarks SCTP outperformed TCP under loss for ping pong test.

77 Experiments: Benchmarks SCTP outperformed TCP under loss for ping pong test.


Download ppt "SCTP versus TCP for MPI Brad Penoff, Humaira Kamal, Alan Wagner Department of Computer Science University of British Columbia."

Similar presentations


Ads by Google