Presentation is loading. Please wait.

Presentation is loading. Please wait.

Request ordering for FI_MSG and FI_RDM endpoints

Similar presentations


Presentation on theme: "Request ordering for FI_MSG and FI_RDM endpoints"— Presentation transcript:

1 Request ordering for FI_MSG and FI_RDM endpoints
29 April ‘14

2 Something needed so consumers of libfabric stay sane
A few type of endpoints with simple ordering rules that are reasonably easy to understand The ordering rules should allow for sufficient flexibility so that different providers can provide maximum performance while also insuring program correctness

3 Background – what is ordering?
Ordering is an end-to-end concept and may include some or all of the following: Expectations of the API consumer w.r.t. the order of execution of operations posted to the fabric provider Execution of operations as expressed on the wire Ordering of information as packets/flits/msgs cross the wire Ordering of inbound operations (both inbound requests and inbound RDMA operations) Order in which inbound data is placed on the memory bus Order in which inbound data is written to memory by the memory controller Expectations of the API consumer w.r.t. the order in which operations are completed/notified

4 IB: The concise ordering rules
Operations on the SEND queue are transmitted on the wire in order. Operations at the RESPONDER side are executed in the order received. A SEND or RDMA WRITE may be executed before an RDMA READ! Operations on the SEND queue are completed in the order in which they were posted. 3 SEND responder 2 RDMA RD 1 SEND SEND 3 RDMA RD SEND 1 ??? ACK 1 READ DATA ACK 3 6/14/2011

5 What’s man –l man/fi_getinfo.3 have now?
FI_MSG - Provides reliable, in-order message based communication, with data transfers maintaining message boundaries. Hmmm. Okay, so if you bought an adaptive network, you wasted your money. FI_RDM - Provides reliable datagram communication without ordering guarantees. – Hmmm. Okay, does PSM really work this way? MPI can’t use this mode easily.

6 It’s worse than that IB provides ordering only on operations posted to a given QP. The QP construct binds together operations of different types in order to provide ordering guarantees between different operations, e.g. between message and RDMA operations. How is that accomplished using the fabric interfaces?

7 What would be nicer… FI_MSG - Provides reliable, message based communication, with data transfers maintaining message boundaries. Messages are ordered by default, with relaxed order being optionally supported on a per message basis. FI_RDM - Provides reliable datagram communication. By default, ordering is not guaranteed, although for datagrams targeting a given network endpoint, a sequence of datagrams can be specified as an ordered sequence.

8 Using relaxed order in MPI – rendezvous example
No ordering dependency No ordering dependency SencCmp1 WR1 responder SndCmp0 WR0 SndCmp1 Wr1 Sndcmp0 Wr0 ordering required ordering required 6/14/2011

9 PCI-e Transaction order rules (for given TC, src, target)
Producer/consumer model first op second op Row Pass Column? Posted Request Non-posted read request Non-posted AMO req. Posted Request (RDMA write) a) no b) y/n yes Non-posted read request y/n Non-posted AMO request. Strict producer/consumer model – i.e. strict order I’m feeling lucky and have set RO bit in second request Requesting relaxed order doesn’t mean you’ll get it, hence y/n. If you’re a HW designer, don’t count on relaxed order.

10 IB Transaction ordering rules (RC)
first op second op Row Pass Column? Send/ RDMA Write RDMA Read Non-posted AMO req. Send/RDMA Write no a)yes b) no Atomic Op. a) yes b) no No ordering guarantee – don’t count on order if you want correctness I need order and have set the fence bit in second transaction Strict producer-consumer model March 30 – April 2, 2014 #OFADevWorkshop

11 Libfabric FI_MSG now – depending on interpretation of fi_getinfo.3
first op Row Pass Column? Send RDMA write RDMA Read Non-posted AMO req. no RDMA Write Atomic Op. second op March 30 – April 2, 2014 #OFADevWorkshop

12 Libfabric FI_MSG with optional relaxed order proposal
first op Row Pass Column? Send RDMA write RDMA Read Non-posted AMO req. a) no b) y/n a) yes c) no RDMA Write a)yes c) no Atomic Op. second op IB RC like behavior (default) Relaxed order bit set in flag (SendMsg, etc.) Fence bit set in flag Provider free to ignore b), must observe c)

13 Ordering bits for FI_MSG
Add new flag bit for sendmsg/writemsg - FI_RELAXED_ORDER - If this bit is set, MSG, RMA ,or AMO operation may be completed ahead of pending MSG, RMA, or AMO ops in the EP’s send queue Messages may appear to complete out of order when this bit is set. Add new flag bit for sendmsg/writemsg - FI_FENCE_GLOBAL - If this bit is set, this operations posted to the EP will not be initiated till all previously posted MSG, RMA, AMO ops to the EP have completed globally fi_ep_sync sounds blocking, this is a potentially non-blocking way to do a fence March 30 – April 2, 2014 #OFADevWorkshop

14 Libfabric FI_RDM now – depending on interpretation of fi_getinfo.3
first op Row Pass Column? Send RDMA write RDMA Read Non-posted AMO req. yes RDMA Write Atomic Op. second op Not enough order? fi_ep_sync seems kind of heavy weight.

15 Libfabric FI_RDM suggestion – HyperTransport ordered sequences
first op Row Pass Column? Send RDMA write RDMA Read Non-posted AMO req. a) yes b) no RDMA Write Atomic Op. a)yes b) no second op a) default, as in man page. App must use fi_ep_sync for ordering. b) If second op has the same order sequence. Ops must be back-to-back.

16 Ordering bits for FI_RDM
Add new flag bit for sendmsg/writemsg - FI_ORDERED_SEQ - If this bit is set, message or rma or amo operation is treated as part of an ordered sequence. The sequence number is specified in the flow field of the fi_msg, etc. argument Msg, RMA, and AMO requests within an ordered sequence must be posted sequentially to a given endpoint, with intervening requests that are not part of the ordered sequence. The operations must all target the same target address. Some providers may be able to do this efficiently, otherwise the behavior is as if fi_ep_sync were invoked internally between each operation.


Download ppt "Request ordering for FI_MSG and FI_RDM endpoints"

Similar presentations


Ads by Google