Presentation is loading. Please wait.

Presentation is loading. Please wait.

OFED TCP Port Mapper Proposal June 15, 2011. Overview Current NE020 Linux OFED driver uses host TCP/IP stack MAC and IP address for RDMA connections Hardware.

Similar presentations


Presentation on theme: "OFED TCP Port Mapper Proposal June 15, 2011. Overview Current NE020 Linux OFED driver uses host TCP/IP stack MAC and IP address for RDMA connections Hardware."— Presentation transcript:

1 OFED TCP Port Mapper Proposal June 15, 2011

2 Overview Current NE020 Linux OFED driver uses host TCP/IP stack MAC and IP address for RDMA connections Hardware tags packets used for RDMA connection management for easy identification Host TCP/IP stack services used for address resolution and neighbor updates RDMA CM claims TCP port creating a kernel socket when the unified portspace patch is applied and support is enabled via module option: tch;h=cfe f de9b8ba15f2e35b2997;hb=ofed_kernel_1_5 tch;h=cfe f de9b8ba15f2e35b2997;hb=ofed_kernel_1_5 Unified portspace kernel patch is applied only when OFED distribution is used intact At least one OSV is moving to a model where OFED kernel patches will not be applied RedHat starting with RHEL 6.0 iSCSI hardware acceleration has moved to a separate MAC/IP address that is not visible to the linux TCP/IP stack (private interface) Linux community has rejected previous push for including the portspace patch rather violently Suggestion from linux community is to do what iSCSI did Goal of this presentation is to … Describe a solution to the iWARP TCP portspace issue using the Sockets Direct Protocol Port Mapper and Netlink sockets 2

3 Current OFED iWARP CM Flows (Listen) 3 Application issues rdma_listen In case of userspace application, kernel transition occurs Local IP address is the Linux IP address (IP 0 ) OFED CM selects an interface and selects a local port from the appropriate portspace Simple case (IP 0 and TCP Port 0 ) Local IP can be ANY; CM issues listen to all interfaces Local port can be ANY; CM picks a port IF local IP and Port are any, port must be accepted on all interfaces Portspace patch issues Socket and Bind for iWARP providers This portion has not been accepted to the kernel Patch exists in the OFED package Default just has kernel CM picking a port independent of the host TCP/IP stack 1. Rdma_listen(Local IP 0, Local Port 0 ) 2. Transition to Kernel CM 3. Interface Selected 4. Port Selected 6. create_ listen 7. Setup Hardware 5. Kern_socket, bind

4 Current OFED iWARP CM Flows (Connect) 4 Application issues rdma_connect In case of userspace application, kernel transition occurs Local and remote IP addresses are the Linux IP addresses (IP 0, IP 2 ) OFED CM selects an interface and selects a local port from the appropriate portspace Local IP can be ANY CM uses the linux stack to pick an interface, this usually handles the Neighbour updated before getting to the provider Portspace patch issues Socket and Bind for iWARP providers Kernel provider is informed (and can trigger) Neighbour updates to stay in sync with the Linux TCP/IP stack Kernel provider mini-cm issues handles TCP/IP three way handshake and MPA exchange through dev_queue_xmit and private receive path 1.Rdma_connect( Local IP 0, Local Port 0, Remote IP 2, Remote Port 2 ) 2. Transition to Kernel CM 3. Interface Selected 4. Port Selected 6. connect 7. Setup Hardware 5. Kern_socket, bind 8. Neighbour Update 9. CM Packets

5 New OFED iWARP CM Architecture 5 Similar to current flow for CM OFED has new iWARP Port Mapper Daemon in userspace OFED has new netlink interface between user and kernel Introduced for statistics Extended for iWARP providers and new Port Mapper Daemon Netlink interface roughly modeled after iSCSI Supports (but does not require) second MAC/IP addresses on local and on remote peer (soft iWARP) Netlink Messages: Port Mapper Netlink Upcalls: Query PID, Add/Remove Mapping, Query Mapping Provider Netlink Upcalls: Query PID, Connect, Listen, Resolve Provider Netlink Downcalls: Inbound Connect, Operation Complete for upcalls, Interface Down Three RNIC models supported RNICs with CM in Kernel/Adapter RNICs with CM in userspace Hybrid RNICs with userspace CM that requires adapter assistance

6 iWARP Port Mapper Concept Port Mapper concept was introduced by the RDMA Consortium as part of the Socket Direct Protocol specification Provides a mechanism to have an iWARP port space separate from linux TCP port space iWARP port space can be on an independent IP address or single IP address Port Mapper service runs over TCP on a well known port (3935) on linux IP addresses Listen issued at service startup Port Mapper service rdma_listen steps: Register a mapping between linux IP Address/TCP Port and iWARP IP Address/TCP Port with the Port Mapper service Port Mapper service rdma_connect steps: Receive a query request from a Port Mapper service client Connect to remote peer on well known port Query RDMA peer’s iWARP IP Address/TCP port using the SDP Port Mapper protocol (PMRequest) Return information from the PMAccept message to the client of the Port Mapper service Port Mapper service peer query steps: Accept Port Mapper connection (port 3935 to linux IP address) from node issuing the query Receive the PMRequest message Look up the IP address and Port from the PM request in the local database from the rdma_listen step Return the mapped IP address and port information in a PMAccept message iWARP provider issues iWARP connect using an iWARP local and remote IP Address/TCP port “quad” after receiving the PMAccept message Later slides show more detail 6

7 Pending Netlink Patch for OFED A patch has been submitted recently to query RDMA connection information via netlink Roland has rolled this patch into the linux-next patch set for late May This patch introduces a single OFED netlink port and an Infiniband netlink infrastructure in ib_core Support for 32 clients within OFED and 1024 operations for each client Only a single client is currently defined (rdma_cm) Components interested in adding netlink capabilities to OFED can register with Infiniband netlink infrastructure The Port Mapper daemon consumes one client Each iWARP provider consumes an additional client The dump netlink operation is used to provide data back to the netlink client 7

8 New OFED iWARP CM Flows (Listen: Userspace provider CM) 8 1. Rdma_listen(Local IP 0, Local Port 0 ) 2. Transition to Kernel CM 3. Interface Selected 4. Port Selected 5. create_ listen 9. Setup Hardware (IP 1, Port 1 ) 6. Netlink: Listen 8. Netlink: Complete 7. Netlink: Register Port Map IP 0, Port 0 -> IP 1, Port 1 Similar to current flow for CM CM can now independently reserve ports since the Port Mapper allows providers to use any provider managed port number to represent CM port number Netlink message used to issue listen to userspace library Mini-cm or userspace TCP stack manages provider “port space” to get Local TCP port 1 that is related to the CM local Port 0 Userspace library registers local IP 1, Port 1 For compatibilty, bind could also be made on existing MAC/IP stack. Soft iWARP requires this, along with some customers. If userspace provider library issues socket/bind to Linux TCP/IP stack (like soft iWARP would do), then IP 0 = IP 1 and Port 0 != Port 1

9 New OFED iWARP CM Flows (Connect: Userspace provider CM) 9 3. Interface Selected 4. Port Selected Similar to current flow for CM Netlink used to issue connect to userspace library Mini-cm or userspace TCP stack manages provider “portspace” to get Local TCP port 1 that is related to the CM local Port 0 Userspace library resolves remote IP 2, Port 2 through the Port Mapper and gets remote IP and port number IP 3, Port 3 Userspace provider CM issues iWARP connect to IP 3, Port 3, including MPA handshake Userspace Mini-cm sends Netlink Connect Complete call to the kernel provider indicating the new connection information: IP 1 :Port 1, IP 3 :Port 3 The kernel driver sets up the RNIC hardware including transitioning the QP to RTS Kernel CM Issues Connect Reply Event 1.Rdma_connect( Local IP 0, Local Port 0, Remote IP 2, Remote Port 2 ) 2. Transition to Kernel CM 1.connect 2.Connect Reply Event 10. Setup Hardware 1.Netlink: Connect 2.Netlink: Connect Complete 1.Netlink: Resolve Remote Port IP 2, Port 2 -> IP 3, Port 3 8. SDP Port Mapper Protocol (IP 0 IP 2 )

10 New OFED iWARP CM Flows (Accept: Userspace provider CM) 10 Userspace provider CM receives a connect request on IP 1, port 1 TCP three-way handshake and MPA request from peer received Userspace library issues Connect Request netlink downcall to kernel provider library Remote iWARP: IP 3, Port 3 (Port Mapped) Remote TCP: Unknown, use Port Mapped IP 3, Port 3 ) Local iWARP: IP 1, Port 1 (Port Mapped) Local TCP: IP 0, Port 0 (from listen) Kernel Mini-cm sends Netlink Connect Request event to the iWARP indicating the new connection information: IP 0 :Port 0, IP 3 :Port 3 Application is notified of the connection request, it turns around with an rdma_accept call The kernel CM issues an accept call to the kernel provider The kernel provider then sets up the RNIC hardware, including sending the MPA response and transitioning the QP to RTS The kernel provider issues an Established CM event 4.Rdma_accept( Local IP 0, Local Port 0, Remote IP 3, Remote Port 3 ) 3. Transition to Userspace CM 5. Transition to Kernel CM 1.Connect Request Event 2.CM Accept 8.Established Event 7. Setup Hardware 1.Netlink: Connect Request

11 New OFED iWARP CM Flows (kernel provider CM) 11 Changes to RNICs that support kernel only connection management drivers are minimal On listen requests, the kernel provider CM must issue the Register Port Map request to the iWARP Port Mapper Daemon using netlink sockets On connect requests, the kernel provider CM must: Issue the Resolve Remote Port netlink message to the iWARP Port Mapper Daemon On completion, use the local and remove iWARP IP addresses and Port numbers to issue the iWARP connect request (instead of the linux IP addresses and Port numbers from the connect request On Connect Request event and accept request handling, map the local iWARP IP address and Port number to the original listen IP address and port number

12 New OFED iWARP CM Flows (hybrid provider CM) 12 A hybrid RNIC has a userspace Connection Manager or Private TCP stack that manages the iWARP IP address and port space, but does not get involved with connection setup The Listen flow for a hybrid RNIC is the same as the flow for the userspace stack The Accept flow is the same as the flow for a kernel provider The Connect flow is slightly different and depicted on the following slide.

13 New OFED iWARP CM Flows (Connect: Hybrid CM) Interface Selected 4. Port Selected Similar to current flow for CM Netlink used to issue resolve message to userspace library Mini-cm or userspace TCP stack manages provider “portspace” to get Local TCP port 1 that is related to the CM local Port 0 Userspace library resolves remote IP 2, Port 2 through the Port Mapper and gets remote IP and port number IP 3, Port 3 This information is returned to the kernel provider CM in a resolve complete netlink message Kernel provider CM issues iWARP connect to IP 3 :Port 3 from IP 1 :Port 1, including MPA handshake The kernel driver sets up the RNIC hardware including transitioning the QP to RTS Kernel CM Issues Connect Reply Event indicating IP 0 :Port 0 and IP 2 :Port 2 as the connection information 1.Rdma_connect( Local IP 0, Local Port 0, Remote IP 2, Remote Port 2 ) 2. Transition to Kernel CM 1.connect 2.Connect Reply Event 10. Setup Hardware 1.Netlink: Resolve 2.Netlink: Resolve Complete 1.Netlink: Resolve Remote Port IP 2, Port 2 -> IP 3, Port 3 8. SDP Port Mapper Protocol (IP 0 IP 2 )

14 Conclusions/Next Steps This proposal supports moving iWARP traffic to an independent port space from TCP/IP sockets applications transparently to the RDMA verbs consumer The iWARP port space can remain on the same IP address (like soft iWARP) or on a separate IP address (like iSCSI) Three different RNIC connection management models are supported The RDMA Consortium published the wire protocol for mapping TCP port numbers to iWARP port numbers This proposal also resolves a port space issue with iSER targets and iWARP in OFED Backward compatibility can be ensured by using timeouts on the port mapper protocol to fall back to the current behavior 14


Download ppt "OFED TCP Port Mapper Proposal June 15, 2011. Overview Current NE020 Linux OFED driver uses host TCP/IP stack MAC and IP address for RDMA connections Hardware."

Similar presentations


Ads by Google