ISCSI Extensions for RDMA (iSER) draft-ko-iwarp-iser-02 Mike Ko IBM August 2, 2004.

Slides:



Advertisements
Similar presentations
Re-INVITE Handling draft-camarillo-sipping-reinvite-00.txt
Advertisements

A Study of iSCSI Extensions for RDMA (iSER)
OFED TCP Port Mapper Proposal June 15, Overview Current NE020 Linux OFED driver uses host TCP/IP stack MAC and IP address for RDMA connections Hardware.
CCNA – Network Fundamentals
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
 Both System.out and System.err are streams—a sequence of bytes.  System.out (the standard output stream) displays output  System.err (the standard.
1 Chapter 3 TCP and IP. Chapter 3 TCP and IP 2 Introduction Transmission Control Protocol (TCP) Transmission Control Protocol (TCP) User Datagram Protocol.
Transport Layer – TCP (Part1) Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing, UNF.
IWARP Update #OFADevWorkshop.
August 02, 2004Mallikarjun Chadalapaka, HP1 iSCSI/RDMA: Overview of DA and iSER Mallikarjun Chadalapaka HP.
Semester 4 - Chapter 4 – PPP WAN connections are controlled by protocols In a LAN environment, in order to move data between any two nodes or routers two.
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
Telnet/SSH Tim Jansen, Mike Stanislawski. TELNET is short for Terminal Network Enables the establishment of a connection to a remote system, so that the.
Network Architectures Week 3 – OSI and The Internet.
TCP: Software for Reliable Communication. Spring 2002Computer Networks Applications Internet: a Collection of Disparate Networks Different goals: Speed,
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
COE 342: Data & Computer Communications (T042) Dr. Marwan Abu-Amara Chapter 2: Protocols and Architecture.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
Gursharan Singh Tatla Transport Layer 16-May
Copyright 2003 CCNA 1 Chapter 7 TCP/IP Protocol Suite and IP Addressing By Your Name.
1 Transport Layer Computer Networks. 2 Where are we?
Presentation on Osi & TCP/IP MODEL
Draft-campbell-dime-load- considerations-01 IETF 92 DIME Working Group Meeting Dallas, Texas.
1 IP Forwarding Relates to Lab 3. Covers the principles of end-to-end datagram delivery in IP networks.
SpaceWire-RT Steve Parkes, Albert Ferrer-Florit
Architectures. Many tasks involved in encoding, protecting and transmitting user application data as bit stream. Network Architecture is how tasks are.
Chapter 2 – X.25, Frame Relay & ATM. Switched Network Stations are not connected together necessarily by a single link Stations are typically far apart.
Data and Computer Communications Chapter 2 – Protocol Architecture, TCP/IP, and Internet-Based Applications.
RDMAP/DDP Security Draft draft-ietf-rddp-security-01.txt Jim Pinkerton, Ellen Deleganes, Sara Bitan.
The OSI Model.
1 The Internet and Networked Multimedia. 2 Layering  Internet protocols are designed to work in layers, with each layer building on the facilities provided.
Spring 2006Computer Networks1 Chapter 2 Network Models.
ISER on SCTP & IB draft-hufferd-ips-iser-sctp-ib-00.txt Generalizations to iSER specification John Hufferd Mike Ko Yaron Haviv.
IEEE MEDIA INDEPENDENT HANDOVER DCN: Title: MIH_Handover primitives and scenarios Date Submitted: April, 30,
Chapter 15 – Part 2 Networks The Internal Operating System The Architecture of Computer Hardware and Systems Software: An Information Technology Approach.
William Stallings Data and Computer Communications
InfiniBand support for Socket- based connection model by CM Arkady Kanevsky November 16, 2005 version 4.
SIP working group IETF#70 Essential corrections Keith Drage.
Draft-ietf-rddp-security-02 Summary of outstanding issues August 4, 2004 Jim Pinkerton.
Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye August 3, 2006
RDMAP/DDP Security Draft draft-ietf-rddp-security-00.txt Jim Pinkerton, Ellen Deleganes, Allyn Romanow, Bernard Aboba.
U NIVERSITY of N EW H AMPSHIRE I NTER O PERABILITY L AB iSCSI Plugfest (Oct 28 - Nov 3) UNH InterOperability Laboratory Yamini Shastry Graduate Research.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
ISER on InfiniBand (and SCTP). Problem Statement Currently defined IB Storage I/O protocol –SRP (SCSI RDMA Protocol) –SRP does not have a discovery or.
ISER Draft Status draft-ietf-ips-iser-01 Mike Ko March 8, 2005.
August 04, 2004John Carrier, Adaptec1 One-Shot STags John Carrier Adaptec.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
DOC Use Case Analysis Client to server use cases 1.
ISER Support Annex Arkady Kanevsky, Ph.D. IBTA SWG San Francisco September 25, 2006.
RFC 4068bis draft-ietf-mipshop-fmipv6-rfc4068bis-01.txt Rajeev Koodli.
COMPUTER NETWORK AND DESIGN CSCI 3385K. Host-to-Host Communications Model Older model Proprietary Application and combinations software controlled by.
Network Models. The OSI Model Open Systems Interconnection (OSI). Developed by the International Organization for Standardization (ISO). Model for understanding.
IEEE SISWG (P1619.3)‏ Messaging & Transport. AGENDA Transport Protocols & Channel Protection Messaging Layer Capability Exchange & Authentication Groups.
Network Models.
Constructors and Destructors
The Transport Layer Implementation Services Functions Protocols
Network Architecture Layered Architectures Network Protocols
Operating System Structure
802-1AX-2014-Cor-1-d0-5 Sponsor Ballot Comments Version 1
Constructors and Destructors
Chapter 15 – Part 2 Networks The Internal Operating System
Chapter 2: Operating-System Structures
IEEE MEDIA INDEPENDENT HANDOVER DCN:
CS4470 Computer Networking Protocols
IEEE MEDIA INDEPENDENT HANDOVER DCN:
IEEE MEDIA INDEPENDENT HANDOVER DCN:
Chapter 2: Operating-System Structures
Process-to-Process Delivery: UDP, TCP
draft-ietf-ips-iser-00 Mike Ko November 8, 2004
DetNet Architecture Updates
Presentation transcript:

iSCSI Extensions for RDMA (iSER) draft-ko-iwarp-iser-02 Mike Ko IBM August 2, 2004

8/2/2004M. Ko2 Agenda What is iSER? iSER connection setup –Open issues iSER flow control –Open issues

8/2/2004M. Ko3 iSCSI Datamover with RDMA Extensions The Datamover Architecture defines an abstract model in which the movement of data between iSCSI end nodes is logically separated from the rest of the iSCSI protocol –Allows a datamover protocol layer to offload the tasks of data movement and placement from the iSCSI layer The iSCSI Extensions for RDMA (iSER) protocol is one such datamover protocol –Applies the Datamover Architecture in extending the data transfer capabilities of iSCSI to include RDMA (Remote Direct Memory Access) as defined in the iWARP protocol suite –Allows iSCSI implementations to have data transfers which achieve true zero copy behavior using generic RDMA network interface controllers (RNICs) SCSI iSCSI iSER RDMAP DDP MPA TCP Datamover Interface iWARP Verbs

8/2/2004M. Ko4 Connection Setup for iSER-assisted Mode at the Initiator Negotiated key values may be passed by the iSCSI layer to the iSER layer by invoking the Notice_Key_Values Operational Primitive Before sending the final Login Request, the iSCSI layer invokes the Allocate_Connection_Resources Operational Primitive to request the iSER layer to allocate the iWARP resources for the connection After the target returns the final Login Response, the iSCSI layer at the initiator invokes the Enable_Datamover Operational Primitive to request the iSER layer to transition into iSER-assisted mode The first message sent by the iSER layer at the initiator to the target is the iSER Hello Message

8/2/2004M. Ko5 Connection Setup for iSER-assisted Mode at the Target Negotiated key values may be passed by the iSCSI layer to the iSER layer by invoking the Notice_Key_Values Operational Primitive Before sending the final Login Response, the iSCSI layer invokes the Allocate_Connection_Resources Operational Primitive to request the iSER layer to allocate the iWARP resources for the connection The iSCSI layer invokes the Enable_Datamover Operational Primitive to enable the iSER mode qualified with the final Login Response PDU The iSER layer sends the final Login Response PDU in byte stream mode and then transitions into iSER-assisted mode After receiving the iSER Hello Message from the initiator, the iSER layer at the target responds by sending the iSER HelloReply Message

8/2/2004M. Ko6 Example of Successful iSER Connection Setup iSCSI Layer iSER Layer iSCSI Layer A. SCSI Login Request PDU with RDMAExtensions=Yes B. SCSI Login Response PDU with RDMAExtensions=Yes C. Optional Notice_Key_Values to pass values of negotiated keys D. Allocate_Connection_Resources to set up iWARP resources E. SCSI Login Request PDU with T=1 and NSG=FullFeaturePhase F. Enable_Datamover to go into iSER mode (* = send last iSCSI PDU in byte stream mode) G. SCSI Login Response PDU in byte stream mode with T=1 and NSG=FullFeaturePhase H. iWARP Send Message containing iSER Hello J. iWARP Send Message containing iSER HelloReply A B C D E C D F* G F H J initiator target

8/2/2004M. Ko7 Negotiation of RDMAExtensions in Leading Connection Only From section 2.3 of iSER draft: “iSER-assisted mode is negotiated during the iSCSI Login for each connection, but an entire iSCSI session MUST operate in one mode...” Question: Since RDMAExtensions is leading- only, this statement is incorrect Proposed change: –Replace the sentence with “iSER-assisted mode is negotiated during the iSCSI Login for each session, and an entire iSCSI session MUST operate in one mode...”

8/2/2004M. Ko8 CRC32C Protection in the Layer Below iSER From section 5.1 of iSER draft: “when the RDMAExtensions key is negotiated to "Yes", the HeaderDigest and the DataDigest keys MUST be negotiated to "None"... because... the iWARP protocol suite provides a CRC32c-based error detection for all iWARP Messages” Recent updates to the MPA draft renders the use of CRC optional –“Disabling of CRCs should only be done when it is clear that the connection through the network has data integrity at least as good as a CRC” –RDDP WG’s position is that all ULPs can assume CRC level or equivalent data protection Proposed change: Add the explicit requirement that end-to-end CRC32C based error detection or equivalent be provided in a layer below iSER

8/2/2004M. Ko9 Order of RDMAExtensions Key Negotiation and Allocate_Connection Resources From section (and similarly for section 5.1.2): “If the outcome of the iSCSI negotiation is to enable iSER-assisted mode, then on the initiator side,... the iSCSI Layer MUST invoke the Allocate_Connection_Resources Operational Primitive” Question: The alternative approach of invoking Allocate_Connection_Resources before negotiating for iSER- assisted mode should be allowed –Current approach results in the connection being torn down if the required resources cannot be allocated –Alternative approach avoids this problem Resources must be deallocated if login fails Resources may have to be deallocated if the negotiated values are less than the allocated value Proposed change: Update the draft to allow the alternative approach with the proviso that it is the responsibility of the implementation to deallocate the resources if the login fails or if the negotiation values are less than the allocated value

8/2/2004M. Ko10 Clarification on the Usage of the Notice_Key_Values Primitive From section 5.1.1: “Optionally, the iSCSI Layer MAY invoke the Notice_Key_Values Operational Primitive before invoking the Allocate_Connection_Resources Operational Primitive” Question: The word “optionally” is ambiguous –Could mean the iSCSI layer may choose to invoke the primitive –Or the iSCSI layer may choose to use that primitive, or some other defined or undefined primitive Proposed change: Remove the word “optionally”

8/2/2004M. Ko11 Requiring the Use of the Notice_Key_Values Primitive From section 5.1.1: “The iSCSI Layer MAY invoke the Notice_Key_Values Operational Primitive” “to request the iSER Layer to take note of the negotiated values of the iSCSI keys for the Connection” Question: The word “MAY” should be replaced with “MUST” to enforce the invocation of the primitive Proposed change: None –If the default values are accepted for all the negotiated keys, then there is no new information to be passed from the iSCSI layer to the iSER layer –Requiring a "MUST" instead of a "MAY“ would require this primitive be invoked even though it is not necessary –Also, it is not architecturally required for the iSCSI layer to issue the Notice_Key_Values primitive

8/2/2004M. Ko12 HeaderDigest, DataDigest, OFMarker, & IFMarker in iSER-assisted Mode From section 6.1 and 6.6: These 4 keys must be negotiated to “none” or “no” if the RDMAExtensions key is negotiated to “yes” Question: Draft seems to imply that these 4 keys must be negotiated even for the defaults Suggestion: Negotiations resulting in RDMAExtensions=Yes for a session implies HeaderDigest=None, DataDigest=None, OFMarker=No, and IFMarker=No on all connections in that session –Override both the default and explicit settings Proposed change: Update the draft to reflect the suggested change

8/2/2004M. Ko13 Scope of RDMAExtensions Key From section 6.3: RDMAExtensions key has session- wide scope Question: Should iSER support mixed mode sessions –Argument for: Open an iSCSI connection when there are insufficient resources to support an iSER-assisted connection in allegiance reassignment and the session is in iSER-assisted mode Flexibility on general principles –Argument against: RFC 3720 assumes homogeneous connections in a session –Introducing mixed mode sessions would require that the RFC3720 semantics be carefully thought through to ensure correctness The task states maintained by an iSCSI connection may be different from those for an iSER-assisted connection iSER-assisted connection may require different LO key values for optimization compared with iSCSI connection Test and debug effort will increase 2x to 3x for mixed mode support Proposed change: None

8/2/2004M. Ko14 Clarification on the Order of RDMAExtensions Key Negotiation From section 6.3: “If the RDMAExtensions key is to be negotiated, it must be offered only on the initial Login Request PDU or Login Response PDU of the leading connection, and if offered, the response must be sent in the immediately following Login Response or Login Request PDU respectively.” Question: Clarify when the negotiation response is to be returned if the key is offered in a PDU where the C-bit is set Question: Clarify that the negotiation takes place in the LoginOperationalNegotiation stage of the leading connection Question: Section of RFC3720 states that a response is optional if the Boolean function is "AND" and the value "No" is received –iSER draft always requires a response to be returned –However, since the default for RDMAExtensions is “no”, it is unlikely that the key-value pair of RDMAExtensions=no will be offered

8/2/2004M. Ko15 Clarification on the Order of RDMAExtensions Key Negotiation (cont.) Proposed change: Replace sentence with “However, if the RDMAExtensions key is to be negotiated, an initiator MUST offer the key on the first Login Request PDU in the LoginOperationalNegotiation stage of the leading connection, and a target MUST offer the key on the first Login Response PDU with which it is allowed to do so (i.e., the first Login Response issued after the first Login Request with the C bit set to 0) in the LoginOperationalNegotiation stage of the leading connection. In response to the offered key=value pair of RDMAExtensions=yes, an initiator MUST respond on the next Login Request PDU with which it is allowed to do so, and a target MUST respond on the next Login Response PDU with which it is allowed to do so.”

8/2/2004M. Ko16 Order of RDMAExtensions Key Negotiation Response From section 6.3: RDMAExtensions key must be offered for negotiation in the first PDU that a node is allowed to do so and the response must be returned in the immediately following PDU in which a node is allowed to respond Question: Why must the RDMAExtensions key be negotiated first? Negotiating the RDMAExtensions key first allows a node to optimally negotiate the value of other keys –Certain iSCSI keys such as MaxBurstLength, MaxOutstandingR2T, ErrorRecoveryLevel, InitialR2T, ImmediateData, etc., may have different optimization points depending on whether iSER-assisted mode is to be enabled in the iSCSI session Proposed change: Update the draft to include the rationale for the order requirement

8/2/2004M. Ko17 Key Ordering Within a PDU From section 6.3: “The [RDMAExtensions] key must precede any other login keys which may be affected by the outcome of the negotiation of the RDMAExtensions key” Question: This can be interpreted as requiring key ordering within a PDU which is contrary to RFC3720 Proposed change: Remove the sentence from the draft

8/2/2004M. Ko18 iSER Flow Control For RDMA Send Type Messages –The iSER protocol does not provide additional flow control beyond that provided by the iSCSI layer on control-type PDUs –An implementation should be able to take advantage of iWARP Verbs mechanisms such as the Shared Receive Queue mechanism to effectively address the Send Message flow control question For RDMA Read Resources –In the iSER Hello Message, the iSER layer at the initiator declares the maximum number of RDMA Read Requests that the initiator can receive on the particular RDMAP Stream (iSER-IRD) to the target This allows the iSER layer at the target to adjust its resources if it can issue more RDMA Read Requests than the initiator can handle –In the iSER HelloReply Message, the iSER layer at the target declares the maximum number of RDMA Read Requests that the target can issue on a particular RDMAP Stream (iSER-ORD) to the initiator This allows the iSER layer at the initiator to adjust its resources if it can handle more RDMA Read Requests than the target can issue –The iSER layer at the target will flow control the RDMA Read Request Messages to not exceed iSER-ORD

8/2/2004M. Ko19 Flow Control for Control-Type PDU From section 8.1: “The iSER Layer SHOULD provision enough Untagged buffers for handling incoming RDMAP Send Message Types to prevent a buffer underrun condition” Question: Should some form of send side flow control be established for iSCSI control-type PDUs? Latest DDP draft, draft-ietf-rddp-ddp-02, no longer mandates that a DDP stream be disabled for a buffer underrun condition Proposed change: Further discussion is needed