Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen

Similar presentations


Presentation on theme: "Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen"— Presentation transcript:

1 Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
iWARP Protocol Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen

2 Outline Motivation Previous Solutions iWARP Conclusions Future Issues
Intorduction Protocol stack Conclusions Future Issues References

3 Motivation 10 Gb/s network is no longer the bottleneck, and it is possible CPU spends more time on network processing than on real computations, in effect starving the application. 10Gbps要普及 PC及server能力也必須提升 甚至是Multi-client server

4 Previous Solutions TOE – TCP Offload Engine
Offload protocol processing to network device Not a full solution Lacks zero-copy thus high CPU usage Caused by programming interface – sockets API TOE OS bypass

5 Previous Solutions (contd.)
RDMA protocol Remote direct memory access, along with OS bypass, supporting zero-copy, is an integral solution. InfiniBand High throughput, but only used in LAN. Protocol offload RDMA 還有 iWARP 一定得TOE 否則比TOE更慢 stack變多 RDMA背後的概念,是透過網路把資料直接傳入某台電腦的一塊記憶區,不需用到多少電腦的處理效能。對照下,目前的作法則需煩勞系統先對傳入的資料進行多重的分析,然後再儲存到正確的區域。由此看來,RDMA比現行的方法快速。 透過在遠端的直接記憶體存取 (RDMA) 機制,SAN 提供大量資料傳輸。 初始器會指定一個緩衝區,在本機系統 」 和 「 遠端系統上的緩衝區。 資料然後是由網路介面卡沒有主機任一端的 CPU 參與兩個位置之間的直接傳輸。

6 RDMA and OS Bypass Remote Direct Memory Access
Allows network adapter to move data directly from one machine to another without involving either host processor User application interacts directly with the NIC OS bypass avoids overheads from system calls context switches hardware interrupts Operating system is not involved in the critical path of packet processing.

7 iWARP Internet Wide Area RDMA Protocol RDMA over TCP/IP
compatible with the existing Internet infrastructure Uses RDMA and OS bypass to move data without the CPU or OS being involved, greatly increasing performance. Protocol offload – RDMA-enabled Network Interface Card (RNIC) 相當於 將 RDMAP DDP MPA TCP/IP offload

8 Networking Performance Barriers
4/28/2017 8:02 PM Networking Performance Barriers Packet Processing Intermediate Buffer Copies Command Context Switches user application adapter buffer app buffer OS buffer driver buffer I/O cmd % CPU Overhead I/O library 100% application to OS context switches 40% kernel device driver OS context switch server software Intermediate buffer copies 20% TCP/IP transport processing 40% software hardware I/O cmd I/O Adapter standard Ethernet TCP/IP packet © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

9 Eliminate Networking Performance Barriers With iWARP
4/28/2017 8:02 PM Eliminate Networking Performance Barriers With iWARP Packet Processing Intermediate Buffer Copies Command Context Switches user application app buffer OS buffer driver buffer app buffer I/O cmd % CPU Overhead I/O cmd I/O library 100% application to OS context switches 40% kernel device driver OS device driver OS context switch I/O cmd I/O cmd server software 60% application to OS context switches 40% Intermediate buffer copies 20% TCP/IP 40% application to OS context switches 40% transport processing 40% Intermediate buffer copies 20% software Transport (TCP) offload hardware I/O cmd RDMA / DDP I/O Adapter TCP/IP adapter buffer standard Ethernet TCP/IP packet User-Level Direct Access/ OS Bypass standard Ethernet TCP/IP packet © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

10 iWARP Protocol Stack RDMAP – RDMA protocol
DDP – Direct Data Placement protocol MPA – Marker PDU Aligned Framing Layer

11 iWARP Protocol Stack (contd.)
Verbs layer is the user-level interface to the RDMA-enabled NIC. RDMAP layer is responsible for RDMA operations, joint buffer management with DDP. DDP layer is used for direct zero-copy data placement, as well as segmentation and reassembly. MPA layer assigns boundaries to DDP messages

12 Verbs APIs DAT Collaborative Direct Access Transport
uDAPL (direct access programming language) kDAPL OpenFabrics Alliance

13 RDMAP RDMA Write -- Transfers data from a local buffer to a remote buffer RDMA Read -- Retrieves data from a remote buffer and places it into a local buffer. Terminate -- Transfers information associated with a error.

14 DDP Two models: tagged buffer models and untagged buffer models
Tagged buffers are typically used for large data transfers, such as large data structures and disk I/O. Needs to exchange steering tag (STag) tagged offset length. Untagged buffer model: Untagged buffers are typically used for small control messages, such as I/O status messages. tagged offset identifies the base address of the buffer

15 4/28/2017 8:02 PM Direct Data Placement RNIC uses iWARP headers embedded in the packets to directly place data payloads in pre-allocated app buffers Eliminates software (kernel) latency loop Application buffers in Host memory p d q iWARP Receive Queue Preposted buffers for Control Messages Ctrl Msg #1 Ctrl Msg #2 Ctrl Msg #3 Ctrl Msg #4 Data Payload q Data Payload p Data Payload q+1 Data Payload d © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

16 MPA DDP is message-oriented, but TCP is byte- oriented.
Provides a deterministic method for out of order arrival.

17 Conclusions A key player in future as technology matures
Adapt to existing infrastructure Bridge between throughput performance and 10Gb/s Ethernet Lower CPU utilization on network processing Significant improvement in latency by OS bypass

18 Future Issues Security – open memory on the network Initial cost
RDMA-accelerated ULPs (upper level protocol) are not compatible with unaccelerated variants Communication between NIC and RNIC Hardware vendors must all agree to succeed in the market

19 References IETF RFCs -- http://www.ietf.org/
Tools.ietf -- Dennis Dalessandro's Publications -- RDMA Consortium DAT Collaborative -- OpenFabrics -- NetEffect -- HP-- uitem.863c3e4cbcdc3f3515b49c108973a801?ciid=2 108a31f05f02110a31f05f d6e10RCRD


Download ppt "Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen"

Similar presentations


Ads by Google