Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen

Slides:



Advertisements
Similar presentations
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Advertisements

A Hybrid MPI Design using SCTP and iWARP Distributed Systems Group Mike Tsai, Brad Penoff, and Alan Wagner Department of Computer Science University of.
Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng ¥ P. Balaji α C. Baron £ L. N. Bhuyan £ D. K. Panda α ¥ Advanced Computing Lab, Los Alamos.
Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC H.-W. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji,
1 May 2011 RDMA Capable iWARP over Datagrams Ryan E. Grant 1, Mohammad J. Rashti 1, Pavan Balaji 2, Ahmad Afsahi 1 1 Department of Electrical and Computer.
IWARP Update #OFADevWorkshop.
Analyzing the Impact of Supporting Out-of-order Communication on In-order Performance with iWARP P. Balaji, W. Feng, S. Bhagvat, D. K. Panda, R. Thakur.
August 02, 2004Mallikarjun Chadalapaka, HP1 iSCSI/RDMA: Overview of DA and iSER Mallikarjun Chadalapaka HP.
Chapter 7 Protocol Software On A Conventional Processor.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/27/20091CSE 124 Networked Services Fall 2009 Some.
RDMA ENABLED WEB SERVER Rajat Sharma. Objective  To implement a Web Server serving HTTP client requests through RDMA replacing the traditional TCP/IP.
Research Agenda on Efficient and Robust Datapath Yingping Lu.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
High-Performance Object Access in OSD Storage Subsystem Yingping Lu.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
 The Open Systems Interconnection model (OSI model) is a product of the Open Systems Interconnection effort at the International Organization for Standardization.
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
Supporting iWARP Compatibility and Features for Regular Network Adapters P. BalajiH. –W. JinK. VaidyanathanD. K. Panda Network Based Computing Laboratory.
Implementing Convergent Networking: Partner Concepts
I/O Acceleration in Server Architectures
IWARP Redefined: Scalable Connectionless Communication Over High-Speed Ethernet M. J. Rashti, R. E. Grant, P. Balaji and A. Afsahi.
Mapping of scalable RDMA protocols to ASIC/FPGA platforms
1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
Lecture 3 Review of Internet Protocols Transport Layer.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
The NE010 iWARP Adapter Gary Montry Senior Scientist
2006 Sonoma Workshop February 2006Page 1 Sockets Direct Protocol (SDP) for Windows - Motivation and Plans Gilad Shainer Mellanox Technologies Inc.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Srihari Makineni & Ravi Iyer Communications Technology Lab
User-mode I/O in Oracle 10g with ODM and DAFS Jeff Silberman Systems Architect Network Appliance Session id: Margaret Susairaj Server Technologies.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
IP Communication Fabric Mike Polston HP
First, by sending smaller individual pieces from source to destination, many different conversations can be interleaved on the network. The process.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
TCP Offload Through Connection Handoff Hyong-youb Kim and Scott Rixner Rice University April 20, 2006.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Ethernet. Ethernet standards milestones 1973: Ethernet Invented 1983: 10Mbps Ethernet 1985: 10Mbps Repeater 1990: 10BASE-T 1995: 100Mbps Ethernet 1998:
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
Sandeep Singhal, Ph.D Director Windows Core Networking Microsoft Corporation.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
F. HemmerUltraNet® Experiences SHIFT Model CPU Server CPU Server CPU Server CPU Server CPU Server CPU Server Disk Server Disk Server Tape Server Tape Server.
Technical Overview of Microsoft’s NetDMA Architecture Rade Trimceski Program Manager Windows Networking & Devices Microsoft Corporation.
E Virtual Machines Lecture 5 Network Virtualization Scott Devine VMware, Inc.
Progress in Standardization of RDMA technology Arkady Kanevsky, Ph.D Chair of DAT Collaborative.
Tgt: Framework Target Drivers FUJITA Tomonori NTT Cyber Solutions Laboratories Mike Christie Red Hat, Inc Ottawa Linux.
Chapter 3: Network Protocols and Communications
Chapter 13: I/O Systems.
Balazs Voneki CERN/EP/LHCb Online group
Infiniband Architecture
LWIP TCP/IP Stack 김백규.
Distributed Systems.
CS 286 Computer Organization and Architecture
Final Review CS144 Review Session 9 June 4, 2008 Derrick Isaacson
Internetworking: Hardware/Software Interface
Storage Networking Protocols
Xen Network I/O Performance Analysis and Opportunities for Improvement
RDMA over Commodity Ethernet at Scale
Application taxonomy & characterization
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
EEC4113 Data Communication & Multimedia System Chapter 1: Introduction by Muhazam Mustapha, July 2010.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
ECE 671 – Lecture 8 Network Adapters.
Presentation transcript:

Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen iWARP Protocol Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen

Outline Motivation Previous Solutions iWARP Conclusions Future Issues Intorduction Protocol stack Conclusions Future Issues References

Motivation 10 Gb/s network is no longer the bottleneck, and it is possible CPU spends more time on network processing than on real computations, in effect starving the application. 10Gbps要普及 PC及server能力也必須提升 甚至是Multi-client server

Previous Solutions TOE – TCP Offload Engine Offload protocol processing to network device Not a full solution Lacks zero-copy thus high CPU usage Caused by programming interface – sockets API TOE OS bypass

Previous Solutions (contd.) RDMA protocol Remote direct memory access, along with OS bypass, supporting zero-copy, is an integral solution. InfiniBand High throughput, but only used in LAN. Protocol offload RDMA 還有 iWARP 一定得TOE 否則比TOE更慢 stack變多 RDMA背後的概念,是透過網路把資料直接傳入某台電腦的一塊記憶區,不需用到多少電腦的處理效能。對照下,目前的作法則需煩勞系統先對傳入的資料進行多重的分析,然後再儲存到正確的區域。由此看來,RDMA比現行的方法快速。 透過在遠端的直接記憶體存取 (RDMA) 機制,SAN 提供大量資料傳輸。 初始器會指定一個緩衝區,在本機系統 」 和 「 遠端系統上的緩衝區。 資料然後是由網路介面卡沒有主機任一端的 CPU 參與兩個位置之間的直接傳輸。

RDMA and OS Bypass Remote Direct Memory Access Allows network adapter to move data directly from one machine to another without involving either host processor User application interacts directly with the NIC OS bypass avoids overheads from system calls context switches hardware interrupts Operating system is not involved in the critical path of packet processing.

iWARP Internet Wide Area RDMA Protocol RDMA over TCP/IP compatible with the existing Internet infrastructure Uses RDMA and OS bypass to move data without the CPU or OS being involved, greatly increasing performance. Protocol offload – RDMA-enabled Network Interface Card (RNIC) 相當於 將 RDMAP DDP MPA TCP/IP offload

Networking Performance Barriers 4/28/2017 8:02 PM Networking Performance Barriers Packet Processing Intermediate Buffer Copies Command Context Switches user application adapter buffer app buffer OS buffer driver buffer I/O cmd % CPU Overhead I/O library 100% application to OS context switches 40% kernel device driver OS context switch server software Intermediate buffer copies 20% TCP/IP transport processing 40% software hardware I/O cmd I/O Adapter standard Ethernet TCP/IP packet © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Eliminate Networking Performance Barriers With iWARP 4/28/2017 8:02 PM Eliminate Networking Performance Barriers With iWARP Packet Processing Intermediate Buffer Copies Command Context Switches user application app buffer OS buffer driver buffer app buffer I/O cmd % CPU Overhead I/O cmd I/O library 100% application to OS context switches 40% kernel device driver OS device driver OS context switch I/O cmd I/O cmd server software 60% application to OS context switches 40% Intermediate buffer copies 20% TCP/IP 40% application to OS context switches 40% transport processing 40% Intermediate buffer copies 20% software Transport (TCP) offload hardware I/O cmd RDMA / DDP I/O Adapter TCP/IP adapter buffer standard Ethernet TCP/IP packet User-Level Direct Access/ OS Bypass standard Ethernet TCP/IP packet © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

iWARP Protocol Stack RDMAP – RDMA protocol DDP – Direct Data Placement protocol MPA – Marker PDU Aligned Framing Layer

iWARP Protocol Stack (contd.) Verbs layer is the user-level interface to the RDMA-enabled NIC. RDMAP layer is responsible for RDMA operations, joint buffer management with DDP. DDP layer is used for direct zero-copy data placement, as well as segmentation and reassembly. MPA layer assigns boundaries to DDP messages

Verbs APIs DAT Collaborative Direct Access Transport http://www.datcollaborative.org/ uDAPL (direct access programming language) kDAPL OpenFabrics Alliance -- http://www.openfabrics.org/

RDMAP RDMA Write -- Transfers data from a local buffer to a remote buffer RDMA Read -- Retrieves data from a remote buffer and places it into a local buffer. Terminate -- Transfers information associated with a error.

DDP Two models: tagged buffer models and untagged buffer models Tagged buffers are typically used for large data transfers, such as large data structures and disk I/O. Needs to exchange steering tag (STag) tagged offset length. Untagged buffer model: Untagged buffers are typically used for small control messages, such as I/O status messages. tagged offset identifies the base address of the buffer

4/28/2017 8:02 PM Direct Data Placement RNIC uses iWARP headers embedded in the packets to directly place data payloads in pre-allocated app buffers Eliminates software (kernel) latency loop Application buffers in Host memory p d q iWARP Receive Queue Preposted buffers for Control Messages Ctrl Msg #1 Ctrl Msg #2 Ctrl Msg #3 Ctrl Msg #4 Data Payload q Data Payload p Data Payload q+1 Data Payload d © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

MPA DDP is message-oriented, but TCP is byte- oriented. Provides a deterministic method for out of order arrival.

Conclusions A key player in future as technology matures Adapt to existing infrastructure Bridge between throughput performance and 10Gb/s Ethernet Lower CPU utilization on network processing Significant improvement in latency by OS bypass

Future Issues Security – open memory on the network Initial cost RDMA-accelerated ULPs (upper level protocol) are not compatible with unaccelerated variants Communication between NIC and RNIC Hardware vendors must all agree to succeed in the market

References IETF RFCs -- http://www.ietf.org/ Tools.ietf -- http://tools.ietf.org/ Dennis Dalessandro's Publications -- http://www.osc.edu/~dennis/publications.html RDMA Consortium -- http://www.rdmaconsortium.org DAT Collaborative -- http://www.datcollaborative.org/ OpenFabrics -- http://www.openfabrics.org/ NetEffect -- http://www.neteffect.com/documents/ HP-- http://h21007.www2.hp.com/portal/site/dspp/men uitem.863c3e4cbcdc3f3515b49c108973a801?ciid=2 108a31f05f02110a31f05f02110275d6e10RCRD