VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.

Slides:



Advertisements
Similar presentations
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 U-Net: A User-Level Network Interface for Parallel and Distributed Computing T. von Eicken, A. Basu,
CS 443 Advanced OS Fabián E. Bustamante, Spring 2005 Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach.
Presented By Srinivas Sundaravaradan. MACH µ-Kernel system based on message passing Over 5000 cycles to transfer a short message Buffering IPC L3 Similar.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
Linux Networking Overview COMS W Spring 2010.
Haoyuan Li CS 6410 Fall /15/2009.  U-Net: A User-Level Network Interface for Parallel and Distributed Computing ◦ Thorsten von Eicken, Anindya.
Federated DAFS: Scalable Cluster-based Direct Access File Servers Murali Rangarajan, Suresh Gopalakrishnan Ashok Arumugam, Rabita Sarker Rutgers University.
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Research Agenda on Efficient and Robust Datapath Yingping Lu.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
Realizing the Performance Potential of the Virtual Interface Architecture Evan Speight, Hazim Abdel-Shafi, and John K. Bennett Rice University, Dep. Of.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
High-Performance Object Access in OSD Storage Subsystem Yingping Lu.
COM S 614 Advanced Systems Novel Communications U-Net and Active Messages.
Ethan Kao CS 6410 Oct. 18 th  Active Messages: A Mechanism for Integrated Communication and Control, Thorsten von Eicken, David E. Culler, Seth.
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
Supporting iWARP Compatibility and Features for Regular Network Adapters P. BalajiH. –W. JinK. VaidyanathanD. K. Panda Network Based Computing Laboratory.
ATM and Fast Ethernet Network Interfaces for User-level Communication Presented by Sagwon Seo 2000/4/13 Matt Welsh, Anindya Basu, and Thorsten von Eicken.
Chapter 17 Networking Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William Stallings.
3/11/2002CSE Input/Output Input/Output Control Datapath Memory Processor Input Output Memory Input Output Network Control Datapath Processor.
Protocols for Wide-Area Data-intensive Applications: Design and Performance Issues Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi, Brian.
1/29/2002 CS Distributed Systems 1 Infiniband Architecture Aniruddha Bohra.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
1 Lecture 20: I/O n I/O hardware n I/O structure n communication with controllers n device interrupts n device drivers n streams.
Lecture 3 Review of Internet Protocols Transport Layer.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
The NE010 iWARP Adapter Gary Montry Senior Scientist
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Penn State CSE “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath.
2006 Sonoma Workshop February 2006Page 1 Sockets Direct Protocol (SDP) for Windows - Motivation and Plans Gilad Shainer Mellanox Technologies Inc.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
I/O management is a major component of operating system design and operation Important aspect of computer operation I/O devices vary greatly Various methods.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Srihari Makineni & Ravi Iyer Communications Technology Lab
Minimizing Communication Latency to Maximize Network Communication Throughput over InfiniBand Design and Implementation of MPICH-2 over InfiniBand with.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
CS533 - Concepts of Operating Systems 1 The Mach System Presented by Catherine Vilhauer.
Middleware Services. Functions of Middleware Encapsulation Protection Concurrent processing Communication Scheduling.
ND The research group on Networks & Distributed systems.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
LRPC Firefly RPC, Lightweight RPC, Winsock Direct and VIA.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
The Mach System Silberschatz et al Presented By Anjana Venkat.
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
Chapter 6 Storage and Other I/O Topics. Chapter 6 — Storage and Other I/O Topics — 2 Introduction I/O devices can be characterized by Behaviour: input,
Internet Protocol Storage Area Networks (IP SAN)
HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS Moontae Lee (Nov 20, 2014)Part 1 CS6410.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Technical Overview of Microsoft’s NetDMA Architecture Rade Trimceski Program Manager Windows Networking & Devices Microsoft Corporation.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
© 2007 EMC Corporation. All rights reserved. Internet Protocol Storage Area Networks (IP SAN) Module 3.4.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Interactions with Microarchitectures and I/O Copyright 2004 Daniel.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Infiniband Architecture
CS 286 Computer Organization and Architecture
Final Review CS144 Review Session 9 June 4, 2008 Derrick Isaacson
Internetworking: Hardware/Software Interface
I/O Systems I/O Hardware Application I/O Interface
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 13: I/O Systems.
ECE 671 – Lecture 8 Network Adapters.
Presentation transcript:

VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna

Outline Motivation VIA Overview QP/IP Architecture QP/IP Performance Summary

Motivation High performance computing, clustering applications require high-throughput, low-latency communications facility Traditional TCP/IP is not designed for high- throughput, low-latency communications Application software has not kept pace with the increase of I/O speed Memory copy Checksum Computation Interrupt Context Switching

Typical Communication Data Path

Bandwidth Comparison

VIA Solution VIA is a industry standard convened by Microsoft, Compaq, Intel. Key features of VIA: Reduce memory copy (Zero-copy) Direct user level access to NIC hardware Eliminate OS kernel from critical path Collapse ISO/OSI model Offload CPU processing to intelligent NIC

VIA Architecture

VIA Components Consumer The end entity to use VIA function to communicate, can be user-level or kernel Use VIPL for programming VI User Agent Implements OS bypassing agent Kernel Agent Device driver, handle security and OS-related issues VIA-capable NIC (Channel Adapter) Implements VIA communications

Programming Abstraction Queue Pairs Components  Send queue  Receive queue  Completion queue (status) Data Movement Operations  Send/Receive  RDMA Read  RDMA Write

Virtual Interface (Queue Pair)

Memory Access Memory Registration Memory must registered before use System pins out the memory region Nic use DMA to transfer data from memory to Nic Memory Protection Registered memory are associated with a VI consumer and only valid to the VI consumer Gather/Scatter list Gather list: a list of registered source data buffers (read) Scatter List: a list of registered destination data buffers (write)

Memory Model Page 0 Page 1 Page n-1 Virtual Memory Space Registered Memory Region Physical Memory

Descriptor A work queue element to be placed into queue pair (send or receive queue) Contains control segment and a list of address segment Specifies operation command, memory address, size

Door Bell An asynchronous mechanism to notify VI NIC of a new work queue post Door Bell can be a register in NIC accessed by both CPU and NIC VIPL VI NIC Descriptor 01

Operation Example – Send/Receive Sender: Consumer:  Register send buffer  Post a Send work queue element Channel Adapter:  Send out the data and header, data are retrieved directly from consumer memory Receiver Consumer:  Register receive buffer  Post a receive buffer in the receive queue Channel Adapter:  Receive packets from sender  Find out a receive queue element in the receive queue  Move data directly to the buffer specified in the receive queue element

Operation Example - RDMA Write Initiator Consumer:  Register sending buffer address  Get receiver’s address  Post a RDMA Write Channel Adapter  Send out data with header(the operation, receiving address), data are retrieved directly from sender buffer Receiver Consumer  Register receiving buffer address  Send the address, R-key and length to initiator Channel Adapter  Receive data  Check the validity of address in RDMA header  Move data directly to the memory specified in the RDMA header

Summary of VIA Goal: low-latency, high-throughput by offering direct access to NIC, Zero copy Architecture components: consumer (VIPL), UA, KA, VI-NIC Main concepts: queue pairs, memory pin, gather/scatter, descriptor, door bell Operations: Send/Receive, RDMA Read, RDMA Write

Why QP/IP TCP/IP network is robust, ubiquitous However, TCP/IP is not designed for high- performance, low-latency purpose Queue Pair abstraction provides a way to offload CPU processing, reduce the critical data path, provide memory zero copy The Integration of QP and IP may be able to reduce the latency, improve the throughput between end-end node applications connected through TCP/IP network

Challenges to QP/IP Provide a VIPL supporting QP/IP Integration of connection setup Handle message segmentation Implement TCP/IP mechanism at NIC Handle message boundary for TCP Handle zero-copy in the event of packet loss

QP/IP Architecture

QPIP Components FSM: Doorbell FSM Sched/XMT FSM RECV FSM Mgmt FSM Major Data Abstract QPs CQs TCP Control Block (TCB)

QP/IP State Machines

QPIP Prototype Three components Application Library  PostSend(), PostRecv(), Poll(), Wait() Kernel driver  Initialization  Address mapping mechanism  Interrupt service Network interface firmware  Implement TCP, UDP, IPV6 protocols

Application-Application RTT

Application Throughput & CPU Utilization

Network Interface Processing Cost

QPIP Based on NBD

NDB Client Throughput and CPU Effectiveness

Summary Integrate the QP concept from VIA with the ubiquitous TCP/IP network Provide low-latency, high throughput for SAN QP/IP contains doorbell FSM, Sched/XMT FSM, RECV FSM, Mgmt FSM. It also contains QPs, CQs, TCB data structure. Demonstrate comparable performance, much lower CPU utilization with modest hardware. The programmability also adds flexibility to adapt with the evolvement of TCP/IP and scheduling requirements.

Issues How to integrate TOE in the mechanism? How to effectively handle message boundary in TCP to support upper level application, I.e. iSCSI? How to handle segmentation? How to support zero-copy in the case of packet loss? How to extend this into a WAN environment (more unpredictability, fluctuation of latency, available bandwidth, congestion, LFN)? How to effectively support OSD communication?

Questions?