Revisiting Network Interface Cards as First-Class Citizens Wu-chun Feng (Virginia Tech) Pavan Balaji (Argonne National Lab) Ajeet Singh (Virginia Tech)

Slides:



Advertisements
Similar presentations
AMD OPTERON ARCHITECTURE Omar Aragon Abdel Salam Sayyad This presentation is missing the references used.
Advertisements

CS-334: Computer Architecture
Chapter 4 Conventional Computer Hardware Architecture
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
Interfacing Processors and Peripherals Andreas Klappenecker CPSC321 Computer Architecture.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
Processor Design 5Z0321 Processor Design 5Z032 Chapter 8 Interfacing Processors and Peripherals Henk Corporaal.
I/O Channels I/O devices getting more sophisticated e.g. 3D graphics cards CPU instructs I/O controller to do transfer I/O controller does entire transfer.
1 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance: — access latency — throughput — connection.
Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented by Reinette Grobler.
1  1998 Morgan Kaufmann Publishers Chapter Seven.
1 Lecture 21: Virtual Memory, I/O Basics Today’s topics:  Virtual memory  I/O overview Reminder:  Assignment 8 due Tue 11/21.
ECE 526 – Network Processing Systems Design
1  1998 Morgan Kaufmann Publishers Chapter Seven Large and Fast: Exploiting Memory Hierarchy (Part II)
1 Today I/O Systems Storage. 2 I/O Devices Many different kinds of I/O devices Software that controls them: device drivers.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
I/O Tanenbaum, ch. 5 p. 329 – 427 Silberschatz, ch. 13 p
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Mahesh Wagh Intel Corporation Member, PCIe Protocol Workgroup.
Input/Output. Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower.
Performance Tradeoffs for Static Allocation of Zero-Copy Buffers Pål Halvorsen, Espen Jorde, Karl-André Skevik, Vera Goebel, and Thomas Plagemann Institute.
Chapter 10: Input / Output Devices Dr Mohamed Menacer Taibah University
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
NETW 3005 I/O Systems. Reading For this lecture, you should have read Chapter 13 (Sections 1-4, 7). NETW3005 (Operating Systems) Lecture 10 - I/O Systems2.
Hardware Definitions –Port: Point of connection –Bus: Interface Daisy Chain (A=>B=>…=>X) Shared Direct Device Access –Controller: Device Electronics –Registers:
1 Lecture 20: I/O n I/O hardware n I/O structure n communication with controllers n device interrupts n device drivers n streams.
Architecture Examples And Hierarchy Samuel Njoroge.
The NE010 iWARP Adapter Gary Montry Senior Scientist
2007 Oct 18SYSC2001* - Dept. Systems and Computer Engineering, Carleton University Fall SYSC2001-Ch7.ppt 1 Chapter 7 Input/Output 7.1 External Devices.
I/O Example: Disk Drives To access data: — seek: position head over the proper track (8 to 20 ms. avg.) — rotational latency: wait for desired sector (.5.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Make Hosts Ready for Gigabit Networks. Hardware Requirement To allow a host to fully utilize Gbps bandwidth, its hardware system must be ready for Gbps.
2009 Sep 10SYSC Dept. Systems and Computer Engineering, Carleton University F09. SYSC2001-Ch7.ppt 1 Chapter 7 Input/Output 7.1 External Devices 7.2.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Minimizing Communication Latency to Maximize Network Communication Throughput over InfiniBand Design and Implementation of MPICH-2 over InfiniBand with.
MBG 1 CIS501, Fall 99 Lecture 18: Input/Output (I/O): Buses and Peripherals Michael B. Greenwald Computer Architecture CIS 501 Fall 1999.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
ECE 526 – Network Processing Systems Design Computer Architecture: traditional network processing systems implementation Chapter 4: D. E. Comer.
ND The research group on Networks & Distributed systems.
Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.
Lecture 25 PC System Architecture PCIe Interconnect
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
EECB 473 Data Network Architecture and Electronics Lecture 1 Conventional Computer Hardware Architecture
Silberschatz, Galvin and Gagne ©2009 Edited by Khoury, 2015 Operating System Concepts – 9 th Edition, Chapter 13: I/O Systems.
PART 7 CPU Externals CHAPTER 7: INPUT/OUTPUT 1. Input/Output Problems Wide variety of peripherals – Delivering different amounts of data – At different.
Advanced Operating Systems - Spring 2009 Lecture 18 – March 25, 2009 Dan C. Marinescu Office: HEC 439 B. Office hours:
Lecture on Central Process Unit (CPU)
Input/Output Problems Wide variety of peripherals —Delivering different amounts of data —At different speeds —In different formats All slower than CPU.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
Part IVI/O Systems Chapter 13: I/O Systems. I/O Hardware a typical PCI bus structure 2.
1  2004 Morgan Kaufmann Publishers Page Tables. 2  2004 Morgan Kaufmann Publishers Page Tables.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Interactions with Microarchitectures and I/O Copyright 2004 Daniel.
Lecture 2. A Computer System for Labs
Chapter 13: I/O Systems Modified by Dr. Neerja Mhaskar for CS 3SH3.
CS 286 Computer Organization and Architecture
Virtual Memory Main memory can act as a cache for the secondary storage (disk) Advantages: illusion of having more physical memory program relocation protection.
Peng Liu Lecture 14 I/O Peng Liu
Presentation transcript:

Revisiting Network Interface Cards as First-Class Citizens Wu-chun Feng (Virginia Tech) Pavan Balaji (Argonne National Lab) Ajeet Singh (Virginia Tech) 2/15/2009

Network Interface Cards as First-Class Citizens Forecast Application-network interface is the bottleneck in the network communication today. Survey of existing system architectures. –Evaluate their merits and demerits. New system architecture proposed –Network Interface card (NIC) treated as First-Class Citizens at par with other CPUs. –New architecture potentially solves application-network interfacing problem. 2/15/2009

Network Interface Cards as First-Class Citizens Outline Motivation Trends Evolution of System Architectures Current Status NIC as First-Class Citizens Problems and Solutions Drawbacks Conclusion 2/15/2009

Network Interface Cards as First-Class Citizens Motivation Problem: Application-to-network interface is still bottleneck Network and I/O Host TCP/IP NIC 2/15/2009 RDMA reduces the overhead by 40-90%

Network Interface Cards as First-Class Citizens Trends : Ethernet Wire Time Vs Processing Time Image Source: Intel Corporation 2/15/2009

Network Interface Cards as First-Class Citizens Trends CPU –Multi-core systems Memory Access Time (DRAM) –7% - 9% decreases every year (5-70 ns now) Memory Capacity –Increasing four-fold every 3 years. Network Link Bandwidth –Already hundreds of Gbps and improving! Application-to-Network Latency? 2/15/2009

Network Interface Cards as First-Class Citizens Outline Motivation Trends Evolution of System Architectures Current Status NIC as First-Class Citizens Problems and Solutions Drawbacks Conclusion 2/15/2009

Network Interface Cards as First-Class Citizens AMD NUMA Architecture 2/15/2009

Network Interface Cards as First-Class Citizens Approach 1: NIC on I/O Bus CPU Disk Northbridge (memory controller hub) Northbridge (memory controller hub) Southbridge (I/O controller hub) Southbridge (I/O controller hub) CPU Northbridge (memory controller hub) Northbridge (memory controller hub) Memory Slots Internal Bus High- speed graphics bus(AGP or PCI Express) PCI Bus LPC Bus Network Interface Card Graphics Card Slot Flash BIOS ROM 2/15/2009

Network Interface Cards as First-Class Citizens Approach 1: Features I/O Bus is many order times slower than the Memory Bus (memory latency) DMA initiation process is very expensive Software overhead (OS intervention) No direct access to CPU. NIC connected to standard I/O Bus Direct Memory Access (DMA) to easy out CPU 2/15/2009

Network Interface Cards as First-Class Citizens Approach 2: NIC connected to memory bus (2004) CPU Disk Northbridge (memory controller hub) Northbridge (memory controller hub) Southbridge (I/O controller hub) Southbridge (I/O controller hub) CPU Northbridge (memory controller hub) Northbridge (memory controller hub) Memory Slots Internal Bus High- speed graphics bus(AGP or PCI Express) PCI Bus LPC Bus Network Interface Card Flash BIOS ROM PCI Slots /15/2009

Network Interface Cards as First-Class Citizens Approach 2: Features PCI-Express orders of magnitude faster than PCI NIC closer to main memory (reduced latency) Not integrated with memory subsystem No direct access to CPU cache DMA initiation process is very expensive Software overhead (OS intervention) 2/15/2009

Network Interface Cards as First-Class Citizens CPU Approach 3: Direct Cache Access Capability (Simulated) CPU Disk Northbridge (memory controller hub) Northbridge (memory controller hub) Southbridge (I/O controller hub) Southbridge (I/O controller hub) Northbridge (memory controller hub) Northbridge (memory controller hub) Memory Slots Internal Bus High- speed graphics bus(AGP or PCI Express) PCI Bus LPC Bus Network Interface Card Flash BIOS ROM PCI Slots 1 2 2/15/2009

Network Interface Cards as First-Class Citizens Approach 3: Features NIC has access to processor cache (low latency) Reduced memory bandwidth requirement. Not integrated with memory subsystem. Software overhead (OS intervention) Increases processor cache requirements. 2/15/2009

Network Interface Cards as First-Class Citizens Outline Motivation Trends Evolution of System Architectures Current Status NIC as First-Class Citizens Problems and Solutions Drawbacks Conclusion 2/15/2009

Network Interface Cards as First-Class Citizens Where are we now? Software –Several high performance RDMA protocols available. Hardware –Intel’s Communication Stream Architecture subsumed by PCI-Express technology. –Myrinet Cards also provide feature of User Level Network Interface. –Qs-Net product of Quadrics is closest to the status of First- Class Citizen. 2/15/2009

Network Interface Cards as First-Class Citizens Where Do We Go From Here? What has been done so far? –Bad NIC architecture Giving first-class citizenship to NIC (attaching NIC to memory bus), but treating it as a second-class citizen, i.e., peripheral device. Obvious Solution –RDMA –Treat NIC as a first-class citizen. How? 2/15/2009

Network Interface Cards as First-Class Citizens Outline Motivation Trends Evolution of System Architectures Current Status NIC as First-Class Citizens Problems and Solutions Drawbacks Conclusion 2/15/2009

Network Interface Cards as First-Class Citizens Approach 4: NIC as First-Class Citizen (Proposed) CPU Disk Northbridge (memory controller hub) Northbridge (memory controller hub) Southbridge (I/O controller hub) Southbridge (I/O controller hub) Northbridge (memory controller hub) Northbridge (memory controller hub) Memory Slots Internal Bus High- speed graphics bus(AGP or PCI Express) LPC Bus Graphics Card Slot Flash BIOS ROM PCI Bus PCI Slots Network CPU 2/15/2009

Network Interface Cards as First-Class Citizens First Class Citizens? Beyond direct cache access. Connected on to the faster memory bus (or PCI-e). NIC integrated with memory sub-system. Have processing capabilities. Not a general purpose CPU but task-specific. Have its own cache like other processors. Treat NIC as a co-processor! 2/15/2009

Network Interface Cards as First-Class Citizens Outline Motivation Trends Evolution of System Architectures Current Status NIC as First-Class Citizens Problems and Solutions Drawbacks Conclusion 2/15/2009

Network Interface Cards as First-Class Citizens NIC Access  Memory Access 2/15/2009

Network Interface Cards as First-Class Citizens Virtualize NIC & Bypass OS Virtualize NIC –High latency to access NIC Packets go through OS via Unix sockets. High DMA initiation overhead. +Easy protection of address spaces +Easy address translation Treat it like a main memory and not like a disk! 2/15/2009

Network Interface Cards as First-Class Citizens NIC Access  Memory Access 2/15/2009

Network Interface Cards as First-Class Citizens Cache NIC Registers/Data NIC Registers Currently Uncached –CPU accesses to NIC may have side effects (unlike normal cache memory) –Behaves more like cache than main memory(passive) –Cache replacement issue Advantages +Low memory latency +Exploit temporal locality Remove unnecessary memory traffic (e.g. during polling) + Explicit Handshake required 2/15/2009

Network Interface Cards as First-Class Citizens NIC Access  Memory Access 2/15/2009

Network Interface Cards as First-Class Citizens NIC memory as cache – Block transfer I/O Transfer –Uncached load/stores to memory-mapped device registers transfer very few bytes (1-16 bytes) –High DMA initiation overhead (through CPU) Cache Block Transfer +High bandwidth ( bytes) +Memory buses are optimized for cache block transfer +Cache-cache transfer 2/15/2009

Network Interface Cards as First-Class Citizens NIC Access  Memory Access 2/15/2009

Network Interface Cards as First-Class Citizens Proper Notification Interrupt –Heavyweight Cache corrupted due to context switch –Corrupts the cache(s). Adversely affects cache hit rate. Results in added memory-bus traffic. Cache Invalidation +“Non-intrusive” NIC invalidates cached NIC register in CPU’s cache. CPU misses on cached but invalidated NIC register & gets valid NIC register from NIC. 2/15/2009

Network Interface Cards as First-Class Citizens NIC Access  Memory Access 2/15/2009

Network Interface Cards as First-Class Citizens Buffering Packets Use device memory of NIC –Limited buffer space Use virtual memory +Plentiful buffer space 2/15/2009

Network Interface Cards as First-Class Citizens NIC Access  Memory Access 2/15/2009

Network Interface Cards as First-Class Citizens [NIC Access  Memory Access] Out-of-Order Access Possible +Additional scheduling flexibility in a dynamic pipeline. Certain loads/stores –May be scheduled earlier than other loads/stores CPU may not need to stall... + Speculative access Due to memory based queues 2/15/2009

Network Interface Cards as First-Class Citizens NIC Access  Memory Access 2/15/2009

Network Interface Cards as First-Class Citizens Memory-Based Queue API Memory-Based Queue API vs. User-Level NIC API +Decouples NIC from CPU Sending/receiving packets = reading/writing queue memory Both CPU and NIC can send/receive multiple packets to/from queues without blocking + No longer explicit DMA initiation requests +Treat NIC queue accesses as side-effect-free memory accesses. + Enables Out-of-order and speculative access. 2/15/2009

Network Interface Cards as First-Class Citizens Outline Motivation Trends Evolution of System Architectures Current Status NIC as First-Class Citizens Problems and Solutions Drawbacks Conclusion 2/15/2009

Network Interface Cards as First-Class Citizens Drawbacks Proprietary Memory Bus –Non-standard interface, but bridges possible. Data Movement –Lose explicit program control. –Proposed solution currently applies only to bus-based cache-coherence protocols. High Risk –Standard interface needs to be developed and adopted. 2/15/2009

Network Interface Cards as First-Class Citizens Conclusion Application-network bottleneck is the biggest challenge. Incremental adjustments in the system architecture are no longer sufficient to allow network to realize its full capacity. Network interface cards must be treated as First- Class Citizens at par with other CPUs. 2/15/2009

Network Interface Cards as First-Class Citizens Questions?? Contact Information: Wu-chun Feng Pavan Balaji Ajeet Singh 2/15/2009