Can Google Route? Building a High-Speed Switch from Commodity Hardware Guido Appenzeller, Matthew Holliman Q2/2002.

Slides:



Advertisements
Similar presentations
Router Internals CS 4251: Computer Networking II Nick Feamster Spring 2008.
Advertisements

Router Internals CS 4251: Computer Networking II Nick Feamster Fall 2008.
1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
IT253: Computer Organization
1 ELEN 602 Lecture 18 Packet switches Traffic Management.
Geoff Salmon, Monia Ghobadi, Yashar Ganjali, Martin Labrecque, J. Gregory Steffan University of Toronto.
Chapter 8 Hardware Conventional Computer Hardware Architecture.
Copyright© 2000 OPNET Technologies, Inc. R.W. Dobinson, S. Haas, K. Korcyl, M.J. LeVine, J. Lokier, B. Martin, C. Meirosu, F. Saka, K. Vella Testing and.
Sizing Router Buffers Guido Appenzeller Isaac Keslassy Nick McKeown Stanford University.
Router Architecture : Building high-performance routers Ian Pratt
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
Spring 2002CS 4611 Router Construction Outline Switched Fabrics IP Routers Tag Switching.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
TDC 461 Basic Communications Systems Local Area Networks 29 May, 2001.
EE 122: Router Design Kevin Lai September 25, 2002.
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
ECE 526 – Network Processing Systems Design
Router Architectures An overview of router architectures.
Switches in Networking B. Konkoth. Network Traffic  Scalability  Ability to handle growing amount of work  Capability of a system to increase performance.
Router Architectures An overview of router architectures.
4: Network Layer4b-1 Router Architecture Overview Two key router functions: r run routing algorithms/protocol (RIP, OSPF, BGP) r switching datagrams from.
Chapter 4 Queuing, Datagrams, and Addressing
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Computer Networks Switching Professor Hui Zhang
Sven Ubik, Petr Žejdl CESNET TNC2008, Brugges, 19 May 2008 Passive monitoring of 10 Gb/s lines with PC hardware.
Storage & Peripherals Disks, Networks, and Other Devices.
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
Dynamic Networks CS 213, LECTURE 15 L.N. Bhuyan CS258 S99.
The MPC Parallel Computer Hardware, Low-level Protocols and Performances University P. & M. Curie (PARIS) LIP6 laboratory Olivier Glück.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 ECSE-6600: Internet Protocols Informal Quiz #14 Shivkumar Kalyanaraman: GOOGLE: “Shiv RPI”
Cisco 3 - Switching Perrine. J Page 16/4/2016 Chapter 4 Switches The performance of shared-medium Ethernet is affected by several factors: data frame broadcast.
Computer Networks: Switching and Queuing Ivan Marsic Rutgers University Chapter 4 – Switching and Queuing Delay Models.
Nick McKeown1 Building Fast Packet Buffers From Slow Memory CIS Roundtable May 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
MB - NG MB-NG Meeting Dec 2001 R. Hughes-Jones Manchester MB – NG SuperJANET4 Development Network SuperJANET4 Production Network Leeds RAL / UKERNA RAL.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
CEN 5501C - Computer Networks - Spring UF/CISE - Newman1 Computer Networks Chapter 5 – Hubs, Switches, VLANs, Fast Ethernet.
Supporting Multimedia Communication over a Gigabit Ethernet Network VARUN PIUS RODRIGUES.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
The CMS Event Builder Demonstrator based on MyrinetFrans Meijers. CHEP 2000, Padova Italy, Feb The CMS Event Builder Demonstrator based on Myrinet.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Lecture Note on Switch Architectures. Function of Switch.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Spring 2000CS 4611 Router Construction Outline Switched Fabrics IP Routers Extensible (Active) Routers.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Application Layer 2-1 Chapter 4 Network Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 A.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Ethernet Packet Filtering – Part 2 Øyvind Holmeide 10/28/2014 by.
Graciela Perera Department of Computer Science and Information Systems Slide 1 of 18 INTRODUCTION NETWORKING CONCEPTS AND ADMINISTRATION CSIS 3723 Graciela.
scheduling for local-area networks”
Youngstown State University Cisco Regional Academy
Weren’t routers supposed
RT2003, Montreal Niko Neufeld, CERN-EP & Univ. de Lausanne
Addressing: Router Design
CS 286 Computer Organization and Architecture
Chapter 4: Network Layer
Router Construction Outline Switched Fabrics IP Routers
MB – NG SuperJANET4 Development Network
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
CS 6290 Many-core & Interconnect
Techniques and problems for
Computer Networks: Switching and Queuing
Chapter 4: Network Layer
Presentation transcript:

Can Google Route? Building a High-Speed Switch from Commodity Hardware Guido Appenzeller, Matthew Holliman Q2/2002

Outline Motiavation: Cost of switches The Basic Unit Switching Fabrics Analysis & Reality Check

Price Survey of Gigabit Switches Apples and Oranges PC Gigabit Ethernet Card $34.99 (32 Bit) $49.35 (64 Bit) Layer 2 24 Port Switch $2, Port Switch $1,500 8 Port Switch $830 4 Port Switch $460 Layer 3 24 Port Switch $7,467 Routers (High End) CISCO 7600 Max Gbit Base price $60k Line Cards $10k Total $100k-250k??? CISCO Max 160 Gbit Base Price $120k Line Cards $25k-$50k Total $300-$1m???

Cost/Gigabit vs. Total Switching Capacity Cost/Gigabit dramatically increases with aggregate speed

Let’s build a Switch the Google way Use large numbers of cheap PC components Use cheap PC boards ($250) 16 MBytes of memory No Disk Use cheap copper Gigabit Ethernet Cards Use Clos Networks to build larger fabrics out of smaller ones PC Address Lookup Output Queuing Ethernet

NIC FIFO Ethernet frame arrivals NIC transfers packet(s) PCI bus arbitration /address PCI burst write Terminate transaction NIC raises interrupt CPU processing CPU DRAM Gig-e MCH PCI bus PC bus layout PC Data Flow The PCI bus is the bottleneck 3.2 GB/s 0.5 GB/s

Incoming packet buffer Interrupt Read packet header Copy packet to output queue DRAM access Lookup Read packet header Copy packet to output queue Lookup … NIC transfers packet(s) Computational Flow Interrupts are a bottle neck for short packets Packet processing is done from/to DRAM Packets are written from to network cards in bursts to save IRQ overhead and PCI bandwidth 5 μs ~100 ns

Per Port Throughput vs. Burst Size We need 66Mhz, 64-bit system x 100 Mbits Burst Size Per-port read/write bandwidth 33 MHz, 32-bit 66 MHz, 32-bit 33 MHz, 64-bit 66 MHz, 64-bit 1 GBit Threshold

Packet size CPU clocks per byte N=1, 5 us N=10, 5 us N=30, 5 us N=1, 11 us N=10, 11 us N=30, 11 us CPU clock cycles per byte vs. Packet Size For 100%throughput we need to aggregate short packets 1 GBit Threshold bytes Average Packet Size N = # of packets per burst 5us = HW latency only 11us = With driver latency

PC Performance Summary Today’s PCs are just fast enough to operate as a 4x4 switch To build a 4x4 half duplex (2x2 full duplex) switch we need: 66 MHz/64 Bit PCI bus 1 Gbyte/s Memory Bandwidth NIC must have sufficient buffers to aggregate short packets to bursts (about 2kBytes) Software has to run w/o interrupts e.g. Linux in halted mode

Building larger Switches Clos Network for an 16x16 Switch made of 4x4 Switches Requires 12 PCs 48 Network Cards 8 GBit capacity Total cost: $5400 3Com is cheaper

Building larger Switches Clos Network for an 256x256 Switch out of 16x16 switches Requires 576 PCs 2304 network cards 128 Gbit capacity Total cost: $260k Now we are cheaper! Well, sort of…

Switch size vs. Basic Units Needed Scaling is slightly worse than n log(n) How many 4x4 switches do we need for an NxN switch? 4 x 4 1 switch 16 x switches 256 x switches 4 2^n x 4 2^n 3 n 4 2^n /4 switches General: N x N needs (N/4) log 4 N (1.5)^log 2 log 4 N switches Could you build a switch with less basic units Maybe, but not much Theoretical limit is O( (N/4) log 4 N ) Differing term (1.5)^log 2 log 4 N is small

Scheduling – The Problem How do we do scheduling? For n=k Clos Network we need dynamic matching For 256x256 alogrithm is time-consuming How to pass traffic information between inputs, scheduler and nodes More network connections are costly Timing critical

Solution: Buffered Clos Networks Two ideas: 1. Add a randomization stage (Chang et. al.) Now we can use round robin as scheduler This is deterministic and requires no synchronization between nodes 2. Use the PC’s capability to buffer data Each node has a few Mbytes If there is a collision re-send packets We use randomization

Randomized, Buffering Clos Network Stage 1: Pseudo Random (no coordination needed) Never blocks Stage 2: Round Robin (no coordination needed) Never blocks. Pseudo Random Round Robin

Stability Analysis of the Switch First stage Analysis Matching is random, distribution of packets on middle column of nodes is I.I.D. Uniform No blocking can occur Queue length at the middle stage is equivalent to an IQR with k inputs, VOQs and Round Robin scheduling We know such a IQR has 100% throughput under I.I.D. Uniform traffic Second stage No blocking can occur, 100% throughput if all VOQs in middle stage are occupied Queue length at the middle stage is equivalent to an output queued router with k inputs. Output queued router has 100% throughput System has 100% throughput for any admissible traffic

Reality Check This might look like a good idea… Cheap Scalable – switch can grow Some Redundancy - node failure reduces throughput by 1/16 worst case …but is probably not exactly what Carriers want High power consumption (50 kW vs. 2.5 kW) Major space requirements (10-20 racks) Packet reordering can happen (TCP won’t like it) Maintenance – One PC will fail per day!

Research Outlook Why this could still be interesting We can do this in hardware Implement in VLSI Build from chipsets that 24x24 switch manufacturers use. We could use better base units E.g TBit half duplex (fastest in the world?) 576x576 using 24x24 Netgear switches (GS524T ) Cost: $158k (We might get a volume discount from Netgear) So far we don’t use intelligence of the nodes We can re-configure matchings periodically Distribute lookup

Backup

Randomized/HoQ Buffering Clos Network Stage 1: Pseudo Random (no coordination needed) Never blocks Stage 2: Head of Queue (no coordination needed) If it blocks, buffer packet and resend. Note: We can overprovision middle layer! Pseudo Random Round Robin