Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.

Slides:



Advertisements
Similar presentations
1 Building a Fast, Virtualized Data Plane with Programmable Hardware Bilal Anwer Nick Feamster.
Advertisements

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Scheduling in Web Server Clusters CS 260 LECTURE 3 From: IBM Technical Report.
System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
Supercharging PlanetLab A High Performance,Multi-Alpplication,Overlay Network Platform Reviewed by YoungSoo Lee CSL.
Performance Evaluation of Open Virtual Routers M.Siraj Rathore
1 ELEN 602 Lecture 18 Packet switches Traffic Management.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Router Architecture : Building high-performance routers Ian Pratt
Chapter 7 Protocol Software On A Conventional Processor.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
Spring 2002CS 4611 Router Construction Outline Switched Fabrics IP Routers Tag Switching.
1 Virtual Private Caches ISCA’07 Kyle J. Nesbit, James Laudon, James E. Smith Presenter: Yan Li.
UCB Switches Jean Walrand U.C. Berkeley
1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
1 Design and Implementation of A Content-aware Switch using A Network Processor Li Zhao, Yan Luo, Laxmi Bhuyan University of California, Riverside Ravi.
10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.
Performance Analysis of the IXP1200 Network Processor Rajesh Krishna Balan and Urs Hengartner.
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 Intel IXP1200 Network Processor q Lab 12, Introduction to the Intel IXA q Jonathan Gunner, Sruti.
Figure 1.1 Interaction between applications and the operating system.
School of Information Technologies IP Quality of Service NETS3303/3603 Weeks
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
ECE 526 – Network Processing Systems Design
Research Gísli Hjálmtýsson - AT&T Research - 1 Programmable Networks of Tomorrow (Pronto): The Programmable Interface of Pronto.
Router Architectures An overview of router architectures.
Router Architectures An overview of router architectures.
4: Network Layer4b-1 Router Architecture Overview Two key router functions: r run routing algorithms/protocol (RIP, OSPF, BGP) r switching datagrams from.
Chapter 4 Queuing, Datagrams, and Addressing
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Computer Networks Switching Professor Hui Zhang
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Adaptive CPU Allocation for Software based Router Systems Puneet Zaroo.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Paper Review Building a Robust Software-based Router Using Network Processors.
1.  Project Goals.  Project System Overview.  System Architecture.  Data Flow.  System Inputs.  System Outputs.  Rates.  Real Time Performance.
CIS679: Scheduling, Resource Configuration and Admission Control r Review of Last lecture r Scheduling r Resource configuration r Admission control.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
1 Scheduling Processes. 2 Processes Each process has state, that includes its text and data, procedure call stack, etc. This state resides in memory.
1 Liquid Software Larry Peterson Princeton University John Hartman University of Arizona
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Router Architecture Overview
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
Performance evaluation of component-based software systems Seminar of Component Engineering course Rofideh hadighi 7 Jan 2010.
Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili.
CS640: Introduction to Computer Networks Aditya Akella Lecture 20 - Queuing and Basics of QoS.
Nick McKeown Spring 2012 Lecture 2,3 Output Queueing EE384x Packet Switch Architectures.
CS 4396 Computer Networks Lab Router Architectures.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Forwarding.
Supporting Multimedia Communication over a Gigabit Ethernet Network VARUN PIUS RODRIGUES.
Chapter 13 – I/O Systems (Pgs ). Devices  Two conflicting properties A. Growing uniformity in interfaces (both h/w and s/w): e.g., USB, TWAIN.
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Lecture 18: Quality of Service Slides used with.
1 CSE 5346 Spring Network Simulator Project.
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
Spring 2000CS 4611 Router Construction Outline Switched Fabrics IP Routers Extensible (Active) Routers.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Graciela Perera Department of Computer Science and Information Systems Slide 1 of 18 INTRODUCTION NETWORKING CONCEPTS AND ADMINISTRATION CSIS 3723 Graciela.
QoS & Queuing Theory CS352.
CS 286 Computer Organization and Architecture
Chapter 4: Network Layer
Final Review CS144 Review Session 9 June 4, 2008 Derrick Isaacson
Lecture 23: Process Scheduling for Interactive Systems
CS 31006: Computer Networks – The Routers
Chapter 4 Network Layer Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith Ross Addison-Wesley, April Network Layer.
Project proposal: Questions to answer
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Chapter 4: Network Layer
Presentation transcript:

Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles

Spring 2003CS 4612 Observations Emerging commodity components can be used to build IP routers –switching fabrics, network processors, … Routers are being asked to support a growing array of services –firewalls, proxies, p2p nets, overlays,...

Spring 2003CS 4613 Data Plane (IP) Control Plane (BGP, RSVP…) Router Architecture

Spring 2003CS 4614 Software-Based Router + Cost + Programmability – Performance (~300 Kpps) – Robustness Data Plane (IP) Control Plane (BGP, RSVP…) PC

Spring 2003CS 4615 Hardware-Based Router – Cost – Programmability + Performance (25+ Mpps) + Robustness Data Plane (IP) Control Plane (BGP, RSVP…) ASIC PC

Spring 2003CS 4616 NP-Based Router Architecture + Cost ($1500) + Programmability ? Performance ? Robustness Data Plane ( packet flows) Control Plane ( packet flows) IXP1200 PC

Spring 2003CS 4617 In General... IXP Pentium IXP Pentium... Pentium

Spring 2003CS 4618 Architectural Overview... Network Services... Virtual Router... Hardware Configurations... Packet Flows Forwarding Paths Switching Paths

Spring 2003CS 4619 Virtual Router Classifiers Schedulers Forwarders

Spring 2003CS Simple Example IP Proxy Active Protocol

Spring 2003CS Intel IXP ScratchDRAMSRAM 6 Micro- Engines StrongARM FIFOs IX Bus MAC Ports IXP1200 Chip PCI Bus

Spring 2003CS Processor Hierarchy MicroEngines Pentium StrongArm

Spring 2003CS Data Plane Pipeline DRAM (buffers) SRAM (queues, state) Input FIFO Slots Output FIFO Slots Input Contexts Output Contexts 64B

Spring 2003CS Data Plane Processing INPUT context loop wait_for_data copy in_fifo  regs Basic_IP_processing copy regs  DRAM if (last_fragment) enqueue  SRAM OUTPUT context loop if (need_data) select_queue dequeue  SRAM copy DRAM  out_fifo

Spring 2003CS Pipeline Evaluation 100Mbps Ether  0.142Mpps Measured independently

Spring 2003CS What We Measured Static context assignment –16 input / 8 output Infinite offered load 64-byte (minimum-sized) IP packets Three different queuing disciplines

Spring 2003CS Single Protected Queue Lock synchronization Max 3.47 Mpps Contention lower bound 1.67 Mpps O I I I Output FIFO

Spring 2003CS Multiple Private Queues Output must select queue Max 3.29 Mpps O I I I Output FIFO

Spring 2003CS Multiple Protected Queues Output must select queue Some QoS scheduling (16 priority levels) Max 3.29 Mpps O I I I Output FIFO

Spring 2003CS Data Plane Processing INPUT context loop wait_for_data copy in_fifo  regs Basic_IP_processing copy regs  DRAM if (last_fragment) enqueue  SRAM OUTPUT context loop if (need_data) select_queue dequeue  SRAM copy DRAM  out_fifo

Spring 2003CS Cycles to Waste INPUT context loop wait_for_data copy in_fifo  regs Basic_IP_processing nop … nop copy regs  DRAM if (last_fragment) enqueue  SRAM OUTPUT context loop if (need_data) select_queue dequeue  SRAM copy DRAM  out_fifo

Spring 2003CS How Many “NOPs” Possible? IXP1200 Evalualtion Board 1.2 Mpps = 8x100Mbps

Spring 2003CS Data Plane Extensions

Spring 2003CS Control and Data Plane Smart Dropper Layered Video Analysis (control plane) (data plane) Shared State

Spring 2003CS What About the StrongARM? Shares memory bus with MicroEngines –must respect resource budget What we do –control IXP1200  Pentium DMA –control MicroEngines What might be possible –anything within budget –exploit instruction and data caches We recommend against –running Linux

Spring 2003CS Performance MicroEngines Pentium StrongArm 310Kpps with 1510 cycles/packet 3.47Mpps w/ no VRP or 1.13Mpps w/ VRP buget

Spring 2003CS Pentium Runs protocols in the control plane –e.g., BGP, OSPF, RSVP Run other router extensions –e.g., proxies, active protocols, overlays Implementation –runs Scout OS + Linux IXP driver –CPU scheduler is key

Spring 2003CS Processes Input Port Pentium Output Port P P PP P

Spring 2003CS Performance

Spring 2003CS Performance (cont) Kpps

Spring 2003CS Scheduling Mechanism Proportional share forms the base –each process reserves a cycle rate –provides isolation between processes –unused capacity fairly distributed Eligibility –a process receives its share only when its source queue is not empty and sink queue is not full Batching –to minimize context switch overhead

Spring 2003CS Share Assignment QoS Flows –assume link rate is given, derive cycle rate –conservative rate to input process –keep batching level low Best Effort Flows –may be influenced by admin policy –use shares to balance system (avoid livelock) –keep batching level high

Spring 2003CS Experiment A (BE) B (QoS) C (QoS) A + C B

Spring 2003CS Mixing Best Effort and QoS Increase offered load from A

Spring 2003CS CPU vs Link Fix A at 50Kpps, increase its processing cost

Spring 2003CS Turn Batching Off CPU efficiency: 66.2%

Spring 2003CS Enforce Time Slice CPU efficiency: 81.6% (30us quantum)

Spring 2003CS Batching Throttle Scheduler Granularity: G –flow processes as many packets as possible w/in G Efficiency Index: E, Overhead Threshold: T –keep the overhead under T%, then 1 / (1+T) < E Batch Threshold: B i –don’t consider Flow i active until it has accumulated at least B i packets, where C sw / (B i x C i ) < T Delay Threshold: D i –consider a flow that has waited D i active

Spring 2003CS Dynamic Control Flow specifies delay requirement D Measure context switch overhead offline Record average flow runtime Set E based on workload Calculate batch-level B for flow

Spring 2003CS Packet Trace