N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Software Packet Processing - The Click modular router - netmap: A novel framework for fast packet I/O.

Slides:



Advertisements
Similar presentations
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Advertisements

1 o Two issues in practice – Scale – Administrative autonomy o Autonomous system (AS) or region o Intra autonomous system routing protocol o Gateway routers.
© 2006 Cisco Systems, Inc. All rights reserved. MPLS v2.2—2-1 Label Assignment and Distribution Introducing Typical Label Distribution in Frame-Mode MPLS.
Chapter 7 Protocol Software On A Conventional Processor.
ECE 526 – Network Processing Systems Design Software-based Protocol Processing Chapter 7: D. E. Comer.
A Comparative Study of Extensible Routers Yitzchak Gottlieb and Larry Peterson.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
Stream Processing in PNEs George Porter Edge Services Session Winter Retreat
4-1 Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving side, delivers.
1 Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
1 K. Salah Module 4.0: Network Components Repeater Hub NIC Bridges Switches Routers VLANs.
EE 122: Router Design Kevin Lai September 25, 2002.
A Comparative Study of Extensible Routers Yitzchak Gottlieb.
CS 268: Lecture 12 (Router Design) Ion Stoica March 18, 2002.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
5 th Biennial Ptolemy Miniconference Berkeley, CA, May 9, 2003 The Component Interaction Domain: Modeling Event-Driven and Demand- Driven Applications.
G Robert Grimm New York University Receiver Livelock.
Chapter 9 Classification And Forwarding. Outline.
Router Construction II Outline Network Processors Adding Extensions Scheduling Cycles.
Router Architectures An overview of router architectures.
Chapter 4 Queuing, Datagrams, and Addressing
Computer Networks Switching Professor Hui Zhang
Jennifer Rexford Princeton University MW 11:00am-12:20pm Programmable Data Planes COS 597E: Software Defined Networking.
Using ns-3 emulation to experiment with Wireless Mesh Network Routing: Lessons learned José Núñez-Martínez Research Engineer Centre Tecnologic de Telecomunicacions.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Hosting Virtual Networks on Commodity Hardware VINI Summer Camp.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Common Devices Used In Computer Networks
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
Segmentation & O/S Input/Output Chapter 4 & 5 Tuesday, April 3, 2007.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
1 Liquid Software Larry Peterson Princeton University John Hartman University of Arizona
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
IP Forwarding.
Design and Implementation of a Multi-Channel Multi-Interface Network Chandrakanth Chereddi Pradeep Kyasanur Nitin H. Vaidya University of Illinois at Urbana-Champaign.
Network Architecture for the LHCb DAQ Upgrade Guoming Liu CERN, Switzerland Upgrade DAQ Miniworkshop May 27, 2013.
EECB 473 DATA NETWORK ARCHITECTURE AND ELECTRONICS PREPARED BY JEHANA ERMY JAMALUDDIN Basic Packet Processing: Algorithms and Data Structures.
Cisco 3 - Switching Perrine. J Page 16/4/2016 Chapter 4 Switches The performance of shared-medium Ethernet is affected by several factors: data frame broadcast.
The Design and Implementation of Firewall, NAT, Traffic Shaper on FreeBSD.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris
CCNA3 Module 4 Brierley Module 4. CCNA3 Module 4 Brierley Topics LAN congestion and its effect on network performance Advantages of LAN segmentation in.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
VIRTUAL NETWORK PIPELINE PROCESSOR Design and Implementation Department of Communication System Engineering Presented by: Mark Yufit Rami Siadous.
Implementing Cisco IP Routing (ROUTE v2.0)
An open source user space fast path TCP/IP stack and more…
Assignment 1  Chapter 1:  Question 11  Question 13  Question 14  Question 33  Question 34  Chapter 2:  Question 6  Question 39  Chapter 3: 
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
Computer System Structures
InterVLAN Routing 1. InterVLAN Routing 2. Multilayer Switching.
Network Layer COMPUTER NETWORKS Networking Standards (Network LAYER)
Chapter 9 Optimizing Network Performance
Kernel Design & Implementation
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Top-Down Network Design Chapter Thirteen Optimizing Your Network Design Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Networking Devices.
Final Review CS144 Review Session 9 June 4, 2008 Derrick Isaacson
Cisco Real Exam Dumps IT-Dumps
Capriccio – A Thread Model
An introduction to the organization of the Internet Lab
CS 31006: Computer Networks – The Routers
Software Defined Networking (SDN)
Network Core and QoS.
COS 561: Advanced Computer Networks
EE 122: Lecture 7 Ion Stoica September 18, 2001.
An introduction to the organization of the Internet Lab
Network Core and QoS.
Presentation transcript:

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Software Packet Processing - The Click modular router - netmap: A novel framework for fast packet I/O Presented by Shinae Woo EE807 Software-defined Network Computing

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB The Click modular router EDDIE KOHLER, ROBERT MORRIS, BENJIE CHEN, JOHN JANNOTTI, and M. FRANS KAASHOEK

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Router’s functionality Routing + Forwarding Do much more than routing and forwarding Firewall Packet filtering Packet tunneling Traffic prioritizing Network address translation

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Routers in early 2000 Cisco ASR 1013 Cisco NCS 6008 Single Chassis System Juniper E120 Broadband Services Router HW routers Specialized HW + proprietary SW Monolithic, closed, static and inflexible Difficult to add/delete functionality SW routers Commodity hardware + shipped with kernel Hard to extend: Need to modify monolithic kernel code x-kernel scout Streams Netgraph Q. How to design an extendable SW router platform?

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Requirements No such solutions in early 2000  1. Flexible and configurable router design 2. Extensible router design 3. Clearly defined interfaces between router functionalities

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Click Modular Router DivideConquer Break down to individual router functionalities Link individual functionalities to complete a router design Packet classification ARP query Address lookup Switching packets

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Click architecture A directed graph with Elements – A single router function Connections – Possible packet path between two elements

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Elements A single router function Input and output ports Configuration strings – Per-element state, tuning behavior

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Connections Possible paths for packet handoff Building a routing configuration 1. Choose a collection of elements 2. Connect them into a directed graph

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Queue Start with packet arrival events Start with available packet transmission and scheduling events Packet storage NULL 1. Implicit queue in elements2. Explicit queue outside elements VS. Queue: Unit for scheduling (single task or single thread) Click’s choice

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Push and Pull Connections (1) QueueDestinationSource receive packet p ready to transmit PushPull enqueue packet p dequeue packet p Single scheduling unit push (p)pull (p) return

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Push and Pull Connections (11) Single scheduling unit Push connection ( ▶ ■ ) Pull connection ( ▷ □) Agnostic connection (double outline) Becomes push or pull depending on peer Either push or pull, not mixed

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Invalid connections between elements 1. Push output cannot connects to pull input 2. Push output cannot have more than one connection 3. Pull input cannot have more than one connection 4. Agnostic elements cannot have mixed push/pull context

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Click implementation Two versions – Linux in-kernel driver: Good for production – User-level driver: Good for profiling and debugging Element – C++ class – 20 virtual functions – need overwrite 6 or less push, pull, run_scheduled Connection – Virtual function calls between elements Passing configuration file to the driver

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Simple languages for declarations and connections Fully declarative – Declaration – Connection – Not how to process packets Shaper ShapedQueue Compound element

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB IP Router 16 elements in push path 1 elements in pull path Bring local information in packet e.g.,TTL Bring annotated information with packets – One element creates information – Other element uses information e.g., Destination IP address, Paint (Marking packet with color)

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Extensibility We can easily add additional functionality to IP router (1)Scheduling (2)Dropping Policy (3)Differentiated Services (4)Ethernet Switch Covered in this talk

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Extensibility (1) Scheduling Stochastic Fairness Queueing (SFQ) – Providing isolation between competing flows – Distributing packets into multiple queues Virtual queue

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Extensibility (2) Dropping policy Weighted Random Early Detection (RED) – Red: Dropping packets with network congestion RED element needs to know ‘nearest downstream queue’ – S1. Manual configuration to give the information – S2. Flow-based router context What is your queue length? My queue length is 10

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Flow-based router context Answers for If I were to emit a packet on my second output, – Where might it go? – Which queues might it encounter?  Using a simple data-flow algorithm on configuration graph RED: Where is my nearest down stream queue?

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Evaluation environment SourceDestination Click IP Router 100 Mbit/s 700 MHz Intel Pentium Ⅲ 733 MHz Intel Pentium Ⅲ 200 MHz Intel Pentium Pro UDP packets 147,900 64B PPS 100 Mbit/s Make a bottlneck

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Simple forwarding rate Click keeps the baseline performance of Linux Peak: 360,000 PPS Loss-free: 333,000 PPS Linux routing table algorithm is slower than Click’s Receive live rock Click’s polling driver avoiding interrupts and efficient I/O

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Forwarding rate with richer functionality Comparison with real network – Not involve: fragmentation, IP options, ICMP error – Smaller number of ARP entries Click’s performance drops only gradually

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Overhead of modularity Overhead of passing packets between elements – Single virtual function call: 70 ns – 16 elements in IP router: 70 ns * 16 = 1 us / packet Combine multiple elements into one – 16 elements  8 elements – Number of virtual function calls will decrease Push path latency decreases – 1.57 us  1.03 us

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Overhead of modularity Unnecessarily general element code Implemented more generality than necessary Classifier – Required for IP router: ARP, IP packets – Implemented: a small data structure for finding which packet data to examine – Special classifier 24% smaller CPU cycles than general classifier But only 4% less than total cost

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Conclusion Click Modular Router An open, extensible, and configurable router framework Building a complex IP router by connecting small, modular elements Modularity does not harm the performance of the base Linux system Easily adding extended functionality

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB netmap: a novel framework for fast packet processing Luigi Rizzo

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB netmap: a novel framework for fast packet processing Possible number of a 10G link Mpps (64B packets) Packet processing overhead in commodity OS – Per-packet dynamic memory allocation – System call overhead – Memory copies netmap – cycles/packet (67 ns) sendto() execution path and time in FreeBSD system call 1us / packet 1.05 Mbps

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB netmap architecture Netmap’s approaches – Per-packet dynamic memory allocation  Preallocated resources – System call overhead  Large batching – Memory copies  Shared buffer between kernel and userspace

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Evaluation Test equipment i MHz Intel G NIC 64B packet Tx netmap: cycles/packet Line rate 8x

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Discussion

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Click implicit dependency from annotation Annotation makes implicit dependency between elements – There are 16 annotation types in the paper (+ custom types now) – How to resolve those dependency? – There are no consideration on the configuration step Paint CheckPaint Set paint annotationUse paint annotation

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Click Extend to support L4/Stateful functionality How to support TCP functionality – Can be built as a single element – Cannot support modularity on L4 layer How to support IP defragmentation – It has to reserve packets until the other parts of packets arrived – Click elements requires to store packets TCP

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB Click Tradeoff between modularity vs efficiency There are fundamental tradeoff between modularity and efficiency VS. How can we get both benefit? – Give a configuration in a modular way – Optimize binary to integrated functionality IPPush

N ETWORKED & D ISTRIBUTED COMPUTING S YSTEMS L AB netmap Comparison between other apporoaches There are many similar work in this area – PSIO, PF_RING, DPDK Similar approaches to solve same challenges Benefit from netmap compared with other approaches – Integrated with FreeBSD – Not depend on specific hardware