(ANCS 2013) Named data networking on a router: fast and dos-resistant forwarding with hash tables Authors: Won So, Ashok Narayanan, David Oran (Cisco Systems)

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

Memory.
Part IV: Memory Management
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Shulin You UNIVERSITY OF MASSACHUSETTS, AMHERST – Department of Electrical and Computer Engineering.
Router Architecture : Building high-performance routers Ian Pratt
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
CS 104 Introduction to Computer Science and Graphics Problems
CSCI 4550/8556 Computer Networks Comer, Chapter 19: Binding Protocol Addresses (ARP)
Memory Management 2010.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Disk Storage, Basic File Structures, and Hashing by Pinar Senkul resources: mostly froom.
An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
Fast binary and multiway prefix searches for pachet forwarding Author: Yeim-Kuan Chang Publisher: COMPUTER NETWORKS, Volume 51, Issue 3, pp , February.
1 Scalable Pattern-Matching via Dynamic Differentiated Distributed Detection (D 4 ) Author: Kai Zheng, Hongbin Lu Publisher: GLOBECOM 2008 Presenter: Han-Chen.
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Router Architectures An overview of router architectures.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Chapter 3 Memory Management: Virtual Memory
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 17 Disk Storage, Basic File Structures, and Hashing.
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
IP Forwarding.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
(TPDS) A Scalable and Modular Architecture for High-Performance Packet Classification Authors: Thilan Ganegedara, Weirong Jiang, and Viktor K. Prasanna.
Scalable Name Lookup in NDN Using Effective Name Component Encoding
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Authors: Haowei Yuan, Tian Song, and Patrick Crowley Publisher: ICCCN 2012 Presenter: Chai-Yi Chu Date: 2013/05/22 1.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Authors: Matteo Varvello, Diego Perino, and Leonardo Linguaglossa Publisher: NOMEN 2013 (The 2nd IEEE International Workshop on Emerging Design Choices.
Multimedia & Mobile Communications Lab.
Storage Structures. Memory Hierarchies Primary Storage –Registers –Cache memory –RAM Secondary Storage –Magnetic disks –Magnetic tape –CDROM (read-only.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Virtual Memory 1 1.
1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:
Networking Named Content Van Jacobson, Diana K. Smetters, James D. Thornton, Michael F. Plass, Nicholas H. Briggs, Rebecca L. Braynard.
Scalable High Speed IP Routing Lookups Scalable High Speed IP Routing Lookups Authors: M. Waldvogel, G. Varghese, J. Turner, B. Plattner Presenter: Zhqi.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
HIGH-PERFORMANCE LONGEST PREFIX MATCH LOGIC SUPPORTING FAST UPDATES FOR IP FORWARDING DEVICES Author: Arun Kumar S P Publisher/Conf.: 2009 IEEE International.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
Chapter 5 Record Storage and Primary File Organizations
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Chapter 7 Memory Management Eighth Edition William Stallings Operating Systems: Internals and Design Principles.
Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.
Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.
Fast Forwarding for NDN Won So Ashok Narayanan Mark Stapp Cisco Systems IETF ICNRG, 31/7/2013, Berlin.
IP Routers – internal view
Statistical Optimal Hash-based Longest Prefix Match
Network Core and QoS.
Scalable Memory-Less Architecture for String Matching With FPGAs
Implementing an OpenFlow Switch on the NetFPGA platform
A Small and Fast IP Forwarding Table Using Hashing
Network Layer: Control/data plane, addressing, routers
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
Network Core and QoS.
Presentation transcript:

(ANCS 2013) Named data networking on a router: fast and dos-resistant forwarding with hash tables Authors: Won So, Ashok Narayanan, David Oran (Cisco Systems) Publisher: ANCS 2013 Presenter: 楊皓中 Date: 2013/11/20

Introduction Named data networking (NDN) ◦ is a networking paradigm that tries to address issues with the current Internet by using named data instead of named hosts for the communication model ◦ First, uses variable-length hierarchical tokenized names, rather than fixed-length addresses  /com/cisco/ndn ◦ Second, 3 different forwarding data structures  Pending Interest Table (PIT)  Content Store (CS)  Forwarding Information Base (FIB) ◦ Lastly, unlike IP some NDN data plane data structures require per-packet state update

Introduction 1)name lookup via hash tables with fast collision-resistant hash computation 2)an efficient FIB lookup algorithm that provides good average and bounded worst case FIB lookup time ◦ enables high-speed forwarding while achieving DoS (Denial-of- Service) resistance 3)PIT partitioning that enables linear multicore speedup 4)an optimized data structure and software prefetching to maximize data cache utilization.

(1) decoding the packet (2) parsing the name components (3) the full-name hash (Hn) is generated from the Interest name and matched against the CS, PIT then FIB HT (4) successive lookups are performed with shorter name prefix hashes (Hn − 1,...,H1) against the FIB, which stores all prefixes in a large HT.

Hash function SipHash [2] offers a good balance as it provides collision resistance and comparable performance to non-cryptographic hashes. ◦ the computational load of a hash function ◦ low hash collision probability ◦ short average HT chain length In order to effectively generate a set of hashes (Hn,...,H1), we have implemented our own SipHash that computes all name prefix hashes with one pass while simultaneously parsing the name components.

Hash table design First, minimize string comparison that negatively affects the performance. ◦ hash value generated from a name is stored at every hash table entry. ◦ While searching for a matching entry during HT chain walk, string comparison is performed only when it finds an entry that has the exactly same,hash value as the input lookup key. second, design a data cache (D$) friendly data structure that causes fewer cache misses during HT lookup. ◦ Instead of using linked list buckets, using a compact array, where each hash bucket contains 7 pre-allocated compact entries (32-bit hash and 32-bit index to the full entry) in 1 cache line (64 bytes) ◦ the key difference is that a single D$ miss occurs in every linked list chain walk while it happens only once in every 7 compact array chain walks if there is no hash collision in the 32 bit hashes

Hash table design

Software prefetching also helps reduce the impact of D$ misses by hiding the D$ miss latencies. Due to the need for sequential lookups in NDN forwarding (Interest: CS → PIT → FIB × n, Data:PIT → CS) 1)Cascade –(i.e. prefetch PIT bucket at CS lookup, prefetch nth FIB bucket at PIT lookup and so on) 2)all-at-once – all prefetch instructions are issued at the beginning just after computing all hashes (Hn,...,H1) but before any lookups. With evaluation, all-at-once prefetching exhibits better overall performance because the memory latency is much longer than one HT lookup cycle. we have employed all-at-once prefeching for our implementation

FIB lookup algorithm In a typical Interest forwarding scenario, the FIB HT lookup can be a bottleneck because lookup is iterated until it finds the longest prefix match (LPM). The basic HT-based lookup algorithm – longest-first strategy as lookup starts from the longest prefix ◦ it will result in a large average lookup time ◦ it makes a router vulnerable to an DoS attack against FIB processing wherein an adversary injects Interest packets with long non-matching names containing garbage tokens. To address this, we propose a 2-stage LPM algorithm using virtual FIB entries ◦ With this scheme, lookup first starts from a certain short (M component) name prefix and either continues to shorter prefixes or restarts from a longer (but shorter than the longest) FIB prefix if needed. Every FIB entry with M prefixes maintains the maximum depth (MD) of prefixes that start with the same prefix in the FIB.

FIB lookup algorithm Fullname → (org)times → (new)times /a/b/c/d/e/f → 3 → 2 /a/c/d/e/f/g → 6 → 3 /a/b/d/e/f/g → 3 → 1 /a/b/c/d → 1 → 2

FIB lookup algorithm

One disadvantage of our method is a FIB expansion caused by additional virtual FIB entries. we expect that impact of FIB expansion will not be significant with a good choice of M think of 2 simple heuristics that work based on the FIB (or RIB) name component (prefix) distribution (1) most- likely – choose M that contains the most number of components (2) cumulative-threshold – choose M where the cumulative component distribution exceeds a certain threshold such as 75%. We have used the most-likely heuristic for our experiments shown in Section 4.

CS and PIT lookup On top of the basic HT lookup used for FIB, we have devised 2 more techniques to optimize access for PIT and CS. First, PIT and CS lookups are combined into one HT lookup because they are adjacent and use the same full-name key. By combining CS and PIT lookup into a single hash table, it reduces 2 separate HT lookups to one. For this to work, a flag bit is used to distinguish whether an entry belongs to CS or PIT.

CS and PIT lookup The second technique to optimize PIT and CS lookup is to partition the PIT (i.e. combined CS/PIT) across cores (or threads). sharing PIT entries among parallel packet processing threads significantly impedes multi- core parallelization speedup PIT partitioning enables each core to exclusively access its own PIT without any locking or data aliasing among cores This is achieved by distributing packets to cores based on names. In our implementation, we use full-name hashes to distribute packets to the forwarding threads. This is efficient because the full-name hash needs to be computed in any case for the CS/PIT HT lookup.

Content Store design- 2 levels of caching that can be done in an NDN router. Short-term caching will be implemented as a packet buffer mechanism with the same memory technology used for the forwarding data structures. To support caching using a packet buffer mechanism, and the packet buffer has to support at least 3 additional primitives: ◦ (1) tx-and-store – transmit and cache a packet (extend the life of a packet) ◦ (2) delete – evict a packet (end the life of a packet) ◦ (3) tx-from-buffer – transmit a packet directly from the packet buffer. If caching is implemented as a forwarding engine only function without these mechanisms, expensive memory copy operations are needed to move big Data packets in and out. Long-term caching of Data packets is mainly for efficiently serving popular content The complete design and implementation of a Content Store with hierarchical storage will appear in our future work.

Target platform Cisco ASR 9000 router with the Integrated Service Module (ISM) The ISMis currently used for video streaming and NAT applications; it features 2 Intel 2.0GHz hexa-core Westmere EP (Xeon) Processors with Hyper-threading (24 logical cores) and 4 Intel Niantic 10GE interfaces. include up to 3.2 TB of SSD with 2 modular flash SAM

Packet format we are currently working on defining a flexible and forwarding friendly binary NDN packet format based on a fixed header and TLV(Type-Length-Value) fields. For experiments, we use a simple format that includes a 1-byte header with a packet type and a simple name format, which includes the length of the full name and encoded name components with the component length and characters.

System overview 10GE transport line cards (LC1-2) Intel 10GE Niantic Network Interface Card (NIC) App process (A0-A7) ◦ all the NDN forwarding operations – CS/PIT FIB lookup, returns the packet to the Dispatcher , The Dispatcher trans- mits the returned packet to the outgoing interface Dispatcher process (D0-3) ◦ identifies the offset of the name in the packet ◦ computes the hash of the full name, Identify which App process (A0-A7) will process the packet

Workload generation We have extracted 13 million HTTP URLs from 16 IRCache traces converted them into NDN names using the following method ◦ (1) each sub-domain name is converted into a single NDN name component in a reverse order ◦ (2) the entire query string after ‘?’ character is always treated as one component ◦ (3) every ‘/’ character in the rest of URL is considered as a name component delimiter ◦ (e.g. cisco.com/ndn is converted to /com/cisco/ndn.)

Workload generation Our hypothesis is that an NDN routing table has a similar long-tailed component distribution to that of NDN object names, but with a smaller average and more skewed shape. URLblacklist [21] provides a collection of doma- ins and short URLs for access blocking URLblacklist distribution matches our hypothesis; a long-tailed distribution with the average 4.26 components. Another possible candidate for the FIB distribution is the domain name distribution. ◦ it was rejected because its distribution is heavily skewed; ◦ most (73%) names have 3 name components.

Scalability of HT-based lookup evaluated Interest forwarding performance by varying target FIB sizes from 256K to 64M entries the baseline forwarding code is used and the PIT HT size is fixed to 256K. cycles/packet for Interest forwarding by increasing HT load factors (LF=FIBsize/HT size) from 1 to 16. ◦ LF is 4, the average HT chain length is 4.07 and the compact array contains about 89% entries. ◦ LF of 8, the average chain length is 8 and the compact array only includes 31% of entries ◦ It means that most HT chain walks involve multiple D$ cache misses

Scalability of HT-based lookup the HT only constitutes about 25% of the total consumption.

Impact of FIB lookup algorithm To examine effectiveness of our 2-stage FIB lookup algorithm, we have run the forwarding code on the ISM with about 7 million Interest packets generated from all unique-name traces and the same synthetic FIB based on URLBlacklist The most-likely method chooses M=4 while the cumulative-threshold method chooses 4, 5 or 6 respectively for thresholds of 50%, 75%and 90%. Since the cumulative threshold method presents difficulties with choosing a threshold, we instead use the most-likely heuristic for the following performance evaluation (i.e. M=4 with the third best performance).

Single-core performance an average forwarding throughput of 1.90 MPPS; 1.32 MPPS Interest and 3.39 MPPS Data forwarding respectively. FIB sizes = 16M entries PIT sizes = 1M entries HT load factor of 4 M=4

Multi-core and system performance