Efficient Packet Classification with Digest Caches Francis Chang Wu-chang Feng Wu-chi Feng Kang Li.

Slides:

Advertisements

Similar presentations

Chapter 4 Memory Management Basic memory management Swapping

Advertisements

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.

1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.

Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.

Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.

1 Memory Management Managing memory hierarchies. 2 Memory Management Ideally programmers want memory that is –large –fast –non volatile –transparent Memory.

Author: Francis Chang, Wu-chang Feng, Kang Li Publisher: INFOCOM 2004 Presenter: Yun-Yan Chang Date: 2010/12/01 1.

Hash-Based IP Traceback Best Student Paper ACM SIGCOMM’01.

Router Architecture : Building high-performance routers Ian Pratt

Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.

CSC1016 Coursework Clarification Derek Mortimer March 2010.

Author: Kang Li, Francis Chang, Wu-chang Feng Publisher: INCON 2003 Presenter: Yun-Yan Chang Date:2010/11/03 1.

Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.

1 Memory-Efficient 5D Packet Classification At 40 Gbps Authors: Ioannis Papaefstathiou, and Vassilis Papaefstathiou Publisher: IEEE INFOCOM 2007 Presenter:

Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.

Translation Buffers (TLB’s)

Memory Management Virtual Memory Page replacement algorithms

1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.

1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.

Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.

1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.

Chapter 9 Classification And Forwarding. Outline.

Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.

Better by a HAIR: Hardware-Amenable Internet Routing Brent Mochizuki University of Illinois at Urbana-Champaign Joint work with: Firat Kiyak (Illinois)

Topics covered: Memory subsystem CSE243: Introduction to Computer Architecture and Hardware/Software Interface.

SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.

CMPE 421 Parallel Computer Architecture

Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)

Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables

How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.

CS 1104 Help Session I Caches Colin Tan, S

1 A Throughput-Efficient Packet Classifier with n Bloom filters Authors: Heeyeol Yu and Rabi Mahapatra Publisher: IEEE GLOBECOM 2008 proceedings Present:

The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.

File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.

Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University

CSE378 Intro to caches1 Memory Hierarchy Memory: hierarchy of components of various speeds and capacities Hierarchy driven by cost and performance In early.

High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.

The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

COMP SYSTEM ARCHITECTURE HOW TO BUILD A CACHE Antoniu Pop COMP25212 – Lecture 2Jan/Feb 2015.

Stochastic Fair Blue An Algorithm For Enforcing Fairness Wu-chang Feng (OGI/OHSU) Dilip Kandlur (IBM) Debanjan Saha (Tellium) Kang Shin (University of.

1 ECE 526 – Network Processing Systems Design System Implementation Principles I Varghese Chapter 3.

Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.

IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.

1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.

ECE 526 – Network Processing Systems Design Network Address Translator II.

Cache Data Compaction: Milestone 2 Edward Ma, Siva Penke, Abhijeeth Nuthan.

Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.

Author : Tzi-Cker Chiueh, Prashant Pradhan Publisher : High-Performance Computer Architecture, Presenter : Jo-Ning Yu Date : 2010/11/03.

Memory Hierarchy Ideal memory is fast, large, and inexpensive

Memory COMPUTER ARCHITECTURE

Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.

The Variable-Increment Counting Bloom Filter

Demand Paging Reference Reference on UNIX memory management

Single-Packet IP Traceback

CS61C : Machine Structures Lecture 6. 2

Demand Paging Reference Reference on UNIX memory management

Bojian Zheng CSCD70 Spring 2018

Performance metrics for caches

Performance metrics for caches

CSC3050 – Computer Architecture

Ch 17 - Binding Protocol Addresses

Duo Liu, Bei Hua, Xianghui Hu, and Xinan Tang

A flow aware packet sampling mechanism for high speed links

Performance metrics for caches

Presentation transcript:

Efficient Packet Classification with Digest Caches Francis Chang Wu-chang Feng Wu-chi Feng Kang Li

Packet classification Essential in all network devices Routers, firewalls, NATs, Diffserv/QoS markers, etc. But, complexity increasing Number of rules Number of fields to classify Size of header (IPv6) Number of flows

Packet classification Performance-bound by memory Must store and access large headers and many rules quickly Lookup algorithms perform better when given more memory Classic space-time trade-off in performance Supporting line speeds requires a large amount of fast memory Fast memory expensive Large memory slow

Probabilistic Networking Goal of work Throw a wrench into space-time trade-off Examine a third axis: accuracy Reduce memory requirements by relaxing the accuracy of packet classification function What are the quantifiable benefits of sacrificing accuracy on the packet classification function?

What? Willingly make mistakes? Sure Packet errors and lack of reliability are a fact of life Masked by application layer or ignored Lots of packets are bad, some are undetectably bad [Stone00] TCP 1 in 1100 to TCP packets fail checksum 1 in 16 million to 10 billion TCP packets are UNDECTABLY bad UDP UDP packets are not required to have valid cksum Even if the cksum is bad, OS will give the packet to the application (Linux) Routing problems occur frequently Transient loops [Hengartner02] Outages

Several places to apply idea… Full classification Exact multi-dimensional solutions still too costly [Baboescu03] Inaccuracy may help Work in progress… Classification caches Space requirements grow linearly with number of flows and fields Use lossy recall in remembering previous classification decisions to reduce cache size Our current work..

Initial approach Bloom filter Approximate data structure to store flows matching a binary predicate Spell checkers Browser and web caches How it works L х N array of memory Addressed by L independent hash functions Each function addresses N buckets Storing new flows Set bits corresponding to the results of L hash functions on header Looking up flows Check bits corresponding to the results of L hash functions on header Collisions in filter cause inaccurate classifications Francis Chang, Wu-chang Feng, Kang Li, “Approximate Caches for Packet Classification”, in Proceedings of INFOCOM ’04, March 2004.

Bloom filter h L-1 h1h1 Flow insertion Unknown flow 0 0 h0h N-1 N L virtual bins out of L*N actual bins

The value of making mistakes Initial results promising Small, high-performance caches with 1 in a billion error rate Storage capacity invariant to header size and fields Size of approximate cache determined by number of flows to store and desired accuracy Size of exact cache determined by number of flows to store and header size and fields IPv4-based connection identifier = 13 bytes IPv6-based connection identifier = 37 bytes

But… Some glaring disadvantages Large number of levels and memory lookups required Not amenable to most NP architectures Requires hardware support and parallel, bit-level memory addressing Aging properties Can not gracefully age cache No selective replacement policies possible (i.e. LRU) Must periodically expunge entire cache Results in large variance in full classifications required

New approach Digest caches Use a traditional cache architecture Store and use a digest of classification fields instead of full header(s)

Digest caches How it works Upon full classification of packet header fields (P) Calculate h 1 (P) and h 2 (P) Use h 1 (P) to select cache line Insert h 2 (P) and classification result into cache line Subsequent packets Calculate h 1 (P) and h 2 (P) Use h 1 (P) to select cache line Lookup h 2 (P) in cache line If match, follow cached result If no match, perform full classification Misclassification caused by hash-signature collisions Increases as the number of bits in digest decreases (c) Increases as the associativity of cache increases (d)

Digest caches Fixes all of the problems of Bloom filter caches Less memory accesses NP-friendly Does not require parallel, bit-addressable memory access Can alleviate need for associative hardware (more later) Gracefully ages Can smoothly remove old entries

Storage comparison between approaches

Evaluation Trace-driven simulation PCCS simulator ( Packet traces Bell Labs trace One hour trace at Bell Labs Research in May 2002 OGI trace One hour trace of OGI’s OC-3c link on July 26, 2002

Choosing associativity Experiment Fixed misclassification probability of 10 –9 Variable bit digest based on associativity Results similar to previous studies Small amount of associativity ideal for performance

Comparing approaches Digest cache 32-bit digests 4-way associativity Bloom filter cache Optimal, 30-level filter Exact caches IPv4 and IPv6 flow caches

Hit rates vs. cache size

Miss-rate variance vs. cache size

NP implementation IXP1200, L3Forwarder 4-way associative digest cache 803Mbs Bloom filter cache 1 level = 990 Mbs 4 levels = 652 Mbs

A final note to those who hate being wrong Can be used to accelerate exact caches Consider Exact cache where where associativity is emulated Entire cache line must be read sequentially to find match Digest cache acceleration Use smaller, digest cache stored in fastest (possibly associative) memory to mirror entries in exact cache Lookup in digest cache gives exact location of relevant entry in exact cache Good for implementing associative caches on NPs that do not have hardware support Speed-up analysis in paper

Questions?

Extra slides

Misclassification rates for digest caches 4-way associative digest cache

Cache misses over time