Applied Research Laboratory Edward W. Spitznagel 24 October Packet Classification using Extended TCAMs Edward W. Spitznagel, Jonathan S. Turner, David E. Taylor Supported by NSF ANI , DARPA N
Applied Research Laboratory Edward W. Spitznagel 24 October Packet Classification Problem Suppose you are a firewall, or QoS router, or network monitor... You are given a list of rules (filters) to determine how to process incoming packets, based on the packet header fields –Some fields in the rules are specified with bit masks; others with ranges Goal: when a packet arrives, find the first rule that matches the packet’s header fields Source Address Destination Address Filter Source Port Destination Port Protocol 11xx01xxa TCP 01xx0010b3-15 UDP 0101xxxxc3** xd--ICMP Action fwd 7 fwd 2 deny fwd 5
Applied Research Laboratory Edward W. Spitznagel 24 October Packet Classification Problem Example: packet arrives with header (0101, 0010, 3, 5, UDP) –classification result: filter b is matched –filter c also matches, but, b occurs before c in the list Easy to do when we have only a few rules; very difficult when we have 100,000 rules and packets arrive at 40 Gb/s Source Address Destination Address Filter Source Port Destination Port Protocol 11xx01xxa TCP 01xx0010b3-15 UDP 0101xxxxc3** xd--ICMP Action fwd 7 fwd 2 deny fwd 5
Applied Research Laboratory Edward W. Spitznagel 24 October Geometric Representation Filters with K fields can be represented geometrically in K dimensions Example: Source Address Source Port Source AddressSource PortFilter xxx2-3a bxx17c a b cccc
Applied Research Laboratory Edward W. Spitznagel 24 October Related Work TCAM-based parallel classification –CoolCAMs (Narlikar, Basu, Zane) for IP lookup SRAM-based sequential classification –Recursive Flow Classification (Gupta, McKeown) –HiCuts (Gupta, McKeown) –Extended Grid of Tries (Baboescu, Singh, Varghese) –HyperCuts (Singh, Baboescu, Varghese, Wang) SRAM: 6 transistors per bit (vs. 16 for TCAM), but the SRAM approaches use more bits per filter
Applied Research Laboratory Edward W. Spitznagel 24 October Most popular practical approach to high-performance packet classification Hardware compares query word (packet header) to all stored words (filters) in parallel –each bit of a stored word can be 0, 1, or X (don’t care) Very fast, but not without drawbacks: –High power consumption limits scalability –inefficient representation of ranges Ternary CAMs
Applied Research Laboratory Edward W. Spitznagel 24 October Source Address Destination Address Filter 11xxxxxxa 0xxx01xxb xxxx0110c Query: Match! Doesn’t Match Match! Entry 0 (filter a) is the first matching filter Packet: Src. Addr.Dest. Addr. ContentsAddress 11xxxxxx0 0xxx01xx1 xxxx01102 TCAM Ternary CAM - Example
Applied Research Laboratory Edward W. Spitznagel 24 October Range Matching in TCAMs Convert ranges into sets of prefixes –1-4 becomes 001, 01*, and 100 –3-5 becomes 011 and 10* Source Port Destination Port F Source PortDestination PortFilter F
Applied Research Laboratory Edward W. Spitznagel 24 October Range Matching in TCAMs With two 16-bit range fields, a single rule could require up to 900 TCAM entries! Typical case: entire filter set expands by a factor of 2 to Source Port Destination Port bc ef a d Source PortDestination PortFilter 00110*a01*10*b10010*c001011d01*011e100011f
Applied Research Laboratory Edward W. Spitznagel 24 October Extended TCAMs Extend standard TCAM architecture to enable classification with larger rulesets Partitioned TCAM, for reduced power –inspired by CoolCAMs –differences in indexing, search and partitioning algorithms Support range matching directly in hardware
Applied Research Laboratory Edward W. Spitznagel 24 October Use of Partitioned TCAM Main component of power use in TCAM search is proportional to number of entries searched Partitioning the TCAM: –divide TCAM into blocks of entries –each block is enabled for search via an associated index filter
Applied Research Laboratory Edward W. Spitznagel 24 October Use of Partitioned TCAM Example: suppose we are given the following filters: 0-15, 0xxx 0-6, 1xxx 7-15, 1xxx 0-15, xxxx 1-13, 001x 2-3, 00xx 11-14, 011x 12-12, 01xx 0-5, , 11xx 7-7, 110x 13-14, 11xx 11-15, 111x 9-10, xxxx 0-14, 1010 index filters: filter blocks: a.1-13, 001x b.2-3, 00xx c.9-10, xxx1 d.11-14, 011x e.12-13, 0xxx f.0-14, 1010 g.7-7, 110x h.0-5, 1110 i.1-2, 1x1x j.13-14, 11xx k.11-15, 111x A real Extended TCAM would have more blocks, and more filters per block.
Applied Research Laboratory Edward W. Spitznagel 24 October Use of Partitioned TCAM Example: classify packet with header values (2, 1010) –index block: second and fourth filters match –search second and fourth filter blocks –find matching filters (1-2, 1x1x) and (0-14, 1010) 0-15, 0xxx 0-6, 1xxx 7-15, 1xxx 0-15, xxxx 1-13, 001x 2-3, 00xx 11-14, 011x 12-12, 01xx 0-5, , 11xx 7-7, 110x 13-14, 11xx 11-15, 111x 9-10, xxxx 0-14, 1010 index filters: filter blocks:
Applied Research Laboratory Edward W. Spitznagel 24 October Use of Partitioned TCAM The key to minimizing power consumption: Organize filters so that only a few TCAM blocks must be searched to find the filters matching a packet. –Use a filter grouping algorithm 0-15, 0xxx 0-6, 1xxx 7-15, 1xxx 0-15, xxxx 1-13, 001x 2-3, 00xx 11-14, 011x 12-12, 01xx 0-5, , 11xx 7-7, 110x 13-14, 11xx 11-15, 111x 9-10, xxxx 0-14, 1010 index filters: filter blocks:
Applied Research Laboratory Edward W. Spitznagel 24 October f c a.1-13, 001x b.2-3, 00xx c.9-10, xxxx d.11-14, 011x e.12-13, 0xxx f.0-14, 1010 g.7-7, 110x h.0-5, 1110 i.1-2, 11xx j.13-14, 11xx k.11-15, 111x a b d e h i g k j 0-15, 0xxx Index entry filters a, b, d, e 24 October
Applied Research Laboratory Edward W. Spitznagel 24 October f c a.1-13, 001x b.2-3, 00xx c.9-10, xxxx d.11-14, 011x e.12-13, 0xxx f.0-14, 1010 g.7-7, 110x h.0-5, 1110 i.1-2, 11xx j.13-14, 11xx k.11-15, 111x g k j 0-15, 0xxx Index entry filters a, b, d, e 0-6, 1xxx h, i h i 24 October
Applied Research Laboratory Edward W. Spitznagel 24 October f c a.1-13, 001x b.2-3, 00xx c.9-10, xxxx d.11-14, 011x e.12-13, 0xxx f.0-14, 1010 g.7-7, 110x h.0-5, 1110 i.1-2, 11xx j.13-14, 11xx k.11-15, 111x g k j 0-15, 0xxx Index entry filters a, b, d, e 0-6, 1xxx h, i 7-15, 1xxx g, j, k 24 October
Applied Research Laboratory Edward W. Spitznagel 24 October a.1-13, 001x b.2-3, 00xx c.9-10, xxxx d.11-14, 011x e.12-13, 0xxx f.0-14, 1010 g.7-7, 110x h.0-5, 1110 i.1-2, 11xx j.13-14, 11xx k.11-15, 111x 0-6, 1xxx 7-15, 1xxx 0-15, 0xxx Index entry filters a, b, d, e h, i g, j, k 0-15, xxxx c, f Next phase: f c 24 October
Applied Research Laboratory Edward W. Spitznagel 24 October a.1-13, 001x b.2-3, 00xx c.9-10, xxxx d.11-14, 011x e.12-13, 0xxx f.0-14, 1010 g.7-7, 110x h.0-5, 1110 i.1-2, 11xx j.13-14, 11xx k.11-15, 111x 0-6, 1xxx 7-15, 1xxx 0-15, 0xxx Index entry filters a, b, d, e h, i g, j, k 0-15, xxxx c, f Next phase: 24 October
Applied Research Laboratory Edward W. Spitznagel 24 October Creating a set of partitions At most k filters per region (k = block size) Regions within the same partition do not overlap Total number of regions equals the index size
Applied Research Laboratory Edward W. Spitznagel 24 October Range Matching Store a pair of values (lo, hi ) for each range match field Range check circuitry compares query values against lo and hi to determine if query is in range –Transistors per bit of range field is twice that of ordinary TCAM –But, for typical IPv4 applications, this results in just a 22% increase in overall transistor count
Applied Research Laboratory Edward W. Spitznagel 24 October Performance Metrics Power Fraction = –a measure of power usage, relative to a standard TCAM –smaller is better Storage Efficiency = –higher is better; 1 is optimal index size + (# of partitions)(block size) number of filters index size + (# of blocks)(block size)
Applied Research Laboratory Edward W. Spitznagel 24 October Different Block Sizes Block size=256 Block size=64 Block size =32 Block size=16 Block size=128
Applied Research Laboratory Edward W. Spitznagel 24 October Results: Power Fraction Block size = 32Block size = 64Block size = 128 Block size = 256 Basic Algorithm Refined
Applied Research Laboratory Edward W. Spitznagel 24 October Results: Storage Efficiency Block size = 32Block size = 64Block size = 128 Block size = 256 Basic Algorithm Refined
Applied Research Laboratory Edward W. Spitznagel 24 October Current/Future Work Computational complexity of filter grouping problem Filter updates (add/delete operations) Multi-level indices Different partitioning algorithms Application to SRAM/DRAM-based classification techniques
Applied Research Laboratory Edward W. Spitznagel 24 October Summary Packet Classification is important for many advanced network services TCAMs scale poorly due to power consumption and inefficient range match representations Extended TCAMs: solve these issues by using partitioned TCAM and hardware support for range matching –power consumption greatly reduced (typically to 5% or less of power used by a standard TCAM) –range match hardware: avoid inefficiency in representing ranges
Applied Research Laboratory Edward W. Spitznagel 24 October Questions? ?