Presentation on theme: "1 EE384Y: Packet Switch Architectures Part II Address Lookup and Classification Nick McKeown Professor of Electrical Engineering and Computer Science,"— Presentation transcript:
1 EE384Y: Packet Switch Architectures Part II Address Lookup and Classification Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University firstname.lastname@example.org http://www.stanford.edu/~nickm
2 Generic Router Architecture (Review from EE384x) Lookup IP Address Update Header Header Processing DataHdrDataHdr ~1M prefixes Off-chip DRAM Address Table Address Table IP AddressNext Hop Queue Packet Buffer Memory Buffer Memory ~1M packets Off-chip DRAM
3 Lookups Must be Fast 12540Gb/s2003 31.2510Gb/s2001 7.812.5Gb/s1999 1.94622Mb/s1997 40B packets (Mpkt/s) LineYear 1.Lookup mechanism must be simple and easy to implement 2.(Surprise?) Memory access time is the long-term bottleneck
4 Memory Technology (2003-04) TechnologySingle chip density $/chip ($/MByte) Access speed Watts/ chip Networking DRAM 64 MB$30-$50 ($0.50-$0.75) 40-80ns0.5-2W SRAM4 MB$20-$30 ($5-$8) 4-8ns1-3W TCAM1 MB$200-$250 ($200-$250) 4-8ns15-30W Note: Price, speed and power are manufacturer and market dependent.
5 Lookup Mechanism is Protocol Dependent Networking Protocol Lookup Mechanism Techniques MPLS, ATM, Ethernet Exact match search –Direct lookup –Associative lookup –Hashing –Binary/Multi-way Search Trie/Tree IPv4, IPv6Longest-prefix match search -Radix trie and variants -Compressed trie -Binary search on prefix intervals
7 Exact Matches in ATM/MPLS VCI/MPLS-label Address Memory Data (Outgoing Port, new VCI/label) VCI/Label space is 24 bits - Maximum 16M addresses. With 64b data, this is 1Gb of memory. VCI/Label space is private to one link Therefore, table size can be negotiated Alternately, use a level of indirection Direct Memory Lookup
8 Exact Matches in Ethernet Switches Layer-2 addresses are usually 48-bits long, The address is global, not just local to the link, The range/size of the address is not negotiable (like it is with ATM/MPLS) 2 48 > 10 12, therefore cannot hold all addresses in table and use direct lookup.
9 Exact Matches in Ethernet Switches (Associative Lookup) Associative memory (aka Content Addressable Memory, CAM) compares all entries in parallel against incoming data. Network address Data Associative Memory (CAM) Address 48bits Match Location Address Normal Memory Data Port
10 Exact Matches in Ethernet Switches Hashing Use a pseudo-random hash function (relatively insensitive to actual function) Bucket linearly searched (or could be binary search, etc.) Leads to unpredictable number of memory references Hashing Function Memory Address Data Network Address 48 16, say Pointer Memory Address Data List/Bucket List of network addresses in this bucket
11 Exact Matches Using Hashing Number of memory references
12 Exact Matches in Ethernet Switches Perfect Hashing Hashing Function Memory Address Data Network Address 48 16, say Port There always exists a perfect hash function. Goal: With a perfect hash function, memory lookup always takes O(1) memory references. Problem: - Finding perfect hash functions (particularly minimal perfect hashings) is very complex. - Updates?
13 Exact Matches in Ethernet Switches Hashing Advantages: –Simple –Expected lookup time is small Disadvantages –Inefficient use of memory –Non-deterministic lookup time Attractive for software-based switches, but decreasing use in hardware platforms
14 Exact Matches in Ethernet Switches Trees and Tries Binary Search Tree <> <><> log 2 N N entries Binary Search Trie 01 0101 111010 Lookup time bounded and independent of table size, storage is O(NW) Lookup time dependent on table size, but independent of address length, storage is O(N)
15 Exact Matches in Ethernet Switches Multiway tries 16-ary Search Trie 0000, ptr1111, ptr 0000, 01111, ptr 000011110000 0000, 0 1111, ptr 111111111111 Ptr=0 means no children Q: Why cant we just make it a 2 48 -ary trie?
16 Exact Matches in Ethernet Switches Multiway tries Table produced from 2 15 randomly generated 48-bit addresses As degree increases, more and more pointers are 0
17 Exact Matches in Ethernet Switches Trees and Tries Advantages: –Fixed lookup time –Simple to implement and update Disadvantages –Inefficient use of memory and/or requires large number of memory references
19 Longest Prefix Matching: IPv4 Addresses 32-bit addresses Dotted quad notation: e.g. 18.104.22.168 Can be represented as integers on the IP number line [0, 2 32 -1]: a.b.c.d denotes the integer: (a*2 24 +b*2 16 +c*2 8 +d) 0.0.0.02522.214.171.124 IP Number Line
20 Class-based Addressing ABCD 0.0.0.0 E 126.96.36.199 192.0.0.0 ClassRangeMS bitsnetidhostid A 0.0.0.0 – 128.0.0.00bits 1-7bits 8-31 B 188.8.131.52 - 184.108.40.206 10bits 2-15bits 16-31 C 192.0.0.0 - 220.127.116.11 110bits 3-23bits 24-31 D (multicast) 18.104.22.168 - 22.214.171.124 1110-- E (reserved) 240.0.0.0 - 255.255.255.255 11110--
21 Lookups with Class-based Addresses 23 186.21 Port 1 Port 2 126.96.36.199 Class A Class B Class C 192.33.32Port 3 Exact match netidport#
22 Problems with Class-based Addressing Fixed netid-hostid boundaries too inflexible –Caused rapid depletion of address space Exponential growth in size of routing tables
23 Early Exponential Growth in Routing Table Sizes Number of BGP routes advertised
24 Classless Addressing (and CIDR) Eliminated class boundaries Introduced the notion of a variable length prefix between 0 and 32 bits long Prefixes represented by P/l: e.g., 122/8, 212.128/13, 34.43.32/22, 10.32.32.2/32 etc. An l-bit prefix represents an aggregation of 2 32-l IP addresses
25 CIDR:Hierarchical Route Aggregation Backbone Router R1 R2 R3 R4 ISP, PISP, Q 192.2.0/22 200.11.0/22 Site, S 192.2.1/24 Site, T 192.2.2/24192.2.0/22200.11.0/22 192.2.1/24192.2.2/24 192.2.0/22, R2 Backbone routing table IP Number Line R2
27 Routing Lookups with CIDR 192.2.0/22, R2 192.2.2/24, R3 192.2.0/22200.11.0/22 192.2.2/24 200.11.0/22, R4 188.8.131.52184.108.40.206 220.127.116.11 LPM: Find the most specific route, or the longest matching prefix among all the prefixes matching the destination address of an incoming packet
28 Longest Prefix Match is Harder than Exact Match The destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix Hence, one needs to search among the space of all prefix lengths; as well as the space of all prefixes of a given length
29 LPM in IPv4 Use 32 exact match algorithms for LPM! Exact match against prefixes of length 1 Exact match against prefixes of length 2 Exact match against prefixes of length 32 Network Address Port Priority Encode and pick
30 Metrics for Lookup Algorithms Speed (= number of memory accesses) Storage requirements (= amount of memory) Low update time (support ~5K updates/s) Scalability –With length of prefix: IPv4 unicast (32b), Ethernet (48b), IPv4 multicast (64b), IPv6 unicast (128b) –With size of routing table: (sweetspot for todays designs = 1 million) Flexibility in implementation Low preprocessing time
31 Radix Trie P1111*H1 P210*H2 P31010*H3 P410101H4 P2 P3 P4 P1 A B C G D F H E 1 0 0 1 1 1 1 Lookup 10111 Add P5=1110* I 0 P5 next-hop-ptr (if prefix) left-ptr right-ptr Trie node
32 Radix Trie W-bit prefixes: O(W) lookup, O(NW) storage and O(W) update complexity Advantages Simplicity Extensible to wider fields Disadvantages Worst case lookup slow Wastage of storage space in chains
33 Leaf-pushed Binary Trie A B C G D E 1 0 0 1 1 left-ptr or next-hop Trie node right-ptr or next-hop P2 P4P3 P2 P1 111*H1 P210*H2 P31010*H3 P410101H4
34 PATRICIA 2 A B C E 1 0 1 Patricia tree internal node 3 P3 P2 P4 P1 1 0 0 F G D 5 bit-position left-ptr right-ptr Lookup 10111 P1111*H1 P210*H2 P31010*H3 P410101H4 Bitpos 12345
35 W-bit prefixes: O(W 2 ) lookup, O(N) storage and O(W) update complexity Advantages Decreased storage Extensible to wider fields Disadvantages Worst case lookup slow Backtracking makes implementation complex PATRICIA
36 Path-compressed Tree 1,, 2 A B C 1 0 10,P2,4 P4 P1 1 0 E D 1010,P3,5 bit-position left-ptrright-ptr variable-length bitstring next-hop (if prefix present) Path-compressed tree node structure Lookup 10111 P1111*H1 P210*H2 P31010*H3 P410101H4
37 W-bit prefixes: O(W) lookup, O(N) storage and O(W) update complexity Advantages Decreased storage Disadvantages Worst case lookup slow Path-compressed Tree
38 Multi-bit Tries Depth = W Degree = 2 Stride = 1 bit Binary trie W Depth = W/k Degree = 2 k Stride = k bits Multi-ary trie W/k
39 Prefix Expansion with Multi-bit Tries If stride = k bits, prefix lengths that are not a multiple of k need to be expanded PrefixExpanded prefixes 0*00*, 01* 11* E.g., k = 2: Maximum number of expanded prefixes corresponding to one non-expanded prefix = 2 k-1
40 Four-ary Trie (k=2) P2 P3P1 2 A B F 11 next-hop-ptr (if prefix) ptr00ptr01 A four-ary trie node P1 1 10 P4 2 H 11 P4 1 10 11 10 D C E G ptr10ptr11 Lookup 10111 P1111*H1 P210*H2 P31010*H3 P410101H4
41 Prefix Expansion Increases Storage Consumption Replication of next-hop ptr Greater number of unused (null) pointers in a node Time ~ W/k Storage ~ NW/k * 2 k-1
42 Generalization: Different Strides at Each Trie Level 16-8-8 split 4-10-10-8 split 24-8 split 21-3-8 split Optional Exercise: Why does this not work well for IPv6?
43 Choice of Strides: Controlled Prefix Expansion [Sri98] Given a forwarding table and a desired number of memory accesses in the worst case (i.e., maximum tree depth, D) A dynamic programming algorithm to compute the optimal sequence of strides that minimizes the storage requirements: runs in O(W 2 D) time Advantages Optimal storage under these constraints Disadvantages Updates lead to sub- optimality anyway Hardware implementation difficult
47 Advantages Storage is linear Can be balanced Lookup time independent of W Disadvantages But, lookup time is dependent on N Incremental updates complex Each node is big in size: requires higher memory bandwidth W-bit N prefixes: O(logN) lookup, O(N) storage Multiway Search on Intervals
48 Routing Lookups: References [lulea98] A. Brodnik, S. Carlsson, M. Degermark, S. Pink. Small Forwarding Tables for Fast Routing Lookups, Sigcomm 1997, pp 3-14. [Example of techniques for decreasing storage consumption] [gupta98] P. Gupta, S. Lin, N.McKeown. Routing lookups in hardware at memory access speeds, Infocom 1998, pp 1241-1248, vol. 3. [Example of hardware-optimized trie with increased storage consumption] P. Gupta, B. Prabhakar, S. Boyd. Near-optimal routing lookups with bounded worst case performance, Proc. Infocom, March 2000 [Example of deliberately skewing alphabetic trees] P. Gupta, Algorithms for routing lookups and packet classification, PhD Thesis, Ch 1 and 2, Dec 2000, available at http://yuba.stanford.edu/ ~pankaj/phd.html [Background and introduction to LPM]
49 Routing lookups : References (contd) [lampson98] B. Lampson, V. Srinivasan, G. Varghese. IP lookups using multiway and multicolumn search, Infocom 1998, pp 1248-56, vol. 3. [LC-trie] S. Nilsson, G. Karlsson. Fast address lookup for Internet routers, IFIP Intl Conf on Broadband Communications, Stuttgart, Germany, April 1-3, 1998. [sri98] V. Srinivasan, G.Varghese. Fast IP lookups using controlled prefix expansion, Sigmetrics, June 1998. [wald98] M. Waldvogel, G. Varghese, J. Turner, B. Plattner. Scalable high speed IP routing lookups, Sigcomm 1997, pp 25-36.