1 Basic Data Structures for IP lookups and Packet Classification.

1 Basic Data Structures for IP lookups and Packet Classification

2 Routing table examples 4.0.0.0/8 6.0.0.0/8 9.2.0.0/16 9.20.0.0/17 12.0.0.0/8 13.0.0.0/8 15.0.0.0/8 16.0.0.0/8 17.0.0.0/8 18.0.0.0/8 20.0.0.0/8 24.0.0.0/18 24.0.0.0/14 24.1.0.0/17 24.4.0.0/17 24.48.0.0/18 2001:0200:0136::/48 2001:0200:0900::/40 2001:0200:0905::/48 2001:0200:0c00::/40 2001:0200::/32 2001:0200:c000::/35 2001:0200:e000::/35 2001:0208::/32 2001:0218::/32 2001:0218:6002::/48 2001:0220::/35 2001:0238::/32 2001:0240::/32 2001:0250:0204::/48 2001:0250::/32 2001:0250:e000::/36

oix-route-views - Route Views Archive http://archive.routeviews.org/oix-route-views/ * 3.0.0.0 203.181.248.233 0 7660 1 7018 80 i * 4.0.0.0 203.194.0.5 0 9942 1 i * 6.1.0.0/16 203.194.0.5 0 9942 1 7170 1455 i * 6.2.0.0/22 203.194.0.5 0 9942 1 7170 1455 i * 6.3.0.0/18 203.194.0.5 0 9942 1 7170 1455 i * 6.4.0.0/16 203.194.0.5 0 9942 1 7170 1455 i * 6.5.0.0/19 203.194.0.5 0 9942 1 7170 1455 i * 6.8.0.0/20 203.194.0.5 0 9942 1 7170 1455 i * 6.9.0.0/20 203.194.0.5 0 9942 1 7170 1455 i * 6.10.0.0/15 203.194.0.5 0 9942 1 7170 1455 i * 6.14.0.0/15 203.194.0.5 0 9942 1 7170 1455 i * 9.2.0.0/16 203.194.0.5 0 9942 1239 701 i * 9.184.112.0/20 203.194.0.5 0 9942 3786 i * 9.186.144.0/20 203.194.0.5 0 9942 3786 i * 12.0.0.0 203.194.0.5 0 9942 1239 7018 i * 12.0.48.0/20 203.194.0.5 0 9942 16631 16631 16631 1742 i 3

Routing table format (1/3) Destination: IP address of the packet's final destinationIP address Next hop: The IP address to which the packet is forwarded Interface: The outgoing network interface the device should use when forwarding the packet to the next hop or final destination Metric: Assigns a cost to each available route so that the most cost-effective path can be chosen 4

Routing table format (2/3) Routes: Includes (1) directly-attached subnets, (2) indirect subnets that are not attached to the device but can be accessed through one or more hops, and (3) default routes to use for certain types of traffic or when information is lacking.subnet 5

Routing table format (3/3) Routing tables can be maintained manually or dynamically.  Tables for static network devices do not change unless a administrator manually changes them.  In dynamic routing, devices build and maintain their routing tables automatically by using routing protocols to exchange information about the surrounding network topology.protocoltopology  Dynamic routing tables allow devices to "listen" to the network and respond to occurrences like device failures and network congestion. 6

7 Prefix Prefix Length Distribution

8 Prefix Length format: b n-1 …b 0 / l ( l is prefix length)  In IPv4, d3.d2.d1.d0/ l can also be used. Mask format: b n-1 …b 0 /m n-1 …m 0 (prefix length is l )  m j = 1 for all n – 1  j  n – l+ 1, and m j =0 otherwise.  d 3.d 2.d 1.d 0 / m 3.m 2.m 1.m 0 for IPv4. Ternary format: b n-1 …b n- l +1 *…* (prefix length is l )  b j = 0 or 1 for n – 1  j  n – l + 1.  If t k is *, then t j must also be * for all j < k.  A single don’t care bit can be used to denote a series of don’t care bits, e.g., 1* denotes 1**** in the 5-bit address space.

9 Prefix (n+1)-bit format: b n-1 …b n- l +1 10…0 ( l is prefix len)  for the prefix b n-1 …b n- l +1 * of length l in ternary format, there is one trailing ‘1’ followed by n – l 0’s. or (n+1)-bit format: b n-1 …b n- l +1 01…1  for the prefix b n-1 …b n- l +1 * of length l in ternary format, there is one trailing ‘0’ followed by n – l 1’s.

10 5-bit Prefixes: b n-1 … b n- l +1 10…0 1111111111 1111011110 1110111101 1110011100 1110*1110* 1111*1111* 111**111** 0001100011 0001000010 0000100001 0000000000 0000*0000* 0001*0001* 000**000** ***** 0**** 00*** 000111000111 000101000101 000011000011 000001000001 000010000010 000110000110 000100000100 001000001000 111111111111 111101111101 111011111011 111001111001 111010111010 111110111110 111100111100 11*** 111000111000 6-bit binary address space 000000 is not used

11 5-bit Prefixes: b n-1 … b n- l +1 01…1 1111111111 1111011110 1110111101 1110011100 1110*1110* 1111*1111* 111**111** 0001100011 0001000010 0000100001 0000000000 0000*0000* 0001*0001* 000**000** ***** 0**** 00*** 000111000111 000101000101 000011000011 000001000001 000010000010 000110000110 000100000100 111101111101 111011111011 111001111001 111010111010 111110111110 111100111100 11*** 111000111000 6-bit binary address space 111111 is not used 110111110111 000000000000

12 Prefix properties Disjoint prefixes:  Two prefixes are said to be disjoint if they do not share any address. Prefix enclosure:  A = b n-1 …b j …b i * and B = b n-1 …b j * and j > i.  Prefix A is enclosed by B (B  A) since the IP address space covered by A is a subset of that covered by B, where  is the enclosure operator.  A special case of overlapping. Prefix comparison  The inequality 0 < * < 1 is used to compare two prefixes in the ternary representation of prefixes.

13 Prefix properties The most specific prefixes (MSP):  The prefixes that do not cover any others.  Disjoint, so can be put in an array for binary search Grouping prefixes in layers based on MSP.  Six layers at most for IPv4 tables 1 2 3 11 2 4 11 2 3 11 2 5 4 11 2 3 11 2 11 2 3 11 2 1

14 Prefix properties Database (year-month) AS6447 (2000-4) AS6447 (2002-4) AS6447 (2005-4) number of prefixes 79,530124,798163,535 Level-1 prefixes73,891(92.9%)114,745 (91.9%)150,245 (91.9%) Level-2 prefixes4,874 (6.1%)8,496 (6.8%)11,135 (6.8%) Level-3 prefixes642 (0.8%)1,290 (1%)1,775 (1.1%) Level-4 prefixes104 (0.1%)235 (0.2%)329 (0.2%) Level-5 prefixes172945 Level-6 prefixes236

15 Prefix properties Prefix length Number

16 Prefix Next-hop P1111*H1 P210*H2 P31010*H3 P410101H4 P1 is disjoint from the other three prefixes. P2  P3  P4 Longest prefix match(LPM), not exact match enclosure makes (1) sorting prefixes and (2) binary searching prefixes difficult Forwarding table example

17 Example Forwarding Table PrefixNext-hop P1111*H1 P210*H2 P31010*H3 P410101H4 Longest prefix match(LPM), not exact match Prefix enclosure makes (1) sorting prefixes and (2) binary searching prefixes difficult. So, trie based schemes emerge naturally

18 Binary Trie (Radix Trie) P1111*H1 P210*H2 P31010*H3 P410101H4 P2 P3 P4 P1 A B C G D F H E 1 0 0 1 1 1 1 Lookup 10111 Add P5=1110* I 0 P5 next-hop-ptr (if prefix) left-ptr right-ptr Trie node

成功大學資訊工程系 CIAL 實驗室 19 Binary prefix search Definition 1 (Prefix comparison): The inequality 0 < * < 1 is used to compare two prefixes in the ternary format.

成功大學資訊工程系 CIAL 實驗室 20 Binary prefix search Directly performing a binary search on the list of sorted prefixes may encounter a failure: Dst = 01011000 123 Correct matchFailed match 4

成功大學資訊工程系 CIAL 實驗室 21 Binary prefix search Enclosure relationship between prefixes results in the search failure Generate some auxiliary prefixes that inherit the routing information of the original LPM (e.g., F) and put them where the binary search operations can find them. ex. auxiliary prefix 01011000. Therefore, it is feasible to split prefix F into two parts such that both sides of prefix O are covered.

成功大學資訊工程系 CIAL 實驗室 22 Binary prefix search The full tree expansion. The full tree expansion splits the enclosure prefixes into many longer prefixes (leaf pushing). Auxiliary prefix merges Many auxiliary prefixes may inherit the same routing information of a common enclosure prefix. These prefixes can be merged into one. The merge operation is defined as follows. Prefix merge: The prefix obtained by merging a set of consecutive prefixes is the longest common ancestor (LCA) of these consecutive prefixes in the binary trie.

成功大學資訊工程系 CIAL 實驗室 23 Binary prefix search The full tree expansion F3=01011000

成功大學資訊工程系 CIAL 實驗室 24 Binary prefix search The full tree after the merge operations F3=01011000

25 Binomial spanning tree 1000 1100 1110 1111 0 1 2 3 A 4-cube and its corresponding binomial spanning tree. 3 2 1 0 1000 1100 1110 1111 0000

26 Perfect code: Hamming code (7, 4) 7-cube example: 0000000 1000000010000000100000001000000010000000100000001 2 4 (16) one-level binomial spanning trees = 7-cube

27 r = received code Syndrome s = (s 2 s 1 s 0 ) = r ． H 7 T Corrected code = r + ErrorPattern[s] Perfect code: Hamming code (7, 4) 1 1 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 0 1 H 7 = G 7 = 1 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 0 1 1 1 1 (a) Parity-check and generator matrices of Hamming code (7, 4). (c) Decoding table 0000000-000 0010000-001 0100000-010 0110010-000 1000000-100 1010100-000 1101000-000 1110001-000 SyndromeErrorPattern Inner product Transpose

28 Perfect code: Hamming code (7, 4) uCodeword 00000000-000 00010001-111 00100010-011 00110011-100 01000100-101 01010101-010 01100110-110 01110111-001 10001000-110 10011001-001 10101010-101 10111011-010 11001100-011 11011101-100 11101110-000 11111111-111 16 codewords Generate 16 Codewords u ． G 7

29 Perfect code: Golay code (23, 12) 2 12 3-level binomial spanning trees C(23,0)+C(23, 1)+C(23,2)+C(23,3) = 1 + 23 + 23*22/2 +3*22*21/(3*2) = 24 + 23*11 + 23*11*7 = 24 + 253*8 = 24 + 2024 = 2048 = 2 11

30 Ranges Why ranges?  Prefixes can also be represented by ranges.  The source/destination port fields of rule tables for packet classification are ranges. Prefixes are special cases of ranges. Prefix b n-1 …b n- l +1 * of length l is the range of addresses from b n-1 …b n- l +1 0…0 to b n-1 …b n- l +1 1…0, denoted as [ b n-1 …b n- l +1 0…0, b n-1 …b n- l +1 1…0 ]. Overlapping:  Two ranges are overlapping if they are not disjoint. Partially overlapping:  Two ranges are partially overlapping if they are neither disjoint nor enclosing.

31 Elementary Intervals for Ranges Definition: Let the set of k elementary intervals constructed from a set R of ranges in the address space of 0 … N – 1 be X = {X i | X i = [e i, f i ], for i = 1 to k}. X must satisfy the following: 1)e 1 = 0 and f k = N – 1, 2)f i = e i +1 – 1 for i = 1 to k – 1, 3)all addresses in X i are covered by the same subset of R (called the range matching set of X i ) denoted by EI i, and 4)EI i  EI i +1, for i = 1 to k – 1.

32 Elementary Intervals for Ranges IDPrefixRange Minus-1Traditional startfinishstartfinish P1000000/2[0, 15]-15015 P2010000/2[16, 31]15311631 P3000100/4[4, 7]3747 P4100000/1[32, 63]31-3263 P5010110/5[22, 23]21232223 P6110000/2[48, 63]47-4863 P7110000/4[48, 51]47514851 P8110111/6[55, 55]54555555 P9100000/3[32, 39]31393239

33 Elementary Intervals for Ranges Graphical view EI 7 {P4,P9} X 7 [32, 39] EI 8 {P4} X 8 [40, 47] EI 9 {P4,P6,P7} X 9 [48, 51] EI 10 {P4,P6} X 10 [52, 54] EI 11 {P4,P6,P8} X 11 [55, 55] EI 12 {P4,P6} X 12 [56, 63] EI 4 {P2} X 4 [16, 21] EI 5 {P2,P5} X 5 [22, 23] EI 6 {P2} X 6 [24, 31] EI 2 {P1,P3} X 2 [4, 7] EI 3 {P1} X 3 [8, 15] EI 1 {P1} X 1 [0, 3]

34 Segment Tree y w z uv h q g r s t 15 23 15 P1 3 3131 5454 P4P6 P2 21 47 P4 39 7 55 P8 X 12 [56,63] X 11 [55,55] 51 P7 X 10 [52,54] X 9 [48,51] P9 X 8 [40,47] X 7 [32,39] P2 X 6 [24,31] X 5 [22,23] X 4 [16,21] P5 P1 X 3 [8,15] X 2 [4,7] X 1 [0,3] P3 leaf node

35 Interval Tree Each node in an interval tree is associated with a key which must be covered by at least one range. Depending on whether a node can store 1 or 1+ range, fat interval tree  each node is allowed to store more than one range.  The number of nodes in the interval tree is O(N).  To insert a range R = [e, f], if R covers root’s key, R is stored in the root. Otherwise, R is inserted in the left (right) subtree of the root when f is smaller (e is larger) than the key of the root.  When R does not cover the key of any node which is traversed, a new node with the key selected from addresses e to f is created and inserted as the left or right child of the node which was last visited.  O(logN + k) time, k is # of prefixes that match the given address.  Prefix insertion and deletion are very expensive because ranges in some nodes may need relocations after tree rotations.

36 Interval Tree thin interval tree: each node of the interval tree stores exactly one range. Since ranges may overlap, two comparison rules are used to compare if a range is smaller or larger than another range. For two ranges R1 = [e1, f1] and R2 = [e2, f2],  R1 < R2 if e1 < e2. If tie, the second rule applies.  R1 < R2 if R2 is a subrange of R1 (i.e. e1 = e2 and f2 < f1). Also, a node stores a max value, Max(the finish endpoints of all ranges) stored in the subtree rooted at that node. In contrast with the fat interval tree, prefix insertion and deletion take O(logN) time. However, O(min{N, klogN}) time is needed to find the longest matching prefix as well as the highest-priority matching prefix, where k is the number of matched prefixes for a given address.

37 Hash Table Narrowing down the search space. Index = Hash_function(key)%m, where key may be the first k bits of IP addresses and m is the size of the hash table. Perfect hash: no collision Minimal perfect hash: A perfect hash, where the size of its hash table is k for k different hashing keys.

38 Hash Table H(k1)%m k1 Array of m elements Difficulties: prefixes and ranges can not be used as the keys of the hash functions directly. H(k2)%m k2 collision

39 Hash Table: 8-bit Segmentation table A 8-bit segmentation table is usually used for IPv4 forwarding tables because there is no prefix of length shorter than 8. H(prefix)%256 (MSB 8 bits of prefix) Array of 256 elements Prefixes with the same first 8 MSB bits 0 1 255 Prefix: 0.x.y.z Maybe empty set

40 Hash Table: 16-bit Segmentation table Prefixes of length <= 16 must be stored properly.  For example, duplicate 0.0.b.c/15 into buckets 0 and 1 or store the port of 0.0.b.c/15 into elements 0 and 1.  Put them into another set (good for update but need to search two sets in the worst case). H(prefix)%2 16 (MSB 16 bits of prefix) Array of 2 16 elements Prefixes with the same first 16 MSB bits 0 1 2 16 -1 Prefix: 0.0.y.z Maybe empty set Prefixes of length  16

41 Hash Table: Compression Since there are many empty elements in the segmentation table, we can use bitmap to compress the segmentation table. Array of M elements Prefixes with the same first 16 MSB bits 0 1 M-1 Prefix: 0.0.y.z Must be non-empty 1100...0110011100...011001 Prefix: 0.1.y.z 2 16 -Bitmap containing M 1’s

42 Bloom filter H 1 (key) = P 1 H 2 (key) = P 2 H 3 (key) = P 3 H 4 (key) = P 4 … H k (key) = P k H i () is a hash function, e.g. MD5 1 1 1 1 Bit vector of m bits m bits

43 Bloom filter After inserting n keys (kn bits), the probability that a particular bit is still 0 is (1-1/m) kn So, the probability of a false positive is p for the right-hand side is minimized when k = ln2  m/n  m/n = 6, k = 4: p = 0.0561  m/n = 8, k = 6: p = 0.0215  m/n=12, k = 8: p =0.00314  m/n=16, k=11: p =0.000458

44 Bloom filter Update:  Update whole SC Threshold: when the digests differ beyond a threshold, say, 5% or 10%, Regular time intervals: every say 5 mins,

45 Counting Bloom filter Deletion operation for local digest:  For each bit in the m-bit vector, use an l -bit counter to record the number of times that a particular bit is turned on by different URLs  l = 4 by experience  If deletion is not supported, cache summary must be rebuilt from scratch on a periodic basis to erase stale bits and prevent bit pollution

1 Basic Data Structures for IP lookups and Packet Classification.

Similar presentations

Presentation on theme: "1 Basic Data Structures for IP lookups and Packet Classification."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Basic Data Structures for IP lookups and Packet Classification.

Similar presentations

Presentation on theme: "1 Basic Data Structures for IP lookups and Packet Classification."— Presentation transcript:

Similar presentations

About project

Feedback