# Spring 2006CS 685 Network Algorithmics1 Longest Prefix Matching Trie-based Techniques CS 685 Network Algorithmics Spring 2006.

## Presentation on theme: "Spring 2006CS 685 Network Algorithmics1 Longest Prefix Matching Trie-based Techniques CS 685 Network Algorithmics Spring 2006."— Presentation transcript:

Spring 2006CS 685 Network Algorithmics1 Longest Prefix Matching Trie-based Techniques CS 685 Network Algorithmics Spring 2006

CS 685 Network Algorithmics2 The Problem Given: –Database of prefixes with associated next hops, say: 1000101*  128.44.2.3 01101100*  4.33.2.1 10001*  124.33.55.12 10*  151.63.10.111 01*  4.33.2.1 1000100101*  128.44.2.3 –Destination IP address, e.g. 120.16.8.211 Find: the longest matching prefix and its next hop

Spring 2006CS 685 Network Algorithmics3 Constraints Handle 150,000 prefixes in database Complete lookup in minimum-sized (40-byte) packet transmission time –OC-768 (40 Gbps): 8 nsec High degree of multiplexing—packets from 250,000 flows interleaved Database updated every few milliseconds  performance  number of memory accesses

Spring 2006CS 685 Network Algorithmics4 Basic ("Unibit") Trie Approach Recursive data structure (a tree) Nodes represent prefixes in the database –Root corresponds to prefix of length zero Node for prefix x has three fields: –0 branch: pointer to node for prefix x0 (if present) –1 branch: pointer to node for prefix x1 (if present) –Next hop info for x (if present) Example Database: a: 0*  x b: 01000*  y c: 011*  z d: 1*  w e: 100*  u f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x

Spring 2006CS 685 Network Algorithmics5 0 1 0 1 axax 0 1 dwdw 01 0 1 czcz 0 1 0 1 ewew a: 0*  x b: 01000*  y c: 011*  z d: 1*  w e: 100*  u f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x 0 1 0 1 0 1 byby 0 1 0 1 0 1 fzfz 0 1 0 1 gugu 0 1 hzhz 0 1 ixix

Spring 2006CS 685 Network Algorithmics6 Trie Search Algorithm typedef struct foo { struct foo *trie_0, *trie_1; NEXTHOPINFO trie_info; } *TRIENODE; NEXTHOPINFO best = NULL; TRIENODE np = root; unsigned int bit = 0x80000000; while (np != NULL) { if (np->trie_info) best = np->trie_info; // check next bit if (addr&bit) np = np->trie_1; else np = np->trie_0; bit >>= 1; } return best;

Spring 2006CS 685 Network Algorithmics7 Conserving Space Sparse database  wasted space –Long chains of trie nodes with only one non-NULL pointer –Solution: handle "one-way" branches with special nodes encode the bits corresponding to the missing nodes using text strings

Spring 2006CS 685 Network Algorithmics8 0 1 0 1 axax 0 1 dwdw 01 0 1 czcz 0 1 0 1 eueu a: 0*  x b: 01000*  y c: 011*  z d: 1*  w e: 100*  u f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x 0 1 0 1 0 1 byby 0 1 0 1 0 1 fzfz 0 1 0 1 gugu 0 1 hzhz 0 1 ixix

Spring 2006CS 685 Network Algorithmics9 0 1 0 1 axax 0 1 dwdw 01 0 1 czcz 0 1 0 1 eueu a: 0*  x b: 01000*  y c: 011*  z d: 1*  w e: 100*  u f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x 0 1 byby 0 1 0 1 0 1 fzfz 0 1 0 1 gugu 0 1 hzhz 0 1 ixix 00

Spring 2006CS 685 Network Algorithmics10 Bigger Issue: Slow! Computing one bit at a time is too slow –Worst-case: one memory access per bit (32 accesses!) Solution: compute n bits at a time –n = stride length –Use n-bit chunks of addresses as index into array in each trie node How to handle prefixes which are not a multiple of n in length? –Extend them, replicate entries as needed –E.g. n=3, 1* becomes 100, 101, 110, 111

Spring 2006CS 685 Network Algorithmics11 Extending Prefixes Original Database a: 0*  x b: 01000*  y c: 011*  z d: 1*  w e: 100*  w f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x Example: stride length=2 Expanded Database a0: 00*  x a1: 01*  x b0: 010000*  y b1: 010001*  y c0: 0110*  z c1: 0111*  z d0: 10*  w d1: 11*  w e0: 1000*  u e1: 1001*  u f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x

Spring 2006CS 685 Network Algorithmics12 Expanded Database a0: 00*  x a1: 01*  x b0: 010000*  y b1: 010001*  y c0: 0110*  z c1: 0111*  z d0: 10*  w d1: 11*  w e0: 1000*  w e1: 1001*  w f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x x x w w 00 01 10 11 z z 00 01 10 11 y y 00 01 10 11 u u 00 01 10 11 z u z x 00 01 10 11 Total cost: 40 pointers (22 null) Max #memory accesses: 3

Spring 2006CS 685 Network Algorithmics13 0 1 0 1 x 0 1 w 01 0 1 z 0 1 0 1 u a: 0*  x b: 01000*  y c: 011*  z d: 1*  w e: 100*  u f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x 0 1 byby 0 1 0 1 0 1 z 0 1 0 1 u 0 1 z 0 1 x 00 Total cost: 46 pointers (21 null) Max #memory accesses: 5

Spring 2006CS 685 Network Algorithmics14 Choosing Fixed Stride Lengths We are trading space for time: –Larger stride length  fewer memory accesses –Larger stride length  more wasted space Use the largest stride length that will fit in memory and complete required accesses within the time budget

Spring 2006CS 685 Network Algorithmics15 Updating Insertion 1.Keep a unibit version of the trie, with each node labeled with longest matching prefix and its length 2.To insert P, search for P, remembering last node, until 1.Null pointer (not present), or 2.Reach the last stride in P 3.Expand P as needed to match stride length 4.Overwrite any existing entries with length less than P's Deletion is similar 1.Find entry for prefix to be deleted 2.Remove its entry (from unibit copy also!) 3.Expand any entries that were "covered" by the deleted prefix

Spring 2006CS 685 Network Algorithmics16 Variable Stride Lengths It is not necessary that every node have the same stride length Reduce waste by allowing stride length to vary per node –Actual stride length encoded in pointer to the trie node –Nodes with fewer used pointers can have smaller stride lengths

Spring 2006CS 685 Network Algorithmics17 Expanded Database a0: 00*  x a1: 01*  x b: 01000*  y c0: 0110*  z c1: 0111*  z d0: 10*  w d1: 11*  w e: 100*  w f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x x x w w 00 01 10 11 z z 00 01 10 11 y 0 1 u 0 1 z u z x 00 01 10 11 Total waste: 16 pointers Max #memory accesses: 3 Note: encoding stride length costs 2 bits/pointer 2 bits 1 bit 2 bits 1 bit

Spring 2006CS 685 Network Algorithmics18 Calculating Stride Lengths How to pick stride lengths? –We have two variables to play with: height and stride length –Trie height determines lookup speed  set max height first Call it h –Then choose strides to minimize storage Define cost of trie T, C(T): –If T is a single node, then number of array locations in the node –Else number of array locations in root +  i C(T i ), where T i 's are children of T Straightforward recursive solution: –Root stride s results in y=2 s subtries T 1,... T y –For each possible s, recursively compute optimal strides for C(T i )'s using height limit h-1 –Choose root stride s to minimize total cost = (2 s +  i C(T i ))

Spring 2006CS 685 Network Algorithmics19 Calculating Stride Lengths Problem: Expensive, repeated subproblems Solution (Srinivasan & Varghese): Dynamic programming Observe that each subtree of a variable-stride trie contains the set of prefixes as some subtree of the original unibit trie For each node of the unibit trie, compute optimal stride and cost for that stride Start at bottom (height = 1), work up Determine optimal grouping of leaves in subtree Given subtree optimal costs, compute parent optimal cost This results in optimal stride length selections for the given set of prefixes

Spring 2006CS 685 Network Algorithmics20 0 1 0 1 x 0 1 w 01 0 1 z 0 1 0 1 u 0 1 byby 0 1 0 1 0 1 z 0 1 0 1 u 0 1 z 0 1 x 00 Stride = 2 Cost = 4 Stride = 1 Cost = 7 Stride = 0 Cost = 1 Stride = 0 Cost = 1 Stride = 0 Cost = 1 Stride = 1 Cost = 2

Spring 2006CS 685 Network Algorithmics21 Alternative Method: Level Compression LC-trie (Nilsson & Karlsson '98) is a variable-stride trie with no empty entries in trie nodes Procedure: –Select largest root stride that allows no empty entries –Do this recursively down through the tree Disadvantage: cannot control height precisely

Spring 2006CS 685 Network Algorithmics22 0 1 0 1 x 0 1 w 01 0 1 z 0 1 0 1 u 0 1 byby 0 1 0 1 0 1 z 0 1 0 1 u 0 1 z 0 1 x 00 Stride = 1 Stride = 2

Spring 2006CS 685 Network Algorithmics23 Performance Comparisons MAE-East database (1997 snapshot) –~ 40K prefixes "Unoptimized" multibit trie: 2003 KB Optimal fixed-stride: 737 KB, computed in 1 msec –Height limit = 4 (  1 Gbps wire speed @ 80 nsec/access) Optimized (S&V) variable-stride: 423 KB, computed in 1.6 sec, Height limit = 4 LC-compressed –Height = 7 –700 KB

Spring 2006CS 685 Network Algorithmics24 Lulea Compressed Tries Goals: –Minimize number of memory accesses –Aggressively compress trie Goal: so it can fit in SRAM (or even cache) Three-level trie with strides of 16, 8, 8 –8 mem accesses typical Main Techniques 1.Leaf-pushing 2.Eliminate duplicate pointers from trie node arrays 3.Efficient bit-counting using precomputation for large bitmaps 4.Use of indices instead of full pointers for next-hop info

Spring 2006CS 685 Network Algorithmics25 1. Leaf-Pushing In general, a trie node entry has associated –A pointer to a next trie node –A prefix (i.e. pointer to next-hop info) –Or both, or neither Observation: we don't need to know about a prefix pointer along the way until we reach a leaf So: "push" prefix pointers down to leaves –Keep only one set of pointers per node

Spring 2006CS 685 Network Algorithmics26 Leaf-Pushing: the Concept Prefixes

Spring 2006CS 685 Network Algorithmics27 Expanded Database a0: 00*  x a1: 01*  x b0: 010000*  y b1: 010001*  y c0: 0110*  z c1: 0111*  z d0: 10*  w d1: 11*  w e0: 1000*  u e1: 1001*  u f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x x x w w 00 01 10 11 z z 00 01 10 11 y y 00 01 10 11 u u 00 01 10 11 z u z x 00 01 10 11 Cost: 40 pointers (22 wasted) Before x 00 01 10 11 x z z 00 01 10 11 y y x x 00 01 10 11 u u w w 00 01 10 11 z u z x 00 01 10 11 Cost: 20 pointers After Leaf-Pushing

Spring 2006CS 685 Network Algorithmics28 2. Removing Duplicate Pointers Leaf-pushing results in many consecutive duplicate pointers Would like to remove redundancy and store only one copy in each node Problem: now we can't directly index into array using address bits –Example: k=2, bits 01 = index 1 needs to be converted to index 0 somehow u u w w 00 01 10 11 u w

Spring 2006CS 685 Network Algorithmics29 2. Removing Duplicate Pointers Solution: Add a bitmap: one bit per original entry –1 indicates new value –0 indicates duplicate of previous value To convert index i, count 1's up to position i in the bitmap, and subtract 1 Example: old index 1  new index 0 old index 2  new index 1 u u w w 00 01 10 11 u w 1 0 1 0 00 01 10 11

Spring 2006CS 685 Network Algorithmics30 Bitmap for Duplicate Elimination Prefixes 100000000000100010000100000000000000000010000000000100000000000010001000000100000011000000000000000000

Spring 2006CS 685 Network Algorithmics31 3. Efficient Bit-Counting Lulea first-level 16-bit stride  64K entries Impractical to count bits up to, say, entry 34578 on the fly! Solution: Precompute (P2a) –Divide bitmap into chunks (say, 64 bits each) –Store the number of 1 bits in each chunk in an array B –Compute # 1 bits up to bit k by: chunkNum = k >> 6; posInChunk = k & 0x3f; // k mod 64 numOnes = B[chunkNum] + count1sInChunk(chunkNum,posInChunk) – 1;

Spring 2006CS 685 Network Algorithmics32 Bit-Counting Precomputation Example 10010100000000000111000000001000001100000001 03367 0101 9 Chunk Size = 8 bits Converted index = 7 + 2 – 1 = 8 index = 35 Cost: 2 memory accesses (maybe less)

Spring 2006CS 685 Network Algorithmics33 4. Efficient Pointer Representation Observation: the number of different next-hop pointers is limited –Each corresponds to an immediate neighbor of the router –Most routers have at most a few dozen neighbors –In some cases a router might have a few hundred distinct next hops, even a thousand Apply P7: avoid unnecessary generality –Only a few bits (say 8-12) needed to distinguish between actual next-hop possibilities Store indices into table of next-hops info –E.g., to support up to 1024 next hops: 10 bits –40K prefixes  40K pointers  160KB @ 32 bits, vs 50KB @ 10 bits

Spring 2006CS 685 Network Algorithmics34 Other Lulea Tricks First level of trie uses two levels of bit-counting array –First counts bits before the 64-bit chunk –Second counts bits in the 16-bit word within chunk Second- and third-level trie nodes are laid out differently depending on number of pointers in them –Each node has 256 entries –Categorized by number of pointers 1-8: "sparse" — store 8-bit indices + 8 16-bit pointers (24B) 9-64: "dense" — like first level, but only one bit-counting array (only six bits of count needed) 65-256: "very dense" — like first level, with two bit-counting arrays: 4 64-bit chunks, 16 16-bit words

Spring 2006CS 685 Network Algorithmics35 Lulea Performance Results 1997 MAE-East database –32K entries, 58K leaves, 56 different next hops –Resulting Trie size: 160KB –Build time: 99 msec –Almost all lookups took < 100 clock cycles (333MHz Pentium)

Spring 2006CS 685 Network Algorithmics36 Trie Bitmap (Eatherton, Dittia & Varghese) Goal: storage, speed comparable to Lulea plus fast insertion Main culprit in slow insertion is leaf-pushing So get rid of leaf-pushing –Go back to storing node and prefix pointers explicitly –Use the same compression bitmap trick on both lists Store next-hop information separately, only retrieve at the end –Like leaf-pushing, only in the control plane! Use smaller strides to limit memory accesses to one per trie node (Lulea requires at least two)

Spring 2006CS 685 Network Algorithmics37 Storing Prefixes Explicitly To avoid expansion/leaf pushing, we have to store prefixes in the node explicitly There are 2 k+1 – 1 possible prefixes of length k –Store list of (unique) next hop pointers for each prefix covered by this node –Use same bitmap/bit counting technique as Lulea to find pointer index –Keep trie nodes small (stride 4 or less), exploit hardware (P5) to do prefix matching, bit counting

Spring 2006CS 685 Network Algorithmics38 a: 0*  x b: 01000*  y c: 011*  z d: 1*  w e: 100*  u f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x Example: Root node, stride = 3 000 001 010 011 100 101 110 111 0 0 1 0 0 0 1 0 0 1 * 0* 1 0 0 0 0 0 0 0 1 1 0 0 1* 00* 01* 10* 11* 000* 001* 010* 011* 100* 101* 110* 0 111* to child nodes x w z u

Spring 2006CS 685 Network Algorithmics39 Tree Bitmap Results Insertions are as in simple multibit tries May cause complete revamp of trie node, but that requires only one memory allocation Performance comparable to Lulea, but insertion much faster

Spring 2006CS 685 Network Algorithmics40 A Different Lookup Paradigm Can we use binary search to do longest-prefix lookups? Observe that each prefix corresponds to a range of addresses –E.g. 204.198.76.0/24 covers the range 204.198.76.0 – 204.198.76.255 –Each prefix has two range endpoints –N disjoint prefixes divide the entire space into 2N+1 disjoint segments –By sorting range endpoints, and comparing to address, can determine exact prefix match

Spring 2006CS 685 Network Algorithmics41 Prefixes as Ranges

Spring 2006CS 685 Network Algorithmics42 Binary Search on Ranges Store 2N endpoints in sorted order –Including the full address range for * Store two pointers for each entry –">" entry: next-hop info for addresses strictly greater than that value –"=" entry: next-hop info for addresses equal to that value

Spring 2006CS 685 Network Algorithmics43 Example: 6-bit addresses Example Database: a: 0*  x b: 01000*  y c: 011*  z d: 1*  w e: 100*  u f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x a: 000000-011111  x b: 010000-010001  y c: 011000-011111  z d: 100000-111111  w e: 100000-100111  u f: 110000-110011  z g: 110100-110111  u h: 111000-111011  z i: 111100-111111  x 000000 010000 010001 011000 011111 100000 100111 110000 110011 110100 110111 111000 111011 111100 111111 x y x x x u w z x u x z x x - x y y z x u u z z u u z z x x >=

Spring 2006CS 685 Network Algorithmics44 Range Binary Search Results N prefixes can be searched in log 2 N + 1 steps –Slow compared to multibit tries –Insertion can also be expensive Memory-expensive –Requires 2 full-size entries per prefix –40K prefixes, 32-bit addresses: 320KB, not counting next- hop info Advantage: no patent restrictions!

Spring 2006CS 685 Network Algorithmics45 Binary Search on Prefix Lengths Waldvogel, et al For same-length prefixes, a hash table gives fast comparisons But linear search on prefix lengths is too expensive Can we do a faster (binary) search on prefix lengths? –Challenge: how do we know whether to move "up" or "down" in length on failure? –Solution: include extra information to indicate presence of a longer prefix that might match –These are called marker entries –Each marker entry also contains best-matching prefix for that node –When searching, remember best-matching prefix when going "up" because of a marker, in case of later failure

Spring 2006CS 685 Network Algorithmics46 Example: Binary Search on Prefix Length Example Database: a: 0*  x b: 01000*  y c: 011*  z d: 1*  w e: 100*  u f: 1100*  z g: 1101*  u h: 1110*  z i: 1111*  x Prefix Lengths: 1, 3, 4, 7 0*1* length 1 BMP a,xd,w 011*100*110M111M010M length 3 BMP c,ze,ud,w a,x length 4 BMP 1100*1101*1110*1111*0100M f,zg,uh,zi,xa,x Example: Search for address 011000 and 101000 length 5 BMP 01000* b,y

Spring 2006CS 685 Network Algorithmics47 Binary Search on Prefix Length Performance Worst-case number of hash-table accesses: 5 However, most prefixes are 16 or 24 bits –Arrange hash tables so these are handled in one or two accesses This technique is very scalable for larger address lengths (e.g. 128 bits for IPv6) –Unibit Trie for IPv6: 128 accesses!

Spring 2006CS 685 Network Algorithmics48 Memory Allocation for Compressed Schemes Problem: when using a compressed scheme (like Lulea), trie nodes are kept at minimal size If a node grows (changes size), it must be reallocated and copied over As we have discussed, memory allocators can perform very badly –Assume M is the size of the largest possible request –Cannot guarantee more than 1/log 2 M of memory will be used! E.g. if M=32, 20% is max guaranteed utilization Router vendors cannot claim to support large databases

Spring 2006CS 685 Network Algorithmics49 Memory Allocation for Compressed Schemes Solution: Compaction –Copy memory from one location to another General-purpose OS's avoid compaction! –Reason: very hard to find and update all pointers to objects in the moved region The good news: –Pointer usage is very constrained in IP lookup algorithms –Most lookup structures are trees  at most one pointer to any node By storing a "parent" pointer, can easily update pointers as needed

Download ppt "Spring 2006CS 685 Network Algorithmics1 Longest Prefix Matching Trie-based Techniques CS 685 Network Algorithmics Spring 2006."

Similar presentations