Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memory Compression Algorithms for Networking Features Sailesh Kumar.

Similar presentations


Presentation on theme: "Memory Compression Algorithms for Networking Features Sailesh Kumar."— Presentation transcript:

1 Memory Compression Algorithms for Networking Features Sailesh Kumar

2 2 - Sailesh Kumar - 11/28/2015 Outline n Regular expressions based packet content inspection (main focus) »D 2 FA »CD 2 FA n Packet header processing »HEXA (History based Encoding, eXecution and Addressing)

3 3 - Sailesh Kumar - 11/28/2015 Why care about Regular Expressions? n Widely used »Network intrusion detection systems, NIDS »Layer 7 switches, load balancing »Firewalls, filtering, authentication and monitoring »Content-based traffic management and routing n Expensive »Space: Large amount of memory »Bandwidth: Requires 1+ state traversal per byte n Performance bottleneck »In enterprise switches, etc »Security appliances –Use DFA, 1+ GB memory, still sub-gigabit throughput »Need to accelerate RegEx!

4 4 - Sailesh Kumar - 11/28/2015 Can we do Better? n Well studied in compiler literature »What’s different in Networking? »Can we do better? n Performance metric (grep) »Traditionally, (construction + execution) time is the metric »In networking context, execution time is critical »Also, there may be thousands of patterns n DFAs are fast »But can have exponentially large number of states »Algorithms exist to minimize number of states »Still 1) low performance and 2) gigabytes of memory n How to achieve high performance? »Use ASIC/FPGA –On-chip memories provides ample bandwidth –Volume and need for speed justifies custom solution »Limited memory, need space efficient representation!

5 5 - Sailesh Kumar - 11/28/2015 Introduction to Our Approach n How to represent DFAs more compactly? »Can’t reduce number of states »How about reducing number of transitions? –256 transitions per state –50+ distinct transitions per state (real world datasets) –Need at least 50+ words per state Three rules a+ b+c c*d+ 2 1 3 b 4 5 a d a c a b d a c b c b b a c d d d c 4 transitions per state Look at state pairs: there are many common transitions. How to remove them?

6 6 - Sailesh Kumar - 11/28/2015 Introduction to Our Approach n How to represent DFAs more compactly? »Can’t reduce number of states »How about reducing number of transitions? –256 transitions per state –50+ distinct transitions per state (real world datasets) –Need at least 50+ words per state Three rules a+ b+c c*d+ 1 3 a a a b b 2 5 4 c b b c d d d c 4 transitions per state Alternative Representation d c a b d c a 1 3 a a a b b 2 5 4 c b b c d d d c d c a b d c a Fewer transitions, less memory

7 7 - Sailesh Kumar - 11/28/2015 D 2 FA Operation 1 3 a a a b b 2 5 4 c b b c d d d c d c a b d c a 1 3 a 2 5 4 c c b d Input stream: a b d DFA and D 2 FA visits the same accepting state after consuming a character Heavy edges are called default transitions Take default transitions, whenever, a labeled transition is missing DFA D 2 FA Three rules a+ b+c c*d+

8 8 - Sailesh Kumar - 11/28/2015 D 2 FA Operation 1 3 a a a b b 2 5 4 c b b c d d d c d c a b d c a 1 3 a 2 5 4 c c b d Any set of default transitions will suffice if there are no cycles of default transitions Thus, we need to construct trees of default transitions So, how to construct space efficient D 2 FAs? while keeping default paths bounded 2 1 3 4 d c b 2 1 3 4 c b d a 5 5 a c c Above two set of default transitions trees are also correct However, we may traverse 2 default transitions to consume a character Thus, we need to do more work => lower performance

9 9 - Sailesh Kumar - 11/28/2015 D 2 FA Construction n Present systematic approach to construct D 2 FA n Begin with a state minimized DFA n Construct space reduction graph »Undirected graph, vertices are states of DFA »Edges exist between vertices with common transitions »Weight of an edge = # of common transitions - 1 2 1 3 b 4 5 a d a c a b d a c b c b b a c d d d c 2 1 3 4 5 3 3 3 2 3 2 2 2 3 3

10 10 - Sailesh Kumar - 11/28/2015 D 2 FA Construction n Convert certain edges into default transitions »A default transition reduces w transitions (w = wt. of edge) »If we pick high weight edges => more space reduction »Find maximum weight spanning forest »Tree edges becomes the default transitions n Problem: spanning tree may have very large diameter »Longer default paths => lower performance 2 1 3 b 4 5 a d a c a b d a c b c b b a c d d d c 2 1 3 4 5 3 3 3 2 3 2 2 2 3 3 # of transitions removed = 2+3+3+3=11 root

11 11 - Sailesh Kumar - 11/28/2015 D 2 FA Construction n We need to construct bounded diameter trees »NP-hard »Small diameter bound leads to low trees weight –Less space efficient D 2 FA »Time-space trade-off n We propose heuristic algorithm based upon Kruskal’s algorithm to create compact bounded diameter D 2 FAs »Details in SIGCOMM 2006 paper 2 1 3 b 4 5 a d a c a b d a c b c b b a c d d d c 2 1 3 4 5 3 3 3 2 3 2 2 2 3 3

12 12 - Sailesh Kumar - 11/28/2015 Results n We ran experiments on »Cisco RegEx rules »Linux application protocol classifier rules »Bro rules »Snort rules (subset of rules) Size of DFA versus D 2 FA (No default path length bound applied)

13 13 - Sailesh Kumar - 11/28/2015 Space-Time Tradeoff Longer default path => more work but less space Space efficient region Default paths have length 4+ Requires 4+ memory accesses per character We propose memory architecture Which enables us to consume one character per clock cycle

14 14 - Sailesh Kumar - 11/28/2015 Outline n Regular expressions based packet content inspection (main focus) »D 2 FA »CD 2 FA n Packet header processing »HEXA (History based Encoding, eXecution and Addressing)

15 15 - Sailesh Kumar - 11/28/2015 D 2 FA versus DFA n D 2 FAs are compact but requires multiple memory accesses »Up to 20x increased memory accesses »Not desirable in off-chip architecture n Can D 2 FAs match the performance of DFAs »YES!!!! »Content Addressed D 2 FAs (CD 2 FA) n CD 2 FAs require only one memory access per byte »Matches the performance of a DFA in cacheless system »Systems with data cache, CD 2 FA are 2-3x faster n CD 2 FAs are 10x compact than DFAs

16 16 - Sailesh Kumar - 11/28/2015 Introduction to CD 2 FA, ANCS’06 n How to avoid multiple memory accesses of D 2 FAs? »Avoid lookup to decide if default path needs to be taken »Avoid default path traversal n Solution: Assign labels to each state, labels contain: »Characters for which it has labeled transitions »Information about all of its default states »Characters for which its default states have labeled transitions find node R at location R R c d a b all ab,cd,R cd,R R V U find node U at hash(c,d,R) find node V at hash(a,b,hash(c,d,R)) Content Labels

17 17 - Sailesh Kumar - 11/28/2015 Introduction to CD 2 FA R c d all ab,cd,R cd,R R V U Input char = hash(a,b, c,d,R) Z l m P q all X Y pq,lm,Z lm,Z hash(c,d,R) Current state: V (label = ab,cd,R) hash(p,q, l,m,Z) a b d a (R, a) (R, b) … (Z, a) (Z, b) … lm,Z pq,lm,Z (X, p) (X, q) (V, a) (V, b) → X (label = pq,lm,Z)

18 18 - Sailesh Kumar - 11/28/2015 Construction of CD 2 FA n We seek to keep the content labels small n Twin Objectives: »Ensure that states have few labeled transitions »Ensure that default paths are as small as possible n D 2 FA construction heuristic based upon maximum weight spanning tree creates long default paths »Limit default paths => less space efficient D 2 FAs n Proposed new heuristic called CRO to construct D 2 FAs »Runs in 3 phases: Construction, Reduction and Optimization »Default path bound = 2 edges => CRO algorithm constructs upto 10x space efficient D 2 FAs »CD 2 FAs are constructed from these D 2 FAs

19 19 - Sailesh Kumar - 11/28/2015 Memory Mapping in CD 2 FA R c d all ab,cd,R cd,R R V U Z l m P q all X Y pq,lm,R lm,R a b (R, a) (R, b) … (Z, a) (Z, b) … WE HAVE ASSUMED THAT HASHING IS COLLISION FREE hash(a,b,hash(c,d,R)) hash(c,d,R)) hash(p,q,hash(l,m,Z)) COLLISION

20 20 - Sailesh Kumar - 11/28/2015 Collision-free Memory Mapping a a b c p q r l m n d e f bc, …. pqr, n, def, hash (abc, …) hash (def, …) hash (pqr, …) hash (lmn, …) hash (edf, …) lm hash (mln, …) WE NEED SYSTEMATIC APPRAOCH Four states 4 memory locations

21 21 - Sailesh Kumar - 11/28/2015 Bipartite Graph Matching n Bipartite Graph »Left nodes are state content labels »Right nodes are memory locations »Map state labels to unique memory locations »An edge for every choice of content label »Perfect matching problem n With n left and right nodes »Need O(logn) random edges »n = 1M implies, we need ~20 edges per node n If we provide slight memory over-provisioning »We can uniquely map state labels with much fewer edges n In our experiments, we found perfect matching without memory over-provisioning

22 22 - Sailesh Kumar - 11/28/2015 Memory Reduction Results

23 23 - Sailesh Kumar - 11/28/2015 Throughput Results 3x Faster 4KB cache

24 24 - Sailesh Kumar - 11/28/2015 Outline n Regular expressions based packet content inspection (main focus) »D 2 FA »CD 2 FA n Packet header processing »HEXA (History based Encoding, eXecution and Addressing)

25 25 - Sailesh Kumar - 11/28/2015 HEXA, ICNP’07 n HEXA (History-based Encoding, eXecution and Addressing) »Challenges the assumption that graph structures must store log 2 n bits pointers to identify successor nodes »Requires only 2-bit versus 20-bit pointers (for 1 million nodes) n Useful for »IP lookup tries (directed acyclic graph) »Simple finite automaton such as Aho-Corasick String Matchers

26 26 - Sailesh Kumar - 11/28/2015 Tries - Traditional Implementation Addrdata 10, 2, 3 20, 4, 5 31, NULL, 6 41, NULL, NULL 50, 7, 8 61, NULL, NULL 70, 9, NULL 81, NULL, NULL 9 There are nine nodes; we will need 4-bit node identifiers Total memory = 9 x 9 bits A node will require 9-bits Two 4-bit child pointers One flag indicates if node is a prefix

27 27 - Sailesh Kumar - 11/28/2015 HEXA based Implementation Define HEXA identifier of a node as the path which leads to it from the root 1. - 2. 0 3. 1 4. 00 5. 01 6. 11 7. 010 8. 011 9. 0100 Notice that these identifiers are unique Thus, they can potentially be mapped to unique memory addresses

28 28 - Sailesh Kumar - 11/28/2015 HEXA based Implementation Use hashing to map the HEXA identifier to memory address 1. - 2. 0 3. 1 4. 00 5. 01 6. 11 7. 010 8. 011 9. 0100 If we have a minimal perfect hash function f -A function that maps elements to unique location Then we can store the trie as shown below f(010) = 5 f(011) = 3 f(0100) = 6 f(-) = 4 f(0) = 7 f(1) = 9 f(00) = 2 f(01) = 8 f(11) = 1 Addrnode memPrefix 1 1,0,0 P3 2 1,0,0 P2 3 1,0,0 P4 4 0,1,1 5 0,1,0 6 1,0,0 P5 7 0,1,1 8 9 1,0,1 P1 Here we use only 3-bits per node in fast path IP addr. - 1 1 …. The prefix, we were looking

29 29 - Sailesh Kumar - 11/28/2015 Devising One-to-one Mapping n Finding a minimal perfect hash function is difficult »One-to-one mapping is essential for HEXA to work n Use discriminator bits »Attach c-bits to every HEXA identifier, that we can modify »Thus a node can have 2 c choices of identifiers »We now need to store these c-bits for every child instead of a single flag n With multiple choices of HEXA identifiers for a node, reduce the problem to a bipartite graph matching »We need to find a perfect matching in the graph to map nodes to unique memory locations

30 30 - Sailesh Kumar - 11/28/2015 Devising One-to-one Mapping - 0 1 00 01 11 010 011 0100 00 0, 01 0, 10 0, 11 0 00 -, 01 -, 10 -, 11 - 0 1 2 3 4 5 6 7 8 h(00) = 0, h(01) = 4 h(10) = 1, h(11) = 5 h(000) = 1, h(010) = 5 h(100) = 2, h(110) = 6 00 1, 01 1, 10 1, 11 1 00 00, 01 00, 10 00, 11 00 00 01, 01 01, 10 01, 11 01 00 11, 01 11, 10 11, 11 11 00 010, 01 010, 10 010, 11 010 00 011, 01 011, 10 011, 11 011 00 0100, 01 0100, 10 0100, 11 0100 h() = 0, h() = 4 h() = 1, h() = 5 h() = 2, h() = 6 h() = 3, h() = 7 h() = 1, h() = 5 h() = 2, h() = 6 h() = 8, h() = 3 h() = 0, h() = 4 h() = 1, h() = 5 h() = 6, h() = 2 h() = 0, h() = 4 h() = 5, h() = 1 h() = 0, h() = 3 h() = 4, h() = 6 Input labels OR HEXA identifier Four choices of HEXA identifiers Choices of memory locations Bipartite graph 1 2 3 4 5 6 7 8 9 Nodes PERFECT MATCHING Pick Appropriate Discriminators

31 31 - Sailesh Kumar - 11/28/2015 HEXA based Implementation 1. - 2. 0 3. 1 4. 00 5. 01 6. 11 7. 010 8. 011 9. 0100 Store its discriminator instead of a single flag for every left and right child Addrnode memPrefix 1 1,xx,xx P3 2 1,xx,xx P2 3 1,xx,xx P4 4 0,xx,xx 5 6 1,xx,xx P5 7 0,xx,xx 8 9 1,xx,xx P1 Here we use only 5-bits per node in fast path

32 32 - Sailesh Kumar - 11/28/2015 Results n 3 choices are enough to find a perfect matching »Thus 2-bits discriminators (00 value reserved for no child) –Significant reduction 2-bits per node versus log 2 n bits 32 Eatherton tries, each contains 100-120k prefixes.

33 33 - Sailesh Kumar - 11/28/2015 Incremental Updates n IP table updates are very frequent »When a node is removed and another added, we must ensure a few memory operations. n In the new bipartite graph, a new perfect matching can be found »Quickly (O(n) time in the worst case, typically constant time) »New matching is slightly different from the previous matching –Typically around 10 different edges, experimental worst-case - 18 –Thus less than 18 memory operations are needed for an update

34 34 - Sailesh Kumar - 11/28/2015 n Questions???


Download ppt "Memory Compression Algorithms for Networking Features Sailesh Kumar."

Similar presentations


Ads by Google