Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.

Similar presentations


Presentation on theme: "Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan."— Presentation transcript:

1 Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan Turner Presented by: Sailesh Kumar

2 2 - Sailesh Kumar - 12/6/2015 Overview n Why regular expressions acceleration is important? n Introduction to our approach »Delayed Input DFA (D 2 FA) n D 2 FA construction n Simulation results n Memory mapping algorithm n Conclusion

3 3 - Sailesh Kumar - 12/6/2015 Why Regular Expressions Acceleration? n RegEx are now widely used »Network intrusion detection systems, NIDS »Layer 7 switches, load balancing »Firewalls, filtering, authentication and monitoring »Content-based traffic management and routing n RegEx matching is expensive »Space: Large amount of memory »Bandwidth: Requires 1+ state traversal per byte n RegEx is performance bottleneck »In enterprise switches from Cisco, etc »Cisco security appliances –Use DFA, 1+ GB memory, still sub-gigabit throughput »Need to accelerate RegEx!

4 4 - Sailesh Kumar - 12/6/2015 Can we do better? n Well studied in compiler literature »What’s different in Networking? »Can we do better? n Construction time versus execution time (grep) »Traditionally, (construction + execution) time is the metric »In networking context, execution time is critical »Also, there may be thousands of patterns n DFAs are fast »But can have exponentially large number of states »Algorithms exist to minimize number of states »Still 1) low performance and 2) gigabytes of memory n How to achieve high performance? »Use ASIC/FPGA –On-chip memories provides ample bandwidth –Volume and need for speed justifies custom solution »Limited memory, need space efficient representation!

5 5 - Sailesh Kumar - 12/6/2015 Introduction to Our Approach n How to represent DFAs more compactly? »Can’t reduce number of states »How about reducing number of transitions? –256 transitions per state –50+ distinct transitions per state (real world datasets) –Need at least 50+ words per state Three rules a+, b+c, c*d+ 2 1 3 b 4 5 a d a c a b d a c b c b b a c d d d c 4 transitions per state Look at state pairs: there are many common transitions. How to remove them?

6 6 - Sailesh Kumar - 12/6/2015 Introduction to Our Approach n How to represent DFAs more compactly? »Can’t reduce number of states »How about reducing number of transitions? –256 transitions per state –50+ distinct transitions per state (real world datasets) –Need at least 50+ words per state Three rules a+, b+c, c*d+ 1 3 a a a b b 2 5 4 c b b c d d d c 4 transitions per state Alternative Representation d c a b d c a 1 3 a a a b b 2 5 4 c b b c d d d c d c a b d c a Fewer transitions, less memory

7 7 - Sailesh Kumar - 12/6/2015 D 2 FA Operation 1 3 a a a b b 2 5 4 c b b c d d d c d c a b d c a 1 3 a 2 5 4 c c b d Input stream: a b d DFA and D 2 FA visits the same accepting state after consuming a character Heavy edges are called default transitions Take default transitions, whenever, a labeled transition is missing DFA D 2 FA

8 8 - Sailesh Kumar - 12/6/2015 D 2 FA Operation 1 3 a a a b b 2 5 4 c b b c d d d c d c a b d c a 1 3 a 2 5 4 c c b d Any set of default transitions will suffice if there are no cycles of default transitions Thus, we need to construct trees of default transitions So, how to construct space efficient D 2 FAs? while keeping default paths bounded 2 1 3 4 d c b 2 1 3 4 c b d a 5 5 a c c Above two set of default transitions trees are also correct However, we may traverse 2 default transitions to consume a character Thus, we need to do more work => lower performance

9 9 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Present systematic approach to construct D 2 FA n Begin with a state minimized DFA n Construct space reduction graph »Undirected graph, vertices are states of DFA »Edges exist between vertices with common transitions »Weight of an edge = # of common transitions - 1 2 1 3 b 4 5 a d a c a b d a c b c b b a c d d d c 2 1 3 4 5 3 3 3 2 3 2 2 2 3 3

10 10 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Convert certain edges into default transitions »A default transition reduces w transitions (w = wt. of edge) »If we pick high weight edges => more space reduction »Find maximum weight spanning forest »Tree edges becomes the default transitions n Problem: spanning tree may have very large diameter »Longer default paths => lower performance 2 1 3 b 4 5 a d a c a b d a c b c b b a c d d d c 2 1 3 4 5 3 3 3 2 3 2 2 2 3 3 # of transitions removed = 2+3+3+3=11 root

11 11 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n We need to construct bounded diameter trees »NP-hard »Small diameter bound leads to low trees weight –Less space efficient D 2 FA »Time-space trade-off n We propose heuristic algorithm based upon Kruskal’s algorithm to create compact bounded diameter D 2 FAs 2 1 3 b 4 5 a d a c a b d a c b c b b a c d d d c 2 1 3 4 5 3 3 3 2 3 2 2 2 3 3

12 12 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Our heuristic incrementally builds spanning tree »Whenever, there is an opportunity, keep diameter small »Based upon Kruskal’s algorithm »Details in the paper

13 13 - Sailesh Kumar - 12/6/2015 Results n We ran experiments on »Cisco RegEx rules »Linux application protocol classifier rules »Bro rules »Snort rules (subset of rules) Size of DFA versus D 2 FA (No default path length bound applied)

14 14 - Sailesh Kumar - 12/6/2015 Space-Time Tradeoff Longer default path => more work but less space Space efficient region Default paths have length 4+ Requires 4+ memory accesses per character We propose memory architecture Which enables us to consume one character per clock cycle

15 15 - Sailesh Kumar - 12/6/2015 Summary of Memory Architecture n We propose an on-chip ASIC architecture »Use multiple embedded memories to store the D 2 FA –Flexibility –Frequent changes to rules n D 2 FA requires multiple memory accesses »How to execute D 2 FA at memory clock rates? n We have proposed deterministic contention free memory mapping algorithm »Uniform access to memories »Enables D 2 FA to consume a character per memory access »Nearly zero memory fragmentation –All memories are uniformly used n Details and results in paper n At 300 MHz we achieve 5 Gbps worst-case throughput

16 16 - Sailesh Kumar - 12/6/2015 Conclusion n Deep packet inspection has become challenging »RegEx are used to specify rules »Wire speed inspection n We presented an ASIC based architecture to perform RegEx matching at 10’s of Gigabit rates n As suggested in the public review, this paper is not the final answer to RegEx matching »But it is a good start n We are presently developing techniques to perform fast RegEx matching using commodity memories »Collaborators are welcome!!!

17 17 - Sailesh Kumar - 12/6/2015 Thank you and Questions?

18 18 - Sailesh Kumar - 12/6/2015 Backup Slides

19 19 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Our heuristic incrementally builds spanning tree »Whenever, there is an opportunity, keep diameter small »Details in the paper n Graph with 31 states, max. wt. default transition tree »Our heuristic creates smaller default paths Kruskal’s algorithm, Max. default path = 8 edges Our refined Kruskal’s algorithm, Avg. default path = 5 edges

20 20 - Sailesh Kumar - 12/6/2015 Multiple Memories n To achieve high performance, use multiple memories and D 2 FA engines n Multiple memories provide high aggregate bandwidth n Multiple engines use bandwidth effectively »However, worst case performance may be low –No better than a single memory »May need complex circuitry to handle contention n We propose deterministic contention free memory mapping and compare it to a random mapping

21 21 - Sailesh Kumar - 12/6/2015 Memory Mapping n The memory mapping algorithm can be modeled as a graph coloring »Graph is the set of default transition trees »Colors represent the memory modules »Color nodes of the trees such that –Nodes along a default path are colored with different colors –All colors are uniformly used n We propose two methods, naïve and adaptive 4 2 4 332333 3 21 4 1 1 3 3 23 4 32 2 4 1 4 11 2 3 41 Naïve coloring Adaptive coloring

22 22 - Sailesh Kumar - 12/6/2015 Results n Adaptive mapping leads to much more uniform color usage »Memories are uniformly used, little fragmentation »Up to 20% space saving with adaptive coloring n Throughput results (300 MHz dual-port eSRAM)


Download ppt "Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan."

Similar presentations


Ads by Google