Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.

Slides:



Advertisements
Similar presentations
Deep Packet Inspection: Where are We? CCW08 Michela Becchi.
Advertisements

Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.
Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood.
VCRIB: Virtual Cloud Rule Information Base Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan HotCloud 2012.
Balajee Vamanan, Gwendolyn Voskuilen, and T. N. Vijaykumar School of Electrical & Computer Engineering SIGCOMM 2010.
Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University.
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems Sailesh Kumar Patrick Crowley.
Detecting Evasion Attacks at High Speeds without Reassembly Detecting Evasion Attacks at High Speeds without Reassembly George Varghese J. Andrew Fingerhut.
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara.
1 HEXA : Compact Data Structures for Faster Packet Processing Department of Computer Science and Information Engineering National Cheng Kung University,
Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.
Memory-Efficient Regular Expression Search Using State Merging Department of Computer Science and Information Engineering National Cheng Kung University,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Improving Signature Matching using Binary Decision Diagrams Liu Yang, Rezwana Karim, Vinod Ganapathy Rutgers University Randy Smith Sandia National Labs.
 Author: Tsern-Huei Lee  Publisher: 2009 IEEE Transation on Computers  Presenter: Yuen-Shuo Li  Date: 2013/09/18 1.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan, Timothy Sherwood Appeared in ISCA 2005 Presented by: Sailesh.
Network Aware Resource Allocation in Distributed Clouds.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author : Michela Becchi 、 Patrick Crowley Publisher : ANCS’07 Presenter : Wen-Tse Liang.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Fang Yu Microsoft Research, Silicon Valley Work was done in UC Berkeley,
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
CAMP: Fast and Efficient IP Lookup Architecture Sailesh Kumar, Michela Becchi, Patrick Crowley, Jonathan Turner Washington University in St. Louis.
SI-DFA: Sub-expression Integrated Deterministic Finite Automata for Deep Packet Inspection Authors: Ayesha Khalid, Rajat Sen†, Anupam Chattopadhyay Publisher:
Peacock Hash: Deterministic and Updatable Hashing for High Performance Networking Sailesh Kumar Jonathan Turner Patrick Crowley.
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, Randy H.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
Addressing Queuing Bottlenecks at High Speeds Sailesh Kumar Patrick Crowley Jonathan Turner.
TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES Lesson №18 Telecommunication software design for analyzing and control packets on the networks by using.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Memory Compression Algorithms for Networking Features Sailesh Kumar.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Author : Randy Smith & Cristian Estan & Somesh Jha Publisher : IEEE Symposium on Security & privacy,2008 Presenter : Wen-Tse Liang Date : 2010/10/27.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V.
An Improved DFA for Fast Regular Expression Matching Author : Domenico Ficara 、 Stefano Giordano 、 Gregorio Procissi Fabio Vitucci 、 Gianni Antichi 、 Andrea.
Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.
Advanced Algorithms for Fast and Scalable Deep Packet Inspection Author : Sailesh Kumar 、 Jonathan Turner 、 John Williams Publisher : ANCS’06 Presenter.
Range Hash for Regular Expression Pre-Filtering Publisher : ANCS’ 10 Author : Masanori Bando, N. Sertac Artan, Rihua Wei, Xiangyi Guo and H. Jonathan Chao.
A DFA with Extended Character-Set for Fast Deep Packet Inspection
RE-Tree: An Efficient Index Structure for Regular Expressions
HEXA: Compact Data Structures for Faster Packet Processing
Advanced Algorithms for Fast and Scalable Deep Packet Inspection
DDoS Attack Detection under SDN Context
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
CSE 373: Data Structures and Algorithms
IP Control Gateway (IPCG)
A Hybrid Finite Automaton for Practical Deep Packet Inspection
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan Turner Presented by: Sailesh Kumar

2 - Sailesh Kumar - 12/6/2015 Overview n Why regular expressions acceleration is important? n Introduction to our approach »Delayed Input DFA (D 2 FA) n D 2 FA construction n Simulation results n Memory mapping algorithm n Conclusion

3 - Sailesh Kumar - 12/6/2015 Why Regular Expressions Acceleration? n RegEx are now widely used »Network intrusion detection systems, NIDS »Layer 7 switches, load balancing »Firewalls, filtering, authentication and monitoring »Content-based traffic management and routing n RegEx matching is expensive »Space: Large amount of memory »Bandwidth: Requires 1+ state traversal per byte n RegEx is performance bottleneck »In enterprise switches from Cisco, etc »Cisco security appliances –Use DFA, 1+ GB memory, still sub-gigabit throughput »Need to accelerate RegEx!

4 - Sailesh Kumar - 12/6/2015 Can we do better? n Well studied in compiler literature »What’s different in Networking? »Can we do better? n Construction time versus execution time (grep) »Traditionally, (construction + execution) time is the metric »In networking context, execution time is critical »Also, there may be thousands of patterns n DFAs are fast »But can have exponentially large number of states »Algorithms exist to minimize number of states »Still 1) low performance and 2) gigabytes of memory n How to achieve high performance? »Use ASIC/FPGA –On-chip memories provides ample bandwidth –Volume and need for speed justifies custom solution »Limited memory, need space efficient representation!

5 - Sailesh Kumar - 12/6/2015 Introduction to Our Approach n How to represent DFAs more compactly? »Can’t reduce number of states »How about reducing number of transitions? –256 transitions per state –50+ distinct transitions per state (real world datasets) –Need at least 50+ words per state Three rules a+, b+c, c*d b 4 5 a d a c a b d a c b c b b a c d d d c 4 transitions per state Look at state pairs: there are many common transitions. How to remove them?

6 - Sailesh Kumar - 12/6/2015 Introduction to Our Approach n How to represent DFAs more compactly? »Can’t reduce number of states »How about reducing number of transitions? –256 transitions per state –50+ distinct transitions per state (real world datasets) –Need at least 50+ words per state Three rules a+, b+c, c*d+ 1 3 a a a b b c b b c d d d c 4 transitions per state Alternative Representation d c a b d c a 1 3 a a a b b c b b c d d d c d c a b d c a Fewer transitions, less memory

7 - Sailesh Kumar - 12/6/2015 D 2 FA Operation 1 3 a a a b b c b b c d d d c d c a b d c a 1 3 a c c b d Input stream: a b d DFA and D 2 FA visits the same accepting state after consuming a character Heavy edges are called default transitions Take default transitions, whenever, a labeled transition is missing DFA D 2 FA

8 - Sailesh Kumar - 12/6/2015 D 2 FA Operation 1 3 a a a b b c b b c d d d c d c a b d c a 1 3 a c c b d Any set of default transitions will suffice if there are no cycles of default transitions Thus, we need to construct trees of default transitions So, how to construct space efficient D 2 FAs? while keeping default paths bounded d c b c b d a 5 5 a c c Above two set of default transitions trees are also correct However, we may traverse 2 default transitions to consume a character Thus, we need to do more work => lower performance

9 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Present systematic approach to construct D 2 FA n Begin with a state minimized DFA n Construct space reduction graph »Undirected graph, vertices are states of DFA »Edges exist between vertices with common transitions »Weight of an edge = # of common transitions b 4 5 a d a c a b d a c b c b b a c d d d c

10 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Convert certain edges into default transitions »A default transition reduces w transitions (w = wt. of edge) »If we pick high weight edges => more space reduction »Find maximum weight spanning forest »Tree edges becomes the default transitions n Problem: spanning tree may have very large diameter »Longer default paths => lower performance b 4 5 a d a c a b d a c b c b b a c d d d c # of transitions removed = =11 root

11 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n We need to construct bounded diameter trees »NP-hard »Small diameter bound leads to low trees weight –Less space efficient D 2 FA »Time-space trade-off n We propose heuristic algorithm based upon Kruskal’s algorithm to create compact bounded diameter D 2 FAs b 4 5 a d a c a b d a c b c b b a c d d d c

12 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Our heuristic incrementally builds spanning tree »Whenever, there is an opportunity, keep diameter small »Based upon Kruskal’s algorithm »Details in the paper

13 - Sailesh Kumar - 12/6/2015 Results n We ran experiments on »Cisco RegEx rules »Linux application protocol classifier rules »Bro rules »Snort rules (subset of rules) Size of DFA versus D 2 FA (No default path length bound applied)

14 - Sailesh Kumar - 12/6/2015 Space-Time Tradeoff Longer default path => more work but less space Space efficient region Default paths have length 4+ Requires 4+ memory accesses per character We propose memory architecture Which enables us to consume one character per clock cycle

15 - Sailesh Kumar - 12/6/2015 Summary of Memory Architecture n We propose an on-chip ASIC architecture »Use multiple embedded memories to store the D 2 FA –Flexibility –Frequent changes to rules n D 2 FA requires multiple memory accesses »How to execute D 2 FA at memory clock rates? n We have proposed deterministic contention free memory mapping algorithm »Uniform access to memories »Enables D 2 FA to consume a character per memory access »Nearly zero memory fragmentation –All memories are uniformly used n Details and results in paper n At 300 MHz we achieve 5 Gbps worst-case throughput

16 - Sailesh Kumar - 12/6/2015 Conclusion n Deep packet inspection has become challenging »RegEx are used to specify rules »Wire speed inspection n We presented an ASIC based architecture to perform RegEx matching at 10’s of Gigabit rates n As suggested in the public review, this paper is not the final answer to RegEx matching »But it is a good start n We are presently developing techniques to perform fast RegEx matching using commodity memories »Collaborators are welcome!!!

17 - Sailesh Kumar - 12/6/2015 Thank you and Questions?

18 - Sailesh Kumar - 12/6/2015 Backup Slides

19 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Our heuristic incrementally builds spanning tree »Whenever, there is an opportunity, keep diameter small »Details in the paper n Graph with 31 states, max. wt. default transition tree »Our heuristic creates smaller default paths Kruskal’s algorithm, Max. default path = 8 edges Our refined Kruskal’s algorithm, Avg. default path = 5 edges

20 - Sailesh Kumar - 12/6/2015 Multiple Memories n To achieve high performance, use multiple memories and D 2 FA engines n Multiple memories provide high aggregate bandwidth n Multiple engines use bandwidth effectively »However, worst case performance may be low –No better than a single memory »May need complex circuitry to handle contention n We propose deterministic contention free memory mapping and compare it to a random mapping

21 - Sailesh Kumar - 12/6/2015 Memory Mapping n The memory mapping algorithm can be modeled as a graph coloring »Graph is the set of default transition trees »Colors represent the memory modules »Color nodes of the trees such that –Nodes along a default path are colored with different colors –All colors are uniformly used n We propose two methods, naïve and adaptive Naïve coloring Adaptive coloring

22 - Sailesh Kumar - 12/6/2015 Results n Adaptive mapping leads to much more uniform color usage »Memories are uniformly used, little fragmentation »Up to 20% space saving with adaptive coloring n Throughput results (300 MHz dual-port eSRAM)