1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Fang Yu Microsoft Research, Silicon Valley Work was done in UC Berkeley,

Slides:



Advertisements
Similar presentations
Gigabit Rate Packet Pattern-Matching Using TCAM
Advertisements

Deep Packet Inspection: Where are We? CCW08 Michela Becchi.
Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.
Fast Submatch Extraction using OBDDs Liu Yang 1, Pratyusa Manadhata 2, William Horne 2, Prasad Rao 2, Vinod Ganapathy 1 Rutgers University 1 HP Laboratories.
Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University.
Multi-Core Packet Scattering to Disentangle Performance Bottlenecks Yehuda Afek Tel-Aviv University.
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Anat Bremler-Barr Interdisciplinary Center Herzliya Shimrit Tzur David Interdisciplinary.
Technical University of Crete Packet Pre-filtering for Network Intrusion Detection Ioannis Sourdis, Vasilis Dimopoulos, Dionisios Pnevmatikatos and Stamatis.
Genome-scale disk-based suffix tree indexing Benjarath Phoophakdee Mohammed J. Zaki Compiled by: Amit Mahajan Chaitra Venus.
Reviewer: Jing Lu Gigabit Rate Packet Pattern- Matching Using TCAM Fang Yu, Randy H. Katz T. V. Lakshman UC Berkeley Bell Labs, Lucent ICNP’2004.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
Protomatching Network Traffic for High Throughput Network Intrusion Detection Shai RubinSomesh JhaBarton P. Miller Microsoft Security Analysis Services.
Efficient Multi-match Packet Classification with TCAM Fang Yu Randy H. Katz EECS Department, UC Berkeley {fyu,
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
1 Energy Efficient Multi-match Packet Classification with TCAM Fang Yu
Efficient Multi-Match Packet Classification with TCAM Fang Yu
Aho-Corasick String Matching An Efficient String Matching.
1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman
SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Martin Austin Motoyama 1 Randy H. Katz 1 1 EECS.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara.
Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
Improving Signature Matching using Binary Decision Diagrams Liu Yang, Rezwana Karim, Vinod Ganapathy Rutgers University Randy Smith Sandia National Labs.
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan, Timothy Sherwood Appeared in ISCA 2005 Presented by: Sailesh.
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author: Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
Distributed Protein Structure Analysis By Jeremy S. Brown Travis E. Brown.
Para-Snort : A Multi-thread Snort on Multi-Core IA Platform Tsinghua University PDCS 2009 November 3, 2009 Xinming Chen, Yiyao Wu, Lianghong Xu, Yibo Xue.
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience Zhichun Li, Manan Sanghi, Yan Chen, Ming-Yang Kao and Brian.
Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:
SI-DFA: Sub-expression Integrated Deterministic Finite Automata for Deep Packet Inspection Authors: Ayesha Khalid, Rajat Sen†, Anupam Chattopadhyay Publisher:
A Regular Expression Matching Algorithm Using Transition Merging Department of Computer Science and Information Engineering National Cheng Kung University,
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, Randy H.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:
MASCOTS 2003 An Active Traffic Splitter Architecture for Intrusion Detection Ioannis Charitakis Institute of Computer Science Foundation of Research And.
Para-Snort : A Multi-thread Snort on Multi-Core IA Platform Tsinghua University PDCS 2009 November 3, 2009 Xinming Chen, Yiyao Wu, Lianghong Xu, Yibo Xue.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES Lesson №18 Telecommunication software design for analyzing and control packets on the networks by using.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author : Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
Memory Compression Algorithms for Networking Features Sailesh Kumar.
Polygraph: Automatically Generating Signatures for Polymorphic Worms James Newsome, Brad Karp, and Dawn Song Carnegie Mellon University Presented by Ryan.
INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
Lexical Analysis S. M. Farhad. Input Buffering Speedup the reading the source program Look one or more characters beyond the next lexeme There are many.
Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
A DFA with Extended Character-Set for Fast Deep Packet Inspection
Advanced Algorithms for Fast and Scalable Deep Packet Inspection
Regular Expression Matching in Reconfigurable Hardware
Yan Chen Department of Electrical Engineering and Computer Science
SigMatch Fast and Scalable Multi-Pattern Matching
2019/1/1 High Performance Intrusion Detection Using HTTP-Based Payload Aggregation 2017 IEEE 42nd Conference on Local Computer Networks (LCN) Author: Felix.
2019/1/3 Exscind: Fast Pattern Matching for Intrusion Detection Using Exclusion and Inclusion Filters Next Generation Web Services Practices (NWeSP) 2011.
Compact DFA Structure for Multiple Regular Expressions Matching
2019/5/3 A De-compositional Approach to Regular Expression Matching for Network Security Applications Author: Eric Norige Alex Liu Presenter: Yi-Hsien.
A Hybrid Finite Automaton for Practical Deep Packet Inspection
Presentation transcript:

1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Fang Yu Microsoft Research, Silicon Valley Work was done in UC Berkeley, jointly with Zhifeng Chen (Google Inc); Yanlei Diao (Umass, Amherst); T. V. Lakshman (Bell Labs); Randy H. Katz (UC Berkeley)

Regular Expressions Flexible way to describe pattern –Example: for detecting yahoo messenger traffic ^(ymsg|ypns|yhoo).?.?.?.?.?.?.? [lwt].*\xc0\x80 Used in many payload scanning applications –L7-filter: protocol identifiers –Bro: intrusion patterns –SNORT: No regular expression in April out of 4867 intrusion rules contain regular expressions as of Jan

Challenges Features specific to packet scanning applications Large set of patterns, order of 100s or 1000s SnortL7-filterXML filtering # of regular expressions analyzed , ,000 % of patterns with wildcards “., +, ?, *”74.9%75.7%50% -100% Average # of wildcards per pattern % of patterns with class “[ ]”31.6%52.8%0 Average # of classes per pattern % of patterns with length restrictions on classes or wildcards 56.3%21.4% 00 3

Design Space 4 Automata-based Approaches DFA-basedNFA-based Patterns (A|B)C and (A|D)E A group of states can be activated simultaneously Only one state is activated High percentage of wildcards  NFA-based approaches can be slow, sometimes less than 1Mb/s Repeated ScanOne Pass Scan Start scanning from one position, if no match, start again at the next position Good for parsers Packets may not contain any patterns No guarantee of high speed Scan the input only once Fast and deterministic throughput Add.* before patterns Some patterns generate very large DFA m Individual DFA for m patterns One composite DFA for m patterns O(m) processing complexity for each input character O(1) processing complexity for each input character Rewrite techniques to reduce memory usage Make DFA-based approach feasible Contributions Selectively group patterns into k groups (e.g., k=3) Avoid exponential memory growth Further speed up matching process (Space Problem)

DFA Sizes of Regular Expressions Pattern featuresExample# of states % of patterns Average # of states 1) Explicit strings with k characters^ABCD.*ABCD k+125.1% ) Wildcards^AB.*CD.*AB.*CD k % ) Patterns with ^, a wildcard, and a length restriction j ^AB.{j+}CD ^AB.{0, j}CD ^AB.{j}CD O(k*j)44.7% ) Patterns with ^, a class of characters overlaps with the prefix, and a length restriction j ^A+[A-Z]{j}DO(k+j 2 ) j~ % ) Patterns with a length restriction j, where a wildcard or a class of characters overlaps with the prefix.*AB.{j}CD.*A[A-Z]{j+}D O(k+2 j ) j~ %> Typical patterns in network payload scanning applications Rewrite Rule 2 Focus of this talk Rewrite Rule 1

Design Considerations Completeness of matching results for one pattern –Complete matching Report all the possible substrings E.g., a pattern ab* and an input abbb –Four possible matches, i.e., a, ab, abb, and abbb –Non-overlapping matching Common practice: left-most longest match, shortest match results In most payload scanning applications, for one pattern, reporting non-overlapping matching result is sufficient 6

Patterns with Exponential DFA Sizes Often for detecting buffer overflow attempts, e.g.,.*AUTH\s[^\n]{100} DFA needs to remember all the possible AUTH\s –A second AUTH\s can either match [^\n]{100} or be counted as a new match of the start of the pattern AUTH\s –Generate a DFA of >100,000 states Can’t be efficiently processed by an NFA-based approach either 7 AUTH\s[\^n] 100 states ε NFA for.*AUTH\s[^\n]{100} Input AUTH\sAUTH\s AUTH\s\s AUTH\s\s\s …

Rewriting Intuition Only the first AUTH\s matters –If there is a ‘\n’ within the next 100 bytes None of the AUTH\s matches the pattern –Otherwise, the first AUTH\s and the following characters have already matched the pattern  Rewrite the pattern to: ([^A]|A[^U]|AU[^T]|AUT[^H]|AUTH[^\s]|AUTH\s[^\n]{0,99}\n)*AUTH\s[^\ n]{100} generates a DFA of only 106 states This rewritten pattern –Report different numbers of matches from the original pattern in identifying complete matches –Equivalent in identifying non-overlapping patterns 8

Rewriting Effect on the SNORT Rule Set 9 Pattern featuresExample# of states % of patterns Average # of states 1) Explicit strings with k characters^ABCD.*ABCD k+125.1% ) Wildcards^AB.*CD.*AB.*CD k % ) Patterns with ^, a wildcard, and a length restriction j ^AB.{j+}CD ^AB.{0, j}CD ^AB.{j}CD O(k*j)44.7% ) Patterns with ^, a class of characters overlaps with the prefix, and a length restriction j ^A+[A-Z]{j}DO(k+j 2 ) O(k+j) 5.11% ) Patterns with a length restriction j, where a wildcard or a class of characters overlaps with the prefix.*AB.{j}CD.*A[A-Z]{j+}D O(k+2 j ) O(k+j) 6.27%>2 214 v

Rewriting Effect on the SNORT Rule Set Created scripts to automatically rewrite patterns 10 Type of RewriteRule SetNumber of Patterns Average Length Restriction DFA Reduction Rate Rewrite Rule for Quadratic case Snort17370>98% Bro000 Rewrite Rule for Exponential Case Snort19344>99% Bro >99% –After rewriting, patterns in SNORT and Bro can be compiled into DFAs

Design Choices 11 Automata-based Approaches DFA-basedNFA-based Repeated ScanOne Pass Scan m Individual DFA for m patterns One composite DFA for m patterns O(m) processing complexity for each input character O(1) processing complexity for each input character Rewrite techniques to reduce memory usage Make DFA-based approach feasible Contributions Selectively group patterns into k groups (e.g., k=3) Further speedup matching process Avoid exponential memory growth

State Explosion Problem Randomly adding patterns from the L7-filters into one DFA 12

Interactions of Regular Expressions Some patterns generate DFA of exponential sizes –E.g., A DFA for pattern.*AB.*CD and.*EF.*GH 13

Grouping Algorithms –Fixed local memory limitation ( NPU or multi-core architectures) Compute pair-wise interactive results, form a graph Keep adding patterns until reaching limit –Pick a pattern with the fewest interactions to the new group –Fixed total memory limitation (General single-core CPU architecture) First compute the DFA of individual patterns and compute the leftover memory size Distribute the leftover memory evenly among ungrouped expressions 14

Experimental Setup Regular expression pattern sets –Linux application layer filer (L7-filter): 70 regular expressions –Pattern sets from Bro intrusion detection systems HTTP related patterns: 648 patterns Payload related patterns: 223 patterns Packet traces: –MIT dump: with viruses and worms –Berkeley dump: normal traffic Scanners: –Generated one pass scanning DFA scanner –A NFA-based scanner Pcregrep –A repeated scanning DFA parser generated by flex 15

Grouping Results for Patterns in L7-filter (70 patterns) Total DFA state Limit Groups Compilation Time (s) Results of grouping algorithms for fixed total memory Sum of individual DFAs No extra memory cost 70/12=5.83 times less processing per character 6.83MB of memory 70/3=23.3 times less processing per character No grouping

Throughput Analysis 17 For Linux L7-filter (70 patterns) Using PCs with 3Ghz single core CPU and 4GB memory

Comparisons to Other Approaches Throughputs (Mb/s) Memory Consumption (KB) MIT dumpBerkeley dump Linux L7-filter (70 patterns) NFA DFA RP DFA OP 3 groups Bro HTTP (648 patterns) NFA DFA RP DFA OP 1 group Bro Payload (223 patterns) NFA DFA RP DFA OP 4 groups NFA—Pcregrep DFA RP – Flex generated DFA-based repeated scan engine DFA OP – Our DFA one pass scanning engine DFA OP is 48 to 704 times faster over the NFA implementation times faster than the commonly used DFA-based parser Use 2.6 to 8.4 times memory

Conclusions High speed regular expression matching scheme –Proposed two rewrite rules DFA-based approach is possible with our rewriting rules Can rewrite complicated patterns from our pattern sets In other pattern sets, there may be patterns not covered by our rewriting rules. –Developed grouping algorithm to selectively group patterns together Orders of magnitude faster than existing solutions –Can be applied to FPGA or ASIC based approaches as well 19