An Improved Multi-Pattern Matching Algorithm for Large-Scale Pattern Sets Author : Zhan Peng, Yu-Ping Wang and Jin-Feng Xue Conference: IEEE 10th International.

Slides:



Advertisements
Similar presentations
Deep Packet Inspection with DFA-trees and Parametrized Language Overapproximation Author: Daniel Luchaup, Lorenzo De Carli, Somesh Jha, Eric Bach Publisher:
Advertisements

Optimizing Regular Expression Matching with SR-NFA on Multi-Core Systems Authors : Yang, Y.E., Prasanna, V.K. Yang, Y.E. Prasanna, V.K. Publisher : Parallel.
Space-for-Time Tradeoffs
1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Pipelined Parallel AC-based Approach for Multi-String Matching Department of Computer Science and Information Engineering National Cheng Kung University,
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
1 A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber Tech. Rep. TR94-17,Department of Computer Science, University of Arizona, May 1994.
String Matching COMP171 Fall String matching 2 Pattern Matching * Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences of.
1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: Publisher/Conf.
An Efficient and Scalable Pattern Matching Scheme for Network Security Applications Department of Computer Science and Information Engineering National.
Pipelined Architecture For Multi-String Match Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Chapter 7 Space and Time Tradeoffs Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
Fast and Scalable Packet Classification Using Perfect Hash functions Author: Viktor Puš, Jan Korenek Publisher: FPGA’09 Presenter: Yu-Ping Chiang Date:
1 Scalable Pattern-Matching via Dynamic Differentiated Distributed Detection (D 4 ) Author: Kai Zheng, Hongbin Lu Publisher: GLOBECOM 2008 Presenter: Han-Chen.
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
SHOCK: A Worst-Case Ensured Sub-linear Time Pattern Matching Algorithm for Inline Anti-Virus Scanning Author: Nen-Fu Huang, Wen-Yen Tsai Publisher: IEEE.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Packet Classification using Rule Caching Author: Nitesh B. Guinde, Roberto Rojas-Cessa, Sotirios G. Ziavras Publisher: IISA, 2013 Fourth International.
Fast forwarding table lookup exploiting GPU memory architecture Author : Youngjun Lee,Minseon Jeong,Sanghwan Lee,Eun-Jin Im Publisher : Information and.
Packet Classification Using Multi-Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: COMPSACW, 2013 IEEE 37th Annual (Computer.
Author : Ozgun Erdogan and Pei Cao Publisher : IEEE Globecom 2005 (IJSN 2007) Presenter : Zong-Lin Sie Date : 2010/12/08 1.
Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:
Optimizing multi-pattern searches for compressed suffix arrays Kalle Karhu Department of Computer Science and Engineering Aalto University, School of Science,
A Regular Expression Matching Algorithm Using Transition Merging Department of Computer Science and Information Engineering National Cheng Kung University,
High-Speed Packet Classification Using Binary Search on Length Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Jan. 14, 2008.
Application: String Matching By Rong Ge COSC3100
EQC16: An Optimized Packet Classification Algorithm For Large Rule-Sets Author: Uday Trivedi, Mohan Lal Jangir Publisher: 2014 International Conference.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
DBS A Bit-level Heuristic Packet Classification Algorithm for High Speed Network Author : Baohua Yang, Xiang Wang, Yibo Xue, Jun Li Publisher : th.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
Early Detection of DDoS Attacks against SDN Controllers
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
Binary-tree-based high speed packet classification system on FPGA Author: Jingjiao Li*, Yong Chen*, Cholman HO**, Zhenlin Lu* Publisher: 2013 ICOIN Presenter:
Boundary Cutting for Packet Classification Author: Hyesook Lim, Nara Lee, Geumdan Jin, Jungwon Lee, Youngju Choi, Changhoon Yim Publisher: Networking,
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
Range Enhanced Packet Classification Design on FPGA Author: Yeim-Kuan Chang, Chun-sheng Hsueh Publisher: IEEE Transactions on Emerging Topics in Computing.
Packet Classification Using Dynamically Generated Decision Trees
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
SWM: Simplified Wu-Manber for GPU- based Deep Packet Inspection Author: Lucas Vespa, Ning Weng Publisher: The 2012 International Conference on Security.
LOP_RE: Range Encoding for Low Power Packet Classification Author: Xin He, Jorgen Peddersen and Sri Parameswaran Conference : IEEE 34th Conference on Local.
SRD-DFA Achieving Sub-Rule Distinguishing with Extended DFA Structure Author: Gao Xia, Xiaofei Wang, Bin Liu Publisher: IEEE DASC (International Conference.
Practical Multituple Packet Classification Using Dynamic Discrete Bit Selection Author: Baohua Yang, Fong J., Weirong Jiang, Yibo Xue, Jun Li Publisher:
Hierarchical Hybrid Search Structure for High Performance Packet Classification Authors : O˜guzhan Erdem, Hoang Le, Viktor K. Prasanna Publisher : INFOCOM,
LightFlow : Speeding Up GPU-based Flow Switching and Facilitating Maintenance of Flow Table Author : Nobutaka Matsumoto and Michiaki Hayashi Conference:
Scalable Multi-match Packet Classification Using TCAM and SRAM Author: Yu-Chieh Cheng, Pi-Chung Wang Publisher: IEEE Transactions on Computers (2015) Presenter:
JA-trie: Entropy-Based Packet Classification Author: Gianni Antichi, Christian Callegari, Andrew W. Moore, Stefano Giordano, Enrico Anastasi Conference.
A Multi-dimensional Packet Classification Algorithm Based on Hierarchical All-match B+ Tree Author: Gang Wang, Yaping Lin*, Jinguo Li, Xin Yao Publisher:
Reorganized and Compact DFA for Efficient Regular Expression Matching
2018/6/26 An Energy-efficient TCAM-based Packet Classification with Decision-tree Mapping Author: Zhao Ruan, Xianfeng Li , Wenjun Li Publisher: 2013.
Statistical Optimal Hash-based Longest Prefix Match
SigMatch Fast and Scalable Multi-Pattern Matching
Chapter 7 Space and Time Tradeoffs
2019/1/3 Exscind: Fast Pattern Matching for Intrusion Detection Using Exclusion and Inclusion Filters Next Generation Web Services Practices (NWeSP) 2011.
Memory-Efficient Regular Expression Search Using State Merging
A Small and Fast IP Forwarding Table Using Hashing
A New String Matching Algorithm Based on Logical Indexing
Knuth-Morris-Pratt Algorithm.
Space-for-time tradeoffs
Compact DFA Structure for Multiple Regular Expressions Matching
2019/5/3 A De-compositional Approach to Regular Expression Matching for Network Security Applications Author: Eric Norige Alex Liu Presenter: Yi-Hsien.
Pipelined Architecture for Multi-String Matching
Space-for-time tradeoffs
2019/5/14 New Shift table Algorithm For Multiple Variable Length String Pattern Matching Author: Punit Kanuga Presenter: Yi-Hsien Wu Conference: 2015.
Authors: A. Rasmussen, A. Kragelund, M. Berger, H. Wessing, S. Ruepp
A Hybrid IP Lookup Architecture with Fast Updates
An Improved Wu-Manber Multiple Patterns Matching Algorithm
2019/9/3 Adaptive Hashing Based Multiple Variable Length Pattern Search Algorithm for Large Data Sets 比對 Simple Pattern 的方法是基於 Hash 並且可以比對不同長度的 Pattern。
MEET-IP Memory and Energy Efficient TCAM-based IP Lookup
Presentation transcript:

An Improved Multi-Pattern Matching Algorithm for Large-Scale Pattern Sets Author : Zhan Peng, Yu-Ping Wang and Jin-Feng Xue Conference: IEEE 10th International Conference on Computational Intelligence and Security (CIS), 2014 Presenter: Kuan-Chieh Feng Date: 2015/11/18 Department of Computer Science and Information Engineering National Cheng Kung University

Outline Introduction Wu-Manber’s algorithm The Improved algorithm Experiment Results National Cheng Kung University CSIE Computer & Internet Architecture Lab 2

Introduction For single-pattern matching, the two most well-known algorithms are the Knuth- Morris-Pratt (KMP) algorithm and the Boyer-Moore (BM) algorithm. For multi-pattern matching, the two widely used algorithms are the Aho-Corasick (AC) algorithm and the WM algorithm. National Cheng Kung University CSIE Computer & Internet Architecture Lab 3

Introduction In this paper, an improved multi-pattern matching algorithm based on the framework of the Wu-Manber (WM) algorithm is proposed to effectively deal with the large pattern sets. National Cheng Kung University CSIE Computer & Internet Architecture Lab 4

Wu-Manber’s algorithm A shift-based algorithm which can match all patterns in the same time, we call it as multi- pattern match Can support large number of patterns because its data structure doesn’t occupy much memory space Need to pre-process the pattern set to construct its data structure National Cheng Kung University CSIE Computer & Internet Architecture Lab 5 [4] Sun Wu, Udi Manber, “A fast algorithm For Multi-Pattern Searching,” Technical Report TR 94-17, University of Arizona at Tuscon, May 1994

Wu-Manber’s algorithm Contains Two Stages : Preprocessing Stage Scanning Stage National Cheng Kung University CSIE Computer & Internet Architecture Lab 6

Wu-Manber’s algorithm Preprocessing Stage: LSP : length of shortest pattern in pattern set (scanning window size) feature-string : first LSP characters of each pattern denoted f-string feature-string set : denoted set F B-gram : usually set to 2 or 3 (block size) Based on F, we can build three tables named SHIFT table, HASH table and PREFIX table. National Cheng Kung University CSIE Computer & Internet Architecture Lab 7

Wu-Manber’s algorithm Pattern = {archer}, window size (LSP) = 4, block size (B-gram) = 2 National Cheng Kung University CSIE Computer & Internet Architecture Lab 8 B-gramShift value archerch0 archerrc1 archerar2

Wu-Manber’s algorithm National Cheng Kung University CSIE Computer & Internet Architecture Lab 9 indexPatternf-string 1such 2rich 3archerarch 4checkchec B-gramShift value ar3 -> 2 ch3 -> 0 ec3 -> 0 he3 -> 1 ic3 -> 1 ri3 -> 2 su3 -> 2 uc3 -> 1 others3 {archer } {archer, such, rich, check} {check} {rich} {such} Pattern set Shift table Default value of shift table entries is LSP-B+1, m is window size and k is block size LSP = 4 and B-gram = 2 Shift value is LSP-q, q is the rightmost position of each B-gram

Wu-Manber’s algorithm BCarchecheicrisuucothers Shift value National Cheng Kung University CSIE Computer & Internet Architecture Lab 10 indexPattern 1such 2rich 3archer 4check keyvalue ch1~3 ec4 others0 Input text = s u c t i r i c h e c k Shift table Hash table Pattern table Matched! After full matching, shift 1 character Matched! m = 4, k = 2 End scan input text PrefixPattern ririch chcheck …… Prefix table

Improved algorithm Two limitations in WM algorithm: The performance is severely affected by LSP.If LSP is very small, there is little opportunity for the algorithm to shift far. With the growing of the pattern set, the lists tied to the HASH table may become unbalance (some lists will be much longer than others). National Cheng Kung University CSIE Computer & Internet Architecture Lab 11

Improved algorithm Two aspects: A selection method for choosing of f-strings :  Reduce the number of candidate patterns for a scan window. INDEX Table :  Reduce the time for finding candidate patterns in hash lists. National Cheng Kung University CSIE Computer & Internet Architecture Lab 12

Improved algorithm - selection method The original WM algorithm always chooses the first LSP characters of one pattern as its f-string without considering any characteristics of the pattern set. National Cheng Kung University CSIE Computer & Internet Architecture Lab 13

Improved algorithm - selection method National Cheng Kung University CSIE Computer & Internet Architecture Lab 14 Here we give a simple selection strategy only depending on the pattern set itself, it contains two steps: Step 1 : For every possible B-gram, we count and record how many patterns in the pattern set containing that B-gram as a substring Step 2 : For a given pattern, among all its substrings of length LSP, we pick out the one whose B-gram suffix has a minimum occurrence and make it to be the f-string of the pattern

Improved algorithm - selection method Step 1 : For example, the number of times each 2-gram occurs in the given pattern set P is: National Cheng Kung University CSIE Computer & Internet Architecture Lab 15 abbdduucce enntctdeothers P = { p1: "abden", p2:"abduct", p3:"abd", p4:"abduce", p5: "abducent" }

Improved algorithm - selection method Step 2 : National Cheng Kung University CSIE Computer & Internet Architecture Lab 16 The corresponding set F of the given pattern set P is : F = { fp1:“bde”, fp2:“uct”, fp3:“abd”, fp4:"uce", fp5:"ent" }

Improved algorithm – INDEX table In the original algorithm, when the B-gram suffix (assumes it is hashed into i) of the current scan window is encountered during scanning, and then every pattern in the hash list related to HASH[i] will be checked (using PREFIX table) for the candidacy. It is inefficient for long hash lists. National Cheng Kung University CSIE Computer & Internet Architecture Lab 17

Improved algorithm - INDEX table Here we design a simple subordinate data structure called "INDEX table" to take the place of PREFIX table. National Cheng Kung University CSIE Computer & Internet Architecture Lab 18

Experiment Results National Cheng Kung University CSIE Computer & Internet Architecture Lab 19

Experiment Results – under various LSPs The size of each pattern set and the size of the text are fixed to 5x10^5 and 100MB National Cheng Kung University CSIE Computer & Internet Architecture Lab 20

Experiment Results – under various number of patterns The LSP of each pattern set is fixed to 7, and the size of the text is still 100MB. Let B = 3. National Cheng Kung University CSIE Computer & Internet Architecture Lab 21