1 Exact pattern matching on resource- limited network devices Chien-Chung Su 2002/12/10.

Slides:



Advertisements
Similar presentations
Algorithms Chapter 15 Dynamic Programming - Rod
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
A Survey of Botnet Size Measurement PRESENTED: KAI-HSIANG YANG ( 楊凱翔 ) DATE: 2013/11/04 1/24.
Massively Parallel Cuckoo Pattern Matching Applied For NIDS/NIPS  Author: Tran Ngoc Thinh, Surin Kittitornkun  Publisher: Electronic Design, Test and.
Hash-Based Indexes Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY.
CPSC 335 Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Join Processing in Databases Systems with Large Main Memories
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Optimizing Buffer Management for Reliable Multicast Zhen Xiao AT&T Labs – Research Joint work with Ken Birman and Robbert van Renesse.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Cell Size Regulation in Bacteria Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA Ariel Amir.
Pricing and Power Control in a Multicell Wireless Data Network Po Yu Chen October, 2001 IEEE Journal on Select Areas in Communications.
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
1 Mining Relationships Among Interval-based Events for Classification Dhaval Patel 、 Wynne Hsu Mong 、 Li Lee SIGMOD 08.
第二章 研究主題(研究題 目)與研究問題.
CPSC 335 Computer Science University of Calgary Canada.
B+-tree and Hashing.
AQM for Congestion Control1 A Study of Active Queue Management for Congestion Control Victor Firoiu Marty Borden.
Motif Finding in Transcription Factor Binding Sites Jian-Bien Chen ( 陳建儐 )
1 Tree-Structured Indexes Yanlei Diao UMass Amherst Feb 20, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
Algorithms and Efficiency of Algorithms February 4th.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
: Searching for Nessy ★☆☆☆☆ 題組: Problem Set Archive with Online Judge 題號: 11044: Searching for Nessy 解題者:王嘉偉 解題日期: 2007 年 5 月 22 日 題意: 給定 case 數量.
1 Scalable Pattern-Matching via Dynamic Differentiated Distributed Detection (D 4 ) Author: Kai Zheng, Hongbin Lu Publisher: GLOBECOM 2008 Presenter: Han-Chen.
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Introduction to Database Systems1 B+-Trees Storage Technology: Topic 5.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Machine Learning and Optimization For Traffic and Emergency Resource Management. Milos Hauskrecht Department of Computer Science University of Pittsburgh.
Presenter: Jen-Hua Chi Advisor: Frank, Yeong-Sung Lin
Study of the Relationship between Peer to Peer Systems and IP Multicasting From IEEE Communication Magazine January 2003 學號 :M 姓名 : 邱 秀 純.
Presented by Group 2: Presented by Group 2: Shan Gao ( ) Shan Gao ( ) Dayang Yu ( ) Dayang Yu ( ) Jiayu Zhou ( ) Jiayu Zhou.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
演算法 ( 課號 : ) 內容 : 本課程是電機工程學系計算機類課程之必選課,對象以對利 用計算機來解決問題有興趣的同學為主。本課程主要是教授 基本的演算法分析與設計技巧,並整理及比較目前最重要之 演算法。 Theme: What is the best algorithm for.
Solving Linear Equations To Solve an Equation means... To isolate the variable having a coefficient of 1 on one side of the equation. Examples x = 5.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Basic Concepts Indexing mechanisms used to speed up access to desired data. E.g., author catalog in library Search Key - attribute to set of attributes.
Greedy Methods and Backtracking Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Competitive Queue Policies for Differentiated Services Seminar in Packet Networks1 Competitive Queue Policies for Differentiated Services William.
MSU/CSE 260 Fall Functions Read Section 1.8.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
1 CS 430: Information Discovery Lecture 4 Files Structures for Inverted Files.
1 Power-Efficient TCAM Partitioning for IP Lookups with Incremental Updates Author: Yeim-Kuan Chang Publisher: ICOIN 2005 Presenter: Po Ting Huang Date:
Virtual Memory The memory space of a process is normally divided into blocks that are either pages or segments. Virtual memory management takes.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
1 Inventory Control with Time-Varying Demand. 2  Week 1Introduction to Production Planning and Inventory Control  Week 2Inventory Control – Deterministic.
Copyright © Curt Hill Hashing A quick lookup strategy.
CES 592 Theory of Software Systems B. Ravikumar (Ravi) Office: 124 Darwin Hall.
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
COP Introduction to Database Structures
CSCE350 Algorithms and Data Structure
Space-for-time tradeoffs
Objective of This Course
B+-Trees and Static Hashing
Effective Replica Allocation
Networked Real-Time Systems: Routing and Scheduling
Space-for-time tradeoffs
Database Systems (資料庫系統)
Database Design and Programming
Space-for-time tradeoffs
Evaluation of Relational Operations: Other Techniques
Huffman Coding Greedy Algorithm
Presentation transcript:

1 Exact pattern matching on resource- limited network devices Chien-Chung Su 2002/12/10

2 Outline Problem definition Resource-limited network devices Introduction of SEBMH Disadvantages of SEBMH Adaptive bucket management Conclusion

3 Problem definition Given –P : pattern(s) –T : text General action –Find all occurrences of P in T

4 Research for exact pattern matching The exact matching problem is solved for those typical word-processing applications. The story changes radically for other specific applications. –DNA and protein search –Relation between search performance and database size –Network intrusion detection

5 Resource-limited network devices Special issues –Security issues Check whether P occur in T –Resource-limited Try to break the tradeoff between speed and space Characteristics –Network-related pattern matching Patterns change sometimes Texts change usually –Solutions Dynamic hash function Adaptive bucket management

6 SEBMH Global Shift Table Hash-Link-List Structure of ASCII Patterns Hash-Link-List Structure of non-ASCII Patterns Input Mask

7 Set-Exclusive table

8 Disadvantages of SEBMH Because the hash function is static, the performance is still dependent with pattern set. –Dynamic hash function The general pattern matching problem, the global shift values will be close to 1 when there are more and more patterns –Classifying the patterns to ease the influence

9 How to improvement Pattern classifier Approximate perfect hash function Adaptive bucket management

10 Step1. sort the class target patterns by KEY Step2. equally distribute the class target patterns into each bucket n = BUCKET_NUM; i = 0; while (pattern is not the last one) { for (i=0 ; i<AVG_P ; i++) { 1.dispatch pattern into bucket(n); 2.get the next pattern; } n++; } Step3. handle the exception condition for (i=1 ; i<BUCKET_NUM ; i++) { if ( patterns with key in bucket(i-1) equal to patterns with key in bucket( i ) ) 1.group these patterns into bucket(i-1) or bucket( i ) } Approximate hash function (1)

11 Approximate hash function (2)

12 Adaptive bucket management Assumption –Resource is limited –Total bucket number is fixed Step 1 : classify the patterns –For example (feature is a factor) Class A Class B Class C

13 Adaptive bucket management Step 2 : allocate buckets –For example Traffic distribution –Class A : 50% –Class B : 30% –Class C : 20% Policy –SEBMH(Class A) could get more buckets at this time –Set-Exclusive table will be more effective »bucket ↑, pattern per bucket ↓, efficacy of set-exclusive table ↑ »bucket ↓, set-exclusive utilization ↑

14 How to allocate buckets Communism Fair Greedy

15 Basic assumption Assumption –Φ : matching time for one pattern –B : total buckets number –P : total patterns number –C : classes number –Bi : buckets number for class i –Pi : patterns number for class i –Di : traffic distribution of class I Known –P1 + P2 + … + Pc = P –D1 + D2 + … + Dc = 1 Problem –Find a sequence (B1, B2, …, Bc) B1 + B2 + … + Bc = B is small enough

16 Communism Method ABM is not applied Without ABM –Classifier is no need –Average matching time : –Other overheads Overheads of approximate perfect hashing Efficacy of Global-Shift table is not obvious Efficacy of Set-Exclusive table is not obvious

17 Fair Method At least one solution For example –Traffic distribution Class A : 50% Class B : 30% Class C : 20% With ABM in Fair Method –Average matching time : –Example:

18 Greedy Method We can find better solutions For example –Traffic distribution Pattern distribution Class A : 50% Class A : 5 Class B : 30% Class B : 5 Class C : 20% Class C : 20 With ABM in Greedy Method –Average matching time : –Example

_ 實驗報告

20 Objective 觀察最佳解的分佈情況 希望能從觀察中找出演算法來求解

21 Traffic dist. 和 pattern dist. 成正比 Bucket = 10 Bucket = 30

22 Traffic dist. 和 pattern dist. 成反比 Bucket = 10 Bucket = 30

23 結論 當 pattern 和 traffic 的分布成反比時才有效果, 可作為訓練 classifier 的參考依據

24 Greedy Algorithm (temp) Step 1 : get the B i from fair method Step 2 : borrow 1 bucket from each class –bonus_bucket = # of class Step 3 : dispatch the bonus buckets –Bonus i = floor (bonus_bucket * (P i / P)) Step 4 : dispatch the remainder buckets –Add bucket into each class and find the best solution one by one

25 How to classify patterns (1) The goals the classifier should achieve –High priority reduce the frequency of ABM performed –Low priority enhance the efficacy of ABM

26 How to classify patterns (2) reduce the frequency of ABM performed –When ABM should not be performed for specific classes …….(1) …….(2)

27 How to classify patterns (3) Expected affect of and – ↑ – ↓ – ↑ – ↓

28 How to classify patterns (4) enhance the efficacy of ABM –Try to let Pi is increasing Di is decreasing

29 How to classify patterns (5) Operators –Combination Directly combine two classes in the same domain –Sibling aggregation Combine two classes in the different domain patterns OtherUDPTCP HTTP FTP …. TFTPICMP Objective –Make the tree with the stable traffic tree …. Constrain –A lots of patterns with the same prefix in the same class should be a independent class

30 How to classify patterns (6) Mathematical model for training classifier –Merge two classes when Conditions of means hold Conditions of variances hold – are the same as previous meanings –k (>=1) is a coefficient that could balance Resource [ k↑] Performance [ k ↓]

31 How to classify patterns (7) Conditions of means

32 How to classify patterns (7) Conditions of variances

33 Classifier Advantages –reduce the impact of complex approximate perfect hash function –eliminate the pattern matching not required

34 Classifier behavior Input packet belong to any class? NO bypass YES dispatch the input packet to the corresponding handler

35 Next Experiments

36 Conclusion