Presentation is loading. Please wait.

Presentation is loading. Please wait.

Krzysztof Fabjański Common string pattern searching.

Similar presentations


Presentation on theme: "Krzysztof Fabjański Common string pattern searching."— Presentation transcript:

1 Krzysztof Fabjański Common string pattern searching

2 Presentation layout: ➢ methods of network traffic collection and its representation ➢ process of signature generation ➢ summary and conclusions

3 Methods of network traffic collection and its representation collecting network traffic: ➢ PC + tcpdump ➢ PC + snort ➢ honeynet ➢ nepenthes (malware collection) ➢ arakis representation of network traffic: ➢ tcpdump format (with payload)

4 Sample tcpdump with payload

5 Process of signature generation ➢ identification of attack ➢ classification of threat ➢ classification of vulnerability ➢ network traffic representation ➢ Proposition of the signature ➢ normalization and validation ➢ introduction of the new signature to the rule set Area of interest.

6 Problem of a huge amount of information web site sport.onet.pl was loaded in 3 sec. During that time tcpdump captured 195 packages. The file with packages consisted of 5666 lines and its size was 415155 bytes.

7 Problem of similarities Should we have: 3 larger singatures: AA|C|HH KK|WW|II DD|LL|DD or 1 common: ABC|C A A A A A A B B C C C C C C H H H H J J D D D D A A B B C C L L L L C C D D D D A A A A A A B B C C C C C C H H H H C C E E A A A A B B C C C C F F H H H H C C A A B B C C K K K K W W W W C C I I I I D D D D A A B B C C L L L L C C D D D D K K K K A A B B C C W W W W C C I I I I A A B B C C K K K K W W W W C C I I I I D D D D A A B B C C L L L L C C D D D D

8 Different types of analysis (for and against) Offline: (DBSCAN) ➢ good precision ➢ low efficiency ➢ time-consuming Online: (Suffix tree algorithm) ➢ good precision ➢ good efficiency ➢ very fast

9 Suffix Trees Suffix Trees are universal data structures useful in a variety of string processing problems Align entire genomesFinding the largest palindrome Detect repeats in DNAFinding the longest common substring in a set Sequence homologyExact and approximate substring matching BioinformaticsTraditional Text Applications

10 $bdacba Building the Suffix Tree with the naive algorithm abcabd $ bcabd$ cabd$ ba d$ $bdac b d$ cabd$ d$d$ $ Running time O(n 2 ) abcabd $

11 Building the Suffix Tree with the Ukkonen algorithm O(n) ➢ Online Algorithm ➢ Uses Suffix Links which link nodes xα → α Link xα → α abcabd $ cabd$ ba d$ $bdac b d$ cabd$ d$d$ $

12 1 create a root 2 add a branch and leaf with S[1] label 3 LastExtension=1 4 for Phase=2 to length[S] 5 do 6 for Extension=LastExtension to Phase 7 do 8 find the end of the path with S[Extension.. Phase – 1] label 9 extend the path 10 if rule for extension==3 then end the loop 11 done 12 LastExtension=Extension 13 done Building the Suffix Tree with the Ukkonen algorithm - pseudocode

13 Comparison of two strings s 1 and s 2 in steps ➢ building a suffix tree for s 1 ➢ finding the longest match of suffixes of s 2 on the suffix tree of s 1 ➢ return of the longest suffix of s 2 machted on the suffix tree of s 1

14 Comparison of more then two strings using Generalized suffix tree ➢ concatenation of strings {s 1,s 2,...,s n } ➢ building a suffix tree for contacenated s string using Ukkonen approach. ➢ return suffix which is the most common for {s 1,s 2,...,s n }

15 Common string pattern searching – main assumptions ➢ online string comparison require O(n) running time ➢ should find all possible common substrings ➢ should clusterize into sets of common strings

16 Common string pattern searching proposition ➢ genarlized suffix tree as a main structure (addition of strings is performed in online mode – no concatenation). ➢ additional variables describing the weight of particular node (numer of matches) ➢ additional structure – list of strings with the numbers denoting the starting position of the suffix in those strings (possible use of hash tables).

17 cab $ aba $ abc $ An example: ba $c a$ $ $c a$ $ b c $ ab$ $ 1 {3 abc$}1 {3 aba$}1 {4 cab$} 3 {1 abc$} {1 aba$} {2 cab$} 3 {2 abc$} {2 aba$} {3 cab$} 2 {3 abc$} {1 cab$} 3 {4 abc$} {4 aba$} {4 cab$} 1 {3 abc$}1 {3 aba$}1 {4 cab$}1 {4 abc$}1 {2 cab$} Expected result: ab | $

18 Thank you for your attention


Download ppt "Krzysztof Fabjański Common string pattern searching."

Similar presentations


Ads by Google