Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-throughput Linked-Pattern Matching for Intrusion Detection System Zachary Baker and Viktor K. Prasanna University of Southern California

Similar presentations

Presentation on theme: "High-throughput Linked-Pattern Matching for Intrusion Detection System Zachary Baker and Viktor K. Prasanna University of Southern California"— Presentation transcript:

1 High-throughput Linked-Pattern Matching for Intrusion Detection System Zachary Baker and Viktor K. Prasanna University of Southern California

2 Outline  Introduction to Intrusion Detection and hardware pattern matching  Performance-centered design flow –Area, performance over large rule databases is more important  Methodology –Library of architectural options Separate pre-decoded pipelines Basic architecture results Customized performance: Partitioning Prefix trees Correlated Content Tool details Reducing PAR cost through incremental synthesis Handling multiple streams through re-sending Efficiency of re-sending strategy  Conclusion

3 IDS: Intrusion Detection Systems  All incoming packets are filtered for specific characteristics or content  Current databases have thousands of patterns requiring string matching –FPGA allows fine-grained parallelism and computational reuse  10 Gb/s and higher rates desired –This is an fairly artificial bound, header processing can reduce overall string matching burdens –Provided by pipelined, streaming architectures

4 String Matching  Throughput of units must equal maximum buffered traffic on network  Current Strategies: –Naïve approach –Hashing ala Bloom filters –KMP, Boyer-Moore, Aho-Corasick (especially bit-splitting, Eatherton trie) –Hardwired shift-and-compare Very fast and simple units Allows variety of interesting meta-layer work to be tacked on

5 Performance-Centered Design Flow  System-centric view  Several thousand pattern matching rules –Unit view of matching units can be inefficient Unnecessary work and hardware –Methodology works towards time and area performance metrics System level design Reuse of hardware elements Option to exchange some area efficiency for bandwidth Allows for patterns to be combined together to extract deeper information

6 Tool Flow

7 Library of Design Options  Basic Architecture: –Shared decoding Characters are decoded into one-bit pipelines and distributed to units as needed –Correlated Content Linkages Reduces false-positives Reduces burden on external controller  Customized performance –Prefix trees Take advantage of partition-generated similarity by creating shared prefixes –High throughput architectures Take advantage of low area requirements to replicate hardware

8 Basic Architecture:Partitioned Pipelines

9 Basic Architecture: Pre-Processing  System-level partitioning of patterns –Reduction in pipeline burden through min-cut partitioning –Shared characters are grouped into independent pipelines, increasing single-chip throughput and allowing for effective multi-chip partitioning –Tool first generates graph representation of pattern set, then executes partitioning routine. The partitioned graphs are then translated into architecture description Partitioning also useful in reducing P&R time

10 Partitioning results in 20% increase in clock frequency - Optimal number of partitions is unpredictable Results - Basic Architecture

11 Flow Options: Area-Efficient Tree Architecture  4-byte prefixes turn out to be very appropriate for intrusion detection: /cgi-bin/bigconf.cgi /cgi-bin/common/ /cgi-sys/addalink.cgi /cgi-sys/entropysearch.cgi  Script searches for 4-byte prefixes and sub-prefixes then generates prefix matchers in hardware description. –Essentially hardware Aho-Corasick tree Average of 15% decrease in area, 5% decrease in clock period over plain unary

12 Correlated Content Layer  Link together pattern matchers –Form state machines from low- level comparators –form higher-level ideas basic regular expressions are available –(alert tcp EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS msg: "attackpattern"; content: "attack"; content: "pattern"; within: 5;)

13 Correlated Content Layer  Benefits: –reduces burden on external controller –reduces number of inputs to priority encoder –basic regular expression functionality AND, OR, !, within, distance, character classes  Disadvantages –Adds state that has to be maintained when streams switch But only the counters that are active

14 Tool Details  Implemented in Perl –Text-oriented language for batch processing of text (pattern databases) and generation of VHDL outputs –Utilizes the Metis partitioning library (U/Minn) –Template-based generation of architecture descriptions  Graph Creation, Partitioning: –Run time: ~30 seconds for < 2000 patterns –Insignificant time costs compared to improvements in performance –Place and Route processing times dwarf architecture generation costs Problem with all hardwired shift-and-compare architectures

15 Small Changes to Rule-set  In normal flow, changes to a single character would result in recompilation of the entire design –Wasteful and a lengthy process –In general, routing tools do not handle small changes well Reduced frequency performance Interaction of interconnect and mapping is highly connected to performance –However, if blocks of architecture can be physically separated on the device, interaction is eliminated Creates a smaller place and route problem Small changes can be integrated without full recompilation

16 Increasing Speed: Place & Route  Key: Predefinition of area constraints for each partition –ideally the partitions are balanced –Underutilization of device blocks makes meeting timing constraints easier

17 Increasing Speed: Place & Route Definitions:  The optimal partition is selected from the set of partitions P  Sp* is the set of characters required to represent the new pattern p*  is the set difference between the characters currently represented in Pi and the characters that are present in Sp*  The partition which will require the addition of the minimum number of new characters is the optimal partition Pj

18 Incremental Synthesis  Goal: Reduce place and route costs –Using ISE 6.2 Incremental Synthesis support, each partition has independent area constraints on device –Change/addition in one partition does not affect other placements partition –Cost for changing rules in one of k partitions: 1/k + guide file processing overhead

19 System Packet Flow  Packets are reordered and packet contents are sent as stream to string matching units

20 Suffix Resending  If an attack spans multiple packets it will not be detected if the system looks at packets on a one-by-one –Packets must be condensed into a stream –If time multiplexing is required some section of the previous session can be pre-pended to the new packets –Reserved section equal to the length of the longest attack

21 Suffix Resending  The necessity to resend packets causes some inefficiency in a multiple stream system –However, TCP and IP header overhead do not need to be handled by the string matching system, allowing for us to make up the difference Average internet packet is 402 bytes long Longest attack in our survey of Hogwash database is 257 bytes TCP and IP headers equal 40 bytes Thus, if 7 packets are issued to the string matching units at a time, the overheads are equalized and efficiency is 100%

22 Overview  Variations in tool flow provide customizable performance: –Tool Options Small: partitioned and pre-decoded architecture Prefix trees Fast: k-way architecture Potential reduction in hardware reconfiguration time Fast reconfiguration KMP architecture (FPGA ’04) Meta-information can be extracted with correlation Thanks! For more information,

Download ppt "High-throughput Linked-Pattern Matching for Intrusion Detection System Zachary Baker and Viktor K. Prasanna University of Southern California"

Similar presentations

Ads by Google