Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems 1 PIs: Dr. Anat Bremler-Barr (IDC) Dr. David.

Similar presentations

Presentation on theme: "Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems 1 PIs: Dr. Anat Bremler-Barr (IDC) Dr. David."— Presentation transcript:

1 Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems 1 PIs: Dr. Anat Bremler-Barr (IDC) Dr. David Hay (HUJI)

2 Deepness Lab was founded in November 2010 Our mission: Deep Packet Inspection (DPI) for Next Generation Network devices Funding: 5 years ERC Starting Grant (1M Euro) 3 years Kabarnit, a Magnet program ($70K/year) A gift from Cisco ($75K) Main Industry Collaborations: Commtouch, Radware, Verint 2

3 People 3 Faculty: Anat Bremler-Barr (IDC Herzliya), David Hay(The Hebrew University of Jerusalem) Postdoc : Shimrit Tzur-David, Yaron Koral Ph.D. Students Liron Schiff (Tel Aviv University), Yotam Harchol (The Hebrew University of Jerusalem) Collaborators: Yehuda Afek (Tel Aviv University), Isaac Keslassy (Technion),Shir Landau-Feibish (Tel Aviv University) Past Students Victor Zigdon, M.Sc. (IDC Herzliya),Adam Mor, M.Sc. (IDC Herzliya)

4 People Dr. Anat Bremler-Barr - Ph.D. with distinction, Tel- Aviv University, Israel (2001). Founder and chief scientist of Riverhead Networks (focused on distributed denial of service solution, and was acquired by Cisco). Senior lecturer (assistant professor) with tenure at IDC. Dr. David Hay - Ph.D. from the Technion (2007). Post-doc at Columbia University, NY, USA and Politecnico di Torino. Previously, also at IBM Research and Cisco San Jose. Senior lecturer (assistant professor) at the Hebrew U.

5 Deep Packet Inspection (DPI) DPI - Identifying signatures (patterns or regular expressions) in the packets’ payload DPI is the main action taken to inspect traffic and therefore it is a critical component in next generation networks: security, content filtering, traffic monitoring, load balancing, lawful interception, targeted advertising, data leakage prevention, application-aware routing …. High-speed DPI is challenging and quickly becomes the bottleneck of the entire packet inspection process. 5

6 Impact 66% of network network equipment vendors define DPI as “a must have” technology today [Heavy Reading Survey, 2011] DPI market on 2011 estimated at $550 million, growth of 20%/year [Qosmos report, Heavy Reading, Dec. 2012] 6

7 Major Challenges Scalability: – Rate - greater than 10 or even 100 Gbps – Memory - handling thousands of signatures – Power - educing the high power consumption Compressed traffic Security of the NIDS itself: – Current solutions are vulnerable to Denial of Service attack DPI in Software Defined Networks Signatures Extraction 7

8 Classical Algorithms 8

9 Aho-Corasick Algorithm Build a Deterministic Finite Automaton Traverse the DFA, byte by byte Accepting state  pattern found Example: {E, BE, BD, BCD, CDBCAB, BCAA} 9 s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9s9 s 10 s 11 C C E D B E D D B C A B A A B E CB E C B E C D E B C D E C E B C E B C E B C E C B B B BCDBCAB s0s0 s 12 s2s2 s5s5 s6s6 s9s9 s 10 s 11

10 Aho-Corasick Algorithm Naïve implementation: Represent the transition function in a table of |Σ|×|S| entries – Σ: alphabet – S: set of states Lookup time: one memory access per input symbol Space: In reality: 70MB to gigabytes… Snort has 77K states, ClamAV over 1M 10 ABCDE S0S0 02701 S1S1 02701 S2S2 02543 S3S3 02701 S4S4 02701 S5S5 132761 S6S6 09701 S7S7 02781 S8S8 09701 :

11 Alternative Implementation Failure transition goes to the state that matches the longest suffix of the input so far Lookup time: at most two memory accesses per input symbol (via amortized analysis) Space: at most, # of symbols in pattern set, depends on implementation 11 B E CB E C B E C D E B C D E C E B C E B C E B C E B C B B s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9s9 s 10 s 11 C C E D B E D D B C A B A A Forward Transition Failure Transition s 10 s5s5 s7s7 s0s0 s1s1

12 s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A Other Alternative: Compress the State Representation 12 symbolABCDE forward:136 Lookup Table 7 failure: False match: ABCDE 10010 Bitmap Encoded Bitmap: Length=|Σ| forward:136 7 failure: False match: symbolAD forward:136 Linear Encoded 7 failure: False match: 2 size: Can count bits using popcnt instruction

13 The Boyer-Moore (BM) Algorithm Shift-based single-pattern search Main idea by example: Shifts of size m or close to it occur most of the times, leading to a very fast algorithm 13 otherwisethgirbChar 6 (m)012345Shift Shift Table

14 Compressed Traffic 14

15 Compressed HTTP 84.1% of the top 1,000 sites compress their traffic. Data compression is done by adding references to repeated data. There are two types of compression: – Intra-response compression – the references point to bytes within the response (Gzip/Deflate) – Inter-responses/connections compression – the references point to bytes in a separate file, called dictionary (Google’s SDCH). 15 19% increase in 8 month!

16 Challenges Current security tools do not deal with compressed traffic due to the great challenges in time and space 16

17 Compressed Traffic : Space Challenge Thousands of concurrent sessions Compressed, Mem: 32KB/session Uncompressed Traffic SpaceTime 80%40% Contribution: Improve

18 General belief: Our algorithms show how to accelerate the pattern matching using the compression information Compressed Traffic : Time Challenge 18 Decompression + pattern matching >> pattern matching Decompression + pattern matching < pattern matching

19 High-Level Idea Compression is done by compressing repeated sequences of bytes Store information about the pattern matching results  No need to fully perform again pattern matching on repeated sequences which were already scanned  x2-3 time reduction The buffers needed for decompression are not used most of the time, and therefore can be kept in compressed form most of time  x5 space reduction 19

20 General Idea: Keep “compressed” buffer Keep buffers in a “compressed” form Uncompress “active session” only Compressed active session buffer New Packet

21 Results Reduction of space by factor of 5! Speedup by factor of 2 or 3 (in GZIP and SDCH) 21

22 Experimental Results: DPI +Packing Unzip entire session. Avg. Size = 170KB SOP 1.39, 5.17KB ACCH 0.36, 37.4KB SOP+ACCH 0.64, 6.19KB Naïve 1.1, 29KB

23 The Other Side of the Coin: Acceleration by Identifying repetitions in uncompressed Traffic There are repetitions in uncompressed HTTP traffic – Entire files (e.g., images) – Parts of the files (e.g., HTML tags, javascripts)  We keep scanning again and again the same thing (and get the same scanning results..) 1.Identify frequently repeated data Stored in a dictionary 2.Perform DPI on the data once and remember the results DPI by pattern matching Aho-Corasick algorithm. Result is the state. 3.When encountering a repetition, recover the state without re- scanning Delicate points need to be taken care of, so we won’t miss any pattern 23

24 Securing the NIDS Itself 24

25 Complexity DoS Attack Over NIDS Easy to craft – very hard to process packets 2 Steps attack: Attacker Internet 2. Sneak into the network 1. Kill IPS/FW

26 Attack on Security Elements Combined Attack: DDoS on Security Element exposed the network – theft of customers’ information

27 Attack on Snort The most widely deployed IDS/IPS worldwide. Heavy packets rate

28 OUR GOAL: A multi-core system architecture, which is robust against complexity DDoS attacks

29 System Throughput Over Time Reaction time can be smaller

30 System Architecture Processor Chip Core #8 NIC Core #1 Q Core #2 Q Q Q Q Detects heavy packets Core #9 Core #10 Routine Mode: Load balance between cores

31 System Architecture Processor Chip Core #8 Dedicated Core #9 NIC Core #1 Q Core #2 Q Q Q B Dedicated Core #10 B Q Detects heavy packets Alert Mode: Dedicated cores for heavy packets Others detect and move heavy to Dedicated. BB

32 Cloud solution The different cores are different (virtual) machines. Load balancing sends heavy packets to machines that run a special more efficient processing method. In SDN, this can be done even faster and easier. 32

33 DPI using TCAMs 33

34 01234567890123456789 SRAM Search Key 0011101010********************* 34 TCAM – Ternary Content- Addressable Memory Encoder Match lines 0 1 2 3 4 6 5 7 8 9 deny accept deny log accept 1110*********0101001010101010** 1110101010100101001************ ******************************* *************************001110 0011101010101****************** 1111111111111111111111111111*** 0011101010101001110001110001110 0 0 0 1 0 1 0 1 0 1 3 De-facto solution of packet classification. Core component of SDN switch Action TCAM 1110101010100101001*********011 1110101010100101001*******11111 1110101010100101001********1111

35 Some Challenges In Using TCAM Reducing the number of entries  power consumption reduction Dealing with ranges (how to encode the range [1-6]?) How to correct errors? – More about it in the next slide How to use it for non-traditional tasks – Traditionally, TCAM is used for IP lookup and header classification (e.g., using 5-tuples) 35

36 Example: Error Correction in TCAM In SRAM (or any regular memory) – Input: address (entry number) – Output: content of that address – One can apply an error detection/correcting code on that content In TCAM – Even if the content seems OK, we still have false miss or indirect false miss errors, TCAM EDC/ECC are harder

37 PEDS: Parallel Error Detection Scheme for TCAM Devices Detecting all errors using the built-in parallel lookup of the TCAM The number of lookups is a function of the width of the TCAM word, and not the number of entries in the database. – Which is 3 orders of magnitude larger Developed, patented in DEEPNESS lab

38 CompactDFA for DPI Using TCAM to represent a huge DFA in a compact manner. Reducing the problem of pattern matching to IP lookup (much easier problem) Each byte scan  one TCAM lookup – Can be reduced using variable stride traversal – Further performance boost with parallelism and pipelining 38

39 Next State SymCurrent 0000 (s 0 )A 1 0110(s 6 )B 0000(s 0 )2 1100 (s 12 )C0000 (s 0 )3 D 4 0001(s 1 )E0000 (s 0 )5 F 6 A 0001(s 1 )7 0010 (s 2 )B 0001(s 1 )8 0000 (s 0 )C 0001(s 1 )9 0000 (s 0 )D 0001(s 1 )10 0000 (s 0 )E 0001(s 1 )11 0000 (s 0 )F 0001(s 1 )12 0000 (s 0 )A 0010(s 2 )13 0100 (s 4 )B 0010(s 2 )14 0011 (s 3 )C 0010(s 2 )15 0000 (s 0 )D 0010(s 2 )16 0000 (s 0 )F1101 (s 13 )84 DFA  CompactDFA Snort: 73MB  0.6MB ClamAV: 1.5GB  26MB SRAMTCAM Longest Prefix Match

40 Signature Extraction 40

41 Zombies on innocent computers Current DDoS Attack Armies of zombies  Many sources Hard to identify behaviorally No known signatures 41 Server-level DDoS attacks Infrastructure-level DDoS attacks Bandwidth-level DDoS attacks

42 Automated Extraction of Signatures for Zero-day Internet Attacks Input: sample of attack traffic (high volume attack) sample of normal traffic Output: Automatically find signatures that appear frequently only during attack Where: – Input collection: In mitigation apparatus (DDoS Guard/firewall/anti-DDoS etc.) In the cloud – collect data from several collectors. – DDoS – power computation saving – Signatures used by anti-DDoS devices and firewalls to stop attack Mitigation in minutes, good enough for these types of attacks 42

Download ppt "Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems 1 PIs: Dr. Anat Bremler-Barr (IDC) Dr. David."

Similar presentations

Ads by Google