Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David.

Name: Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David.
Uploaded: 2017-08-24T12:05:43+00:00
Duration: PTM24S18
Channel: Octavio Rodney
Description: Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David.

Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David Hay (HUJI)

Deepness Lab was founded in November 2010
Our mission: Deep Packet Inspection (DPI) for Next Generation Network devices Funding: 5 years ERC Starting Grant (1M Euro) 3 years Kabarnit, a Magnet program ($70K/year) A gift from Cisco ($75K) Main Industry Collaborations: Commtouch, Radware, Verint

People Faculty: Anat Bremler-Barr (IDC Herzliya), David Hay(The Hebrew University of Jerusalem) Postdoc : Shimrit Tzur-David, Yaron Koral Ph.D. Students Liron Schiff (Tel Aviv University), Yotam Harchol (The Hebrew University of Jerusalem) Collaborators: Yehuda Afek (Tel Aviv University), Isaac Keslassy (Technion),Shir Landau-Feibish (Tel Aviv University) Past Students Victor Zigdon, M.Sc. (IDC Herzliya),Adam Mor, M.Sc. (IDC Herzliya)

People Dr. Anat Bremler-Barr - Ph.D. with distinction, Tel-Aviv University, Israel (2001). Founder and chief scientist of Riverhead Networks (focused on distributed denial of service solution, and was acquired by Cisco). Senior lecturer (assistant professor) with tenure at IDC. Dr. David Hay - Ph.D. from the Technion (2007). Post-doc at Columbia University, NY, USA and Politecnico di Torino. Previously, also at IBM Research and Cisco San Jose. Senior lecturer (assistant professor) at the Hebrew U.

Deep Packet Inspection (DPI)
DPI - Identifying signatures (patterns or regular expressions) in the packets’ payload DPI is the main action taken to inspect traffic and therefore it is a critical component in next generation networks: security, content filtering, traffic monitoring, load balancing, lawful interception, targeted advertising, data leakage prevention, application-aware routing …. High-speed DPI is challenging and quickly becomes the bottleneck of the entire packet inspection process. resulting in security holes and/or limited/ineffective functionalities.

Impact 66% of network network equipment vendors define DPI as “a must have” technology today [Heavy Reading Survey, 2011] DPI market on 2011 estimated at $550 million, growth of 20%/year [Qosmos report, Heavy Reading, Dec. 2012]

Major Challenges Scalability: Compressed traffic
Rate - greater than 10 or even 100 Gbps Memory - handling thousands of signatures Power - educing the high power consumption Compressed traffic Security of the NIDS itself: Current solutions are vulnerable to Denial of Service attack DPI in Software Defined Networks Signatures Extraction SDN – needs to determine flows, DPI can help, will play a significant role…

Classical Algorithms

Aho-Corasick Algorithm
B E C D C E D B A Build a Deterministic Finite Automaton Traverse the DFA, byte by byte Accepting state  pattern found Example: {E, BE, BD, BCD, CDBCAB, BCAA} s2 s5 s6 s9 B s10 The standard algorithm for exact string matching in DPI is Aho-Corasick Its idea is to build a Deterministic Finite Automaton that when traversing it we recognize patterns Each state corresponds to the longest prefix, that is the suffix of the current input To build the automaton one builds a trie over the alphabet and it should contain all possible transitions Usually, the automaton remains in the higher levels, under real-life traffic (about 10% of the states, 85% of the time) s11 s12 BCDBCAB

Aho-Corasick Algorithm
B C D E S0 2 7 1 S1 S2 5 4 3 S3 S4 S5 13 6 S6 9 S7 8 S8 : Naïve implementation: Represent the transition function in a table of |Σ|×|S| entries Σ: alphabet S: set of states Lookup time: one memory access per input symbol Space: In reality: 70MB to gigabytes… Snort has 77K states, ClamAV over 1M One way to represent this automaton is a table with (alphabet size) multiplied by (number of states) entries It looks like the lookup time of this method is constant and so it will give a constant throughput

Alternative Implementation
s0 s7 s12 s1 s2 s3 s5 s4 s14 s13 s6 s8 s9 s10 s11 s0 Forward Transition Failure Transition B E C D C E D B A s1 s7 Failure transition goes to the state that matches the longest suffix of the input so far Lookup time: at most two memory accesses per input symbol (via amortized analysis) Space: at most, # of symbols in pattern set, depends on implementation s5 s10 Let's take away the extra transitions that make the automaton so big We'll add, instead, different transitions, call them "failure transitions". They point where we should go if we did not find a matching forward transition. Without consuming an input symbol. (Failure transition points to the state with the longest common suffix of current state's label) Why two memory accesses? Because failure transitions always go up the tree, so we will at most go up what we went down before.

Other Alternative: Compress the State Representation
symbol A B C D E forward: 13 6 symbol A D forward: 13 6 failure: 7 match: False failure: 7 match: False size: 2 Lookup Table s0 s7 s12 s1 s2 s3 s5 s4 C E D B s14 s13 s6 s8 s9 s10 A s11 s0 s7 s12 s1 s2 s3 s5 s4 C E D B s14 s13 s6 s8 s9 s10 A s11 Linear Encoded A B C D E 1 Bitmap: Can count bits using popcnt instruction Length=|Σ| How do we represent a state in this method? We want: smaller representation, to fit in cache, AND fast lookup These are the known methods. In bitmap – say "popcnt" Now – how can we make the automaton even smaller? forward: 13 6 failure: 7 match: False Bitmap Encoded

The Boyer-Moore (BM) Algorithm
Shift-based single-pattern search Main idea by example: Shifts of size m or close to it occur most of the times, leading to a very fast algorithm Shift Table otherwise t h g i r b Char 6 (m) 1 2 3 4 5 Shift

Compressed Traffic

Compressed HTTP 84.1% of the top 1,000 sites compress their traffic.
Data compression is done by adding references to repeated data. There are two types of compression: Intra-response compression – the references point to bytes within the response (Gzip/Deflate) Inter-responses/connections compression – the references point to bytes in a separate file, called dictionary (Google’s SDCH). 19% increase in 8 month! There is a paper the handles the intra-response infocome 2009 ref We exploit this repetitions to facilitate the dpi process

Challenges Current security tools do not deal with compressed traffic due to the great challenges in time and space

Compressed Traffic : Space Challenge
Thousands of concurrent sessions Compressed, Mem: 32KB/session Uncompressed Traffic DPI unzip Space Time 80% 40% Contribution: Improve

Compressed Traffic : Time Challenge
General belief: Our algorithms show how to accelerate the pattern matching using the compression information Decompression + pattern matching >> pattern matching Decompression + pattern matching < pattern matching 18

High-Level Idea Compression is done by compressing repeated sequences of bytes Store information about the pattern matching results  No need to fully perform again pattern matching on repeated sequences which were already scanned  x2-3 time reduction The buffers needed for decompression are not used most of the time, and therefore can be kept in compressed form most of time  x5 space reduction 19

General Idea: Keep “compressed” buffer
New Packet active session buffer Compressed Keep buffers in a “compressed” form Uncompress “active session” only unzip

Results Reduction of space by factor of 5!
Speedup by factor of 2 or 3 (in GZIP and SDCH)

Experimental Results: DPI +Packing
Unzip entire session. Avg. Size = 170KB SOP 1.39, 5.17KB SOP+ACCH 0.64, 6.19KB Naïve 1.1, 29KB ACCH 0.36, 37.4KB

The Other Side of the Coin: Acceleration by Identifying repetitions in uncompressed Traffic
There are repetitions in uncompressed HTTP traffic Entire files (e.g., images) Parts of the files (e.g., HTML tags, javascripts) We keep scanning again and again the same thing (and get the same scanning results..) Identify frequently repeated data Stored in a dictionary Perform DPI on the data once and remember the results DPI by pattern matching Aho-Corasick algorithm. Result is the state. When encountering a repetition, recover the state without re-scanning Delicate points need to be taken care of, so we won’t miss any pattern

Securing the NIDS Itself

Complexity DoS Attack Over NIDS
Easy to craft – very hard to process packets 2 Steps attack: Attacker 1. Kill IPS/FW Internet 2. Sneak into the network

Attack on Security Elements
Combined Attack: DDoS on Security Element exposed the network – theft of customers’ information

Attack on Snort The most widely deployed IDS/IPS worldwide.
Heavy packets rate

OUR GOAL: A multi-core system architecture, which is robust against complexity DDoS attacks

System Throughput Over Time
Reaction time can be smaller

System Architecture Routine Mode: Load balance between cores
Detects heavy packets NIC Core #1 Q Core #2 Q Processor Chip Routine Mode: Load balance between cores Core #8 Q Core #9 Q Core #10 Q

System Architecture Alert Mode: Dedicated cores for heavy packets
Detects heavy packets NIC Core #1 Q Core #2 Q Processor Chip Alert Mode: Dedicated cores for heavy packets Others detect and move heavy to Dedicated. Core #8 Q Dedicated Core #9 B B Q B Dedicated Core #10 Q B

Cloud solution The different cores are different (virtual) machines.
Load balancing sends heavy packets to machines that run a special more efficient processing method. In SDN, this can be done even faster and easier.

DPI using TCAMs

TCAM – Ternary Content- Addressable Memory
Action 1 1 2 3 4 5 6 7 8 9 deny 1 2 3 4 6 5 7 8 9 ********1111 deny *******11111 *********011 accept deny ********************* 3 accept 1110********* ** Encoder deny ************ deny *************************001110 deny ****************** De-facto solution of packet classification. Core component of SDN switch log *** accept ******************************* Match lines TCAM SRAM Search Key 34 34

Some Challenges In Using TCAM
Reducing the number of entries  power consumption reduction Dealing with ranges (how to encode the range [1-6]?) How to correct errors? More about it in the next slide How to use it for non-traditional tasks Traditionally, TCAM is used for IP lookup and header classification (e.g., using 5-tuples)

Example: Error Correction in TCAM
In SRAM (or any regular memory) Input: address (entry number) Output: content of that address One can apply an error detection/correcting code on that content In TCAM Even if the content seems OK, we still have false miss or indirect false miss errors, TCAM EDC/ECC are harder

PEDS: Parallel Error Detection Scheme for TCAM Devices
Detecting all errors using the built-in parallel lookup of the TCAM The number of lookups is a function of the width of the TCAM word, and not the number of entries in the database. Which is 3 orders of magnitude larger Developed, patented in DEEPNESS lab

CompactDFA for DPI Using TCAM to represent a huge DFA in a compact manner. Reducing the problem of pattern matching to IP lookup (much easier problem) Each byte scan  one TCAM lookup Can be reduced using variable stride traversal Further performance boost with parallelism and pipelining 38

DFA  CompactDFA Longest Prefix Match Snort: 73MB  0.6MB
TCAM SRAM Next State Sym Current 0000 (s0) A 1 0110(s6) B 0000(s0) 2 1100 (s12) C 3 D 4 0001(s1) E 5 F 6 7 0010 (s2) 8 9 10 11 12 0010(s2) 13 0100 (s4) 14 0011 (s3) 15 16 0000 (s0) 1101 (s13) 84 Longest Prefix Match DFA  CompactDFA Snort: MB  0.6MB ClamAV: 1.5GB  26MB

Signature Extraction

Current DDoS Attack Armies of zombies  Many sources
Hard to identify behaviorally No known signatures Zombies on innocent computers Infrastructure-level DDoS attacks Server-level DDoS attacks Bandwidth-level DDoS attacks

Automated Extraction of Signatures for Zero-day Internet Attacks
Input: sample of attack traffic (high volume attack) sample of normal traffic Output: Automatically find signatures that appear frequently only during attack Where: Input collection: In mitigation apparatus (DDoS Guard/firewall/anti-DDoS etc.) In the cloud – collect data from several collectors. DDoS – power computation saving Signatures used by anti-DDoS devices and firewalls to stop attack Mitigation in minutes, good enough for these types of attacks

Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David.

Similar presentations

Presentation on theme: "Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David.

Similar presentations

Presentation on theme: "Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David."— Presentation transcript:

Similar presentations

About project

Feedback