Shift-based Pattern Matching for Compressed Web Traffic Presented by Victor Zigdon 1* Joint work with: Dr. Anat Bremler-Barr 1* and Yaron Koral 2 The SPC.

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

2017/3/25 Test Case Upgrade from “Test Case-Training Material v1.4.ppt” of Testing basics Authors: NganVK Version: 1.4 Last Update: Dec-2005.
Advanced Piloting Cruise Plot.
Analysis of Computer Algorithms
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Year 6 mental test 5 second questions
So far Binary numbers Logic gates Digital circuits process data using gates – Half and full adder Data storage – Electronic memory – Magnetic memory –
1 Processes and Threads Creation and Termination States Usage Implementations.
Chapter 6 File Systems 6.1 Files 6.2 Directories
Re-examining Instruction Reuse in Pre-execution Approaches By Sonya R. Wolff Prof. Ronald D. Barnes June 5, 2011.
SE-292 High Performance Computing
Parallel List Ranking Advanced Algorithms & Data Structures Lecture Theme 17 Prof. Dr. Th. Ottmann Summer Semester 2006.
David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Chapter 4 Memory Management Basic memory management Swapping
ABC Technology Project
Project 5: Virtual Memory
Multilevel Page Tables
Memory.
Hash Tables.
Practical Session 9, Memory
1 Overview Assignment 4: hints Memory management Assignment 3: solution.
SE-292: High Performance Computing
CS 241 Spring 2007 System Programming 1 Memory Replacement Policies Lecture 32 Klara Nahrstedt.
Page Replacement Algorithms
Online Algorithm Huaping Wang Apr.21
Cache and Virtual Memory Replacement Algorithms
Chapter 3.3 : OS Policies for Virtual Memory
Module 10: Virtual Memory
Chapter 3 Memory Management
Chapter 10: Virtual Memory
Virtual Memory 1 Computer Organization II © McQuain Virtual Memory Use main memory as a cache for secondary (disk) storage – Managed jointly.
Multipattern String Matching On A GPU Author: Xinyan Zha, Sartaj Sahni Publisher: 16th IEEE Symposium on Computers and Communications Presenter: Ye-Zhi.
Review Pseudo Code Basic elements of Pseudo code
Chapter 6 File Systems 6.1 Files 6.2 Directories
Processes Management.
Executional Architecture
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
Addition 1’s to 20.
25 seconds left…...
Introduction to Computability Theory
Week 1.
SE-292 High Performance Computing
We will resume in: 25 Minutes.
February 12, 2007 WALCOM '2007 1/22 DiskTrie: An Efficient Data Structure Using Flash Memory for Mobile Devices N. M. Mosharaf Kabir Chowdhury Md. Mostofa.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
Lesson 13 Editing and Formatting Documents
Installing Windows XP Professional Using Attended Installation Slide 1 of 30Session 8 Ver. 1.0 CompTIA A+ Certification: A Comprehensive Approach for all.
Deep Packet Inspection(DPI) Engineering for Enhanced Performance of Network Elements and Security Systems PIs: Dr. Anat Bremler-Barr (IDC) Dr. David.
Space-Time Tradeoffs in Software-based Deep Packet Inspection Author: Anat Bremler-Barr, Yotam Harchol, and David Hay Published in Proc. IEEE HPSR 2011.
Space-for-Time Tradeoffs
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Anat Bremler-Barr Interdisciplinary Center Herzliya Shimrit Tzur David Interdisciplinary.
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
Text Operations: Coding / Compression Methods. Text Compression Motivation –finding ways to represent the text in fewer bits –reducing costs associated.
1 Accelerating Multi-Patterns Matching on Compressed HTTP Traffic Authors: Anat Bremler-Barr, Yaron Koral Presenter: Chia-Ming,Chang Date: Publisher/Conf.
A Fast Algorithm for Multi-Pattern Searching Sun Wu, Udi Manber May 1994.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
Accelerating Multipattern Matching on Compressed HTTP Traffic Published in : IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 20, NO. 3, JUNE 2012 Authors : Bremler-Barr,
Shift-based Pattern Matching for Compressed Web Traffic Author: Anat Bremler-Barr, Yaron Koral,Victor Zigdon Publisher: IEEE HPSR,2011 Presenter: Kai-Yang,
Survey on Improving Dynamic Web Performance Guide:- Dr. G. ShanmungaSundaram (M.Tech, Ph.D), Assistant Professor, Dept of IT, SMVEC. Aswini. S M.Tech CSE.
Multimedia Data Introduction to Lossless Data Compression Dr Sandra I. Woolley Electronic, Electrical.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
Efficient Processing of Multi-Connection Compressed Web Traffic Yaron Koral 1 with: Yehuda Afek 1, Anat Bremler-Barr 1 * 1 Blavatnik School of Computer.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
A DFA with Extended Character-Set for Fast Deep Packet Inspection
DEFLATE Algorithm Kent.
Presentation transcript:

Shift-based Pattern Matching for Compressed Web Traffic Presented by Victor Zigdon 1* Joint work with: Dr. Anat Bremler-Barr 1* and Yaron Koral 2 The SPC Algorithm 1 Computer Science Dept. Interdisciplinary Center, Herzliya, Israel 2 Blavatnik School of Computer Sciences Tel-Aviv University, Israel Supported by European Research Council (ERC) Starting Grant no

Motivation I: Compressed Web Traffic Compressed web traffic increases in popularity HTTP Response content encoded with gzip

Motivation II: DPI on Compressed Web Traffic Handle multiple concurrent compressed sessions Perform multi-patterns matching at line-speed In Snort account for 70% of total execution time Tight memory constrains (32KB per session) Current security tools: Bypass GZIP

Accelerating Idea Previous work: ACCH [infocom2009] Compression is done by compressing repeated sequences of bytes Store information about the pattern matching results No need to fully perform pattern matching on repeated sequence of bytes that were already scanned for patterns ! Skipped scanning bytes ! Outcome: Decompression + pattern matching < pattern matching The idea was implemented on Aho-Corasick Algorithm, a pattern matching algorithm which scans byte by byte Throughput improvement: ??60% Extra information (extra storage): 25% 4

Our Contribution : SPC algorithm Apply the same accelerating idea on pattern matching algorithm that per se skipped bytes (WM - shift based algorithm) Simpler, straightforward and more efficient algorithm 5 Throughput improvement: ??60% ??80% Extra information (extra storage): 25% 12%

Background: GZIP Compressed HTTP GZIP (or Deflate) are composed of two stages: Stage 1: LZ77 Goal: Reduce text size Technique: Compress repeating strings Stage 2: Huffman Coding Goal: Reduce symbol coding size Technique: Represent frequent symbols by fewer bits 6

Background: LZ77 Compression Compress repeated strings in the GZIP 32KB sliding window Each repetition is represented by a pointer Pointer == {distance, length} ABCDEF123ABCDEF ABCDEF123{9,6} 7

Background: The Boyer-Moore (BM) Algorithm Shift-based single-pattern search Main idea by example: Shifts of size m or close to it occur most of the times, leading to a very fast algorithm 8 otherwisethgirbChar 6 (m)012345Shift Shift Table Prof. J. Strother Moore Prof. Robert Stephen Boyer

Background: The Modified Wu-Manber (MWM) Algorithm Employ BMs shift concept to multi-pattern matching m length of shortest pattern Trim all patterns to their m-bytes prefix Use m-bytes virtual ScanWindow to indicate the current position Determine shift-value using B-bytes blocks of each pattern, rather than one byte as in BM MaxShift = m-B+1 If the B bytes indicates a possible pattern check if there is exact pattern. Auxiliary data structure: PtrnsHash Each entry holds the list of patterns with the same B-bytes prefix We use m-bytes prefix which results in shorter lists ( ) 9 Prof. Udi Manber

Modified Wu-Manber (MWM) Example - Simulated Scan 10 Shift Table (B=2)Patterns (m=5) Otherwise, 4 (MaxShift = 5-2+1=4)

Enter SPC Shift-based Pattern matching for Compressed traffic Recall that LZ77 compress data with pointers to past occurrences of strings Bytes referred by pointers were already scanned If we have a prior knowledge that an area does not contain matches we can skip scanning most of it General method: Perform on-the-fly decompression and scanning Scan uncompressed portions of the data using MWM and skip most of the data represented by LZ77 pointers 11

Maintaining Matches Information partial match a match of the m-bytes scan window with the m-bytes prefix of a pattern exact match full pattern match PartialMatch bit-vector Mark partial matches found in scanned text Maintaining one bit per byte. 12

Handling Pointer Boundaries Matches may occur in the pointer boundaries: A prefix of the referred bytes may be a suffix of a pattern that started previous to the pointer A suffix of the referred bytes may be a prefix of a pattern that continues after the pointer Special care needs to be taken to handle pointer boundaries and maintain MWM characteristics

SPC = MWM + Pointers While scanning text, update the PartialMatch bit-vector As long as scan window is not fully contained within a pointer boundaries, perform regular MWM scan This handles, pointer boundary case When the m-bytes scan window shifts fully into a pointer, check which areas of the pointer can be skipped This is performed by addressing the PartialMatch bit-vector Continue regular MWM scan at m-1 bytes before the end of the pointer This handles, pointer boundary case

Scanning and Skipping Pointers If no partial matches are found in the pointer Safely shift the scan window to m-1 bytes before the pointer end Effectively skipping the internal body of the pointer For each partial match marked in the referred area Mark this position as a partial match in the pointer Check for exact match against this text position 15

SPC Simulated Scan Example 16 Shift Table (B=2)Patterns (m=5) Otherwise, 4 (MaxShift = 5-2+1=4)

The Setup The Platform Intel Core i5 750 processor, with 4 cores The Data-Set 6781 HTTP pages encoded with GZIP (Alexa.org top sites) 335MB in an uncompressed form (or 66MB compressed) 92.1% represented by pointers 16.7bytes average pointer length The Pattern-Set Snort (NIDS), total of patterns 6837 text patterns (results in 11M matches, 3.24% of text) Also in the paper Mod security rules 17

SPC Characteristics Analysis 18 Skip ratio definition = percentage of characters the algorithm skips SPC shift ratio is based on two factors: MWM shift for scans outside pointers Skipping internal pointer byte scans For m = B: MWM does not skip at all SPC shifts are based solely on pointer skipping (ranges from 60% to 70%)

SPC Run-time Performance Multi-core Throughput SPCs throughput on our platform For Snort, Gbit/sec for m=5 and B=4 For ModSecurity, Gbit/sec for m=5 and B=3 Those results were received by running with 4 threads that performs pattern matching on data loaded in advance to the main memory The algorithms were implemented in C# using general purpose libraries Better throughput could be achieved by using optimized software libraries or hardware optimized for networking 19

SPC Run-time Performance Throughput Normalized to ACCH 20 m=6 gains the best performance However, we choose m=5 as a tradeoff between performance and pattern-set coverage SPCs throughput is better than that of ACCH For m = 5, on Snort, we get a throughput improvement of 51.86%, SPC is faster than MWMs for all m and B values For Snort, the throughput improvement is 73.23%

SPC Storage Requirements Our MWM and SPC requires only 1.88 bytes per char High probability to reside within the cache Original MWM requires 1.4KB per char 21

Conclusion HTTP compression gains popularity High processing requirements ignored by FWs SPC accelerates the entire pattern matching process Taking advantage of the information within the compressed traffic Compared to ACCH SPC Gains a performance boost of over 51% SPC use half the space (4KB) of the additional information needed per connection SPC is simpler, straightforward and more efficient Encourage vendors to support inspection of compressed traffic 22

23 Questions?