Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Learning Rules from System Call Arguments and Sequences for Anomaly Detection Gaurav Tandon and Philip Chan Department of Computer Sciences Florida Institute.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Lecture 4 (week 2) Source Coding and Compression
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Network Traffic Anomaly Detection Based on Packet Bytes Matthew V. Mahoney Florida Institute of Technology
Polymorphic blending attacks Prahlad Fogla et al USENIX 2006 Presented By Himanshu Pagey.
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
A Statistical Model for Domain- Independent Text Segmentation Masao Utiyama and Hitoshi Isahura Presentation by Matthew Waymost.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Huffman Coding: An Application of Binary Trees and Priority Queues
Ensemble Learning: An Introduction
1 Information-Theoretic Mass Spectral Library Search Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and.
2-1 Sample Spaces and Events Conducting an experiment, in day-to-day repetitions of the measurement the results can differ slightly because of small.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
A Signal Analysis of Network Traffic Anomalies Paul Barford, Jeffrey Kline, David Plonka, and Amos Ron.
EEE377 Lecture Notes1 EEE436 DIGITAL COMMUNICATION Coding En. Mohd Nazri Mahmud MPhil (Cambridge, UK) BEng (Essex, UK) Room 2.14.
Source Coding Hafiz Malik Dept. of Electrical & Computer Engineering The University of Michigan-Dearborn
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka, Amos Ron University of Wisconsin – Madison Summer, 2002.
CSE Lectures 22 – Huffman codes
Time Series Data Analysis - II
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
STATISTIC & INFORMATION THEORY (CSNB134)
Distributed Network Intrusion Detection An Immunological Approach Steven Hofmeyr Stephanie Forrest Patrik D’haeseleer Dept. of Computer Science University.
1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.
Chirag N. Modi and Prof. Dhiren R. Patel NIT Surat, India Ph. D Colloquium, CSI-2011 Signature Apriori based Network.
Where Are the Nuggets in System Audit Data? Wenke Lee College of Computing Georgia Institute of Technology.
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
Managing Software Projects Analysis and Evaluation of Data - Reliable, Accurate, and Valid Data - Distribution of Data - Centrality and Dispersion - Data.
Statistics Primer ORC Staff: Xin Xin (Cindy) Ryan Glaman Brett Kellerstedt 1.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Prof. Amr Goneid Department of Computer Science & Engineering
Two Categories of Responders  Type 1 - Combinations of A and B treated as a fourth category (strategy evident in complete rejection of proposed categories.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
DoWitcher: Effective Worm Detection and Containment in the Internet Core S. Ranjan et. al in INFOCOM 2007 Presented by: Sailesh Kumar.
Who Is Peeping at Your Passwords at Starbucks? To Catch an Evil Twin Access Point DSN 2010 Yimin Song, Texas A&M University Chao Yang, Texas A&M University.
Learning Rules for Anomaly Detection of Hostile Network Traffic Matthew V. Mahoney and Philip K. Chan Florida Institute of Technology.
Prepared by: Amit Degada Teaching Assistant, ECED, NIT Surat
Copyright © 2003 OPNET Technologies, Inc. Confidential, not for distribution to third parties. Session 1341: Case Studies of Security Studies of Intrusion.
Spatial Interpolation III
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
1 Clarifying Sensor Anomalies using Social Network feeds * University of Illinois at Urbana Champaign + U.S. Army Research Lab ++ IBM Research, USA Prasanna.
Information Retrieval Techniques MS(CS) Lecture 7 AIR UNIVERSITY MULTAN CAMPUS Most of the slides adapted from IIR book.
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Quality Control  Statistical Process Control (SPC)
1Computer Sciences Department. 2 Advanced Design and Analysis Techniques TUTORIAL 7.
Machine Learning for Network Anomaly Detection Matt Mahoney.
Active Learning for Network Intrusion Detection ACM CCS 2009 Nico Görnitz, Technische Universität Berlin Marius Kloft, Technische Universität Berlin Konrad.
Il-Ahn Cheong Linux Security Research Center Chonnam National University, Korea.
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
DOWeR Detecting Outliers in Web Service Requests Master’s Presentation of Christian Blass.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
HUFFMAN CODES.
Online Conditional Outlier Detection in Nonstationary Time Series
Machine Learning: Lecture 3
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Psych 231: Research Methods in Psychology
Chapter 7: Transformations
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Presentation transcript:

Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida Institute of Technology

Motivation Existing anomaly detection techniques rely on information derived only from the packet headers More sophisticated attacks involve the application payload Example : Code Red II worm  GET /default.ida?NNNNNNNNN… Parsing the payload is required! Problems in hand-coded parsing:  Large number of application protocols  Frequent introduction of new protocols

Problem Statement To parse application payload into tokens without explicit knowledge of the application protocols These tokens are later used as features for anomaly detection

Related work Pattern Detection - Important Tokens  Fixed Length: Forrest et al. (1998)  Variable Length: Wespi et al. (2000) Jiang et al.(2002) Boundary Detection – All Tokens  VOTING EXPERTS by Cohen et al. (2002) Boundary Entropy Frequency Binary Votes

Approach Boundary Finding Algorithms:  Boundary Entropy  Frequency  Augmented Expected Mutual Information  Minimum Description Length Approach is domain independent (no prior domain knowledge)

Combining Boundary Finding Algorithms Combination of all or a subset (E.g. Frequency + Minimum Description Length) of techniques Each algorithm can cast multiple votes, depending on confidence measure

Boundary Entropy (Cohen et al) Entropy at the end of each possible window is calculated Itisarainyday X ‘x’ is the byte following the current window High Entropy means more variation w

Voting using Boundary Entropy change graph to discrete bars Entropy in meaningful tokens starts with a high value, drops, and peaks at the end Vote for positions with the peak entropy Threshold suppresses votes for low entropy values Threshold = Average BE Itisarainyday

Frequency (Cohen et al) Most frequent set of tokens are assumed to be meaningful tokens Frequencies of tokens with length =1, 2, 3…., 6 Shorter tokens are inherently more frequent than longer tokens Normalize frequencies for tokens of the same length using standard deviation Boundaries are assigned at the end of most frequent token in the window arainyday Itis Frequency in window: (1)”I” = 3(2)”It” = 5 (3) “Iti” = 2 (4)”It is” = 3

Mutual Information (MI) Mutual Information given by: Gives us the reduction of uncertainty in presence of event ‘b’ given event ‘a’ MI does not incorporate the counter evidence when ‘a’ occurs without ‘b’ and vice versa

Augmented Expected Mutual Information (AEMI) AEMI sums the supporting evidence and subtracts the counter evidence For each window, the location with the minimum AEMI value suggests a boundary Itisarainyday ab

Minimum Description Length (MDL) Shorter code assigned to frequent tokens to minimize the overall coding length Boundary yielding shortest coding length is assigned votes Coding Length per byte:  Lg P(t i ): no of bits to encode t i  |t i |=length of t i Itisarainyday t left t right

Normalize scores of each algorithm Each algorithm produces list of scores Since the number of votes is proportional to the score, the scores must be normalized Each score is replaced by the number of standard deviations that the score is away from the mean value

Normalize votes of each algorithm Algorithms produce list of votes depending on the scores Make sure each algorithm votes with the same weight. Number of votes is replaced by the number of standard deviations from the mean value

I t I s s1s2s3s4 ns1ns2ns3ns4 v2v3v4v1 I t I s s1s2s3s4 ns1ns2ns3ns4 nv1 v2v3v4v1 nv1 Scores Normalized scores Votes nv1 Normalizing Scores and Votes Combined Normalized Votes

Combined Approach with Weighted Voting A list of votes from all the experts is gathered For each boundary, the final votes are summed A boundary is placed at a position if the votes at the position exceed threshold. Threshold = Average number of Votes

Evaluation Criteria Evaluation A: % of space separated words retrieved Evaluation B: % of keywords in the protocol specification that were retrieved Evaluation C: entropy of the tokens in output file (lower the better) Evaluation D: number of detected attacks in network traffic A and B only for text based protocols

Anomaly Detection Algorithm – LERAD (Mahoney and Chan) LERAD forms rules based on 23 attributes  First 15 attributes: from packet header  Next 8 attributes: from the payload  Example Rule: If port = 80 then word1 = “GET” Original Payload attributes: space separated tokens Our Payload attributes: Boundary separated tokens

Experimental Data 1999 DARPA Intrusion Detection Evaluation Data Set Week 3 :attack free (training) data Weeks 4, 5: attack containing (test) data Evaluations A, B, C (Known boundaries) : Week 3  trained: days  tested: days 5 – 7  Prevent gaining knowledge from Weeks 4 and 5 Evaluation D (Detected attacks)  Trained: Week 3  Tested :Weeks 4 and 5

Evaluation A: % of Space-Separated Tokens Recovered MethodPort# 25 Port# 80 Port# 21 Port# 79 Avg Freq+MDL Frequency BE + AEMI + MDL+ Freq AEMI MDL BE

Evaluation B: % of Keywords in RFCs Recovered MethodPort#25Port#80Port#21Avg Freq+MDL Frequency BE+AEMI+ MDL+Freq AEMI MDL BE3222.3

Evaluation C: Entropy of Output (Lower is Better) average across 6 ports MethodAverage Value Frequency5.0 MDL5.03 Freq+MDL5.06 BE5.25 BE + AEMI + Freq + MDL5.56 AEMI6.38

Ranking of Algorithms MethodEvaluation AEvaluation BEvaluation C Freq+MDL113 Frequency221 BE+AEMI+ MDL+ Freq 335 AEMI446 MDL552 BE664

Detection Rate for Space Separated Vs Boundary Separated (Freq + MDL) Port #10 FP/day Space Boundary 100 FP/day Space Boundary Overall % Improvement--5 8

Summary of Contributions Used payload information, while most IDS concentrate on header information. Proposed AEMI + MDL for boundary detection Combined all and subset of algorithms Used weighted voting to indicate confidence Proposed techniques find boundaries better than spaces Achieved higher detection rates in an anomaly detection system

Future Work Further evaluation on other ports Pick more useful tokens instead of first 8 DARPA data set is partially synthetic, further evaluation on real traffic Evaluation with other Anomaly detection algorithms

Thank you

Experimental Results Table Results from Additional Ports for Freq + MDL and ALL MethodEvaluation A % Words Found Evaluation B % Keywords Found Evaluation Entropy Frq+ MDL ALLFrq+ MDL ALLFrq+ MDL ALL