Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

Slides:



Advertisements
Similar presentations
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Advertisements

Is Random Model Better? -On its accuracy and efficiency-
Loss-Sensitive Decision Rules for Intrusion Detection and Response Linda Zhao Statistics Department University of Pennsylvania Joint work with I. Lee,
Supporting Cooperative Caching in Disruption Tolerant Networks
URCA: Pulling out Anomalies by their Root Causes Fernando Silveira and Christophe Diot.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Sensor-Based Abnormal Human-Activity Detection Authors: Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan Presenter: Raghu Rangan.
Computer Science Dr. Peng NingCSC 774 Adv. Net. Security1 CSC 774 Advanced Network Security Topic 7.3 Secure and Resilient Location Discovery in Wireless.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Detecting DDoS Attacks on ISP Networks Ashwin Bharambe Carnegie Mellon University Joint work with: Aditya Akella, Mike Reiter and Srinivasan Seshan.
Detectability of Traffic Anomalies in Two Adjacent Networks Augustin Soule, Haakon Ringberg, Fernando Silveira, Jennifer Rexford, Christophe Diot.
FLAME: A Flow-level Anomaly Modeling Engine
Nick Duffield, Patrick Haffner, Balachander Krishnamurthy, Haakon Ringberg Rule-Based Anomaly Detection on IP Flows.
PERSISTENT DROPPING: An Efficient Control of Traffic Aggregates Hani JamjoomKang G. Shin Electrical Engineering & Computer Science UNIVERSITY OF MICHIGAN,
Forwarding Redundancy in Opportunistic Mobile Networks: Investigation and Elimination Wei Gao 1, Qinghua Li 2 and Guohong Cao 3 1 The University of Tennessee,
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Statistical based IDS background introduction. Statistical IDS background Why do we do this project Attack introduction IDS architecture Data description.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.
Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.
ANOMALY DETECTION AND CHARACTERIZATION: LEARNING AND EXPERIANCE YAN CHEN – MATT MODAFF – AARON BEACH.
Report on statistical Intrusion Detection systems By Ganesh Godavari.
Part I: Classification and Bayesian Learning
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.
Radial Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.
1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.
Active Learning for Networked Data Based on Non-progressive Diffusion Model Zhilin Yang, Jie Tang, Bin Xu, Chunxiao Xing Dept. of Computer Science and.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Where Are the Nuggets in System Audit Data? Wenke Lee College of Computing Georgia Institute of Technology.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.
Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1.
DoWitcher: Effective Worm Detection and Containment in the Internet Core S. Ranjan et. al in INFOCOM 2007 Presented by: Sailesh Kumar.
Detection Unknown Worms Using Randomness Check Computer and Communication Security Lab. Dept. of Computer Science and Engineering KOREA University Hyundo.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Copyright © 2003 OPNET Technologies, Inc. Confidential, not for distribution to third parties. Session 1341: Case Studies of Security Studies of Intrusion.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
A Trust Based Distributed Kalman Filtering Approach for Mode Estimation in Power Systems Tao Jiang, Ion Matei and John S. Baras Institute for Systems Research.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
ASTUTE: Detecting a Different Class of Traffic Anomalies Fernando Silveira 1,2, Christophe Diot 1, Nina Taft 3, Ramesh Govindan 4 1 Technicolor 2 UPMC.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Anomaly Detection Using Autonomous System Flow Aggregates Thienne Johnson 1,2 and Loukas Lazos 1 1 Department of Electrical and Computer Engineering.
Experience Report: System Log Analysis for Anomaly Detection
Fall 2004 Backpropagation CS478 - Machine Learning.
Sofus A. Macskassy Fetch Technologies
Fast Pattern-Based Throughput Prediction for TCP Bulk Transfers
Computing and Compressive Sensing in Wireless Sensor Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Data Streaming in Computer Networking
DDoS Attack Detection under SDN Context
Soft Error Detection for Iterative Applications Using Offline Training
PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD
Knowledge Transfer via Multiple Model Local Structure Mapping
Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.
Actively Learning Ontology Matching via User Interaction
Statistical based IDS background introduction
Lu Tang , Qun Huang, Patrick P. C. Lee
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2, Olivier Verscheure 2, Xiaoqiao Meng 2, Lu Su 1,Jiawei Han 1 1 Department of Computer Science University of Illinois 2 IBM TJ Watson Research Center INFOCOM’2011 Shanghai, China INPUT: multiple simple atomic detectors OUTPUT: optimization-based combination mostly consistent with all atomic detectors

2 Network Traffic Anomaly Detection Computer Network Network Traffic Anomalous or Normal?

3 Challenges the normal behavior can be too complicated to describe. some normal data could be similar to the true anomalies labeling current anomalies is expensive and slow the network attacks adapt themselves continuously – what we know in the past may not work for today

The Problem Simple rules (or atomic rules) are relatively easy to craft. Problem: –there can be way too many simple rules –each rule can have high false alarm or FP rate Challenge: can we find their non-trivial combination (per event, per detector) that significantly improve accuracy?

5 Why We Need Combine Detectors? Count Entropy Count Entropy Count Entropy Label Too many alarms! Combined view is better than individual views!!

6 Combining Detectors is non-trivial –We aim at finding a consolidated solution without any knowledge of the true anomalies (unsupervised) –But we could improve with limited supervision and incrementally (semi-supervised and incremental) –We don’t know which atomic detectors are better and which are worse –At some given moment, it could be some non-trivial and dynamic combination of atomic detectors –There could be more bad base detectors than good ones, so that majority voting cannot work

7 Problem Formulation Record 1 Record 2 Record 3 Record 4 Record 5 Record 6 Record 7 …… A1A1 A2A2 A k-1 AkAk Which one is anomaly? YNNN …… NYYN YNNN YYNY NNYY NNNN NNNN Combine atomic detectors into one! We propose a non-trivial combination Consensus: 1. mostly consistent with all atomic detectors 2. optimization-based framework

8 How to Combine Atomic Detectors? Linear Models –As long as one detector is correct, there always exist weights to combine them linearly –Question: how to figure out these weights –Per example & per detector Different from majority voting and model averaging Principles –Consensus considers the performance among a set of examples and weights each detectors by considering its performance over others, i.e, each example is no longer i.i.d –Consensus: mostly consistent among all atomic detectors –Atomic detectors are better than random guessing and systematic flipping –Atomic detectors should be weighted according to their detection performance –We should rank the records according to their probability of being an anomaly Algorithm –Reach consensus among multiple atomic anomaly detectors unsupervised Semi-supervised incremental –Automatically derive weights of atomic detectors and records – per detector & per event – no single weight works for all situations.

9 Framework …… Detectors Records A1A1 AkAk record i detector j probability of anomaly, normal adjacency initial probability [1 0][0 1]

10 Objective minimize disagreement Similar probability of being an anomaly if the record is connected to the detector Do not deviate much from the initial probability …… Detectors Records A1A1 AkAk [1 0][0 1]

11 Methodology …… Detectors Records A1A1 AkAk [1 0][0 1] Iterate until convergence Update detector probability Update record probability

12 Propagation Process …… Detectors Records [1 0][0 1] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]

13 Consensus Combination Reduces Expected Error Detector A –Has probability P(A) –Outputs P(y|x,A) for record x regarding y=0 (normal) and y=1 (anomalous) Expected error of single detector Expected error of combined detector Combined detector has a lower expected error

14 Extensions Semi-supervised –Know the labels of a few records in advance –Improve the performance of the combined detector by incorporating this knowledge Incremental –Records arrive continuously –Incrementally update the combined detector

15 Incremental …… DetectorsRecords A1A1 AkAk [1 0][0 1] When a new record arrives Update detector probability Update record probability

16 Semi-supervised …… Detectors Records A1A1 AkAk [1 0][0 1] Iterate until convergence unlabeled labeled

17 Benchmark Data Sets IDN –Data: A sequence of events: dos flood, syn flood, port scanning, etc, partitioned into intervals –Detector: setting threshold on two high-level measures describing the probability of observing events during each interval DARPA –Data: A series of TCP connection records, collected by MIT Lincoln labs, each record contains 34 continuous derived features, including duration, number of bytes, error rate, etc. –Detector: Randomly select a subset of features, and apply unsupervised distance-based anomaly detection algorithm

18 Benchmark Datasets LBNL –Data: an enterprise traffic dataset collected at the edge routers of the Lawrence Berkeley National Lab. The packet traces were aggregated by intervals spanning 1 minute –Detector: setting threshold on six metrics including number of TCP SYN packets, number of distinct IPs in the source or destination, maximum number of distinct IPs an IP in the source or destination has contacted, and 6) maximum pairwise distance between distinct IPs an IP has contacted.

19 Experiments Setup Baseline methods –base detectors –majority voting –consensus maximization –semi-supervised (2% labeled) –stream (30% batch, 70% incremental) Evaluation measure –area under ROC curve (0-1, 1 is the best) –ROC curve: tradeoff between detection rate and false alarm rate

20 AUC on Benchmark Data Sets worstbestaverageMVUCSCIC IDN DARPA LBNL Worst, best and average performance of atomic detectors consensus combination Unsupervised, semi- supervised and incremental version of consensus combination Majority voting among detectors Consensus combination improves anomaly detection performance!

Stream Computing Continuous Ingestion Continuous Complex Analysis in low latency

22 Conclusions Consensus Combination –Combine multiple atomic anomaly detectors to a more accurate one in an unsupervised way We give –Theoretical analysis of the error reduction by detector combination –Extension of the method to incremental and semi- supervised learning scenarios –Experimental results on three network traffic datasets

23 Thanks! Any questions? Code available upon request