Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,

Similar presentations


Presentation on theme: "Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,"— Presentation transcript:

1 Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2, Olivier Verscheure 2, Xiaoqiao Meng 2, Lu Su 1,Jiawei Han 1 1 Department of Computer Science University of Illinois 2 IBM TJ Watson Research Center INFOCOM’2011 Shanghai, China INPUT: multiple simple atomic detectors OUTPUT: optimization-based combination mostly consistent with all atomic detectors

2 2 Network Traffic Anomaly Detection Computer Network Network Traffic Anomalous or Normal?

3 3 Challenges the normal behavior can be too complicated to describe. some normal data could be similar to the true anomalies labeling current anomalies is expensive and slow the network attacks adapt themselves continuously – what we know in the past may not work for today

4 The Problem Simple rules (or atomic rules) are relatively easy to craft. Problem: –there can be way too many simple rules –each rule can have high false alarm or FP rate Challenge: can we find their non-trivial combination (per event, per detector) that significantly improve accuracy?

5 5 Why We Need Combine Detectors? Count 0.1-0.5 Entropy 0.1-0.5 Count 0.3-0.7 Entropy 0.3-0.7 Count 0.5-0.9 Entropy 0.5-0.9 Label Too many alarms! Combined view is better than individual views!!

6 6 Combining Detectors is non-trivial –We aim at finding a consolidated solution without any knowledge of the true anomalies (unsupervised) –But we could improve with limited supervision and incrementally (semi-supervised and incremental) –We don’t know which atomic detectors are better and which are worse –At some given moment, it could be some non-trivial and dynamic combination of atomic detectors –There could be more bad base detectors than good ones, so that majority voting cannot work

7 7 Problem Formulation Record 1 Record 2 Record 3 Record 4 Record 5 Record 6 Record 7 …… A1A1 A2A2 A k-1 AkAk Which one is anomaly? YNNN …… NYYN YNNN YYNY NNYY NNNN NNNN Combine atomic detectors into one! We propose a non-trivial combination Consensus: 1. mostly consistent with all atomic detectors 2. optimization-based framework

8 8 How to Combine Atomic Detectors? Linear Models –As long as one detector is correct, there always exist weights to combine them linearly –Question: how to figure out these weights –Per example & per detector Different from majority voting and model averaging Principles –Consensus considers the performance among a set of examples and weights each detectors by considering its performance over others, i.e, each example is no longer i.i.d –Consensus: mostly consistent among all atomic detectors –Atomic detectors are better than random guessing and systematic flipping –Atomic detectors should be weighted according to their detection performance –We should rank the records according to their probability of being an anomaly Algorithm –Reach consensus among multiple atomic anomaly detectors unsupervised Semi-supervised incremental –Automatically derive weights of atomic detectors and records – per detector & per event – no single weight works for all situations.

9 9 Framework …… Detectors Records A1A1 AkAk record i detector j probability of anomaly, normal adjacency initial probability [1 0][0 1]

10 10 Objective minimize disagreement Similar probability of being an anomaly if the record is connected to the detector Do not deviate much from the initial probability …… Detectors Records A1A1 AkAk [1 0][0 1]

11 11 Methodology …… Detectors Records A1A1 AkAk [1 0][0 1] Iterate until convergence Update detector probability Update record probability

12 12 Propagation Process …… Detectors Records [1 0][0 1] [0.5 0.5] [0.7 0.3] [0.357 0.643] [0.5285 0.4715] [0.357 0.643] [0.7 0.3] [0.6828 0.3172] [0.7514 0.2486] [0.304 0.696]

13 13 Consensus Combination Reduces Expected Error Detector A –Has probability P(A) –Outputs P(y|x,A) for record x regarding y=0 (normal) and y=1 (anomalous) Expected error of single detector Expected error of combined detector Combined detector has a lower expected error

14 14 Extensions Semi-supervised –Know the labels of a few records in advance –Improve the performance of the combined detector by incorporating this knowledge Incremental –Records arrive continuously –Incrementally update the combined detector

15 15 Incremental …… DetectorsRecords A1A1 AkAk [1 0][0 1] When a new record arrives Update detector probability Update record probability

16 16 Semi-supervised …… Detectors Records A1A1 AkAk [1 0][0 1] Iterate until convergence unlabeled labeled

17 17 Benchmark Data Sets IDN –Data: A sequence of events: dos flood, syn flood, port scanning, etc, partitioned into intervals –Detector: setting threshold on two high-level measures describing the probability of observing events during each interval DARPA –Data: A series of TCP connection records, collected by MIT Lincoln labs, each record contains 34 continuous derived features, including duration, number of bytes, error rate, etc. –Detector: Randomly select a subset of features, and apply unsupervised distance-based anomaly detection algorithm

18 18 Benchmark Datasets LBNL –Data: an enterprise traffic dataset collected at the edge routers of the Lawrence Berkeley National Lab. The packet traces were aggregated by intervals spanning 1 minute –Detector: setting threshold on six metrics including number of TCP SYN packets, number of distinct IPs in the source or destination, maximum number of distinct IPs an IP in the source or destination has contacted, and 6) maximum pairwise distance between distinct IPs an IP has contacted.

19 19 Experiments Setup Baseline methods –base detectors –majority voting –consensus maximization –semi-supervised (2% labeled) –stream (30% batch, 70% incremental) Evaluation measure –area under ROC curve (0-1, 1 is the best) –ROC curve: tradeoff between detection rate and false alarm rate

20 20 AUC on Benchmark Data Sets worstbestaverageMVUCSCIC IDN0.52690.66710.59040.70890.72550.72040.7270 0.28320.80590.57310.68540.77110.80480.7552 0.37450.82660.66540.88710.90760.90890.9090 DARPA0.58040.60680.59810.77650.78120.80050.7730 0.59300.61370.60210.78650.79380.81730.7836 0.58510.61500.60220.77390.77960.79850.7727 LBNL0.50050.82300.71010.81650.81800.83240.8160 Worst, best and average performance of atomic detectors consensus combination Unsupervised, semi- supervised and incremental version of consensus combination Majority voting among detectors Consensus combination improves anomaly detection performance!

21 Stream Computing Continuous Ingestion Continuous Complex Analysis in low latency

22 22 Conclusions Consensus Combination –Combine multiple atomic anomaly detectors to a more accurate one in an unsupervised way We give –Theoretical analysis of the error reduction by detector combination –Extension of the method to incremental and semi- supervised learning scenarios –Experimental results on three network traffic datasets

23 23 Thanks! Any questions? Code available upon request


Download ppt "Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,"

Similar presentations


Ads by Google