Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Toward Sophisticated Detection With Distributed Triggers Ling Huang* Minos Garofalakis § Joe Hellerstein* Anthony Joseph* Nina Taft § *UC Berkeley §

Similar presentations


Presentation on theme: "1 Toward Sophisticated Detection With Distributed Triggers Ling Huang* Minos Garofalakis § Joe Hellerstein* Anthony Joseph* Nina Taft § *UC Berkeley §"— Presentation transcript:

1 1 Toward Sophisticated Detection With Distributed Triggers Ling Huang* Minos Garofalakis § Joe Hellerstein* Anthony Joseph* Nina Taft § *UC Berkeley § Intel Research

2 Outline What is a distributed triggering system? Simple Example: State of the Art Problem Statement Sophisticated Example: Tomorrow General Framework

3 Data Fusion Center Traditional Distributed Monitoring Large-scale network monitoring systems  Distributed and collaborative monitoring boxes  Continuously generating time series data Existing research focuses on data streaming  All data sent to data fusion center  Well suited for one-time queries, trend analysis, and continuously recording system state Monitor 1 Monitor 2 Monitor 3

4 Distributed Triggering System Use distributed monitoring system as infrastructure, but add: Goal:  monitor system-wide properties (defined across multiple machines), continuously  and fire alerts when system-wide characteristic exceeds acceptable threshold.  AND avoid pushing all the data to coordinator Idea: do system-wide anomaly detection with a limited view of the monitored data Approach:  Engage local monitors to do filtering (“triggering”) to avoid streaming all the data to coordinator.

5 Example Botnet scenario: ensemble of machines create huge number of connections to a server. Individually each attacker’s # connections lies below Host-IDS threshold. Individual monitors: track number of TCP connections. Coordinator tracks: SUM of TCP connections across all machines.  Flag a violation when bypass acceptable threshold C and error tolerance  Fire Not fire time SUM (aggregate time series)

6 Streaming vs. Triggering Streaming protocols  Goal: to estimate system state or signals  Needs to keep data streaming in incurs ongoing communication overhead   -guarantee on signal estimation Triggering protocols  Goal: is detection. 0-1 system state  Only need detailed data when close to detection threshold. incur overhead when necessary   -guarantee on ability to detection

7 aggregator Distributed Triggering Framework data 1 (t) data 2 (t) data n (t) check constraint adjust filter parameters original monitored time series filtered_data 1 (t) filtered_data 2 (t) filtered_data n (t) coordinator Alarms user inputs: threshold and error tolerance

8 Problem Statements What kinds of queries can you ask?  What kinds of system-wide properties can be tracked? How do you do the filtering at the monitors?  What do we send to coordinator? Summarized data? Sampled data? What kind of detection accuracy can we guarantee?  Coordinator may make errors with partial data

9 Why do detection with less data? Scalability !!!  Enterprise networks are not overprovisioned  Sensor networks clearly have limited communications bandwidth  ISP’s today are overprovisioned – so do they need this? Yes. Current monitoring (e.g., SNMP) happens on 5 minute time scale. What happens if this goes to 1 second time scale, or less –> data explosion. NIDS going to smaller time scales

10 Where we are today Problem: in order to track SUMs for detection, how do we compute the filtering parameters, with proof of analytical bound on detection error. For this query type (SUM, AVERAGE) problem is solved.  Huang, et. al. Intel Tech Report April 06  Keralapura, et. al. in SIGMOD 2006 For other queries (applications), basic problem has to be resolved (how to filter and derive bounds)

11 Extensions to sophisticated triggers PCA-based anomaly detection [Lakhina, et. al. sigcomm 04/05]  Example of dependencies across monitors Constraints defined over time to catch persistent/ongoing violations  Time window: Instantaneous, fixed and time- varying Compare groups of machines: is one set of servers more heavily loaded than another set? Load(Set-A) > Load(Set-B)?

12 Detection of Network-wide Anomalies A volume anomaly is a sudden change in an Origin-Destination flow (i.e., point to point traffic) Given link traffic measurements, diagnose the volume anomalies in flows H1H1 H2H2 Regional network 1 Regional network 2

13 The Subspace Method Principal Components Analysis (PCA): An approach to separate normal from anomalous traffic Normal Subspace : space spanned by the first k principal components Anomalous Subspace : space spanned by the remaining principal components Then, decompose traffic on all links by projecting onto and to obtain: Traffic vector Normal traffic vector abnormal traffic vector

14 n m Operation center The Centralized Algorithm [ lakhina04 ] Eigen values Eigen vectors Data matrix Y 1) Each link produces a column of data over time. 2) n links produces a row data y at each time instance. The detection is:

15 data(t) 12 9 45 7 2431 63 72 Y= filtered_data(t) Approximate Detection Procedure PCA on Y original constraint PCA on modified constraint Difference?

16 Intuition on how filtering is done Slack:  captures how “far away from threshold” Partition  into  i for each monitor Compute marginal impact of monitor i on global aggregate Monitors send data whenever: marginal impact on others based on ‘slack’ in system drift

17 Performance  Missed DetectionsFalse AlarmsData Reduction Week 1Week 2Week 1Week 2Week 1Week 2 0.0000087%70% 0.05011088%76% 0.1010090%79% Data Used: Abilene traffic matrix, 2 weeks, 41 links. error tolerance = upper bound on error

18 Capabilities and Future Work ApplicationConstraintQuery Botnet attack on w server (web, DNS) On sum of TCP connections SUM Subspace anomaly detection On quadratic function of traffic volumes Quadratic FUTURE Load balance across subsets of server clusters On sum of loads across different subsets Set comparison Find k top hottest sensors Top-k Future Work: Analysis for upper bounds on guarantees

19 Take Aways For one application, we implemented a large scale detection system, using 70-80% LESS data that current streaming solution. You don’t need all the data!  Can preserve accuracy This is good news for scalability: more monitors, smaller time scales. Approach is applicable to many application domains

20 20 Thank You Questions ?


Download ppt "1 Toward Sophisticated Detection With Distributed Triggers Ling Huang* Minos Garofalakis § Joe Hellerstein* Anthony Joseph* Nina Taft § *UC Berkeley §"

Similar presentations


Ads by Google