Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.

Similar presentations


Presentation on theme: "1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel."— Presentation transcript:

1 1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel Research {hling, xuanlong, jordan, adj}@cs.berkeley.edu {minos.garofalakis, nina.taftg@intel.com

2 2 Detection of Network-wide Anomalies A volume anomaly is a sudden change in an Origin-Destination flow (i.e., point to point traffic) Given only link traffic measurements, efficiently diagnose the volume anomalies H1H1 H2H2 The backbone network Regional network 1 Regional network 2

3 3 An Illustration Observed network link flow = aggregate of application-level flows Anomalies in (unobserved) application-level flow Finding anomalies in high-dimensional, noisy data is difficult !

4 4 The PCA Method An approach to separate normal from anomalous traffic Normal Subspace : space spanned by the top k principal components Anomalous Subspace : space spanned by the remaining principal components Then, decompose traffic on all links by projecting onto and to obtain: Traffic vector of all links at a particular point in time Normal traffic vector Residual traffic vector

5 5 A Geometric Illustration In general, anomalous traffic results in a large value of, where

6 6 Detection Illustration Value of over time at anomalous time points clearly stand out over time (residual part) Value of

7 7 n m Operation center The Centralized Algorithm The Network Eigen values Eigen vectors Data matrix Y (m x n) n links m time points (data) Detection procedure Raise a flag if Periodically (e.g. once a week)

8 8 Scalability Issues of Centralized Approach As the number of monitoring devices grow (up to hundreds or thousands network data features)  central processing site overloaded  certain networks do not overprovision inter-site connectivity When anomalies occur on smaller time scales (down to second or sub-second scales)  “periodic push” has to be applied on second or sub-second scales  the volume of data transmitted through network would explode

9 9 Our Distributed Approach A communication-efficient framework that  detects anomalies at desired accuracy level  minimizes data communication cost A distributed protocol for data processing  local monitors decide when to update data to coordinator  coordinator makes global decision and feedback to monitors An algorithmic frame guide the tradeoff  simple algorithm for determining filtering parameters given desired detection accuracy  stochastic matrix perturbation theory quantify how a perturbed data matrix impacts its eigen structures how perturbed eigen structures impacts the detection accuracy

10 10 Our In-Network Detection Framework Anomaly User inputs Original monitored time series Processed time series Distr. Monitors Coordinator

11 11 The Protocol At Monitors Each monitor updates information to coordinator if its incoming signal where (filtering slacks) are adaptively computed by the coordinator can be based on any prediction model built by on its data at an update time  e.g., the average of last 5 signal values observed locally at

12 12 The coordinator makes a new row where The Protocol At The Coordinator If any element in is updated  Update  Compute new Perform detection using

13 13 The Tradeoff The bigger the, the less communication, but the more the detection error Need an algorithm to related to detection accuracy Difference? Eigen Vectors Eigen Values Raw data y in blue, data available for detection in red

14 14 Parameter Design and Error Control (I) Given upper bound of false alarm , determine the monitor slacks  ’s Perturbation analysis: from deviation of false alarm to monitor slacks

15 15 Let and are eigenvalues of the covariance matrices and Define the perturbation matrix Define the eigen error From matrix perturbation theory, we have So the key point is to estimate in terms of slaks  ’s Parameter Design and Error Control (II)

16 16 Let and Standard assumptions on the filtering error matrix W: Eigen Error   Monitor Slacks  ’s (I)

17 17 Eigen-Error   Monitor Slacks  ’s (II) Where:, n is number of monitors and m is the number of data points.

18 18 Detection Error   Eigen-Error  (I) Basic idea: study how eigen error impacts detection error  With full data, false alarm rate is  With approximate data, we only have perturbed version  Given eigen error, we can compute the false alarm rate (though not in closed-form solution) Inverse dependency: given desired false alarm rate, we can determine tolerable eigen error by fast binary search

19 19 Detection Error   Eigen-Error  (I) Consider normalized random variable For approximate data, we only obtain Let denote an upper bound on The deviation of false alarm rate can be approximate as The upper bound of false alarm rate is

20 20 Evaluation Given a tolerable deviation of false alarm rate, we can determine system parameters Using system parameters, we can evaluate the actual detection accuracy using simulation Experiment setup  Abilene backbone network data  Traffic matrices of size 1008 X 41  Set uniform slack for all monitors

21 21 Results Monitor slacks, communication cost and detection error

22 22 Results (II)


Download ppt "1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel."

Similar presentations


Ads by Google