Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong.

Slides:

Advertisements

Similar presentations

Shapelets Correlated with Surface Normals Produce Surfaces Peter Kovesi School of Computer Science & Software Engineering The University of Western Australia.

Advertisements

Noise & Data Reduction. Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.

Lecture 15 Orthogonal Functions Fourier Series. LGA mean daily temperature time series is there a global warming signal?

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

1 Chapter 16 Fourier Analysis with MATLAB Fourier analysis is the process of representing a function in terms of sinusoidal components. It is widely employed.

DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.

The General Linear Model Or, What the Hell’s Going on During Estimation?

COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –

Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

1 Communication-Efficient Online Detection of Network-Wide Anomalies Ling Huang* XuanLong Nguyen* Minos Garofalakis § Joe Hellerstein* Michael Jordan*

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

“Real-time” Transient Detection Algorithms Dr. Kang Hyeun Ji, Thomas Herring MIT.

School of Computing Science Simon Fraser University

1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.

Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.

Curve-Fitting Regression

Digital Image Processing Chapter 4: Image Enhancement in the Frequency Domain.

Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.

1 Toward Sophisticated Detection With Distributed Triggers Ling Huang* Minos Garofalakis § Joe Hellerstein* Anthony Joseph* Nina Taft § *UC Berkeley §

Traffic Matrix Estimation: Existing Techniques and New Directions A. Medina (Sprint Labs, Boston University), N. Taft (Sprint Labs), K. Salamatian (University.

Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.

Multi-Resolution Analysis (MRA)

Introduction to Wavelets

EL 933 Final Project Presentation Combining Filtering and Statistical Methods for Anomaly Detection Augustin Soule Kav´e SalamatianNina Taft.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.

A Signal Analysis of Network Traffic Anomalies Paul Barford, Jeffrey Kline, David Plonka, and Amos Ron.

Time and Frequency Representation

Discrete Time Periodic Signals A discrete time signal x[n] is periodic with period N if and only if for all n. Definition: Meaning: a periodic signal keeps.

Application of Digital Signal Processing in Computed tomography (CT)

A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka, Amos Ron University of Wisconsin – Madison Summer, 2002.

Tomo-gravity Yin ZhangMatthew Roughan Nick DuffieldAlbert Greenberg “A Northern NJ Research Lab” ACM.

Summarized by Soo-Jin Kim

Shannon Lab 1AT&T – Research Traffic Engineering with Estimated Traffic Matrices Matthew Roughan Mikkel Thorup

Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.

Spatio-Temporal Compressive Sensing Yin Zhang The University of Texas at Austin Joint work with Matthew Roughan.

Transforms. 5*sin (2  4t) Amplitude = 5 Frequency = 4 Hz seconds A sine wave.

1 Chapter 5 Image Transforms. 2 Image Processing for Pattern Recognition Feature Extraction Acquisition Preprocessing Classification Post Processing Scaling.

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Low-rank By: Yanglet Date: 2012/12/2. Included Works. Yin Zhang, Lili Qiu ―Spatio-Temporal Compressive Sensing and Internet Traffic Matrices, SIGCOMM.

Constructing Optimal Wavelet Synopses Dimitris Sacharidis Timos Sellis

Curve-Fitting Regression

Part I: Image Transforms DIGITAL IMAGE PROCESSING.

Network Anomography Yin Zhang – University of Texas at Austin Zihui Ge and Albert Greenberg – AT&T Labs Matthew Roughan – University of Adelaide IMC 2005.

1 Distributed Detection of Network-Wide Traffic Anomalies Ling Huang* XuanLong Nguyen* Minos Garofalakis § Joe Hellerstein* Michael Jordan* Anthony Joseph*

Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.

CHEE825 Fall 2005J. McLellan1 Spectral Analysis and Input Signal Design.

ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Mining Anomalies in Network-Wide Flow Data Anukool Lakhina with Mark Crovella and Christophe Diot NANOG35, Oct 23-25, 2005.

7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.

Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar.

Dr. Scott Umbaugh, SIUE Discrete Transforms.

Taming Internet Traffic Some notes on modeling the wild nature of OD flows Augustin Soule Kavé Salamatian Antonio Nucci Nina Taft Univ. Paris VI Sprintlabs.

EE515/IS523: Security 101: Think Like an Adversary Evading Anomarly Detection through Variance Injection Attacks on PCA Benjamin I.P. Rubinstein, Blaine.

D. Rincón, M. Roughan, W. Willinger – Towards a Meaningful MRA of Traffic Matrices 1/36 Towards a Meaningful MRA for Traffic Matrices D. Rincón, M. Roughan,

CS 376b Introduction to Computer Vision 03 / 17 / 2008 Instructor: Michael Eckmann.

Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

Network Anomography Yin Zhang Joint work with Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement.

Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *

Fourier transform.

Digital Image Processing Lecture 8: Fourier Transform Prof. Charlene Tsai.

Data Transformation: Normalization

Advanced Wireless Networks

Singular Value Decomposition

UNIT-8 INVERTERS 11/27/2018.

Wikipedia Traffic Forecasting

X.1 Principal component analysis

Presentation transcript:

Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong Sun Some slides borrow from Yin Zhang

2 Network Anomaly Detection Is the network experiencing unusual conditions? –Call these conditions anomalies –Anomalies can often indicate network problems DDoS attack, network worms, flash crowds, misconfigurations, vendor implementation bugs, … –Need rapid detection and diagnosis Want to fix the problem quickly Questions of interest –Detection Is there an unusual event? –Identification What’s the best explanation? –Quantification How serious is the problem?

3 Network Anomography What we want –Volume anomalies [Lakhina04] Significant changes in an Origin-Destination flow, i.e., traffic matrix element –Detect Volume anomalies –Identify which O-D pair A B C

4 Network Anomography Challenge –It is difficult to measure traffic matrix directly –The anomalies detection problem is somewhat more complex and difficult First, anomaly detection is performed on a series of measurements over a period of time, rather than from a single snapshot. In addition to changes in the traffic, the solution must build in the ability to deal with changes in routing. What we have –Link traffic measurements Simple Network Management Protocol (SNMP) data on individual link loads is available almost ubiquitously. Network Anomography –Infer volume anomalies from link traffic measurements

5 An Illustration Courtesy: Anukool Lakhina [Lakhina04]

6 Anomography = Anomalies + Tomography

7 Mathematical Formulation Problem: Infer changes in TM elements (x t ) given link measurements (b t ) Only measure at links router route 1 route 3 route 2 link 2 link 1 link 3

8 Mathematical Formulation b t = A t x t (t=1,…,T) Typically massively under-constrained! Only measure at links router route 1 route 3 route 2 link 2 link 1 link 3

9 Static Network Anomography Time-invariant A t (= A), B=[b 1 …b T ], X=[x 1 …x T ] Only measure at links router route 1 route 3 route 2 link 2 link 1 link 3 B = AX

10 Anomography Strategies Early Inverse 1.Inversion –Infer OD flows X by solving b t =Ax t 2.Anomaly extraction –Extract volume anomalies X from inferred X Drawback: errors in step 1 may contaminate step 2 Late Inverse 1.Anomaly extraction –Extract link traffic anomalies B from B 2.Inversion –Infer volume anomalies X by solving b t =Ax t Idea: defer “lossy” inference to the last step     

11 Extracting Link Anomalies B Temporal Anomography: –Fourier / wavelet analysis Link anomalies = the high frequency components –ARIMA modeling Diff EWMA (Exponentially Weighted Moving Average) is ARIMA(0, 1, 1) Holt-Winters is ARIMA(0, 2, 2) –Temporal PCA PCA = Principal Component Analysis Project columns onto principal link column vectors Spatial Anomography: –Spatial PCA [Lakhina04] Project rows onto principal link row vectors 

12 Extracting Link Anomalies B Fourier analysis –Fourier analysis decompose a complex periodic waveform into a set of sinusoids with different amplitudes, frequencies and phases. –The sum of these sinusoids can exactly match the original waveform. –The idea of using the Fourier analysis to extract anomalous link traffic is to filter out the low frequency components. –In general, low frequency components capture the daily and weekly traffic patterns, while high frequency components represent the sudden changes in traffic behavior. 

13 Extracting Link Anomalies B Fourier analysis –For a discrete-time signal x 0, x 1,..., x N-1, the Discrete Fourier Transform (DFT) is defined by –where f n is a complex number that captures the amplitude and phase of the signal at the n-th frequency –Lower n corresponds to a lower frequency component, with f 0 being the DC component, –f n with n close to N/2 corresponding to high frequencies 

14 Extracting Link Anomalies B Fourier analysis –The Inverse Discrete Fourier Transform (IDFT) is used to reconstruct the signal in the time domain by –An efficient way to implement the DFT and IDFT is the Fast Fourier Transform (FFT). –The computational complexity of the FFT is O(N log(N)). 

15 Extracting Link Anomalies B FFT based anomography. –1. Transform link traffic B into the frequency domain: F = FFT(B): apply the FFT on each row of B. (a row corresponds to the time series of traffic data on one link.) –2. Remove low frequency components: i.e. set F i = 0, for i ∈ [1, c] ∪ [N-c, N], where c is a cut-off frequency. (For example, using 10-minute aggregated link traffic data of one week duration, and c = 10N/60, corresponding to a frequency of one cycle per hour.) –3. Transform back into the time domain: i.e. we take B = IFFT(F). The result is the high frequency components in the traffic data, which we will use as anomalous link traffic  

16 Extracting Link Anomalies B Wavelet analysis –1. Use wavelets to decompose B into different frequency levels: W = WAVEDEC(B), by applying a multi-level 1-D wavelet decomposition on each row of B. –2. Then remove low- and mid-frequency components in W by setting all coefficients at frequency levels higher than w c to 0. Here w c is a cut-off frequency level. –3. Reconstruct the signal: B = WAVEREC(W’). The result is the high-frequency components in the traffic data.  

17 Extracting Link Anomalies B ARIMA Modeling -- Box-Jenkins methodology, or AutoRegressive Integrated Moving Average (ARIMA) A class of linear time-series forecasting techniques that capture the linear dependency of the future values on the past. It has been extensively used for anomaly detection in univariate time series. To get back to anomaly detection, we simply identify the forecast errors as anomalous link traffic. Traffic behavior that cannot be well captured by the model is considered anomalous. 

18 Extracting Link Anomalies B ARIMA(p, d, q) model includes three parameters: –The autoregressive parameter (p), –The number of differencing passes (d), –The moving average parameter (q). –Some model used for detecting anomalies in time- series, for example, the Exponentially Weighted Moving Average (EWMA) is ARIMA(0, 1, 1); Holt-Winters is ARIMA(0, 2, 2). 

19 Extracting Link Anomalies B ARIMA(p, d, q) model includes three parameters: –the autoregressive parameter (p), –the number of differencing passes (d), –the moving average parameter (q).  where z k is obtained by differencing the original time series d times (when d ≥ 1) or by subtracting the mean from the original time series (when d = 0), e k is the forecast error at time k, φ i (i = 1,..., p) and θ j (j = 1,..., q) are the autoregression and moving average coefficients, respectively.

20 Extracting Link Anomalies B 

Diagnosing Network-Wide Traffic Anomalies Anukool Lakhina, Mark Crovella, Christophe Diot “Diagnosing Network-Wide Traffic Anomalies” SIGCOMM’04,

22 Extracting Link Anomalies B Spatial Anomography: Spatial PCA [Lakhina04] –1. Identify the first axis that the link traffic data have the greatest degree of variance along the first axis  –2. Identify the second axis that the link traffic data have the second greatest degree of variance along the second one, and so on so forth:

23 Extracting Link Anomalies B Spatial Anomography: Spatial PCA [Lakhina04] –3. Divide the link traffic space into the normal subspace and the anomalous subspace by examining the projection of the time series of link traffic data on each principal axis in order. As soon as a projection is found that contains a 3σ deviation from the mean, that principal axis and all subsequent axes are assigned to the anomalous subspace. All previous principal axis are assigned to the normal subspace. 

24 Data Collected Abilene Sprint-Europe

25 Low Intrinsic Dimensionality of Link Traffic Studied via Principal Component Analysis Key result: Normal traffic is well approximated as occupying a low dimensional subspace Reasons: 1. Links share OD flows 2. Set of OD flows also low dimensional

26 The Subspace Method An approach to separate normal from anomalous traffic Normal Subspace, : space spanned by the first k principal components Anomalous Subspace, : space spanned by the remaining principal components Then, decompose traffic on all links by projecting onto and to obtain: Traffic vector of all links at a particular point in time Normal traffic vector Residual traffic vector

27 Traffic on Link 1 Traffic on Link 2 A Geometric Illustration In general, anomalous traffic results in a large value of y

28 Detection Traffic on Link 1 Traffic on Link 2 Capture size of vector using squared prediction error (SPE): Result due to [Jackson and Mudholkar, 1979]

29 Detection Illustration Value of over time (all traffic) over time (SPE) Value of SPE at anomaly time points clearly stand out

30 Extracting Link Anomalies B Temporal PCA PCA = Principal Component Analysis Similar with Spatial PCA Project columns onto principal link column vectors 

31 Temporal Anomography: B = AX Now if we know B, how to solve the abnormal traffic O-D pairs X ? (1) Pseudoinverse solution (2) Sparsity maximization     Solving b t = Ax t  

32 Solving b t = A x t Pseudoinverse: x t = pinv(A) b t –Shortest minimal L 2 -norm solution Solve x t subject to |b t – A x t | 2 is minimal       

33 Solving b t = A x t Maximize sparsity –In practice, we expect only a few anomalies at any one time, so x typically has only a small number of large values. –Hence it is natural to proceed by maximizing the sparsity of x, i.e., solving the following l 0 norm minimization problem:    

34 Performance Evaluation Fix one anomaly extraction method Compare “real” and “inferred” anomalies –“real” anomalies: directly from OD flow data –“inferred” anomalies: from link data Order them by size –Compare the size How many of the top N do we find –Gives detection rate: | top N ”real”  top N inferred | / N

35 Performance Evaluation

36 Performance Evaluation

37 Performance Evaluation

38 Performance Evaluation

39 Performance Evaluation: Anomography Hard to compare performance –Lack ground-truth: what is an anomaly? So compare events from different methods –Compute top M “benchmark” anomalies Apply an anomaly extraction method directly on OD flow data –Compute top N “inferred” anomalies Apply another anomography method on link data –Report min(M,N) - | top M benchmark  top N inferred | M  N  “false negatives” # big “benchmark” anomalies not considered big by anomography M  N  “false positives” # big “inferred” anomalies not considered big by benchmark method –Choose M, N similar to numbers of anomalies a provider is willing to investigate, e.g per week

40 Anomography: “False Negatives” Top 50 Inferred “False Negatives” with Top 30 Benchmark DiffEWMAH-WARIMAFourierWaveletT-PCAS-PCA Diff EWMA Holt-Winters ARIMA Fourier Wavelet T-PCA S-PCA Diff/EWMA/H.-W./ARIMA/Fourier/Wavelet all largely consistent 2.PCA methods not consistent (even with each other) - PCA cannot detect anomalies in the “normal” subspace - PCA insensitive to reordering of [b 1 …b T ]  cannot utilize all temporal info 3.Spatial methods (e.g. spatial PCA) are not self-consistent

41 Anomography: “False Positives” Top 30 Inferred “False Positives” with Top 50 Benchmark DiffEWMAH-WARIMAFourierWaveletT-PCAS-PCA Diff EWMA Holt-Winters ARIMA Fourier Wavelet T-PCA S-PCA Diff/EWMA/H.-W./ARIMA/Fourier/Wavelet all largely consistent 2.PCA methods not consistent (even with each other) - PCA cannot detect anomalies in the “normal” subspace - PCA insensitive to reordering of [b 1 …b T ]  cannot utilize all temporal info 3.Spatial methods (e.g. spatial PCA) are not self-consistent

42 Conclusions Anomography = Anomalies + Tomography –Find anomalies in {x t } given b t =A t x t (t=1,…,T) Contributions 1.A general framework for anomography methods –Decouple anomaly extraction and inference components 2.A number of novel algorithms –Taking advantage of the range of choices for anomaly extraction and inference components –Choosing between spatial vs. temporal approaches 3.Extensive evaluation on real traffic data –6-month Abilene and 1-month Tier-1 ISP The method of choice: ARIMA + Sparsity-L1

43 Thank you ! Question?

44 Extracting Link Anomalies B Temporal Anomography: B = BT –Fourier / wavelet analysis Link anomalies = the high frequency components –ARIMA modeling Diff: f t = b t-1 b t = b t – f t EWMA: f t = (1-  ) f t-1 +  b t-1 b t = b t – f t –Temporal PCA PCA = Principal Component Analysis Project columns onto principal link column vectors Spatial Anomography: B = TB –Spatial PCA [Lakhina04] Project rows onto principal link row vectors     