Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Framework for Discovering Anomalous Regimes in Multivariate Time-Series Data with Local Models Stephen Bay Stanford University, and Institute for the.

Similar presentations


Presentation on theme: "A Framework for Discovering Anomalous Regimes in Multivariate Time-Series Data with Local Models Stephen Bay Stanford University, and Institute for the."— Presentation transcript:

1 A Framework for Discovering Anomalous Regimes in Multivariate Time-Series Data with Local Models Stephen Bay Stanford University, and Institute for the Study of Learning and Expertise sbay@apres.stanford.edu Joint work with Kazumi Saito, Naonori Ueda, and Pat Langley

2 Discovering Anomalous Regimes Problem: Discover when a section of an observed time series has been generated by an anomalous regime. Anomalous: extremely rare or unusual Regime: the hypothetical true model generating the observed data

3 Motivation charge voltage temp. current variables causally related several different modes www.ndi.org nasa.gov

4 Other Categories of Irregularities Outliers Unusual patterns

5 DARTS Framework Estimate on windows Map into parameter space Estimate density of T according to R 1. Reference and Test data 4. Anomaly score 3. Parameter space 2. Local Models compute threshold Discovering Anomalous Regimes in Time Series

6 Local Models Vector Autoregressive models Regression format Ridge Regression

7 Scoring and Density Estimation Estimate the density of local models from T relative to R in the parameter space Kernels NN style

8 Determining a Null Distribution Score function provides a continuous estimate but some tasks require hard cutoff Null Distribution: –the distribution of anomaly scores we would expect to see if the data was completely normal Resample R and generate empirical distribution from block cross-validation Provides hypothesis testing framework for sounding alarms Anomaly score Empirical distribution

9 Computation Time Local Models –Linear in N (reference and test) –Cubic in number of variables (for AR) –Linear in window size (for AR) Density Estimation –Implemented with KD-trees –Potentially N T log N R –Can be worse in higher dimensions

10 Experiments Why evaluation is difficult Data sets –CD Player –Random Walk –ECG Arrhythmia –Financial Time-Series Comparison Algorithms –Hotelling’s T 2 statistic

11 Hotelling’s T 2 Statistic Commonly used in statistical process control for monitoring multivariate processes Basically the same as Mahalanobis distance Applied with time lags for patient monitoring in multivariate data (Gather et al., 2001)

12 CD Player Data from mechanical cd player arm –Two inputs relating to actuators (u1,u2) –Two outputs relating to position accuracy (y1,y2)

13 Output variable y1: artificial anomaly

14 Output variable y2: unchanged

15 Hotelling’s T 2

16 Random Walk No anomalies in random walk data

17 DARTS

18 Hotelling’s T 2

19 Cardiac Arrhythmia Data Electrocardiogram traces from MIT-BIH Collected to study cardiac dynamics and arrhythmias Every beat annotated by two cardiologists 30 minute recording @ 360 Hz Roughly 650,000 points, 2000 beats Points 100-3000 reference set remainder is test data

20 Cardiac Reference Data

21 DARTS Vaa

22 Hotelling’s T 2 Vaa

23 DARTS jjj

24 a

25 TP/FP Statistics ThresholdTPTNFPFNSensitivitySelectivity 97%3139841276682.6%71.1% 98%2851040719475.2%80.1% 99%20510882317454.1%89.9% Sensitivity = TP / (TP + FN) Selectivity = TP / (TP + FP)

26 Japanese Financial Data Monthly data from 1983-2003 Variables: –Monetary base –National bond interest rate –Wholesale price index –Index of industrial produce –Machinery orders –Exchange rate yen/dollar True anomalies unknown –subjective evaluation by expert

27 DARTS: Bond Rate

28 DARTS: Monetary Base

29 DARTS: Wholesale Price Index

30 DARTS: Index Industrial Produce

31 DARTS: Machinery Orders

32 Hotelling’s T 2

33 Hotelling’s T 2 vs. DARTS T2 can detect multivariate changes but, –Has little selectivity –Does not distinguish between variables –Does not handle drifts –F-statistical test often grossly underestimates proper threshold

34 Limitations of DARTS Suitability of local models Window-size and sensitivity Number of parameters Overlapping data Efficiency of KD-tree Explanation

35 Related Work Limit checking Discrepancy checking Autoregressive models Unusual patterns HMM’s

36 Conclusions DARTS framework Data -> local models -> parameter space -> density estimate Provides hypothesis testing framework for flagging anomalies Promising results on a variety of real and synthetic problems

37

38 DARTS Framework 1.Preprocess R and T 2.Select target variable and create local models from R 3.Create local models from T 4.Compare models of T to R in space P 5.Compute Null Distribution 6.Repeat steps 2-5 for each variable


Download ppt "A Framework for Discovering Anomalous Regimes in Multivariate Time-Series Data with Local Models Stephen Bay Stanford University, and Institute for the."

Similar presentations


Ads by Google