Presentation is loading. Please wait.

Presentation is loading. Please wait.

Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.

Similar presentations


Presentation on theme: "Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University."— Presentation transcript:

1 Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University

2 © January 16http://www.pdl.cmu.edu/2 Motivation Co-evolving time series (data streams) appear in many different applications — e.g.: Disk access traffic in network clusters Internet flow traffic in a network Temperatures in a large building Chlorine concentration in water distribution network Values are typically correlated Would be very useful if we could summarize them on the fly

3 © January 16http://www.pdl.cmu.edu/3 Example water distribution network normal operation Phase 1Phase 2Phase 3 : : : chlorine concentrations sensors near leak sensors away from leak time

4 © January 16http://www.pdl.cmu.edu/4 Discover “hidden” (latent) variables for: Summarization of main trends for users Efficient forecasting, spotting outliers/anomalies Incremental, real-time computation Limited memory requirements Goals

5 © January 16http://www.pdl.cmu.edu/5 Phase 1Phase 2Phase 3 : : : Example: chlorine measurements water distribution network normal operationmajor leak chlorine concentrations sensors near leak sensors away from leak

6 © January 16http://www.pdl.cmu.edu/6 Phase 1 k = 1 Example: hidden variable actual measurements (n streams) k hidden variable(s) We would like to discover a few “hidden (latent) variables” that summarize the key trends Phase 1 : : : chlorine concentrations

7 © January 16http://www.pdl.cmu.edu/7 Example: hidden variable tracking chlorine concentrations Phase 1 Phase 2 actual measurements (n streams) k hidden variable(s) k = 2 : : : We would like to discover a few “hidden (latent) variables” that summarize the key trends

8 © January 16http://www.pdl.cmu.edu/8 Example: hidden variable tracking chlorine concentrations Phase 1 Phase 2 Phase 3 actual measurements (n streams) k hidden variable(s) k = 1 : : : We would like to discover a few “hidden (latent) variables” that summarize the key trends

9 © January 16http://www.pdl.cmu.edu/9 Method outline Step 1: How to capture correlations? Step 2: How to do it incrementally, when we have a very large number of points? Step 3: How to dynamically adjust the number of hidden variables?

10 © January 16http://www.pdl.cmu.edu/10 1. How to capture correlations? 20 o C 30 o C Temperature T 1 First sensor time

11 © January 16http://www.pdl.cmu.edu/11 1. How to capture correlations? First sensor Second sensor 20 o C 30 o C Temperature T 2 time

12 © January 16http://www.pdl.cmu.edu/12 20 o C30 o C 1. How to capture correlations 20 o C 30 o C Temperature T 1 Correlations: Let’s take a closer look at the first three value-pairs… Temperature T 2

13 © January 16http://www.pdl.cmu.edu/13 20 o C30 o C 1. How to capture correlations 20 o C 30 o C Temperature T 2 Temperature T 1 First three lie (almost) on a line in the space of value- pairs…  O(n) numbers for the slope, and  One number for each value-pair (offset on line) offset = “hidden variable” time=1 time=2 time=3

14 © January 16http://www.pdl.cmu.edu/14 1. How to capture correlations 20 o C30 o C 20 o C 30 o C Temperature T 2 Temperature T 1 Other pairs also follow the same pattern: they lie (approximately) on this line

15 © January 16http://www.pdl.cmu.edu/15 Method outline Step 1: How to capture correlations? Step 2: How to do it incrementally, when we have a very large number of points? Step 3: How to dynamically adjust the number of hidden variables?

16 © January 16http://www.pdl.cmu.edu/16 From hidden variables Experiments: chlorine concentration 166 streams 2 hidden variables (~4% error) Measurements Reconstruction [CMU Civil Engineering] from sensor

17 © January 16http://www.pdl.cmu.edu/17 Experiments: c hlorine concentration hidden variables [CMU Civil Engineering] Both capture global, periodic pattern Second: ~ first, but “phase-shifted” Can express any “phase-shift”…

18 © January 16http://www.pdl.cmu.edu/18 Conclusion Many settings with hundreds of streams, but Stream values are, by nature, related We proposed a method to discover hidden variables as summarization of main trends for users require only incremental computation without buffering of any past data Future work: Apply on more applications: e.g, performance monitoring for storage system, network system.

19 © January 16http://www.pdl.cmu.edu/19 Related work Stream SVD [Guha, Gunopulos, Koudas / KDD03] StatStream [Zhu, Shasha / VLDB02] Clustering [Aggarwal, Han, Yu / VLDB03], [Guha, Meyerson, et al / TKDE], [Lin, Vlachos, Keogh, Gunopulos / EDBT04], Classification [Wang, Fan, et al / KDD03], [Hulten, Spencer, Domingos / KDD01] Piecewise approximations [Palpanas, Vlachos, Keogh, etal / ICDE 2004]

20 © January 16http://www.pdl.cmu.edu/20 Experiments: Light measurements 54 sensors 2-4 hidden variables (~6% error) measurement reconstruction

21 © January 16http://www.pdl.cmu.edu/21 Experiments: Light measurements 1 & 2: main trend (as before) 3 & 4: potential anomalies and outliers hidden variables intermittent

22 © January 16http://www.pdl.cmu.edu/22 Stream correlations Step 1: How to capture correlations? Step 2: How to do it incrementally, when we have a very large number of points? Step 3: How to dynamically adjust the number of hidden variables?

23 © January 16http://www.pdl.cmu.edu/23 2. Incremental update error 20 o C30 o C 20 o C 30 o C Temperature T 2 Temperature T 1 For each new point Project onto current line Estimate error New value

24 © January 16http://www.pdl.cmu.edu/24 2. Incremental update error 20 o C 30 o C 20 o C30 o C Temperature T 2 Temperature T 1 For each new point Project onto current line Estimate error Rotate line in the direction of the error and in proportion to its magnitude  O(n) time New value

25 © January 16http://www.pdl.cmu.edu/25 2. Incremental update 20 o C 30 o C 20 o C30 o C Temperature T 2 Temperature T 1 For each new point Project onto current line Estimate error Rotate line in the direction of the error and in proportion to its magnitude

26 © January 16http://www.pdl.cmu.edu/26 Stream correlations Principal Component Analysis (PCA) The “line” is the first principal component (PC) vector This line is optimal: it minimizes the sum of squared projection errors

27 © January 16http://www.pdl.cmu.edu/27 2. Incremental update Given number of hidden variables k Assuming k is known We know how to update the slope (detailed equations in paper) For each new point x and for i = 1, …, k : y i := w i T x(proj. onto w i ) d i  d i + y i 2 (energy  i-th eigenval.) e i := x – y i w i (error) w i  w i + (1/d i ) y i e i (update estimate) x  x – y i w i (repeat with remainder) y1y1 w1w1 x e1e1 w 1 updated

28 © January 16http://www.pdl.cmu.edu/28 Stream correlations Step 1: How to capture correlations? Step 2: How to do it incrementally, when we have a very large number of points? Step 3: How to dynamically adjust k, the number of hidden variables?

29 © January 16http://www.pdl.cmu.edu/29 T3T3 3. Number of hidden variables If we had three sensors with similar measurements Again: points would lie on a line (i.e., one hidden variable, k=1), but in 3-D space T1T1 T2T2 value-tuple space

30 © January 16http://www.pdl.cmu.edu/30 T3T3 3. Number of hidden variables Assume one sensor intermittently gets stuck Now, no line can give a good approximation T1T1 T2T2 value-tuple space

31 © January 16http://www.pdl.cmu.edu/31 T3T3 3. Number of hidden variables Assume one sensor intermittently gets stuck Now, no line can give a good approximation But a plane will do (two hidden variables, k = 2) T1T1 T2T2 value-tuple space

32 © January 16http://www.pdl.cmu.edu/32 Number of hidden variables (PCs) Keep track of energy maintained by approximation with k variables (PCs): Reconstruction accuracy, w.r.t. total squared error Increment (or decrement) k if fraction of energy maintained goes below (or above) a threshold If below 95%, k  k  1 If above 98%, k  k  1


Download ppt "Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University."

Similar presentations


Ads by Google