Download presentation

Presentation is loading. Please wait.

Published byKaley Colley Modified about 1 year ago

1
1 StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time Pankaj Kumar Madhukar Rakesh Kumar Singh Puspendra Kumar Project Instructor: Prof P.K.Reddy

2
2 Goal n Given tens of thousands of high speed time series data streams, to detect high-value correlation, including synchronized and time- lagged, over sliding windows in real time. n Real time u high update frequency of the data stream u fixed response time, online Correlated!

3
3 Our approach n Naive algorithm u N : number of streams u w : size of sliding window u space O(N) and time O(N 2 w) VS space O(N 2 ) and time O(N 2 ). n Suppose that the streams are updated every second. u With a Pentium 4 PC, the exact computing method can only monitor 700 streams with a delay of 2 minutes. n Our Approach u Using Discrete Fourier Transform to approximate correlation u Using grid structure to filter out unlikely pairs u Our approach can monitor 10,000 streams with a delay of 2 minutes.

4
4 Roadmap n Goal n StatStream u Data Structure u Correlation Approximation u Grid structure n Empirical study n Future work

5
5 Stream synoptic data structure n Three level time interval hierarchy u Time point, Basic window, Sliding window n Basic window (the key to our technique) u The computation for basic window i must finish by the end of the basic window i+1 u The basic window time is the system response time. n Digests Sliding window digests: sum DFT coefs Basic window digests: sum DFT coefs Sliding window Basic window Time point Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs

6
6 Roadmap n Motivation and Goal n Related work n StatStream u Data Structure u Correlation Approximation u Grid structure n Empirical study n Future work

7
7 Synchronized Correlation Uses Basic Windows n Inner-product of aligned basic windows Stream x Stream y Sliding window Basic window

8
8 n Approximate with an orthogonal function family (e.g. DFT) n Inner product of the time series Inner product of the digests n The time and space complexity is reduced from O(b) to O(n). u b : size of basic window u n : size of the digests (n<**
**

9
9 Approximate lagged Correlation n Inner-product with unaligned windows n The time complexity is reduced from O(b) to O(n 2 ), as opposed to O(n) for synchronized correlation. sliding window

10
10 Roadmap n Motivation and Goal n Related work n StatStream u Data Structure u Correlation Approximation u Grid structure n Empirical study n Future work

11
11 Grid Structure(to avoid checking all pairs) n The DFT coefficients yields a vector. High correlation => c loseness in the vector space u We can use a grid structure and look in the neighborhood, this will return a super set of highly correlated pairs. x

12
12 Roadmap n Motivation and Goal n Related work n StatStream u Data Structure u Correlation Approximation u Grid structure n Empirical study n Future work

13
13 Empirical Study n Response time u Exact (naïve method): T=k 0 bN 2

14
14 Empirical Study n DFT-grid: u Updating Digests: T 1 =k 1 bN u Detecting correlation:T 2 =k 2 N 2

15
15 Empirical Study(cont.) n Approximation errors u Larger size of digests, larger size of sliding window and smaller size of basic window give better approximation u The approximation errors are small for the stock data. n Precision: the quality of the grid structure

16
16 Roadmap n Motivation and Goal n Related work n StatStream u Data Structure u Correlation Approximation u Grid structure n Empirical study n Future work

17
17 Future work n Algorithmic: u dynamic clustering of streams u outlier detection F a stream that becomes less correlated with the other streams in its cluster. n Applications: u Data-intensive application requiring correlation among many streams. u Network Traffic Monitoring: F The unusual high correlation between two links in a network might suggest some anomaly. u Medical Time Series: F The high correlation between the two region in the human brain during fMRI testing might suggest some functional connection. u Some domain specific definition of correlation might be more appropriate. F E.g., in fMRI time series, detrending before correlating.

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google