Download presentation

Presentation is loading. Please wait.

Published byKaley Colley Modified over 3 years ago

1
1 StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time Pankaj Kumar Madhukar Rakesh Kumar Singh Puspendra Kumar Project Instructor: Prof P.K.Reddy

2
2 Goal n Given tens of thousands of high speed time series data streams, to detect high-value correlation, including synchronized and time- lagged, over sliding windows in real time. n Real time u high update frequency of the data stream u fixed response time, online Correlated!

3
3 Our approach n Naive algorithm u N : number of streams u w : size of sliding window u space O(N) and time O(N 2 w) VS space O(N 2 ) and time O(N 2 ). n Suppose that the streams are updated every second. u With a Pentium 4 PC, the exact computing method can only monitor 700 streams with a delay of 2 minutes. n Our Approach u Using Discrete Fourier Transform to approximate correlation u Using grid structure to filter out unlikely pairs u Our approach can monitor 10,000 streams with a delay of 2 minutes.

4
4 Roadmap n Goal n StatStream u Data Structure u Correlation Approximation u Grid structure n Empirical study n Future work

5
5 Stream synoptic data structure n Three level time interval hierarchy u Time point, Basic window, Sliding window n Basic window (the key to our technique) u The computation for basic window i must finish by the end of the basic window i+1 u The basic window time is the system response time. n Digests Sliding window digests: sum DFT coefs Basic window digests: sum DFT coefs Sliding window Basic window Time point Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs Basic window digests: sum DFT coefs

6
6 Roadmap n Motivation and Goal n Related work n StatStream u Data Structure u Correlation Approximation u Grid structure n Empirical study n Future work

7
7 Synchronized Correlation Uses Basic Windows n Inner-product of aligned basic windows Stream x Stream y Sliding window Basic window

8
8 n Approximate with an orthogonal function family (e.g. DFT) n Inner product of the time series Inner product of the digests n The time and space complexity is reduced from O(b) to O(n). u b : size of basic window u n : size of the digests (n<**
{
"@context": "http://schema.org",
"@type": "ImageObject",
"contentUrl": "http://images.slideplayer.com/13/3900251/slides/slide_8.jpg",
"name": "8 n Approximate with an orthogonal function family (e.g.",
"description": "DFT) n Inner product of the time series Inner product of the digests n The time and space complexity is reduced from O(b) to O(n). u b : size of basic window u n : size of the digests (n<
**

9
9 Approximate lagged Correlation n Inner-product with unaligned windows n The time complexity is reduced from O(b) to O(n 2 ), as opposed to O(n) for synchronized correlation. sliding window

10
10 Roadmap n Motivation and Goal n Related work n StatStream u Data Structure u Correlation Approximation u Grid structure n Empirical study n Future work

11
11 Grid Structure(to avoid checking all pairs) n The DFT coefficients yields a vector. High correlation => c loseness in the vector space u We can use a grid structure and look in the neighborhood, this will return a super set of highly correlated pairs. x

12
12 Roadmap n Motivation and Goal n Related work n StatStream u Data Structure u Correlation Approximation u Grid structure n Empirical study n Future work

13
13 Empirical Study n Response time u Exact (naïve method): T=k 0 bN 2

14
14 Empirical Study n DFT-grid: u Updating Digests: T 1 =k 1 bN u Detecting correlation:T 2 =k 2 N 2

15
15 Empirical Study(cont.) n Approximation errors u Larger size of digests, larger size of sliding window and smaller size of basic window give better approximation u The approximation errors are small for the stock data. n Precision: the quality of the grid structure

16
16 Roadmap n Motivation and Goal n Related work n StatStream u Data Structure u Correlation Approximation u Grid structure n Empirical study n Future work

17
17 Future work n Algorithmic: u dynamic clustering of streams u outlier detection F a stream that becomes less correlated with the other streams in its cluster. n Applications: u Data-intensive application requiring correlation among many streams. u Network Traffic Monitoring: F The unusual high correlation between two links in a network might suggest some anomaly. u Medical Time Series: F The high correlation between the two region in the human brain during fMRI testing might suggest some functional connection. u Some domain specific definition of correlation might be more appropriate. F E.g., in fMRI time series, detrending before correlating.

Similar presentations

OK

Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and.

Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on principles of peace building quotes Ppt on production management Ppt on statistics in maths what does commutative mean Ppt on history of atomic models Ppt on indian entertainment and media industry Ppt on object-oriented concepts in java with examples Ppt on social entrepreneurship Working of raster scan display ppt on tv Ppt on panel discussion format Ppt on sight words