Presentation is loading. Please wait.

Presentation is loading. Please wait.

08/25/2004KDD ‘041 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides.

Similar presentations


Presentation on theme: "08/25/2004KDD ‘041 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides."— Presentation transcript:

1 08/25/2004KDD ‘041 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides for teaching, if You send me an email telling me the class number/ university in advance. My name and email address appears on the first slide (if you are using all or most of the slides), or on each slide (if you are just taking a few slides). You may freely use these slides for a conference presentation, if You send me an email telling me the conference name in advance. My name appears on each slide you use. You may not use these slides for tutorials, or in a published work (tech report/ conference paper/ thesis/ journal etc). If you wish to do this, email me first, it is highly likely I will grant you permission. (c) Eamonn Keogh, eamonn@cs.ucr.edu

2 08/25/2004KDD ‘042 Visually Mining and Monitoring Massive Time Series Jessica Lin* Eamonn Keogh Stefano Lonardi (UC Riverside) Jeffrey Lankford Donna Nystrom (The Aerospace Corp)

3 08/25/2004KDD ‘043 Motivation Before the launch of any unmanned space vehicle, a critical “go/no go” decision must be made. –Data from past launches is available to assist in the decision- making. –Streaming telemetry must be constantly monitored to detect any potential problems. A single framework is needed to perform these two tasks. –Existing tools inadequate for such tasks.

4 08/25/2004KDD ‘044 Introduction We introduce VizTree –Mining archival data Pattern discovery –repeated pattern discovery (motif discovery), –anomaly detection, –query-by-content –Monitoring incoming streaming data Why visualization? –human eye is often advocated as the ultimate data- mining tool –User-interaction allows visual data exploration and hypotheses testing

5 08/25/2004KDD ‘045 Outline Introduction Related Works VizTree Motivation VizTree Implementation –Time Series Discretization Experimental Evaluation Diff Tree Discussion/Conclusion

6 08/25/2004KDD ‘046 Related Work 1: Time Series Spirals Spiral Axis = serial attributes are encoded as line thickness Radii = periodic attributes Carlis & Konstan. UIST-98 Independently rediscovered by Weber, Alexa & Müller InfoVis-01 But dates back to 1888! Monday Tuesday Wednesday Thursday Friday Saturday Sunday One year of power demand data

7 08/25/2004KDD ‘047 Related Work 2: TimeSearcher Comments Simple and intuitive Highly dynamic exploration Query power may be limited and simplistic Limited scalability Hochheiser, and Shneiderman

8 08/25/2004KDD ‘048 Related Work 3 – Calendar-based The cluster and calendar-based visualization on employee working hours data. It shows six clusters, representing different working-day patterns.

9 08/25/2004KDD ‘049 Motivation of VizTree 10001000101001000101010100001 010100010101110111101011010010 11101001010100111010101010010 10010101011101010100101010101 101010100101100101110111101000 111000010100001001110101000111 00001010101100101110101 010110010111100110100100001000 101001101101011100001010101110 1111100011011011011111101001100 100100011010001111001101101000 101111000101101001101100110100 000010011000100111000001110100 1100101100001010010 Here are two sets of bit strings. Which set is generated by a human and which one is generated by a computer?

10 08/25/2004KDD ‘0410 VizTree 10001000101001000101010100001010 100010101110111101011010010111010 010101001110101010100101001010101 110101010010101010110101010010110 010111011110100011100001010000100 111010100011100001010101100101110 101 010110010111100110100100001000101 001101101011100001010101110111110 001101101101111110100110010010001 101000111100110110100010111100010 110100110110011010000001001100010 011100000111010011001011000010100 10 “ humans usually try to fake randomness by alternating patterns ” Lets put the sequences into a depth limited tree, such that the frequencies of all triplets are encoded in the thickness of branches… 0 1 0 0 0 1 1 1

11 08/25/2004KDD ‘0411 VizTree Zoom in The “trick” on the previous slide only works for discrete data, but time series are real valued. But we can SAX up a time series to make it discrete! VisTree Convert the time series to SAX Push the data in a depth-limited suffix tree Encode the frequencies as the line thicknessVisTree Convert the time series to SAX Push the data in a depth-limited suffix tree Encode the frequencies as the line thickness Overview Details 1 Details 2 Overview, zoom & filter, details on demand

12 08/25/2004KDD ‘0412 SAX Symbolic A ggregate ApproXimation baabccbc

13 08/25/2004KDD ‘0413 How do we obtain SAX? bccbaaba First convert the time series to PAA representation, then convert the PAA to symbols It take linear time 0 20 40 60 80 100 120 C C

14 08/25/2004KDD ‘0414 Time series subsequences tend to have a highly Gaussian distribution Why a Gaussian? A normal probability plot of the (cumulative) distribution of values from subsequences of length 128.

15 08/25/2004KDD ‘0415 Visual Comparison A raw time series of length 128 is transformed into the word “aaaaaabbbccdeffdcbbdcdefffffdccbb.” –We can use more symbols to represent the time series since each symbol requires fewer bits than real-numbers (float, double) -3 -2 -1 0 1 2 3 DFT PLA Haar APCA f e d c b a

16 Subsequence Matching/ Motif Dicovery This example demonstrates subsequence matching and motif discovery. We want to find a U-shaped pattern, so we’d try something that starts high, descends, and then ascends again. Clicking on “abdb” shows such patterns. Ben Shneiderman Zoom in Overview, zoom & filter, details on demand

17 08/25/2004KDD ‘0417 Motif Discovery Clicking on “abxx” shows this repeated patterns

18 08/25/2004KDD ‘0418 Anomaly Detection 1 Clicking on the branch “acxx” shows the anomalous heartbeat

19 08/25/2004KDD ‘0419 Anomaly Detection 2 Clicking on “bab” shows the anomalous week (Christmas). Instead of a normal 5-working-day week, it has 3-working day during Christmas.

20 08/25/2004KDD ‘0420 Diff Tree DiffTree Convert the two time series to SAX Push the data in a depth- limited suffix tree Encode the difference of frequencies as the line thickness Encode the significance of difference as the line color intensity Rank the surprising patternsDiffTree Convert the two time series to SAX Push the data in a depth- limited suffix tree Encode the difference of frequencies as the line thickness Encode the significance of difference as the line color intensity Rank the surprising patterns Blue lines - pattern is more common in A Green lines - pattern is more common in B Red lines – surprising patterns

21 08/25/2004KDD ‘0421 Diff Tree 2

22 08/25/2004KDD ‘0422 Scalability The pixel space of the tree is determined solely by the number of segments and alphabet size. –Constant and independent of the size of time series –Size of the dataset plays a role in memory space, since each node in the tree stores the offsets of its subsequences. However, SAX allows efficient numerosity reduction to reduce the number of subsequences being included into the tree large amounts of dimensionality reduction do not greatly affect the accuracy of our results (for the power dataset, the dimensionality is reduced from 672 to 3, a compression ratio of 224-to-1).

23 08/25/2004KDD ‘0423 Conclusion We propose VizTree, a novel time series visualization tool for pattern discovery. –Frequently occurring patterns –Anomaly detection –Query-by-content A single framework that allows both mining of the archival data and monitoring of streaming data. Highly scalable.

24 08/25/2004KDD ‘0424 Future Work Researchers from other sectors of the industry can greatly benefit from our system as well. –it could potentially be used for indexing and editing video sequences. Problems that can be indirectly solved: –Subsequence Clustering –Time Series Rule Discovery While we mainly focus on the “mining” aspect in this paper, we will extend VizTree to accept online streaming data for monitoring purposes.


Download ppt "08/25/2004KDD ‘041 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides."

Similar presentations


Ads by Google