Jessica Lin, Eamonn Keogh, Stefano Loardi

Slides:



Advertisements
Similar presentations
SAX: a Novel Symbolic Representation of Time Series
Advertisements

Indexing DNA Sequences Using q-Grams
Mining Association Rules from Microarray Gene Expression Data.
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Relevance Feedback Retrieval of Time Series Data Eamonn J. Keogh & Michael J. Pazzani Prepared By/ Fahad Al-jutaily Supervisor/ Dr. Mourad Ykhlef IS531.
Avrilia Floratou, Sandeep Tata, and Jignesh M. Patel ICDE 2010 Efficient and Accurate Discovery of Patterns in Sequence Datasets.
Mining Time Series.
Quadtrees, Octrees and their Applications in Digital Image Processing
Interactive Pattern Search in Time Series (Using TimeSearcher 2) Paolo Buono, Aleks Aris, Catherine Plaisant, Amir Khella, and Ben Shneiderman Proceedings,
08/25/2004KDD ‘041 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Efficient Query Filtering for Streaming Time Series
Quadtrees, Octrees and their Applications in Digital Image Processing
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Time-Series Data Kaitlin Duck Sherwood CS 533c. Why do you care? Time-series data is all over the place.
Data Mining and Decision Tree CS157B Spring 2006 Masumi Shimoda.
Enterprise systems infrastructure and architecture DT211 4
Time Series Data Analysis - II
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Time Series Motifs Statistical Significance
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Automatically Generating Models for Botnet Detection Presenter: 葉倚任 Authors: Peter Wurzinger, Leyla Bilge, Thorsten Holz, Jan Goebel, Christopher Kruegel,
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
Mining Time Series.
Quadtrees, Octrees and their Applications in Digital Image Processing.
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
VizDB A tool to support Exploration of large databases By using Human Visual System To analyze mid-size to large data.
NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.
18 February 2003Mathias Creutz 1 T Seminar: Discovery of frequent episodes in event sequences Heikki Mannila, Hannu Toivonen, and A. Inkeri Verkamo.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Multi resolution Watermarking For Digital Images Presented by: Mohammed Alnatheer Kareem Ammar Instructor: Dr. Donald Adjeroh CS591K Multimedia Systems.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
VizTree Huyen Dao and Chris Ackermann. Introducing example
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
A Time Series Representation Framework Based on Learned Patterns
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View that Includes Motifs, Discords and Shapelets Chin-Chia Michael Yeh, Yan.
Supervised Time Series Pattern Discovery through Local Importance
Visually Mining and Monitoring Massive Time Series
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
A Time Series Representation Framework Based on Learned Patterns
Time Series Filtering Time Series
Data Warehousing and Data Mining
What Is Good Clustering?
CHAPTER 7: Information Visualization
Jessica Lin Eamonn Keogh Stefano Lonardi
Presentation transcript:

Visualizing and Discovering Nontrivial Patterns In Large Time Series Databases Jessica Lin, Eamonn Keogh, Stefano Loardi Information Visualization 2005, Vol 4, No 2. Presented By Nicholas Chen Samah Ramadan

Time Series What? sequences of values or events changing with time Why? Applications Medicine: ECG, EEG Finance: stock market, credit cards Aerospace: launch telemetry, satellite sensor Entertainment: music, movies

Data mining Time series Why? Trend Analysis Similarity Search What tasks? Sequence matching: whole / subsequence / chunking Anomaly detection : deviation from normal Motif discovery: overrepresentation

VizTree What? How? Visualization tool for time series data Based on subsequence trees How? Time series Symbolic representation Symbolic representation Tree representation VizTree VizTree

Previous Approaches Cluster and calendar based visualization Time series  Sequence of day patterns By bottom-up clustering algorithm Limitations: calendar pattern data, prior knowledge of patterns

Previous Approaches (cont..) Spiral Periodic section of time  one ring Data values  color and line thickness Limitations: Data should be periodic (known period)

Previous Approaches (cont..) TimeSearcher Query-by-content Flexible User must specify query regions (must know what to look for) Scalability issues

VizTree Example An interesting problem Two sets of binary sequences of length 200 were generated One sequence generated by a pseudo-random-number generator by the computer The other was generated by hand by a group of volunteers

VizTree Example Can you tell who generated which? VizTree can! Subsequence tree representations for all sets of 3 digits in each sequence. Be sure to point out how you get from the set of three sequences to the tree Real Random! Fake random!

Discretizing Time Series Problem: Most time series are not discrete Must convert real-valued data to symbols Symbolic Aggregate approXimation, SAX Lower bound symbolic space Feasible approximation for large databases Normalization before discretization (usually)

SAX A subsequence C is extracted by a sliding window of length n Each window is divided into w equal-sized regions Average the data points in each region The average will fall into one of α levels (alphabet size) A symbol corresponds to each level (a, b, c, d … ) Example: Above window corresponds to a c d c b d b a

A Sample Tree α x w leaves (independent of time series length) α possible levels w regions

VizTree in Action Subsequence matching Demo Would like to find patterns that have certain characteristics Do this by specifying a path of nodes through the big tree Demo Find certain patterns in heartbeat pattern Notice interactive detail view

VizTree In Action II Finding Motifs Demo VizTree is very good at showing commonly occurring motifs Simply look at thick branches Demo Look at what most weeks look like in power consumption Can step through, or go directly to a motif Demo idea: Use power dataset and move progressively down the thick branches.

VizTree In Action III Finding simple anomalies Demo One way is to do the opposite of finding motifs Go through all the thin lines Demo Locate weeks where power consumption is unusual Also, locate where heart beat is irregular Show how we can find Christmas holiday usage pattern, also find heart beat irregularity by clicking on thin lines

VizTree In Action IV Diff-Tree Demo For more complex anomalies Compare a times series against a reference time series Three concepts Difference in frequencies (Blue or Green) Confidence (Luminosity) Difference x Confidence = Surprisingness (Red) Demo Two similar data sets, find areas where they differ VizTree can rank surprisingness Use shuttle data. Show the glaring anomaly that we might find using previous method. However, also show the second anomaly that is much harder to locate.

All 3 windows will be “medium – low - high” Numerosity Reduction Fancy term for reducing the noise by removing trivial patterns Consecutive windows are often similar or identical. Results in overcounting, can obscure differences Need to have bases covered. If you don’t know how the data is segmented, shouldn’t set to no overlap All 3 windows will be “medium – low - high”

VizTree Criticisms While exploring VizTree to prepare demos, noticed a couple of issues: Atrocious time series UI Parameter values somewhat of a black art Phase issues Hierarchical tree structure is somewhat misleading

VizTree Advantages: Disadvantages: Scalable to large data sets Good for tasks it is designed for (finding motifs, anomaly detection, high level sequence search) Disadvantages: Not so good for other data mining tasks Not completely intuitive – need to think in terms of the program Settings are arbitrary and dataset dependant