NSF Career Award 0237918. IIS-0237918-001 University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.

Slides:



Advertisements
Similar presentations
SAX: a Novel Symbolic Representation of Time Series
Advertisements

Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
PARTITIONAL CLUSTERING
Jessica Lin, Eamonn Keogh, Stefano Lonardi, Bill Chiu
Mining Time Series.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
08/25/2004KDD ‘041 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides.
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
CBF Dataset Two-Pat Dataset Euclidean DTW Increasingly Large Training.
Jessica Lin, Eamonn Keogh, Stefano Loardi
1. 2 General problem Retrieval of time-series similar to a given pattern.
Finding Time Series Motifs on Disk-Resident Data
Visually Mining and Monitoring Massive Time Series Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn.
The 3-class ECG problem: (left) the best clustering was our approach, the second best (right) was Euclidian distance.
A Multiresolution Symbolic Representation of Time Series
Ashish Uthama EOS 513 Term Paper Presentation Ashish Uthama Biomedical Signal and Image Computing Lab Department of Electrical.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
Data Mining and Decision Tree CS157B Spring 2006 Masumi Shimoda.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
Time Series Data Analysis - II
Pattern Matching with Acceleration Data Pramod Vemulapalli.
Time Series Motifs Statistical Significance
Exact Indexing of Dynamic Time Warping
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Multimedia and Time-series Data
The MPEG-7 Color Descriptors
Analysis of Constrained Time-Series Similarity Measures
Time Series Data Analysis - I Yaji Sripada. Dept. of Computing Science, University of Aberdeen2 In this lecture you learn What are Time Series? How to.
K. Selçuk Candan, Maria Luisa Sapino Xiaolan Wang, Rosaria Rossini
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
10/23/2015© Mohamed Medhat Gaber1 Adaptive Mobile ECG Analysis Dr Mohamed Medhat Gaber School of Computing University of Portsmouth
Mining Time Series.
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
BARCODE IDENTIFICATION BY USING WAVELET BASED ENERGY Soundararajan Ezekiel, Gary Greenwood, David Pazzaglia Computer Science Department Indiana University.
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Overview of Bioinformatics 1 Module Denis Manley..
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Exact indexing of Dynamic Time Warping
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Automatic Discovery and Processing of EEG Cohorts from Clinical Records Mission: Enable comparative research by automatically uncovering clinical knowledge.
Advanced Database Concepts
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
VizTree Huyen Dao and Chris Ackermann. Introducing example
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
A Time Series Representation Framework Based on Learned Patterns
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Oracle Advanced Analytics
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning overview Chapter 18, 21
Artificial Intelligence for Speech Recognition
Introduction to Data Mining
Supervised Time Series Pattern Discovery through Local Importance
Visually Mining and Monitoring Massive Time Series
A Time Series Representation Framework Based on Learned Patterns
Jiawei Han Department of Computer Science
Robust Similarity Measures for Mobile Object Trajectories
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships in Massive Time Series Databases Research Objectives:Significant Results: Approach: Broader Impact: To date, the vast majority of research on time series data mining has focused on similarity search, and to a lesser extent on clustering. We believe that these problems should now be regarded as essentially solved. In particular, there are now fast exact techniques for searching and clustering patterns under both the Euclidean distance and Dynamic Time Warping, the two most useful distance measures. However, from a knowledge discovery viewpoint, there are several important unsolved problems in time series data mining that are more interesting, important, and challenging. In this project, we are to addressing these problems. Our long-term goal is the creation of efficient algorithms to allow the extraction of knowledge in the form of patterns, anomalies, regularities and rules, from massive time series dataset Our work has had a large impact on the state of the art of time series indexing and time series data mining. Some concrete examples include: Our time series motif finding algorithm is being used to find video textures by Celly and Zordan,, and to find repeated motions in motion capture data by Tanaka and Uehara. Our time series anomaly detection algorithm is being used by the Aerospace Corp to monitor spacecraft telemetry, and by a joint Berkeley/Stanford group to monitor computer systems. A NASA white paper noted that "it has great promise for the future". Our time series indexing technique (LB_Keogh indexing) has been expanded by us, and by many others, including groups that use it for Euclidean indexing, for subsequence matching, for indexing handwriting, for indexing multidimensional sequences, for indexing music and for indexing motion capture data. Our papers in the area have been referenced more than a 1,000 times, see We have being maintaining the UCR time series data mining archive, we have given test datasets and code to more than 400 research groups and individuals. Many high level representations of time series have been proposed for data mining. See the figure to the right for a hierarchy of all the various time series representations in the literature. One representation that the data mining community has not considered in detail is the discretization of the original data into symbolic strings. At first glance this seems a surprising oversight. There is an enormous wealth of existing algorithms and data structures that allow the efficient manipulations of strings. Such algorithms have received decades of attention in the text retrieval community, and more recent attention from the bioinformatics community. Some simple examples of “tools” that are not defined for real-valued sequences but are defined for symbolic approaches include hashing, Markov models, suffix trees, decision trees etc. The core of our contributions are based on a new symbolic representation of time series, called SAX (Symbolic Aggregate ApproXimation ). Our representation is unique in that it allows dimensionality/numerosity reduction, and it also allows distance measures to be defined on the symbolic representation that lower bound corresponding popular distance measures defined on the original data. As we have demonstrated, the latter feature is particularly exciting because it allows one to run certain data mining algorithms on the efficiently manipulated symbolic representation. Time Series Representations Data Adaptive Non Data Adaptive SpectralWavelets Piecewise Aggregate Approximation Piecewise Polynomial Symbolic Singular Value Approximation Random Mappings Piecewise Linear Approximation Adaptive Piecewise Constant Approximation Discrete Fourier Transform Discrete Cosine Transform Haar Daubechies dbn n > 1 CoifletsSymlets Sorted Coefficients OrthonormalBi-Orthonormal InterpolationRegression Trees Natural Language Strings Symbolic Aggregate Approximation Non Lower Bounding Chebyshev Polynomials Data Dictated Model Based Hidden Markov Models Statistical Models Value Based Slope Based Grid Clipped Data Zoom in Overview Details 1 Details 2 Winding Dataset (The angular speed of reel 2) A B C Normal sequence Actor misses holster Briefly swings gun at target, but does not aim Laughing and flailing hand Normal Time Series Surprising Time Series It is expected that this work will have a broad impact, because: Time series are ubiquitous, occurring in virtually every human endeavor, including medicine, finance, entertainment etc. The proposed work is very general, and has already made contributions to virtually every time series problem, including Visualization (VLDB 2004 and SIGKDD 2004a), Motif Discovery (SIGKDD 2003), Anomaly Detection (SIGKDD 2004a) and Indexing (DMKD 2003). SAX has made contributions to… …Motif Discovery (SIGKDD 2003) …Visualization ( VLDB 2004 and SIGKDD 2004a) …Anomaly Detection (SIGKDD 2004b) …Indexing, Classification and Clustering (DMKD 2003)