The 3-class ECG problem: (left) the best clustering was our approach, the second best (right) was Euclidian distance.

Slides:



Advertisements
Similar presentations
1 Gesture recognition Using HMMs and size functions.
Advertisements

SAX: a Novel Symbolic Representation of Time Series
Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data Thanawin Rakthanmanon Eamonn Keogh Stefano Lonardi Scott Evans.
Anomaly Detection in Problematic GPS Time Series Data and Modeling Dafna Avraham, Yehuda Bock Institute of Geophysics and Planetary Physics, Scripps Institution.
What’d I do? You will need a piece of paper to write on. After each step I do on the screen, you will be expected to write down what I did. Be sure that.
Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use these slides for teaching, if.
Ab initio gene prediction Genome 559, Winter 2011.
9.1 Power Series.
Modeling hominid relationships
2008 Physiological Measurements Focusing on measurements that assess the function of the major body systems 1.
Matrices A set of elements organized in a table (along rows and columns) Wikipedia image.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
3D Multi-view Reconstruction Young Min Kim Karen Zhu CS 223B March 17, 2008.
While we believe our paper is self contained, this presentation contains: 1.Augmented and larger scale versions of experiments shown in the paper. 2.Additional.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Recognition of Human Gait From Video Rong Zhang, C. Vogler, and D. Metaxas Computational Biomedicine Imaging and Modeling Center Rutgers University.
Jessica Lin, Eamonn Keogh, Stefano Loardi
Time Series Bitmap Experiments This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments which.
Metamorphic Malware Research
1. 2 General problem Retrieval of time-series similar to a given pattern.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Time Series Bitmap Experiments This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments which.
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
Games and Simulations O-O Programming in Java The Walker School
Time Series Anomaly Detection Experiments This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments.
MATH 224 – Discrete Mathematics
Advanced Precalculus Notes 11.3 Geometric Sequences; Geometric Series.
Gene finding with GeneMark.HMM (Lukashin & Borodovsky, 1997 ) CS 466 Saurabh Sinha.
Kumar Srijan ( ) Syed Ahsan( ). Problem Statement To create a Neural Networks based multiclass object classifier which can do rotation,
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
© Cambridge University Press 2013 Thomson_alphaem.
© Cambridge University Press 2013 Thomson_Fig
Plan and Data. Are you aware of concepts such as sample, population, sample distribution, population distribution, sampling variability?
Exploratory Data Analysis Exploratory Data Analysis Dr.Lutz Hamel Dr.Joan Peckham Venkat Surapaneni.
Project Lachesis: Parsing and Modeling Location Histories Daniel Keeney CS 4440.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Fast Shapelets: All Figures in Higher Resolution.
CSC321: Neural Networks Lecture 16: Hidden Markov Models
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.
Overview Data Mining - classification and clustering
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Circles Finding with Clustering Method By: Shimon Machluf.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
© Cambridge University Press 2013 Thomson_Fig
Why searching over feature subsets is hard Suppose you have the following classification problem, with 100 features, where is happens that Features 1 and.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Other Models for Time Series. The Hidden Markov Model (HMM)
Introduction Exploring Categorical Variables Exploring Numerical Variables Exploring Categorical/Numerical Variables Selecting Interesting Subsets of Data.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Phylogenetic Trees. An old and controversial question: What is our relationship to the modern species of apes? Consider the following species: gorilla,
CSE 4705 Artificial Intelligence
Matrix Profile Examples
© Cambridge University Press 2011
How to use… [matrixProfile, profileIndex, motifIndex, discordIndex] = interactiveMatrixProfile(data, subLen); Input data: input time series subLen: subsequence.
Humans and other apes Start with humans and order the rest of the apes according to how closely they are related to us: bonobos, chimpanzees, gibbons,
A Time Series Representation Framework Based on Learned Patterns
Time Series Filtering Time Series
משרד התעשייה, המסחר והתעסוקה פעולות המשרד לעידוד מגזר המיעוטים
Thomson_eeWWtgc © Cambridge University Press 2013.
Thomson_atlascmsEventsAlt
How to Change the Images
A first-round discussion* on
Self-organizing map numeric vectors and sequence motifs
Thomson_CandP © Cambridge University Press 2013.
Time Series Filtering Time Series
Thomson_AFBCartoon © Cambridge University Press 2013.
Presentation transcript:

The 3-class ECG problem: (left) the best clustering was our approach, the second best (right) was Euclidian distance

Created as in [1], except length is 1000 and transition matrices are hmm1 = [.9.1;.9.1]; hmm2 = [.1.9;.1.9]; To convert to discrete data, our approach simply considers datapoints above the mean to be 1’s, otherwise they are 0’s [1] P.Smyth. Clustering sequences with hidden Markov models. In Advances in Neural Information Processing,volume 9, Cambridge, MA, MIT Press.

Bluewhale FinbackWhale Cow Cat GreySeal HarborSeal Horse WhiteRhino HouseMouse Rat Chimpanzee Pygmychimpanzee Human Gorilla Orangutan SumatranOrangutan Gibbon Opossum Wallaroo Platypus DNA: Single Linkage

Our approach Euclidean distance Cluster 1 (datasets 1 ~ 5): BIDMC Congestive Heart Failure Database (chfdb)BIDMC Congestive Heart Failure Database (chfdb): record chf02 Start times at 0, 82, 150, 200, 250, respectively Cluster 2 (datasets 6 ~ 10): BIDMC Congestive Heart Failure Database (chfdb)BIDMC Congestive Heart Failure Database (chfdb): record chf15 Start times at 0, 82, 150, 200, 250, respectively Cluster 3 (datasets 11 ~ 15): Long Term ST Database (ltstdb)Long Term ST Database (ltstdb): record Start times at 0, 50, 100, 150, 200, respectively Cluster 4 (datasets 16 ~ 20): MIT-BIH Noise Stress Test Database (nstdb)MIT-BIH Noise Stress Test Database (nstdb): record 118e6 Start times at 0, 50, 100, 150, 200, respectively

L-1u L-1n L-1g L-1t L-1v The results of using our algorithm on various datasets from the Aerospace Corp collection. The bolder the line, the stronger the anomaly. Note that because of the way we plotted these there is a tendency to locate the beginning of the anomaly as opposed to the most anomalous part. These datasets all have their anomalies in the middle, to make sure that our code does not have a basis to find things anomalous in the center of a time series, we did experiments where we grafted the anomalies to different places. In all these cases we had equal success.

Stephen Bay asked how would our approach handle random data, and random walk. Here we use ten of each, plus some structured data as a sanity check.

Similar to the previous experiment, but with just random and random walk data

Hand resting at side Hand above holster Aiming at target Actor misses holster Briefly swings gun at target, but does not aim Laughing and flailing hand This image is in the paper, this is just a higher resolution version