Presentation is loading. Please wait.

Presentation is loading. Please wait.

Time Series Bitmap Experiments This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments which.

Similar presentations


Presentation on theme: "Time Series Bitmap Experiments This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments which."— Presentation transcript:

1 Time Series Bitmap Experiments This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments which were omitted because of space constraints Note that in every case, all the data is freely available

2 1 2 17 18 27 28 29 30 3 4 5 6 21 22 7 8 9 10 11 12 13 14 15 16 19 20 26 25 23 24 The clustering achieved on 15 pairs of samples from 15 diverse datasets. The red lines in the dendrogram draw attention to objectively incorrect subtrees Parameters Level 3 N = 100 n = 10 Dataset 1: Heterogeneous Data, Part 1 1 MotorCurrent: 2 MotorCurrent: 3 Video Surveillance: Ann, gun 4 Video Surveillance: Ann, no gun 5 Video Surveillance: Eamonn, gun 6 Video Surveillance: Eamonn, no gun 7 Power Demand: Jan-March (Italian) 8 Power Demand: April-June (Italian) 9 Great Lakes (Erie) 10 Great Lakes (Ontario) 11 Buoy Sensor: North Salinity 12 Buoy Sensor East Salinity 13 Koski ECG: slow 1 14 Koski ECG: slow 2 15 Koski ECG: fast 1 16 Koski ECG: fast 2 17 Exchange Rate: Swiss Franc 18 Exchange Rate: German Mark 19 Furnace: heating input 20 Furnace: cooling input 21 Reel 2: angular speed 22 Reel 2: tension 23 Balloon1 24 Balloon2 (lagged) 25 Evaporator: feed flow 26 Evaporator: vapor flow 27 Shuttle Inertia Sensor X 28 Shuttle Inertia Sensor X 29 Shuttle Inertia Sensor Z 30 Shuttle Inertia Sensor Z Data Key Data is in ASCII file “time_series_bitmap_1”

3 1 2 29 30 21 22 5 6 3 4 9 10 11 12 17 18 28 27 7 8 13 14 15 16 19 26 20 25 23 24 Dataset 1: Heterogeneous Data, Part 2 If we do the clustering with only level 2 information, the clustering is very slightly worse, but still quite robust considering that we are only using 1.6% of the information available in the time series Parameters Level 2 N = 100 n = 10 1 MotorCurrent: 2 MotorCurrent: 3 Video Surveillance: Ann, gun 4 Video Surveillance: Ann, no gun 5 Video Surveillance: Eamonn, gun 6 Video Surveillance: Eamonn, no gun 7 Power Demand: Jan-March (Italian) 8 Power Demand: April-June (Italian) 9 Great Lakes (Erie) 10 Great Lakes (Ontario) 11 Buoy Sensor: North Salinity 12 Buoy Sensor East Salinity 13 Koski ECG: slow 1 14 Koski ECG: slow 2 15 Koski ECG: fast 1 16 Koski ECG: fast 2 17 Exchange Rate: Swiss Franc 18 Exchange Rate: German Mark 19 Furnace: heating input 20 Furnace: cooling input 21 Reel 2: angular speed 22 Reel 2: tension 23 Balloon1 24 Balloon2 (lagged) 25 Evaporator: feed flow 26 Evaporator: vapor flow 27 Shuttle Inertia Sensor X 28 Shuttle Inertia Sensor X 29 Shuttle Inertia Sensor Z 30 Shuttle Inertia Sensor Z Data Key

4 1 2 17 18 27 28 29 30 9 10 11 12 21 22 3 4 5 6 13 14 15 16 7 8 19 20 26 25 23 24 Parameters Level 3 N = 64 n = 8 Dataset 1: Heterogeneous Data, Part 3 1 2 17 18 30 29 27 28 3 4 9 10 11 12 21 22 5 6 7 8 13 14 15 16 19 20 26 25 23 24 Parameters Level 3 N = 77 n = 11 1 2 17 18 27 28 29 30 3 4 5 6 21 22 7 8 9 10 11 12 13 14 15 16 19 20 26 25 23 24 Parameters Level 3 N = 54 n = 9 Changing the parameters by up to 50% either way has little effect on the quality of the clustering. Here are some random examples 1 MotorCurrent: 2 MotorCurrent: 3 Video Surveillance: Ann, gun 4 Video Surveillance: Ann, no gun 5 Video Surveillance: Eamonn, gun 6 Video Surveillance: Eamonn, no gun 7 Power Demand: Jan-March (Italian) 8 Power Demand: April-June (Italian) 9 Great Lakes (Erie) 10 Great Lakes (Ontario) 11 Buoy Sensor: North Salinity 12 Buoy Sensor East Salinity 13 Koski ECG: slow 1 14 Koski ECG: slow 2 15 Koski ECG: fast 1 16 Koski ECG: fast 2 17 Exchange Rate: Swiss Franc 18 Exchange Rate: German Mark 19 Furnace: heating input 20 Furnace: cooling input 21 Reel 2: angular speed 22 Reel 2: tension 23 Balloon1 24 Balloon2 (lagged) 25 Evaporator: feed flow 26 Evaporator: vapor flow 27 Shuttle Inertia Sensor X 28 Shuttle Inertia Sensor X 29 Shuttle Inertia Sensor Z 30 Shuttle Inertia Sensor Z Data Key

5 Dataset 1: Heterogeneous Data, Part 4 We compared our approach to a Markov model based approach and a ARIMA based approach. For both competitors we spent one hour of human time trying to find the best parameters 1 MotorCurrent: 2 MotorCurrent: 3 Video Surveillance: Ann, gun 4 Video Surveillance: Ann, no gun 5 Video Surveillance: Eamonn, gun 6 Video Surveillance: Eamonn, no gun 7 Power Demand: Jan-March (Italian) 8 Power Demand: April-June (Italian) 9 Great Lakes (Erie) 10 Great Lakes (Ontario) 11 Buoy Sensor: North Salinity 12 Buoy Sensor East Salinity 13 Koski ECG: slow 1 14 Koski ECG: slow 2 15 Koski ECG: fast 1 16 Koski ECG: fast 2 17 Exchange Rate: Swiss Franc 18 Exchange Rate: German Mark 19 Furnace: heating input 20 Furnace: cooling input 21 Reel 2: angular speed 22 Reel 2: tension 23 Balloon1 24 Balloon2 (lagged) 25 Evaporator: feed flow 26 Evaporator: vapor flow 27 Shuttle Inertia Sensor X 28 Shuttle Inertia Sensor X 29 Shuttle Inertia Sensor Z 30 Shuttle Inertia Sensor Z Data Key 1 28 2 17 18 27 5 6 21 22 29 30 3 4 7 8 9 10 12 11 13 16 15 14 19 23 24 26 20 25 1 17 18 28 2 5 6 22 3 4 10 21 13 14 15 16 7 11 9 19 8 12 20 25 26 23 24 27 29 30 1 2 17 18 27 28 29 30 3 4 5 6 21 22 7 8 9 10 11 12 13 14 15 16 19 20 26 25 23 24 Parameters Level 3 N = 100 n = 10 Segmental Markov model [1] Mixtures of ARMA models [2]

6 1 2 3 4 5 11 13 12 14 15 6 9 10 7 8 16 18 17 19 20 Dataset 2: Homogenous Data, Part 1 Cluster 1 (datasets 1 ~ 5): BIDMC Congestive Heart Failure Database (chfdb): record chf02 Start times at 0, 82, 150, 200, 250, respectively Cluster 2 (datasets 6 ~ 10): BIDMC Congestive Heart Failure Database (chfdb): record chf15 Start times at 0, 82, 150, 200, 250, respectively Cluster 3 (datasets 11 ~ 15): Long Term ST Database (ltstdb): record 20021 Start times at 0, 50, 100, 150, 200, respectively Cluster 4 (datasets 16 ~ 20): MIT-BIH Noise Stress Test Database (nstdb): record 118e6 Start times at 0, 50, 100, 150, 200, respectively Data Key Parameters Level 3 N = 50 n = 10 Here we cluster 5 randomly chosen subsections from 4 different ECG datasets

7 Dataset 2: Homogenous Data, Part 2 1 5 2 3 4 11 13 12 14 15 16 20 18 19 17 6 9 10 7 8 The bitmap approach is defined (and very robust) when the time series are of different lengths

8 Dataset 3: MIT ECG Arrhythmia Data Part 1 In Ge and Smyth 2000, this dataset was explored with segmental hidden Markov models. After they careful adjusted the parameters they reported 98% classification accuracy. Using time series bitmap with virtually any parameter settings, we get perfect classifications and clustering. We can get perfect classifications using one nearest neighbor classification, or we can project the data into 2 dimensional space (see next slide) and get perfect accuracy using a simple linear classifier, a decision tree or SVD. (Dataset donated by Padhraic Smyth and Seyoung Kim) 1 25 9 24 14 28 8 12 15 13 27 2 3 26 7 5 19 17 18 22 23 20 6 10 11 16 21 4 29 55 32 38 50 40 36 44 52 56 30 34 39 41 31 43 33 37 53 35 45 42 51 46 48 49 47 54 Parameters Level 1 N = 60 n = 12 1 2 28 7 19 15 3 10 12 25 4 16 9 20 26 14 27 17 24 5 8 22 29 36 6 13 21 11 18 23 34 41 30 39 31 37 35 44 53 46 52 48 50 49 56 32 40 42 45 38 43 55 33 54 47 51 Segmental Markov model [1]

9 0.35 0.4 0.45 0.5 0.55 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 Parameters Level 1 N = 60 n = 12 Dataset 3: MIT ECG Arrhythmia Data Part 2

10 Dataset 4: MotorCurrent Part 1 1 14 3 16 18 5 7 11 8 10 12 6 15 13 20 2 4 17 19 9 21 34 32 23 36 25 38 27 40 30 22 24 37 26 39 31 33 35 28 29 1 16 6 11 21 36 31 26 4 19 14 9 24 39 29 34 2 17 12 7 22 37 27 32 5 20 10 15 25 40 35 30 3 18 8 13 23 38 33 28 The Bitmap approach is completely phase independent, which may be useful for certain datasets. Consider the Motorcurrent dataset (Donated by Richard J. Povinelli). Here the problem is to distinguish between normal motor operations and “broken connectors”. If we attempt to cluster this dataset with Euclidean distance or DTW, the fact that the sample are out of phase confuses the algorithm (far left), however the bitmap approach can easily produce objectively correct clusterings. In this problem the time series bitmaps are very very similar between classes, and humans will find it hard to distinguish them. Nevertheless, there is enough information to achieve correct clusterings Euclidean Distance

11 This drawing shows the correlation of muscle depolarization and ECG tracings at corresponding times. Phase 0 denotes ventricular depolarization. This is seen on the ECG as the beginning of the QRS complex. Phase 1 denotes the initial rapid repolarization due to closing of fast sodium channels. This is seen as the large drop in mV on the ECG. Phase 2 represents the plateau stage during which inflow and outflow currents are balanced. The ECG returns to baseline. Phase 3 is repolarization. Potassium channels open and calcium closes. The ECG shows the repolarizing T wave. Phase 4 is the recover phase. Both the muscle tracing and ECG return to baseline levels 0100200300400500 ventricular depolarization initial rapid repolarization “plateau” stage repolarization recovery phase 0100200300400500 More information about the Kalpakis_ECG demonstration 1 9 43 22 23 37 19 41 29 28 44 45 47 46 48 2 4 5 35 13 39 11 14 3 8 12 25 17 40 42 6 18 7 10 30 21 32 38 36 34 16 33 31 15 24 26 27 20 A clustering of a subset of the Kalpakis_ECG dataset. Note that while ECGs have incredible variability, the 5 non-ECGs clearly stand out in the bitmap representation. A clustering of a subset of the Kalpakis_ECG dataset. Note that while ECGs have incredible variability, the 5 non-ECGs clearly stand out in the bitmap representation.

12 http://www.physionet.org/cgi-bin/chart?database=mitdb&record=210&annotator=atr&tstart=21&width=small Anomaly detection MITdb/210 Dataset 0:190:210:230:250:27 Fusion of ventricular and normal beat Anomaly Score


Download ppt "Time Series Bitmap Experiments This file contains full color, large scale versions of the experiments shown in the paper, and additional experiments which."

Similar presentations


Ads by Google