A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Aggregating local image descriptors into compact codes
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Word Spotting DTW.
Fast Algorithms For Hierarchical Range Histogram Constructions
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Special Topic on Image Retrieval Local Feature Matching Verification.
Discriminative and generative methods for bags of features
V. Megalooikonomou, Temple University Clustering and Partitioning for Spatial and Temporal Data Mining Vasilis Megalooikonomou Data Engineering Laboratory.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
ICIP 2000, Vancouver, Canada IVML, ECE, NTUA Face Detection: Is it only for Face Recognition?  A few years earlier  Face Detection Face Recognition 
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Docking Algorithm Scheme Part 1: Molecular shape representation Part 2: Matching of critical features Part 3: Filtering and scoring of candidate transformations.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Lecture 28: Bag-of-words models
ADVISE: Advanced Digital Video Information Segmentation Engine
Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.
An efficient and effective region-based image retrieval framework Reporter: Francis 2005/5/12.
Spatial and Temporal Data Mining
דורון דמרי מקסים גורביץ מנחה : דמיטרי פורמן Cached Vector Quantization.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Unsupervised Learning
A Multiresolution Symbolic Representation of Time Series
1 Embedded colour image coding for content-based retrieval Source: Journal of Visual Communication and Image Representation, Vol. 15, Issue 4, December.
A fuzzy video content representation for video summarization and content-based retrieval Anastasios D. Doulamis, Nikolaos D. Doulamis, Stefanos D. Kollias.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
“A Comparison of Document Clustering Techniques” Michael Steinbach, George Karypis and Vipin Kumar (Technical Report, CSE, UMN, 2000) Mahashweta Das
Fast vector quantization image coding by mean value predictive algorithm Authors: Yung-Gi Wu, Kuo-Lun Fan Source: Journal of Electronic Imaging 13(2),
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Turn angle function and elastic time series matching Turn angle function and elastic time series matching Presented by: Wang, Xinzhen Advisor: Dr. Longin.
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
A survey of different shape analysis techniques 1 A Survey of Different Shape Analysis Techniques -- Huang Nan.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
By Sarita Jondhale 1 The process of removing the formants is called inverse filtering The remaining signal after the subtraction of the filtered modeled.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors :
A Fast LBG Codebook Training Algorithm for Vector Quantization Presented by 蔡進義.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
CSCI 631 – Foundations of Computer Vision March 15, 2016 Ashwini Imran Image Stitching.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
EE368 Final Project Spring 2003
S.R.Subramanya1 Outline of Vector Quantization of Images.
An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information.
A New Image Compression Scheme Based on Locally Adaptive Coding
Scale-Space Representation of 3D Models and Topological Matching
Source :Journal of visual Communication and Image Representation
Advisor: Chin-Chen Chang1, 2 Student: Yi-Pei Hsieh2
Paper Reading Dalong Du April.08, 2011.
Dynamic embedding strategy of VQ-based information hiding approach
Chair Professor Chin-Chen Chang Feng Chia University
Hiding Information in VQ Index Tables with Reversibility
Predictive Grayscale Image Coding Scheme Using VQ and BTC
A New Image Compression Scheme Based on Locally Adaptive Coding
Presentation transcript:

A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li

Abstract Introducing a new representation of time series, the Multiresolution Vector Quantized (MVQ) approximation –MVQ keeps both local and global information about the original time series in a hierarchical mechanism –Processing the original time series at multiple resolutions

Abstract (cont.) Representation of time series is symbolic employing key subsequences and potentially allows the application of text-based retrieval techniques into the similarity analysis of time series.

Introduction Two series should be considered similar if they have enough non- overlapping time-ordered pairs of subsequences that are similar.

Introduction (cont.) Instead of calculating the Euclidean distance, first extract key subsequences utilizing the Vector Quantization (VQ) technique and encode each time series based on the frequency of appearance of each key subsequence. Then calculate similarities in terms of key subsequence matches.

Introduction (cont.) Hierarchical mechanism: the original time series are processed at several different resolutions, and similarity analysis is performed using a weighted distance function combining all the resolution levels

Background Many of the previous work focus on the avoidance of false dismissals. However, in some cases the existence of too many false alarms may decrease the efficiency of retrieval. The Euclidean distance is not always the optimal distance measure.

Background (cont.) For large datasets, the computational complexity associated with the Euclidean distance calculation is a problem ( O(N*n) ). Euclidean distance (point-based model) is vulnerable to shape transformations such as shifting and scaling.

Background (cont.) A new framework that utilizes high- level features is proposed –Codebook generation –Time series encoding –Time series representation and retrieval In order to keep both local and global information, use multiple codebooks with different resolutions

Background (cont.) For each resolution, VQ is applied to discover the vocabulary of subsequences (codewords) –In VQ, a codeword is used to represent a number of similar vectors. The Generalized Lloyd Algorithm is used to produce a “locally optimal” codebook from a training set.

Background (cont.) To quantitatively measure the similarity between different time series encoded with a VQ codebook, the Histogram Model is employed. – where – and refer to the appearance frequency of codeword in time series t and q, respectively.

Proposed Method MVQ approximation –Partitions each time series into equi- length segments and represents each segment with the most similar key subsequence from a codebook. –Represent each time series as the appearance frequency of each codeword in it. –Apply at several resolutions

Proposed Method (cont.) Codebook Generation –The dataset is preprocessed Each time series is partitioned into a number of segments each of length l, and each segment forms a sample of the training set that is used to generate the codebook. –Each codeword corresponds to a key subsequence

Example1 –Codewords of a 2-level codebook

Proposed Method (cont.) Time Series Encoding –Every time series is decomposed into segments of length l. –For each segment, the closest codeword in the codebook is found and the corresponding index is used to represent this segment. –The appearance frequency of each codeword is counted.

Proposed Method (cont.) Time Series Encoding (cont.) –The representation of a time series is a vector showing the appearance frequency of every codeword.

Proposed Method (cont.) Time Series Summarization –The codewords stand for the most representative subsequences for the entire dataset. –We can just check the appearance frequencies of the codewords and get an overview of the time series.

Example2

Proposed Method (cont.) Distance Measure and Multiresolution Representation –Using only one codebook (single resolution) introduces problems The order among the indices of codewords is not kept; some important global information is lost Increasing false alarms

Proposed Method (cont.) Distance Measure and Multiresolution Representation (cont.) –A hierarchical mechanism is introduced. Several different resolutions are involved. higher resolution → local information lower resolution → global information

Example3 –Reconstruction of time series using different resolutions

Proposed Method (cont.) Distance Measure and Multiresolution Representation (cont.) –By being assigned different weights to different resolutions, a weighted similarity measure (Hierarchical Histogram Model) is defined:

Experiments Best Matches Retrieval –SYNDATA 6 classes; 100 time series for each class; 60 points for each time series

Experiments (cont.) Best Matches Retrieval (cont.) –CAMMOUSE 1600 points for each time series

Experiments (cont.) Best Matches Retrieval (cont.) –Comparisons with other methods

Experiments (cont.) Clustering –SYNDATA

Experiments (cont.) Clustering (cont.) –CAMMOUSE

Thank you!