1. 2 General problem Retrieval of time-series similar to a given pattern.

Slides:



Advertisements
Similar presentations
When Is Nearest Neighbors Indexable? Uri Shaft (Oracle Corp.) Raghu Ramakrishnan (UW-Madison)
Advertisements

College of Information Technology & Design
Word Spotting DTW.
Fast Algorithms For Hierarchical Range Histogram Constructions
The Divide-and-Conquer Strategy
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
--Presented By Sudheer Chelluboina. Professor: Dr.Maggie Dunham.
Optimal Planar Point Enclosure Indexing Lars Arge, Vasilis Samoladas and Ke Yi Department of Computer Science Duke University Technical University of Crete.
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Hongtao Cheng 1 Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences Author:Flip Korn H. V. Jagadish Christos Faloutsos From ACM.
Mining Time-Series Databases Mohamed G. Elfeky. Introduction A Time-Series Database is a database that contains data for each point in time. Examples:
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
SST:an algorithm for finding near- exact sequence matches in time proportional to the logarithm of the database size Eldar Giladi Eldar Giladi Michael.
Indexing of Time Series by Major Minima and Maxima Eugene Fink Kevin B. Pratt Harith S. Gandhi.
Based on Slides by D. Gunopulos (UCR)
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Objective of Computer Vision
Important Extrema of Time Series Eugene Fink Harith S. Gandhi.
Automatic Camera Calibration for Image Sequences of a Football Match Flávio Szenberg (PUC-Rio) Paulo Cezar P. Carvalho (IMPA) Marcelo Gattass (PUC-Rio)
Energy-efficient Self-adapting Online Linear Forecasting for Wireless Sensor Network Applications Jai-Jin Lim and Kang G. Shin Real-Time Computing Laboratory,
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Indexing Time Series.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Multimedia and Time-series Data
Presented by Tienwei Tsai July, 2005
Analysis of Constrained Time-Series Similarity Measures
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
SISAP’08 – Approximate Similarity Search in Genomic Sequence Databases using Landmark-Guided Embedding Ahmet Sacan and I. Hakki Toroslu
Clustering Uncertain Data Speaker: Ngai Wang Kay.
Subsequence Matching in Time Series Databases Xiaojin Xu
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
The Landmark Model: An Instance Selection Method for Time Series Data C.-S. Perng, S. R. Zhang, and D. S. Parker Instance Selection and Construction for.
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
Fast Subsequence Matching in Time-Series Databases Author: Christos Faloutsos etc. Speaker: Weijun He.
Jaruloj Chongstitvatana Advanced Data Structures 1 Index Structures for Multimedia Data Feature-based Approach.
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
Non-Photorealistic Rendering and Content- Based Image Retrieval Yuan-Hao Lai Pacific Graphics (2003)
(1) Abstract and Contents New model selection criteria called Matchability which is based on maximizing matching opportunity is proposed. Given data set.
INTERACTIVELY BROWSING LARGE IMAGE DATABASES Ronald Richter, Mathias Eitz and Marc Alexa.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.
An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Speed improvements to information retrieval-based dynamic time warping using hierarchical K-MEANS clustering Presenter: Kai-Wun Shih Gautam Mantena 1,2.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
NSF Career Award IIS University of California Riverside Eamonn Keogh Efficient Discovery of Previously Unknown Patterns and Relationships.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Time Series Sequence Matching Jiaqin Wang CMPS 565.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Searching Topics Sequential Search Binary Search.
Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Multi-view Traffic Sign Detection, Recognition and 3D Localisation Radu Timofte, Karel Zimmermann, and Luc Van Gool.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
4.6.1 Upper Echelons of Surfaces
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Fast Subsequence Matching in Time-Series Databases.
Computing and Compressive Sensing in Wireless Sensor Networks
HyperNetworks Engın denız usta
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Time Series Filtering Time Series
K Nearest Neighbor Classification
Fast Sequence Alignments
Time Series Filtering Time Series
Donghui Zhang, Tian Xia Northeastern University
Presentation transcript:

1

2 General problem Retrieval of time-series similar to a given pattern.

3 Example: Stock charts Database of time-series

4 Example: Stock charts Database of time-seriesPattern

5 Example: Stock charts Database of time-seriesPatternRetrieval results

6 Example: Stock charts Database of time-seriesPatternRetrieval results

7 Example: Electrocardiogram Database of time-series

8 Example: Electrocardiogram Database of time-seriesPattern

9 Example: Electrocardiogram Database of time-seriesPatternRetrieval results

10 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions

11 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions Contributions }

12 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data

13 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions

14 Previous work Feature choice Similarity metrics Indexing and retrieval

15 Previous work: Feature choice Discrete Fourier transforms Alphabets Statistical features Subsets of points

16 Previous work: Similarity metrics Euclidean distance Bounding rectangles Envelope count Aggregate similarity

17 Previous work: Indexing and retrieval Advanced techniques: B-trees R-trees KD-trees VP-trees Grids Applied techniques: Linear search with compression

18 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions

19 Important points Choose “important” maxima and minima, and discard the other points.

20 Important points Choose “important” maxima and minima, and discard the other points. Original series Example:

21 Important points Choose “important” maxima and minima, and discard the other points. Original series Example:

22 Important points Choose “important” maxima and minima, and discard the other points. Original series Example: Compressed series

23 Definition of important points Important minimum

24 Definition of important points Important minimum a m is the minimum among a i,…, a j

25 Definition of important points Important minimum a m is the minimum among a i,…, a j a i /a m  R and a j /a m  R

26 Definition of important points Important minimum a m is the minimum among a i,…, a j a i /a m  R and a j /a m  R R is a knob that determines compression rate

27 Definition of important points Important maximum a m is the maximum among a i,…, a j a m /a i  R and a m /a j  R R is a knob that determines compression rate

28 Compression example Original series

29 Compression example Original series Compressed series

30 Compression example Original series Compressed series

31 Compression example Original series Compressed series

32 Compression algorithm Linear time Constant memory Accepts streaming data For a series with n values, compression time is  n milliseconds (300 MHz PC, Visual Basic 6.0).

33 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions

34 Retrieval Retrieval of time-series similar to a given pattern. Intuition: Find a prominent feature in the pattern Find candidate segments with a similar feature Compare similarity of candidates to the pattern

35 Example: Stock charts Database of time-series

36 Example: Stock charts Database of time-series

37 Example: Stock charts Database of time-seriesPattern

38 Example: Stock charts Database of time-seriesPattern

39 Example: Stock charts Database of time-seriesPattern

40 Example: Stock charts Database of time-seriesPatternRetrieval results

41 Algorithm Identify the prominent leg in the pattern Retrieve similar legs from the database Identify corresponding candidate segments For each candidate segment, compute its similarity to the pattern Output the candidates whose similarity is above the threshold

42 Important details Use compressed pattern and compressed sequences in the retrieval process The prominent feature is the leg having the greatest ratio of right end to left end All legs in the database are indexed by their prominence, using a binary search tree

43 Alternative versions Different prominence definitions Different similarity metrics The end-point ratio prominence usually gives the best empirical results.

44 Extended legs Similar sequence

45 Indexing on extended legs Advantage: More accurate retrieval Disadvantage: Larger index, more memory If a compressed sequence has n legs: Worst case: n 2 /2 extended legs Average case:  (n  lg n) extended legs

46 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions

47 Data sets Stock charts Air and sea temperatures Wind speeds Electroencephalograms Electrocardiograms

48 Data sets Stock charts Air and sea temperatures Wind speeds Electroencephalograms Electrocardiograms 60,000 points 445,000 points 79,000 points 17,000 points 2,000 points

49 Patterns Compressed patterns with 4 to 27 legs Examples:

50 Retrieval time Retrieval time: 0.07  m  k milliseconds m legs in a pattern k candidates

51 Retrieval accuracy: Stock charts 20 % candidates C = 3 10 % C = 2 5 % C = % C = 1.1

52 Retrieval accuracy: Wind speeds 20 % candidates C = % C = % C = 1.1

53 Retrieval candidate quality Stock charts (5,400 legs)447 Air and sea temperatures (5,500 legs)456 Wind speeds (10,500 legs)379 Candidates 5%10%20% Found matches among ten best:

54 Outline Previous work Important points Indexing and retrieval Empirical results Conclusions

55 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data

56 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data

57 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data ~

58 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data ~

59 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data ~

60 Criteria for retrieval methods Gunopulos [2000]: Work for erratic time-series Accept any pattern Find inexact matches Work when some points are missing Work on streaming data ~ ~

61 Main results Compression Fast compression procedure Preserves similarity Retrieval Works with compressed data Controlled trade-off between speed and accuracy