Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.

Slides:



Advertisements
Similar presentations
Lindsey Bleimes Charlie Garrod Adam Meyerson
Advertisements

Aggregating local image descriptors into compact codes
PARTITIONAL CLUSTERING
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Discovering Lag Interval For Temporal Dependencies Larisa Shwartz Liang Tang, Tao Li, Larisa Shwartz1 Liang Tang, Tao Li
Visual Data Mining: Concepts, Frameworks and Algorithm Development Student: Fasheng Qiu Instructor: Dr. Yingshu Li.
Anany Levitin ACM SIGCSE 1999SIG. Outline Introduction Four General Design Techniques A Test of Generality Further Refinements Conclusion.
Relevance Feedback Retrieval of Time Series Data Eamonn J. Keogh & Michael J. Pazzani Prepared By/ Fahad Al-jutaily Supervisor/ Dr. Mourad Ykhlef IS531.
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
Data Mining Techniques: Clustering
Mining Time Series.
Cascaded Filtering For Biometric Identification Using Random Projection Atif Iqbal.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Multimedia DBs.
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Efficient Query Filtering for Streaming Time Series
Jessica Lin, Eamonn Keogh, Stefano Loardi
Multimedia DBs. Time Series Data
1. 2 General problem Retrieval of time-series similar to a given pattern.
Based on Slides by D. Gunopulos (UCR)
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Clustering Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of.
1Ellen L. Walker Matching Find a smaller image in a larger image Applications Find object / pattern of interest in a larger picture Identify moving objects.
Selection Sort, Insertion Sort, Bubble, & Shellsort
The Chinese University of Hong Kong Department of Computer Science and Engineering Lyu0202 Advanced Audio Information Retrieval System.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Efficient Case Retrieval Sources: –Chapter 7 – –
IIIT Hyderabad Atif Iqbal and Anoop Namboodiri Cascaded.
Exact Indexing of Dynamic Time Warping
Variability The goal for variability is to obtain a measure of how spread out the scores are in a distribution. A measure of variability usually accompanies.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 19: Searching and Sorting Algorithms.
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
Automated Construction of Parameterized Motions Lucas Kovar Michael Gleicher University of Wisconsin-Madison.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
Mining Time Series.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
CGMB214: Introduction to Computer Graphics
Data Extraction using Image Similarity CIS 601 Image Processing Ajay Kumar Yadav.
Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
K-Means Algorithm Each cluster is represented by the mean value of the objects in the cluster Input: set of objects (n), no of clusters (k) Output:
An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)
Image Classification for Automatic Annotation
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Data Transformation: Normalization
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
A Time Series Representation Framework Based on Learned Patterns
Time Series Filtering Time Series
Near-Optimal (Euclidean) Metric Compression
Searching Similar Segments over Textual Event Sequences
Data Transformations targeted at minimizing experimental variance
CSC 380: Design and Analysis of Algorithms
Time Series Filtering Time Series
Presentation transcript:

Identifying Patterns in Time Series Data Daniel Lewis 04/06/06

Time Series Data ● Definition: – “An ordered set of m real-valued variables” ● How can patterns that occur in time series data be located?

● Euclidean Distance: Comparing Time Series

Adaptive Piecewise Constant Approximation ● Segments of time series data are represented by 2 values, the mean value of all points in the segment and the right endpoint of the segment ● Allows large queries to be quickly compared, but APCA representations must be created first.

Adaptive Piecewise Constant Approximation (cont')

APCA (cont') ● Data Compression allows for faster search ● Can be used for indexing large series ● Can handle large queries ● But what if we want to identify patterns in streaming time series data?

Pattern Recognition ● For a given query q and a time series s, if the Euclidean distance between q and s < r, the series match ● r: user defined, application specific threshold

Detecting Patterns in Streaming Data ● Brute Force Method: – P 1,.., P n : Set of patterns of length k – S: Input Stream – For every possible substring of length k in s, calculate the distance between the substring and all n patterns – This method is obviously extremely costly in the case of a large pattern set and a large input stream ● O(n(|S| - k))

Speeding Up Pattern Identification ● Early Abandoning: – If at any point error > r 2, we can stop computation

Wedge Creation ● Combine multiple patterns into a wedge: – Define Upper Limit: ● U i = max( C 1i,.., C ki ) – Define Lower Limit: ● L i = min( C 1i,.., C ki ) – This produces a wedge such that:

Wedge Creation (cont')

Wedge Comparison ● Distance Between a Query and a Wedge: ● If distance > r, then the distance between the query and all component patterns > r, allowing you to eliminate multiple possible matches with a single comparison

Hierarchical Wedges ● The usefulness of any wedge is determined by the similarity of the patterns used in its construction. ● More similar patterns create smaller, more useful wedges ● Patterns can be combined in a tree-like pattern to produce a hierarchy of wedges

Hierarchical Wedges (cont')

Atomic Wedgie ● Preparation: – All patterns are clustered by similarity – The most similar patterns are combined into wedges – The resulting wedges are combined to form larger, less specific wedges

Atomic Wedgie (cont') ● Usage: – When streaming data arrives, each substring of length k is first compared to the largest wedge, if dist > r, comparison stops, else, the distance is compared against the two component wedges, eliminating any branches where the distance exceeds r. – Eventually, all branches are eliminated or a single (atomic) pattern is matched

Atomic Wedgie Optimization ● Optimization: – Summation is order independent – Large sections are less likely to increase error than small sections – Thus, if error is summed starting with the smallest sections first, the requirements for early abandon are more likely to be met earlier

Atomic Wedgie (cont') ● Advantages: – If wedges are well formed, large speed increases can occur ● A large number of similar possible patterns can be analyzed quickly ● Disadvantages: – If wedges are poorly formed, the time required will exceed the Brute Force Method ● Dissimilar patterns are not handled well

Special Considerations ● The choice of r (similarity threshold) is of great importance: – if r is too large, a substring can match too many patterns to be useful – if r is too small, too little matching may occur ● Good Choice: – r = average distance from any pattern to its nearest neighbor

Atomic Wedgie Results

References Chakrabarti, K., Keogh, E., Mehrotra, S., and Pazzani, M., Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases, ACM Transactions on Database Systems, Vol 27, Wei, L., Keogh, E., Van Herle, H., Mafra-Neto, A., Atomic Wedgie: Efficient Query Filtering for Streaming Times Series, Data Mining, Fifth IEEE International Conference on Nov Page(s):

Questions?