# Relevance Feedback Retrieval of Time Series Data Eamonn J. Keogh & Michael J. Pazzani Prepared By/ Fahad Al-jutaily Supervisor/ Dr. Mourad Ykhlef IS531.

## Presentation on theme: "Relevance Feedback Retrieval of Time Series Data Eamonn J. Keogh & Michael J. Pazzani Prepared By/ Fahad Al-jutaily Supervisor/ Dr. Mourad Ykhlef IS531."— Presentation transcript:

Relevance Feedback Retrieval of Time Series Data Eamonn J. Keogh & Michael J. Pazzani Prepared By/ Fahad Al-jutaily Supervisor/ Dr. Mourad Ykhlef IS531 Sunday, 16 April 2006

INTRODUCTION(1/2) Time series account for a significant portion of the data stored in business, medical, engineering, and social science databases. There are innumerable statistical tests one can perform on time series, such as determining autocorrelation coefficients, measuring linear trends, etc.

INTRODUCTION(2/2) Much of the utility of collecting this data, comes from the ability of humans to visualize the shape of the (suitably plotted) data. For example: –Cardiologists view ECGs to diagnose arrhythmias. –Chartists examine stock market data, searching for certain shapes that are thought to be indicative of a stock’s future performance. –Astronomers examine star light curves (the changes in frequency over time) to classify stars.

Issues in time series retrieval(1/2) Two important issues in time series retrieval have not yet been explored. 1.Relevance Feedback: 1.Relevance Feedback: In time series domains, as in text domains, users may not initially know how to form a query to retrieve precisely what they are looking for. The paper shows, the relevance feedback may be applied to retrieval of time series data to learn which sections of the time series are most significant.

Issues in time series retrieval(2/2) Subjectivity of similarity:Subjectivity of similarity: Most work on retrieval of time series has used Euclidean distance as the distance measure. However, there is little evidence that Euclidean distance maps onto human intuition of similarity.

Example: Clusters of four small data sets where similarity is determined by Euclidean distance.

Time Series Similarity Solution correct userproblem interactsThe “correct” distance measure depends upon the user and problem, and it should be continuously learned as the user interacts with the system.

REPRESENTING TIME SERIES(1/2) Using the original ‘raw’ data for query by content in time series databases is computationally expensive (especially for an interactive system) and fails to abstract key features of the data. What is needed?!!! is a higher-level representation.

REPRESENTING TIME SERIES(2/2) Piece-wise linear segmentationPiece-wise linear segmentation, which attempts to model the data as sequences of straight lines. It provides a useful form of data compression and noise filtering.

Notation(1/2) A time series, sampled at k points, is represented as an uppercase letter such as A. The segmented version of A, containing K linear segments, is denoted as a bold uppercase letter such as A. –where A is a 5-tuple of vectors of length K. –A => {AXL, AXR, AYL, AYR, AW}

Notation(2/2) The ith segment of sequence A is represented by the line between (AXLi AYLi) and (AXRi AYRi), and AWi, which represents the segment's weight.

Comparing Time Series(1/3) To retrieve time series, we need a distance (or conversely similarity) measure to compare the query with segments stored in the database. Euclidean distanceAdopting a distance measure, DS, that approximates the Euclidean distance measure on the ‘raw’ data.

Comparing Time Series(2/3) It is convenient for notational purposes to assume that the endpoints of the two sequences being compared are aligned in the X-axis.

Comparing Time Series(3/3) The overall distance between two sequences is simply the square root of the summation over all such slices. The weighted distance between two sequences of length i is:

Merging Time Series(1/2) The basic idea is that the merge operator takes two sequences as input and returns a single sequence whose shape is a compromise between the two original sequences.

Merging Time Series (2/2)

QUERY REFINEMENT VIA RELEVANCE FEEDBACK Relevance feedback is the reformulation of a search query in response to feedback provided by the user for the results of previous versions of the query. Query Output Feedback

Formulating a New Query(1/2) 1.An initial query sequence Q is used to rank all sequences in the database. This query may be hand drawn by the user or it may be a sequence or subsequence from the database. 2.The best n sequences are retrieved and shown to the user. 3.The user assigns ratings to the retrieved sequences on a scale from –3 to +3. After 4.the user has evaluated the top n sequences, the query update rule that is used to produce a new query Qnew, and 5.The search process begins again.

Formulating a New Query(2/2)

SUBJECTIVITY OF SIMILARITY(1/2) could failIt is still possible that a query using this representation could fail to retrieve an item that the user would have found relevant. Consider a problem of offset translation, Two similar (or even identical) shapes can have an arbitrarily large dissimilarity because they are separated in the Yaxis.

SUBJECTIVITY OF SIMILARITY(2/2)

Download ppt "Relevance Feedback Retrieval of Time Series Data Eamonn J. Keogh & Michael J. Pazzani Prepared By/ Fahad Al-jutaily Supervisor/ Dr. Mourad Ykhlef IS531."

Similar presentations