Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of Macau Discovering Longest-lasting Correlation in Sequence Databases Yuhong Li Department of Computer and Information Science.

Similar presentations


Presentation on theme: "University of Macau Discovering Longest-lasting Correlation in Sequence Databases Yuhong Li Department of Computer and Information Science."— Presentation transcript:

1 University of Macau Discovering Longest-lasting Correlation in Sequence Databases Yuhong Li yb27407@umac.mo Department of Computer and Information Science University of Macau, Macau

2 2 University of Macau ■ Typical Analysis Query LCS: Motivation Find the most correlated stock to GOOG for every 3 months in 2008 - 2011? It is hard to define a proper length. 0 +1 Perfect negative correlation Perfect positive correlation No correlation

3 3 University of Macau ■ Longest-lasting Correlated Subsequences LCS: Motivation

4 4 University of Macau Baseline Solution … … … Sequence database … … os=0 os=1

5 5 University of Macau Challenges … sequence 1, …, sequence n

6 6 University of Macau Main Idea ■ Time Series are Long  Dimensionality Reduction  Thousands of dimensions.  Dimensionality Reduction obeys upper bounding lemma. ■ Huge Search Space  Batch Pruning  Group similar subsequences ■ Unpruned Subsequences  Further Refinement Intra-object grouping Inter-object grouping Correlation computing costs O(m) Raw subsequences, dim = m PAA representation, dim = 3 Correlation computing costs O(3)

7 7 University of Macau LCS: Diamond Cover Index ■ Intra-object grouping  Grouping similar subsequences in a sequence object.... PAA feature space minDist

8 8 University of Macau LCS: Diamond Cover Index ■ Inter-object Grouping  Exploiting Similarity between Sequence Objects.  Grouping the diamond MBRs of different objects into higher level MBRs  Compact MBRs. DCI is the collection of the compact MBRs. Memory efficient. Offer good pruning ability.

9 9 University of Macau LCS: Subsequence Refinement minDist

10 10 University of Macau LCS: Experimental Evaluation ■ Programming  Language: C++  Machine: Ubuntu 12.04, 4GB RAM ■ Datasets  RAND: Random generate sequences.  STOCK: 2187 quoted companies in NYSE from 2008 to 2012.  TAO: Sea surface temperatures, 28399 sequences of length 1008.

11 11 University of Macau SOTA: state-of-the-art method in distance calculation. SKIP: incremental correlation computation. SOTA+DCI, SKIP+DCI: DCI version of SOTA and SKIP respectively. Stock DatasetTAO Dataset LCS: Experimental Evaluation At least one order of magnitude faster than SOTA adaption.

12 12 University of Macau Thanks QA inputhiddenoutput


Download ppt "University of Macau Discovering Longest-lasting Correlation in Sequence Databases Yuhong Li Department of Computer and Information Science."

Similar presentations


Ads by Google