Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rule discovery from time series Authors: Guatam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, Pedhraic Smyth Presented By: Tom Gradel.

Similar presentations


Presentation on theme: "Rule discovery from time series Authors: Guatam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, Pedhraic Smyth Presented By: Tom Gradel."— Presentation transcript:

1 Rule discovery from time series Authors: Guatam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, Pedhraic Smyth Presented By: Tom Gradel

2 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Problem: Examine some data MSFT for about 4 years Other applications  Traffic patterns  Climactic changes  Paleoecological data of 36 different taxa of diatoms in sediment

3 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Problem: Data a different viewpoint MSFT differences Notice  Extremely noisy  Might be some periodicity

4 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Motivation Time series occur frequently  General solution useful in many applications Profit incentive  Casinos get rich on 51% biased “coin”  Later see accuracy in the 55-60% level

5 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Basic Methodology Original time series  (1,2,1,2,1,2,4,3,3,4,3,4) Window width = 3 Create alphabet  a1, a2, a3 Discretize series  (a1, a2, a1, a2, a3, …) 1 2 3 4 5 a1= a2= = a3=

6 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Methodology - continued Use clustering to determine shapes  Greedy method treating each subsequence of width w as point in R w  k-means algorithm finds cluster centroids Free parameters  w = window width  d = cluster diameter  k = number of clusters

7 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Symbology and Mathematics Rule syntax A B Pattern A results in Pattern B Within 20 time units J-measure (ranking) J(B T ;A) = p(A) * [ p(B T |A) * log(p(B T |A) / p(B T )) + (1 – p(B T |A)) * log((1 – p(B T |A)) / (1 – p(B T )) ] 20

8 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Sample Results 18 4 w = 13, d = 3.5 Confidence = 59.6% J-measure = 0.0037 Within 20 days after a gradual decrease, there will be a slight increase, then a big decrease, a dip, then leveling off 18 4 RULE 20

9 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic More results wdRuleSup(%)Conf(%)J-meas. 133.518=>42.859.60.0037 154.037=>421.357.370.0087 154.511=>93.566.70.0031 305.576=>211.257.30.0003 Total of 488 rules with J-measure > 0.03 More than 25% of rules relate to 3 pairs of sequences, indicating that only a small set of sequences are closely related

10 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Positive points Semantics A B can be used by others More methodology (not illustrated) develops combinations symbology for representing combinations of rules A i Algorithm works well with different stocks J-measure is a good way to establish the confidence in the rule N

11 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Negative Points Clustering methodology > O(n 2 ) Finding meaningful rules depends on human  Many useless rules similar to “after the stock is low it is expected to raise again”  Unclear why they use “within 20 days” scenarios Algorithm doesn’t scale well to global trends Paper indicates (but does not show) rules that overlap, a problem in my current project

12 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Recommendations Examine algorithms that recognize patterns directly, without need for clustering Eliminate nonsense rules Examine replacing free parameters by algorithm that computes the best value for these parameters automatically

13 CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Conclusions Don’t expect to get rich with this algorithm Correlation of this type of algorithm is around 60%, so risk is high Good symbology for representing time series interactions


Download ppt "Rule discovery from time series Authors: Guatam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, Pedhraic Smyth Presented By: Tom Gradel."

Similar presentations


Ads by Google