Rule discovery from time series Authors: Guatam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, Pedhraic Smyth Presented By: Tom Gradel.

Rule discovery from time series Authors: Guatam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, Pedhraic Smyth Presented By: Tom Gradel

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Problem: Examine some data MSFT for about 4 years Other applications  Traffic patterns  Climactic changes  Paleoecological data of 36 different taxa of diatoms in sediment

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Problem: Data a different viewpoint MSFT differences Notice  Extremely noisy  Might be some periodicity

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Motivation Time series occur frequently  General solution useful in many applications Profit incentive  Casinos get rich on 51% biased “coin”  Later see accuracy in the 55-60% level

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Basic Methodology Original time series  (1,2,1,2,1,2,4,3,3,4,3,4) Window width = 3 Create alphabet  a1, a2, a3 Discretize series  (a1, a2, a1, a2, a3, …) 1 2 3 4 5 a1= a2= = a3=

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Methodology - continued Use clustering to determine shapes  Greedy method treating each subsequence of width w as point in R w  k-means algorithm finds cluster centroids Free parameters  w = window width  d = cluster diameter  k = number of clusters

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Symbology and Mathematics Rule syntax A B Pattern A results in Pattern B Within 20 time units J-measure (ranking) J(B T ;A) = p(A) * [ p(B T |A) * log(p(B T |A) / p(B T )) + (1 – p(B T |A)) * log((1 – p(B T |A)) / (1 – p(B T )) ] 20

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Sample Results 18 4 w = 13, d = 3.5 Confidence = 59.6% J-measure = 0.0037 Within 20 days after a gradual decrease, there will be a slight increase, then a big decrease, a dip, then leveling off 18 4 RULE 20

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic More results wdRuleSup(%)Conf(%)J-meas. 133.518=>42.859.60.0037 154.037=>421.357.370.0087 154.511=>93.566.70.0031 305.576=>211.257.30.0003 Total of 488 rules with J-measure > 0.03 More than 25% of rules relate to 3 pairs of sequences, indicating that only a small set of sequences are closely related

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Positive points Semantics A B can be used by others More methodology (not illustrated) develops combinations symbology for representing combinations of rules A i Algorithm works well with different stocks J-measure is a good way to establish the confidence in the rule N

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Negative Points Clustering methodology > O(n 2 ) Finding meaningful rules depends on human  Many useless rules similar to “after the stock is low it is expected to raise again”  Unclear why they use “within 20 days” scenarios Algorithm doesn’t scale well to global trends Paper indicates (but does not show) rules that overlap, a problem in my current project

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Recommendations Examine algorithms that recognize patterns directly, without need for clustering Eliminate nonsense rules Examine replacing free parameters by algorithm that computes the best value for these parameters automatically

CIS 526: Machine Learning Copyright (c) December 9, 2003 Thomas Gradel For:: Dr. Slobodan Vucetic Conclusions Don’t expect to get rich with this algorithm Correlation of this type of algorithm is around 60%, so risk is high Good symbology for representing time series interactions

Rule discovery from time series Authors: Guatam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, Pedhraic Smyth Presented By: Tom Gradel.

Similar presentations

Presentation on theme: "Rule discovery from time series Authors: Guatam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, Pedhraic Smyth Presented By: Tom Gradel."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Rule discovery from time series Authors: Guatam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, Pedhraic Smyth Presented By: Tom Gradel.

Similar presentations

Presentation on theme: "Rule discovery from time series Authors: Guatam Das, King-Ip Lin, Heikki Mannila, Gopal Renganathan, Pedhraic Smyth Presented By: Tom Gradel."— Presentation transcript:

Similar presentations

About project

Feedback