Presentation on theme: "Time Series Shapelets: A New Primitive for Data Mining"— Presentation transcript:
1 Time Series Shapelets: A New Primitive for Data Mining KDD 2009Time Series Shapelets: A New Primitive for Data MiningLexiang Ye and Eamonn KeoghUniversity of California, RiversideHello, everyone. I am Lexiang Ye. And this is the joint work with my advisor Dr. Eamonn Keogh.Presented by: Zhenhui Li
2 Classification in Time Series Application: Finance, Medicine1-Nearest NeighborPros: accurate, robust, simpleCons: time and space complexity (lazy learning); results are not interpretableAs we all know, classification is an important problem in the data mining domain. Classification of time series has been extracting great interest over the past decade. And its applications are among a wide variety domains, such as medicine, finance and etcRecent research has suggested the nearest neighbor classification is very hard to beat for most problems. It is most accurate proved by extensive empirical tests. It is robust to noise. It is extremely simple and generic.20040060080010001200
3 Solution Shapelets time series subsequence representative of a class discriminative from other classesIn this work, we introduce a new time series primitive, time series shapelet, which addresses these limitations. Informally, shapelets are time series subsequences which are in some sense maximally representative of a class.There are some similar problem in other domains. Like distinguishing substring selection or probe design. However, the time series shapalets covers much wider problem than those.
4 MOTIVATING EXAMPLEFirst, Let’s begin with a toy example in shape for visual clarity.
5 stinging nettles false nettles Shapelet Shapelet Dictionary 5.1 I Leaf Decision TreeShapelet Dictionary5.1yesnoI1false nettlesShapeletstinging nettlesGiven two species of leaves, the first one is called stinging nettle and the second one is colloquially called “false nettle”, since it is very similar to the first group and is easily confused. Now, we want to build a classifier on these two species based on shape.First we convert the outline shapelet to one-dimension time series representation. Because the global difference of the two types of shapes are subtle, Instead, We use a particularly distinguishing subsection. By “blushing” the subsection back, we can gain the insight of the difference, the stinging nettle has a stem connect to the leaf at an angle close to 90 degree. While for the false nettle, the stem connect to the leaf at a much wider angle. This distinguishing subsection is called “shapelet”.Now we build the classifier, the outline time series has a subsection similar to this shapelet are classified as false nettle, otherwise it is stinging nettle.So how can we extract these distinguishing shapelet?
6 BRUTE-FORCE ALGORITHM We start with the simple brute-force algorithm of finding the shapelet.
7 Extract subsequences of all possible lengths Candidates PoolcaThe first step is to construct the shapelet candidates pool. To exhaustively search all the possibilities, we extract subsequences of all possible length. Using a sliding window of certain size, we can extract the subsequence of that lengths. We use the sliding window, slide across each time series, and extract all the subsequences. Then we proceed to a sliding window of different size, and so on and so force.. . .
8 Testing the utility of a candidate shapelet Information gainArrange the time series objectsbased on the distance from candidateFind the optimal split point (maximal information gain)Pick the candidate achieving best utility as the shapeletAfter we have all the candidates in the pool, we test and compare the utility of each candidate shapelet in the pool. Here we use the information gain as the utility function. The higher the better.First, we arrange the objects by the distance from each object to the candidate. For visual illustration, in the figure, the leaves are arranged in the real number line according to the candidate.Then we test every possible split point between each two objects and find the optimal one that can best divide the leaves into two original classes. Here, this is the optimal split point for the candidate, it is not perfect but pretty good.After we testing all the candidate, we can pick the one achieving best utility as the shapeletSince the brute force algorithm exhaustively check every single candidate, it guarantee to find the best candidate as the shapelet. However, the brute force method suffers a vital problem. To get the distance from each of the object to the candidate, we need to find the best matching location for the the candidate in the time series which introduces a subsequence searching procedure for every distance calculation.Split Pointcandidate
9 Problem Total number of candidate Candidates PoolProblemTotal number of candidateEach candidate: compute the distance between this candidate and each training sampleTrace dataset200 instances, each of length 2757,480,200 shapelet candidatesapproximately three days. . .There is really a huge number of candidate in the pool, even for a very small dataset. Take the well know trace dataset for example, it contains 200 instances, each of length 275. But checking every candidate about takes the brute force algorithm about three days. To enable to extend the shapelet method to larger dataset, we apply some speedup on the brute force algorithm.
10 SpeedupDistance calculations from time series objects to shapelet candidates are the most expensive partReduce the time in two waysDistance Early Abandonreduce the distance computation time between two time seriesAdmissible Entropy Pruningreduce the number of distance calculatationscandidateWe realize the most expensive calculations in the brute force method is the distance calculation.The two speedups reduce the distance calculation time in two aspect; one is the distance early abandon, it is a known idea and have successfully applied to previous researches. And this method also works particularly efficient in our case.The other is our novel idea, called the admissible entropy prune.
13 best matching location 102030405060708090100best matching locationSDist= 0.4T
14 calculation abandoned at this point T Dist> 0.4Scalculation abandoned at this pointT102030405060708090100
15 Distance Early Abandon We only need the minimum DistMethodKeep the best-so-far distanceAbandon the calculation if the current distance is larger than best so far.Consider the subsequences of T in random order to reduce the best so far as much as possible in the early stage.
16 ADMISSIBLE ENTROPY PRUNING We now explain how this idea works
17 Admissible Entropy Pruning We only need the best shapelet for each classFor a candidate shapeletWe don’t need to calculate the distance for each training sampleAfter calculating some training samples, the upper bound of information gain < best candidate shapeletStop calculationTry next candidateAs we mentioned before, we use the information gain as the utility function because, it is the tradition evaluation in the decision tree and easily generalized to the multi-class problem. And what’s more, we can exploit it to largely reduce the number of distance calculations from objects to candidates.
18 stinging nettles false nettles Suppose now we are considering a candidate. The distances of the first five time series objects to the candidate have been calculated, and their corresponding positions in a one-dimensional representation are shown in the figure.
19 I=0.42I= 0.29Before continue to calculate the remaining distances, we can first give an upper bound of the information gain. The best case is that the rest leaves in two species are far away from each other. So we arrange them at the two side. In this case, the upper bound of the information gain is Suppose we have a previous best-so-far candidate, which has the entropy of 0.42.Our current information gain is lower than the best-so-far. Therefore, at this point, we can stop the distance calculation for the remaining objects and prune this candidate from consideration.So our speedup method still consider every single candidate as the brute force method, but try to avoid unnecessary distance calculations as many as possible.
20 Classification stinging nettles false nettles Shapelet Leaf Decision TreeShapelet Dictionary5.1yesnoI1ClassificationAfter we find the shapelet, we can frame it as the decision tree. For each step of the decision tree induction, we determine the shapelet. For classification stage, we compare the time series with the shapelet at each step. Although the shapelet takes long in the training. It is very fast for classification. Since the number of the shapelets a time series should compare to is equal to the height of the decision tree.
21 EXPERIMENTAL EVALUATION Now I will provide you several interesting examples and case studies.
22 Performance Comparison Lets start with a large dataset, Lightning dataset. In the original dataset, the length of each time series object is And there are separated training and testing set. The size of training is 2000, and testing One sample from each class are plotted in the right below figure here. In our experiment, we use the different fraction of data from training set, and calculate the accuracy on the original testing set. The left figure shows the performance comparison. The y-axis indicates the computing time and x-axis represents the different size of the training. As you can see, our pruning method works far efficiently than the brute force one. When the data set is size of 160, the brute force method take 5 days to complete the learning procedure, while our method only takes 2 hours. On the right figure, we shows the testing accuracy, when only 10 objects (out of the original 2,000) are examined, the decision tree is slightly worse than the best known result on this dataset but after examining just 1% of the training data, it is significantly more accurate.Original Lightning DatasetLength2000TrainingTesting18000
23 Projectile PointsProjectile point classification is an important and interesting topic in anthropology. The diversity and broken projectile points increase the difficulty to classify the projectile point correctly. In this example, we use our shapelet to classify these three types of projectile points on the shapes.
24 Arrowhead Decision Tree 11.2485.47Shapelet Dictionary(Clovis)(Avonlea)III2004001.0Arrowhead Decision Tree21ClovisAvonleaWe get the shapelet at each node of the decision tree. By looking at the corresponding portion in the shape, we found some interesting explanation of the classifiers. Clovis is distinguished from the others by an unnotched hafting area near the bottom connected by a deep concave bottom end. After the Clovis is distinguished, Avonlea is differentiated from the mixed class by a small notched hafting area connected by a shallow concave bottom end.Our method also gives better accuracy and much less time than the rotation invariant nearest neighbor method.MethodAccuracyTimeShapelet0.800.33Rotation Invariant Nearest Neighbor0.681013
25 one sample from each class Wheat Spectrography200400600800100012000.51one sample from each classNow lets see a multi-class example, the wheat spectrography problem. The wheat samples are grown in Canada between 1998 and The data is labeled by the year in which the wheat was grown. In the figures, we show one sample from each class. The objects are separated in the y-axis for visual clarity.As we can see the global difference is subtle. Thus, we need to reply on the local differences.Wheat DatasetLength1050Training49Testing276
26 Shapelet Dictionary Wheat Decision Tree 0.4II0.3III0.2IV0.10.0VVI100200300Wheat Decision Tree241365IIIIIIIVVVIUsing the shapelet method, we build the decision tree, found the shapelets represent the local differences and got the better accuracy than the nearest neighbor. So our method works well in the multi-class problem.MethodAccuracyTimeShapelet0.7200.86Nearest Neighbor0.5430.65
28 the Gun/NoGun Problem No Gun Gun (No Gun) 38.94 I Shapelet Dictionary 2IShapelet DictionaryLets we considered a well-studied gun/nogun. In our experiment, we use the same training/testing split as the previous study. The shapelet indicates a phenomenon known as “overshoot”. we can see that the NoGun class has a “dip” where the actor put her hand down by her side. That is because the inertia carries her hand a little too far and she is forced to correct for it.And we also achieve better accuracy using less classification time in this well-studied case.50100Gun Decision TreeI1MethodAccuracyTimeShapelet0.9330.016Rotation Invariant Nearest Neighbor0.9130.064
29 Gait Analysis Gait Dataset Length different lengths Training 18 For all the above experiments, we only consider one dimensional time series. In the gait analysis example, we extend the usage of shapelet to multi-variate time series. Here, we want to distinguish the normal walk, in the left video, from the abnormal walk on the right. We record the time series for both feet. The video are taken from different actors, different walking style and different step numbers.Gait DatasetLengthdifferent lengthsTraining18Testing143
30 Reduces the sensitivity of alignment 0.9090.9021.00.860right toeIleft toe(Normal Walk)0.535100200300Walk Decision TreeI1The resulting shapelet on the left represent exactly one walking cycle. On the right figure, we shows the accuracy comparison between two different methods. Shapelet and Rotation invariant nearest neighbor. We have two types of data in the experiment, one is the raw data extracted from the video and the other is data after careful alignment and segmentation. Although the nearest neighbor method doesn’t lose too much on the well segmented data, the accuracy of nn suffers greatly on the raw data. so the shapelet method can reduce the sensitivity of alignment and thus alleviates human labors for preprocessing data.Time for Classification on 143 testing objectsShapelet0.060sRotation Invariant Nearest Neighbor2.684s
31 Conclusions Interpretable results more accurate/robust significantly faster at classificationAs we have seen,The shapelet provide interpretable results, which may help domain practitioners better understand their data.The shapelet generate more accurate result. Also it is more robust to the noise and unalignment.What’s more, the shapelet is significantly fast at classification than existing state-of-art approaches.
32 Discussions - Comparison Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu, “Discriminative Frequent Pattern Analysis for Effective Classification” (ICDE'07)Hong Cheng, Xifeng Yan, Jiawei Han, and Philip S. Yu, "Direct Discriminative Pattern Mining for Effective Classification", (ICDE'08)Similarities:motivation: Discriminative frequent pattern = Shapelettechnique: Use upper bound of information gain to speed upDifferences:application: general feature selection v.s. time series (no explicit features)split node: binary (contain/not contain a pattern) v.s. numeric value (smaller/larger than a value)
33 Discussions – other topics Similar ideas could be applied to other research topicsgraphimagespatio-temporalsocial network….
34 Discussions – other topics Graph classification:Xifeng Yan, Hong Cheng, Jiawei Han, and Philip S. Yu, “Mining Significant GraphPatterns by Scalable Leap Search”, Proc ACM SIGMOD Int. Conf. onManagement of Data (SIGMOD'08), Vancouver, BC, Canada, June 2008.
35 Discussions – other topics moving object classificationDiscriminative sub-movement
36 Discussions – other topics Social networkclassify normal/spamming users