Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B. Aditya Prakash, Christos Faloutsos School of Computer Science Carnegie Mellon.

Similar presentations


Presentation on theme: "Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B. Aditya Prakash, Christos Faloutsos School of Computer Science Carnegie Mellon."— Presentation transcript:

1 Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B. Aditya Prakash, Christos Faloutsos School of Computer Science Carnegie Mellon University © L. Li, 2010 Machine Learning Lunch 2010/11/29

2 Motion Capture Understanding human motion Robots to assist the disabled

3 Network security Anomaly detection in computer network traffic BGP updates in network

4 DataStream monitoring Monitoring a datacenter with 5000 servers: 1TB data per day, 55 million streams ([Reeves+ 2009]) Temperature in datacenter

5 Find similar motions SELECT * FROM WHERE data LIKE

6 Motivation Answering similarity queries in Time Series Databases © L. Li, 20106 SELECT * FROM TSDB WHERE data LIKE “ ” TSDB

7 Database + Machine learning “Databased Learning ” Statistical effective Deeper pattern/functional relation (regression, Bayesian network, SVM/kernels, Clustering) Efficient Scalable (indexing, hashing, query optimization, Buffering/caching) Research Philosophy

8 Database + Machine learning “Databased Learning ” Example find similar motions/time series

9 Beyond similarity queries 9 Featureextraction Query & Indexing Similarity function, clustering/classification Visualization Forecasting Compression

10

11 CMU SCS Outline Motivation Proposed Method: Intuition & Example Experiments & Results PLiF: Insight Details Conclusion © L. Li, 201011

12 CMU SCS Intuition: Goals © L. Li, 201012 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4

13 CMU SCS Intuition: Goals © L. Li, 201013 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4 (1a) lag independent (1b) frequency proximity (1c) grouping harmonics

14 CMU SCS 14 Example: synthetic signals Equations (a)sin(2πt/100) (b)cos(2πt/100) (c)sin(2πt/98 + π/6) (d) sin(2πt/110) + 0.2sin(2πt/30) (e) cos(2πt/110) + 0.2sin(2πt/30 + π/4) © L. Li, 2010

15 CMU SCS 15 Intuition (1a) Equations (a)sin(2πt/100) (b)cos(2πt/100) (c)sin(2πt/98 + π/6) (d) sin(2πt/110) + 0.2sin(2πt/30) (e) cos(2πt/110) + 0.2sin(2πt/30 + π/4) Time shift © L. Li, 2010 e.g. left-foot-start walking v.s. right-foot-start walking

16 CMU SCS 16 Intuition (1b) Equations (a)sin(2πt/100) (b)cos(2πt/100) (c)sin(2πt/98 + π/6) (d) sin(2πt/110) + 0.2sin(2πt/30) (e) cos(2πt/110) + 0.2sin(2πt/30 + π/4) nearby frequency & time shift © L. Li, 2010 e.g. running v.s. fast running

17 CMU SCS 17 Intuition (1c) Equations (a)sin(2πt/100) (b)cos(2πt/100) (c)sin(2πt/98 + π/6) (d) sin(2πt/110) + 0.2sin(2πt/30) (e) cos(2πt/110) + 0.2sin(2πt/30 + π/4) groups of harmonics © L. Li, 2010 ~ human voices

18 CMU SCS 18 Q: only two numbers to represent each! Proposed PLiF - + © L. Li, 2010 500

19 CMU SCS 19 Intuition: how it works © L. Li, 2010 500 find hidden variable/pattern f=1/100 f=1/110 f=1/30 HV1HV2HV3

20 CMU SCS 20 Intuition: how it works © L. Li, 2010 find hidden variable/pattern f=1/110 f=1/30 HV2HV3 Co-occur HV2’ = HV2 HV3

21 CMU SCS 21© L. Li, 2010 HV1HV2’ 1.0 0 0 0.9 0 0 1.0 0

22 CMU SCS 22© L. Li, 2010 HV1 1.00 0 0.90 01.0 0 HV2’

23 CMU SCS 23 Why it works? / How to interpret? Proposed PLiF - + harmonics.1 /100 Group of harmonics 1/110 & 1/30 © L. Li, 2010

24 CMU SCS 24 Basic Idea pattern/harmonics 1/100 pattern/harmonics 1/110 & 1/30 “walking” “running” projection to harmonics (aka. frequency) © L. Li, 2010

25 CMU SCS 25 Why not SVD/PCA? PCA PLiF - + no clear grouping Confused! © L. Li, 2010

26 CMU SCS Outline Motivation Proposed Method: Intuition & Example Experiments & Results PLiF: Insight Details Conclusion © L. Li, 201026

27 CMU SCS Experiment: Goals to Verify © L. Li, 201027 Good features (low dimensional) G1 Good compression G2 Ability to forecast G3 Scalability G4

28 CMU SCS Experiments Datasets: © L. Li, 201028 BGP: 10 * 103kChlorine:166 * 4k Mocap 49 * 100-500

29 CMU SCS Result – Visualization © L. Li, 201029 Mocap PLiF first two “fingerprints” With PLiF, now able to visualize very high dimensional time sequences

30 CMU SCS Result – Clustering Pred.walkrun 263 1020 © L. Li, 201030 Mocap PLiF first two “fingerprints” walking running PLiF + thresholding Pred.walkrun 1513 11110 PCA + kmeans Accuracy = 46/49 Accuracy = 25/49

31 CMU SCS Result – Clustering © L. Li, 201031 BGP data: PLiF + hierarchical clustering

32 CMU SCS Intuition: Goals © L. Li, 201032 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4

33 CMU SCS Result - Compression © L. Li, 201033 Chlorine 166 * 4k Storing only the PLiF features & sampling of hidden variables Ideal compression ratio error

34 CMU SCS Result - Compression © L. Li, 201034 Mocap: 93 * 300 Storing only the PLiF features & sampling of hidden variables Ideal compression ratio error

35 CMU SCS Intuition: Goals © L. Li, 201035 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4 later

36 CMU SCS Scalability © L. Li, 201036 Linear ~ sequence length sequence length wall clock time (s) sequence length wall clock time (s)

37 CMU SCS Scalability Optimized algorithm Details later © L. Li, 201037 PLiF-basic PLiF wall clock time SLOPE=1/3

38 CMU SCS Intuition: Goals © L. Li, 201038 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4 later

39 CMU SCS Outline Motivation Proposed Method: Intuition & Example Experiments & Results PLiF: Insight Details Conclusion © L. Li, 201039

40 CMU SCS Proposed Method: PLiF © L. Li, 201040 S4 S1 S2 S3 Learning Dynamics Finding Canonical Form Handling the Lag Grouping Harmonics

41 CMU SCS Step 1. Learning Dynamics Use machine learning to find: –“Transition” of Hidden Variables (HV): one time- tick to other –“Mixing” weights: HVs  observed data © L. Li, 201041 Time series of hidden variables

42 CMU SCS 42 Underlying Model: Linear Dynamical Systems Details © L. Li, 2010

43 CMU SCS 43 Linear Dynamical Systems: parameters Details © L. Li, 2010 namemeaning & example µ0µ0 initial state for hidden variable e.g. initial position, velocity & acceleration Atransition matrixhow the states move forward, e.g. soccer flying in the air Ctransmission/ projection/ output matrix hidden state  observation, e.g. camera taking picture of the soccer Q0Q0 Initial covariance Qtransition covariancehow precision is the soccer motion Rtransmission/ projection covariance i.e. observation noise; e.g. how accurate is the camera

44 CMU SCS Dynamics/Transition in Hidden Variables © L. Li, 201044 HV(t+1) transition matrix HV(t) - enables forecasting

45 CMU SCS Mixing Weights © L. Li, 201045 mixing/output matrix C - +

46 CMU SCS 46 Learning the Parameters Expectation-Maximization maximizing the expected log likelihood: Details © L. Li, 2010 Standard EM: expensive! Further speed optimization in our PLiF: matrix inversion using Woodbury matrix identity

47 CMU SCS Step 2: Canonicalization But, hidden variables –hard to interpret –non-unique: many combinations are essentially the same Intuition: –To make hidden variables compact and “uniquely” identified © L. Li, 201047

48 CMU SCS Canonicalization adds Interpretability © L. Li, 201048 Time series of HV after canonicalization (real part) frequency scaling (subtle) “Harmonics” HV before f=1/110 f=1/100 f=1/30

49 CMU SCS Step 2: Canonicalization Again, Estimating how each signal is composed of “harmonics”/patterns but, in complex space © L. Li, 201049 Mixing matrix (complex valued)

50 CMU SCS Step 3:Handling Lag Intuition: –Groups emerge.. –reducing redundancy –eliminating phase shift © L. Li, 201050 Conjugate! Mixing matrix (complex valued)

51 CMU SCS Step 3:Handling Lag Idea: –only magnitude counts –removing duplicates © L. Li, 201051 - +

52 CMU SCS Step 3:Handling Lag interpretability © L. Li, 201052 - + harmonics.1/100 harmonics 1/110 harmonics 1/30

53 CMU SCS Step 4:Grouping Harmonics Intuition: –Still a little redundancy © L. Li, 201053 - + harmonics.1/100 harmonics 1/110 harmonics 1/30 Think Minimum Description Length

54 CMU SCS Step 4: Grouping Harmonics © L. Li, 201054 Dimensional Reduction - + SVD/PCA U,S,V  min |X-U*S*V T | 2

55 CMU SCS Step 4: Grouping Harmonics © L. Li, 201055 - + Group of harmonics 1/110 & 1/30 harmonics.1/100

56 CMU SCS Parsimonious Linear Fingerprinting Goals  steps © L. Li, 201056 Good features/similarity function G1 Good compression G2 Ability to forecast G3 Scalability G4 (1a) lag independent (1b) frequency proximity (1c) grouping harmonics S4 S1 S2 S3 Learning Dynamics Canonical Form Handling Lag Grouping Harmonics PLiF alg. steps PLiF Goals

57 CMU SCS Outline Motivation Proposed Method: Intuition & Example Experiments & Results PLiF: Insight Details Conclusion © L. Li, 201057

58 CMU SCS Conclusion Need for finding compact representation of time series data Intuition & Insights of PLiF Interpretation of PLiF & How it works Experiments on a diverse set of data –It really works! –It is fast & scalable. © L. Li, 201058

59 CMU SCS Take away message Need to find Good feature for time series:  Similarity func., Compression, Forecasting Design the method meets TS characteristics –e.g. Phase shift/lag correlation When to use PLiF –near periodic & relatively smooth signals © L. Li, 201059

60 CMU SCS References Lei Li, B. Aditya Prakash, Christos Faloutsos. Parsimonious Linear Fingerprinting for Time Series. VLDB 2010. Lei Li, Jim McCann, Christos Faloutsos, Nancy Pollard. DynaMMo: Mining and Summarization of Coevolving Sequences with Missing Values. ACM KDD 2009. © L. Li, 201060

61 CMU SCS Question? Thanks! © L. Li, 201061 Christos F aloutsos Lei Li B. Aditya P rakash http://www.cs.cmu.edu/~leili/http://www.cs.cmu.edu/~leili/ leili@cs.cmu.edu

62 CMU SCS BACKUP appendix © L. Li, 201062

63 CMU SCS 63 Why not Fourier (DFT)? 1. FT cannot do forecasting © L. Li, 2010

64 CMU SCS 64 Why not Fourier (DFT)? 1. FT cannot do forecasting © L. Li, 2010

65 CMU SCS 65 Why not Fourier (DFT)? FT spectrum 1. FT cannot do forecasting 2. No arbitrary frequency true freq. frequency © L. Li, 2010

66 CMU SCS 66 Why not Fourier (DFT)? 1. FT cannot do forecasting 2. No arbitrary frequency 3. nearby frequency treated differently, not suited for across signals freq.=5 freq.=5.1 © L. Li, 2010

67 CMU SCS Handling Missing Values Lei Li, Jim McCann, Nancy Pollard, Christos Faloutsos. BoLeRO: A Principled Technique for Including Bone Length Constraints in Motion Capture Occlusion Filling, ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 2010. Lei Li, Jim McCann, Christos Faloutsos, Nancy Pollard. DynaMMo: Mining and Summarization of Coevolving Sequences with Missing Values. ACM KDD 2009. Lei Li, Jim McCann, Christos Faloutsos, Nancy Pollard. Laziness is a virtue: Motion stitching using effort minimization. Eurographics 2008, © L. Li, 201067

68 CMU SCS Details for Implementation © L. Li, 201068 Read this only if you want to implement it

69 CMU SCS 69 Modelling the data: Linear Dynamical Systems Details © L. Li, 2010

70 CMU SCS 70 Linear Dynamical Systems: parameters namemeaning & example µ0µ0 initial state for hidden variable e.g. initial position, velocity & acceleration Atransition matrixhow the states move forward, e.g. soccer flying in the air Ctransmission/ projection/ output matrix hidden state  observation, e.g. camera taking picture of the soccer Q0Q0 Initial covariance Qtransition covariancehow precision is the soccer motion Rtransmission/ projection covariance i.e. observation noise; e.g. how accurate is the camera Details © L. Li, 2010

71 CMU SCS 71 Learning the Dynamics Expectation-Maximization maximizing the expected log likelihood Details © L. Li, 2010

72 CMU SCS 72 Finding Canonical Form Intuition: find the canonical dynamics taking eigenvalue decomposition of the transition matrix A compensate C with C h is a projection of the data to the dynamics but... Details © L. Li, 2010


Download ppt "Parsimonious Linear Fingerprinting for Time Series Lei Li joint work with B. Aditya Prakash, Christos Faloutsos School of Computer Science Carnegie Mellon."

Similar presentations


Ads by Google