Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #25: Time series mining and forecasting Christos Faloutsos.

Similar presentations


Presentation on theme: "CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #25: Time series mining and forecasting Christos Faloutsos."— Presentation transcript:

1 CMU SCS : Multimedia Databases and Data Mining Lecture #25: Time series mining and forecasting Christos Faloutsos

2 CMU SCS (c) C. Faloutsos, Must-Read Material Byong-Kee Yi, Nikolaos D. Sidiropoulos, Theodore Johnson, H.V. Jagadish, Christos Faloutsos and Alex Biliris, Online Data Mining for Co-Evolving Time Sequences, ICDE, Feb Chungmin Melvin Chen and Nick Roussopoulos, Adaptive Selectivity Estimation Using Query Feedbacks, SIGMOD 1994

3 CMU SCS (c) C. Faloutsos, Thanks Deepay Chakrabarti (CMU) Spiros Papadimitriou (CMU) Prof. Byoung-Kee Yi (Pohang U.)

4 CMU SCS (c) C. Faloutsos, Outline Motivation Similarity search – distance functions Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

5 CMU SCS (c) C. Faloutsos, Problem definition Given: one or more sequences x 1, x 2, …, x t, … (y 1, y 2, …, y t, … … ) Find –similar sequences; forecasts –patterns; clusters; outliers

6 CMU SCS (c) C. Faloutsos, Motivation - Applications Financial, sales, economic series Medical –ECGs +; blood pressure etc monitoring –reactions to new drugs –elderly care

7 CMU SCS (c) C. Faloutsos, Motivation - Applications (cont’d) ‘Smart house’ –sensors monitor temperature, humidity, air quality video surveillance

8 CMU SCS (c) C. Faloutsos, Motivation - Applications (cont’d) civil/automobile infrastructure –bridge vibrations [Oppenheim+02] – road conditions / traffic monitoring

9 CMU SCS (c) C. Faloutsos, Motivation - Applications (cont’d) Weather, environment/anti-pollution –volcano monitoring –air/water pollutant monitoring

10 CMU SCS (c) C. Faloutsos, Motivation - Applications (cont’d) Computer systems –‘Active Disks’ (buffering, prefetching) –web servers (ditto) –network traffic monitoring –...

11 CMU SCS (c) C. Faloutsos, Stream Data: Disk accesses time #bytes

12 CMU SCS (c) C. Faloutsos, Problem #1: Goal: given a signal (e.g.., #packets over time) Find: patterns, periodicities, and/or compress year count lynx caught per year (packets per day; temperature per day)

13 CMU SCS (c) C. Faloutsos, Problem#2: Forecast Given x t, x t-1, …, forecast x t Time Tick Number of packets sent ??

14 CMU SCS (c) C. Faloutsos, Problem#2’: Similarity search E.g.., Find a 3-tick pattern, similar to the last one Time Tick Number of packets sent ??

15 CMU SCS (c) C. Faloutsos, Problem #3: Given: A set of correlated time sequences Forecast ‘Sent(t)’

16 CMU SCS (c) C. Faloutsos, Important observations Patterns, rules, forecasting and similarity indexing are closely related: To do forecasting, we need –to find patterns/rules –to find similar settings in the past to find outliers, we need to have forecasts –(outlier = too far away from our forecast)

17 CMU SCS (c) C. Faloutsos, Outline Motivation Similarity Search and Indexing Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

18 CMU SCS (c) C. Faloutsos, Outline Motivation Similarity search and distance functions –Euclidean –Time-warping...

19 CMU SCS (c) C. Faloutsos, Importance of distance functions Subtle, but absolutely necessary: A ‘must’ for similarity indexing (-> forecasting) A ‘must’ for clustering Two major families –Euclidean and Lp norms –Time warping and variations

20 CMU SCS (c) C. Faloutsos, Euclidean and Lp... L 1 : city-block = Manhattan L 2 = Euclidean L 

21 CMU SCS (c) C. Faloutsos, Observation #1 Time sequence -> n-d vector... Day-1 Day-2 Day-n

22 CMU SCS (c) C. Faloutsos, Observation #2 Euclidean distance is closely related to –cosine similarity –dot product –‘cross-correlation’ function... Day-1 Day-2 Day-n

23 CMU SCS (c) C. Faloutsos, Time Warping allow accelerations - decelerations –(with or w/o penalty) THEN compute the (Euclidean) distance (+ penalty) related to the string-editing distance

24 CMU SCS (c) C. Faloutsos, Time Warping ‘stutters’:

25 CMU SCS (c) C. Faloutsos, Time warping Q: how to compute it? A: dynamic programming D( i, j ) = cost to match prefix of length i of first sequence x with prefix of length j of second sequence y

26 CMU SCS (c) C. Faloutsos, Thus, with no penalty for stutter, for sequences x 1, x 2, …, x i,; y 1, y 2, …, y j x-stutter y-stutter no stutter Time warping

27 CMU SCS (c) C. Faloutsos, VERY SIMILAR to the string-editing distance x-stutter y-stutter no stutter Time warping

28 CMU SCS (c) C. Faloutsos, Time warping Complexity: O(M*N) - quadratic on the length of the strings Many variations (penalty for stutters; limit on the number/percentage of stutters; …) popular in voice processing [Rabiner + Juang]

29 CMU SCS (c) C. Faloutsos, Other Distance functions piece-wise linear/flat approx.; compare pieces [Keogh+01] [Faloutsos+97] ‘cepstrum’ (for voice [Rabiner+Juang]) –do DFT; take log of amplitude; do DFT again! Allow for small gaps [Agrawal+95] See tutorial by [Gunopulos + Das, SIGMOD01]

30 CMU SCS (c) C. Faloutsos, Other Distance functions In [Keogh+, KDD’04]: parameter-free, MDL based

31 CMU SCS (c) C. Faloutsos, Conclusions Prevailing distances: –Euclidean and –time-warping

32 CMU SCS (c) C. Faloutsos, Outline Motivation Similarity search and distance functions Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

33 CMU SCS (c) C. Faloutsos, Linear Forecasting

34 CMU SCS (c) C. Faloutsos, Forecasting "Prediction is very difficult, especially about the future." - Nils Bohr houghts.html

35 CMU SCS (c) C. Faloutsos, Outline Motivation... Linear Forecasting –Auto-regression: Least Squares; RLS –Co-evolving time sequences –Examples –Conclusions

36 CMU SCS (c) C. Faloutsos, Reference [Yi+00] Byoung-Kee Yi et al.: Online Data Mining for Co-Evolving Time Sequences, ICDE (Describes MUSCLES and Recursive Least Squares)

37 CMU SCS (c) C. Faloutsos, Problem#2: Forecast Example: give x t-1, x t-2, …, forecast x t Time Tick Number of packets sent ??

38 CMU SCS (c) C. Faloutsos, Forecasting: Preprocessing MANUALLY: remove trends spot periodicities time 7 days

39 CMU SCS (c) C. Faloutsos, Problem#2: Forecast Solution: try to express x t as a linear function of the past: x t-2, x t-2, …, (up to a window of w) Formally: Time Tick ??

40 CMU SCS (c) C. Faloutsos, (Problem: Back-cast; interpolate) Solution - interpolate: try to express x t as a linear function of the past AND the future: x t+1, x t+2, … x t+wfuture; x t-1, … x t-wpast (up to windows of w past, w future ) EXACTLY the same algo’s Time Tick ??

41 CMU SCS (c) C. Faloutsos, Linear Regression: idea Body weight express what we don’t know (= ‘dependent variable’) as a linear function of what we know (= ‘indep. variable(s)’) Body height

42 CMU SCS (c) C. Faloutsos, Linear Auto Regression:

43 CMU SCS (c) C. Faloutsos, Linear Auto Regression: Number of packets sent (t-1) Number of packets sent (t) lag w=1 Dependent variable = # of packets sent (S [t]) Independent variable = # of packets sent (S[t-1]) ‘lag-plot’

44 CMU SCS (c) C. Faloutsos, Outline Motivation... Linear Forecasting –Auto-regression: Least Squares; RLS –Co-evolving time sequences –Examples –Conclusions

45 CMU SCS (c) C. Faloutsos, More details: Q1: Can it work with window w>1? A1: YES! x t-2 xtxt x t-1

46 CMU SCS (c) C. Faloutsos, More details: Q1: Can it work with window w>1? A1: YES! (we’ll fit a hyper-plane, then!) x t-2 xtxt x t-1

47 CMU SCS (c) C. Faloutsos, More details: Q1: Can it work with window w>1? A1: YES! (we’ll fit a hyper-plane, then!) x t-2 x t-1 xtxt

48 CMU SCS (c) C. Faloutsos, More details: Q1: Can it work with window w>1? A1: YES! The problem becomes: X [N  w]  a [w  1] = y [N  1] OVER-CONSTRAINED –a is the vector of the regression coefficients –X has the N values of the w indep. variables –y has the N values of the dependent variable

49 CMU SCS (c) C. Faloutsos, More details: X [N  w]  a [w  1] = y [N  1] Ind-var1 Ind-var-w time

50 CMU SCS (c) C. Faloutsos, More details: X [N  w]  a [w  1] = y [N  1] Ind-var1 Ind-var-w time

51 CMU SCS (c) C. Faloutsos, More details Q2: How to estimate a 1, a 2, … a w = a? A2: with Least Squares fit (Moore-Penrose pseudo-inverse) a is the vector that minimizes the RMSE from y a = ( X T  X ) -1  (X T  y)

52 CMU SCS (c) C. Faloutsos, More details Q2: How to estimate a 1, a 2, … a w = a? A2: with Least Squares fit Identical to earlier formula (proof?) Where a = ( X T  X ) -1  (X T  y) a = V x   x U T  y X = U x  x V T

53 CMU SCS (c) C. Faloutsos, More details Straightforward solution: Observations: –Sample matrix X grows over time –needs matrix inversion –O(N  w 2 ) computation –O(N  w) storage a = ( X T  X ) -1  (X T  y) a : Regression Coeff. Vector X : Sample Matrix XN:XN: w N

54 CMU SCS (c) C. Faloutsos, Even more details Q3: Can we estimate a incrementally? A3: Yes, with the brilliant, classic method of ‘Recursive Least Squares’ (RLS) (see, e.g., [Yi+00], for details). We can do the matrix inversion, WITHOUT inversion! (How is that possible?!)

55 CMU SCS (c) C. Faloutsos, Even more details Q3: Can we estimate a incrementally? A3: Yes, with the brilliant, classic method of ‘Recursive Least Squares’ (RLS) (see, e.g., [Yi+00], for details). We can do the matrix inversion, WITHOUT inversion! (How is that possible?!) A: our matrix has special form: (X T X)

56 CMU SCS (c) C. Faloutsos, More details XN:XN: w N X N+1 At the N+1 time tick: x N+1

57 CMU SCS (c) C. Faloutsos, More details Let G N = ( X N T  X N ) -1 (``gain matrix’’) G N+1 can be computed recursively from G N GNGN w w

58 CMU SCS (c) C. Faloutsos, EVEN more details: 1 x w row vector Let’s elaborate (VERY IMPORTANT, VERY VALUABLE!)

59 CMU SCS (c) C. Faloutsos, EVEN more details:

60 CMU SCS (c) C. Faloutsos, EVEN more details: [w x 1] [w x (N+1)] [(N+1) x w] [w x (N+1)] [(N+1) x 1]

61 CMU SCS (c) C. Faloutsos, EVEN more details: [w x (N+1)] [(N+1) x w]

62 CMU SCS (c) C. Faloutsos, EVEN more details: 1 x w row vector ‘gain matrix’

63 CMU SCS (c) C. Faloutsos, EVEN more details:

64 CMU SCS (c) C. Faloutsos, EVEN more details: wxw wx1 1xw wxw 1x1 SCALAR!

65 CMU SCS (c) C. Faloutsos, Altogether:

66 CMU SCS (c) C. Faloutsos, Altogether: where I: w x w identity matrix  : a large positive number (say, 10 4 ) IMPORTANT!

67 CMU SCS (c) C. Faloutsos, Comparison: Straightforward Least Squares –Needs huge matrix (growing in size) O(N×w) –Costly matrix operation O(N×w 2 ) Recursive LS –Need much smaller, fixed size matrix O(w×w) –Fast, incremental computation O(1×w 2 ) –no matrix inversion N = 10 6, w = 1-100

68 CMU SCS (c) C. Faloutsos, Pictorially: Given: Independent Variable Dependent Variable

69 CMU SCS (c) C. Faloutsos, Pictorially: Independent Variable Dependent Variable. new point

70 CMU SCS (c) C. Faloutsos, Pictorially: Independent Variable Dependent Variable RLS: quickly compute new best fit new point

71 CMU SCS (c) C. Faloutsos, Even more details Q4: can we ‘forget’ the older samples? A4: Yes - RLS can easily handle that [Yi+00]:

72 CMU SCS (c) C. Faloutsos, Adaptability - ‘forgetting’ Independent Variable eg., #packets sent Dependent Variable eg., #bytes sent

73 CMU SCS (c) C. Faloutsos, Adaptability - ‘forgetting’ Independent Variable eg. #packets sent Dependent Variable eg., #bytes sent Trend change (R)LS with no forgetting

74 CMU SCS (c) C. Faloutsos, Adaptability - ‘forgetting’ Independent Variable Dependent Variable Trend change (R)LS with no forgetting (R)LS with forgetting RLS: can *trivially* handle ‘forgetting’

75 CMU SCS (c) C. Faloutsos, How to choose ‘w’? goal: capture arbitrary periodicities with NO human intervention on a semi-infinite stream Details

76 CMU SCS (c) C. Faloutsos, Reference [Papadimitriou+ vldb2003] Spiros Papadimitriou, Anthony Brockwell and Christos Faloutsos Adaptive, Hands-Off Stream Mining VLDB 2003, Berlin, Germany, Sept Details

77 CMU SCS (c) C. Faloutsos, Answer: ‘AWSOM’ (Arbitrary Window Stream fOrecasting Method) [Papadimitriou+, vldb2003] idea: do AR on each wavelet level in detail: Details

78 CMU SCS (c) C. Faloutsos, AWSOM xtxt t t W 1,1 t W 1,2 t W 1,3 t W 1,4 t W 2,1 t W 2,2 t W 3,1 t V 4,1 time frequency = Details

79 CMU SCS (c) C. Faloutsos, AWSOM xtxt t t W 1,1 t W 1,2 t W 1,3 t W 1,4 t W 2,1 t W 2,2 t W 3,1 t V 4,1 time frequency Details

80 CMU SCS (c) C. Faloutsos, AWSOM - idea W l,t W l,t-1 W l,t-2 W l,t   l,1 W l,t-1   l,2 W l,t-2  … W l’,t’-1 W l’,t’-2 W l’,t’ W l’,t’   l’,1 W l’,t’-1   l’,2 W l’,t’-2  … Details

81 CMU SCS (c) C. Faloutsos, More details… Update of wavelet coefficients Update of linear models Feature selection –Not all correlations are significant –Throw away the insignificant ones (“noise”) (incremental) (incremental; RLS) (single-pass) Details

82 CMU SCS (c) C. Faloutsos, Results - Synthetic data Triangle pulse Mix (sine + square) AR captures wrong trend (or none) Seasonal AR estimation fails AWSOMARSeasonal AR Details

83 CMU SCS (c) C. Faloutsos, Results - Real data Automobile traffic –Daily periodicity –Bursty “noise” at smaller scales AR fails to capture any trend Seasonal AR estimation fails Details

84 CMU SCS (c) C. Faloutsos, Results - real data Sunspot intensity –Slightly time-varying “period” AR captures wrong trend Seasonal ARIMA –wrong downward trend, despite help by human! Details

85 CMU SCS (c) C. Faloutsos, Complexity Model update Space: O  lgN + mk 2   O  lgN  Time: O  k 2   O  1  Where –N: number of points (so far) –k:number of regression coefficients; fixed –m:number of linear models; O  lgN  Details

86 CMU SCS (c) C. Faloutsos, Outline Motivation... Linear Forecasting –Auto-regression: Least Squares; RLS –Co-evolving time sequences –Examples –Conclusions

87 CMU SCS (c) C. Faloutsos, Co-Evolving Time Sequences Given: A set of correlated time sequences Forecast ‘Repeated(t)’ ??

88 CMU SCS (c) C. Faloutsos, Solution: Q: what should we do?

89 CMU SCS (c) C. Faloutsos, Solution: Least Squares, with Dep. Variable: Repeated(t) Indep. Variables: Sent(t-1) … Sent(t-w); Lost(t-1) …Lost(t-w); Repeated(t-1),... (named: ‘MUSCLES’ [Yi+00])

90 CMU SCS (c) C. Faloutsos, Forecasting - Outline Auto-regression Least Squares; recursive least squares Co-evolving time sequences Examples Conclusions

91 CMU SCS (c) C. Faloutsos, Examples - Experiments Datasets –Modem pool traffic (14 modems, 1500 time- ticks; #packets per time unit) –AT&T WorldNet internet usage (several data streams; 980 time-ticks) Measures of success –Accuracy : Root Mean Square Error (RMSE)

92 CMU SCS (c) C. Faloutsos, Accuracy - “Modem” MUSCLES outperforms AR & “yesterday”

93 CMU SCS (c) C. Faloutsos, Accuracy - “Internet” MUSCLES consistently outperforms AR & “yesterday”

94 CMU SCS (c) C. Faloutsos, Linear forecasting - Outline Auto-regression Least Squares; recursive least squares Co-evolving time sequences Examples Conclusions

95 CMU SCS (c) C. Faloutsos, Conclusions - Practitioner’s guide AR(IMA) methodology: prevailing method for linear forecasting Brilliant method of Recursive Least Squares for fast, incremental estimation. See [Box-Jenkins] (AWSOM: no human intervention)

96 CMU SCS (c) C. Faloutsos, Resources: software and urls free-ware: ‘R’ for stat. analysis (clone of Splus) python script for RLS

97 CMU SCS (c) C. Faloutsos, Books George E.P. Box and Gwilym M. Jenkins and Gregory C. Reinsel, Time Series Analysis: Forecasting and Control, Prentice Hall, 1994 (the classic book on ARIMA, 3rd ed.) Brockwell, P. J. and R. A. Davis (1987). Time Series: Theory and Methods. New York, Springer Verlag.

98 CMU SCS (c) C. Faloutsos, Additional Reading [Papadimitriou+ vldb2003] Spiros Papadimitriou, Anthony Brockwell and Christos Faloutsos Adaptive, Hands-Off Stream Mining VLDB 2003, Berlin, Germany, Sept [Yi+00] Byoung-Kee Yi et al.: Online Data Mining for Co-Evolving Time Sequences, ICDE (Describes MUSCLES and Recursive Least Squares)

99 CMU SCS (c) C. Faloutsos, Outline Motivation Similarity search and distance functions Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

100 CMU SCS (c) C. Faloutsos, Bursty Traffic & Multifractals

101 CMU SCS (c) C. Faloutsos, SKIP, if you have done HW2, Q3: Foils use D1 (‘information fractal dimension’), While HW2-Q3 uses D2 (‘correlation’ f.d.)

102 CMU SCS (c) C. Faloutsos, Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals –Problem –Main idea (80/20, Hurst exponent) –Results HW2-Q3

103 CMU SCS (c) C. Faloutsos, Reference: [Wang+02] Mengzhi Wang, Tara Madhyastha, Ngai Hang Chang, Spiros Papadimitriou and Christos Faloutsos, Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic, ICDE 2002, San Jose, CA, 2/26/ /1/2002. Full thesis: CMU-CS Performance Modeling of Storage Devices using Machine Learning Mengzhi Wang, Ph.D. Thesis Abstract,.ps.gz,.pdf Abstract.ps.gz.pdf HW2-Q3

104 CMU SCS (c) C. Faloutsos, Recall: Problem #1: Goal: given a signal (eg., #bytes over time) Find: patterns, periodicities, and/or compress time #bytes Bytes per 30’ (packets per day; earthquakes per year) HW2-Q3

105 CMU SCS (c) C. Faloutsos, Problem #1 model bursty traffic generate realistic traces (Poisson does not work) time # bytes Poisson HW2-Q3

106 CMU SCS (c) C. Faloutsos, Motivation predict queue length distributions (e.g., to give probabilistic guarantees) “learn” traffic, for buffering, prefetching, ‘active disks’, web servers HW2-Q3

107 CMU SCS (c) C. Faloutsos, Q: any ‘pattern’? time # bytes Not Poisson spike; silence; more spikes; more silence… any rules? HW2-Q3

108 CMU SCS (c) C. Faloutsos, Solution: self-similarity # bytes time # bytes HW2-Q3

109 CMU SCS (c) C. Faloutsos, But: Q1: How to generate realistic traces; extrapolate; give guarantees? Q2: How to estimate the model parameters? HW2-Q3

110 CMU SCS (c) C. Faloutsos, Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals –Problem –Main idea (80/20, Hurst exponent) –Results HW2-Q3

111 CMU SCS (c) C. Faloutsos, Approach Q1: How to generate a sequence, that is –bursty –self-similar –and has similar queue length distributions HW2-Q3

112 CMU SCS (c) C. Faloutsos, Approach A: ‘binomial multifractal’ [Wang+02] ~ ‘law’: –80% of bytes/queries etc on first half –repeat recursively b: bias factor (eg., 80%) HW2-Q3

113 CMU SCS (c) C. Faloutsos, Binary multifractals HW2-Q3

114 CMU SCS (c) C. Faloutsos, Binary multifractals HW2-Q3

115 CMU SCS (c) C. Faloutsos, Parameter estimation Q2: How to estimate the bias factor b? HW2-Q3

116 CMU SCS (c) C. Faloutsos, Parameter estimation Q2: How to estimate the bias factor b? A: MANY ways [Crovella+96] –Hurst exponent –variance plot –even DFT amplitude spectrum! (‘periodogram’) –More robust: ‘entropy plot’ [Wang+02] HW2-Q3

117 CMU SCS (c) C. Faloutsos, Entropy plot Rationale: – burstiness: inverse of uniformity –entropy measures uniformity of a distribution –find entropy at several granularities, to see whether/how our distribution is close to uniform. HW2-Q3

118 CMU SCS (c) C. Faloutsos, Entropy plot Entropy E(n) after n levels of splits n=1: E(1)= - p1 log 2 (p1)- p2 log 2 (p2) p1p2 % of bytes here HW2-Q3

119 CMU SCS (c) C. Faloutsos, Entropy plot Entropy E(n) after n levels of splits n=1: E(1)= - p1 log(p1)- p2 log(p2) n=2: E(2) = -   p 2,i * log 2 (p 2,i ) p 2,1 p 2,2 p 2,3 p 2,4 HW2-Q3

120 CMU SCS (c) C. Faloutsos, Real traffic Has linear entropy plot (-> self-similar) # of levels (n) Entropy E(n) 0.73 HW2-Q3

121 CMU SCS (c) C. Faloutsos, Observation - intuition: intuition: slope = intrinsic dimensionality = info-bits per coordinate-bit –unif. Dataset: slope =? –multi-point: slope = ? # of levels (n) Entropy E(n) HW2-Q3

122 CMU SCS (c) C. Faloutsos, Observation - intuition: intuition: slope = intrinsic dimensionality = info-bits per coordinate-bit –unif. Dataset: slope =1 –multi-point: slope = 0 # of levels (n) Entropy E(n) HW2-Q3

123 CMU SCS (c) C. Faloutsos, Entropy plot - Intuition Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Pick a point; reveal its coordinate bit-by-bit - how much info is each bit worth to me? HW2-Q3

124 CMU SCS (c) C. Faloutsos, Entropy plot Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Is MSB 0? ‘info’ value = E(1): 1 bit HW2-Q3

125 CMU SCS (c) C. Faloutsos, Entropy plot Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Is MSB 0? Is next MSB =0? HW2-Q3

126 CMU SCS (c) C. Faloutsos, Entropy plot Slope ~ intrinsic dimensionality (in fact, ‘Information fractal dimension’) = info bit per coordinate bit - eg Dim = 1 Is MSB 0? Is next MSB =0? Info value =1 bit = E(2) - E(1) = slope! HW2-Q3

127 CMU SCS (c) C. Faloutsos, Entropy plot Repeat, for all points at same position: Dim=0 HW2-Q3

128 CMU SCS (c) C. Faloutsos, Entropy plot Repeat, for all points at same position: we need 0 bits of info, to determine position -> slope = 0 = intrinsic dimensionality Dim=0 HW2-Q3

129 CMU SCS (c) C. Faloutsos, Entropy plot Real (and 80-20) datasets can be in- between: bursts, gaps, smaller bursts, smaller gaps, at every scale Dim = 1 Dim=0 0

130 CMU SCS (c) C. Faloutsos, (Fractals, again) What set of points could have behavior between point and line? HW2-Q3

131 CMU SCS (c) C. Faloutsos, Cantor dust Eliminate the middle third Recursively! HW2-Q3

132 CMU SCS (c) C. Faloutsos, Cantor dust HW2-Q3

133 CMU SCS (c) C. Faloutsos, Cantor dust HW2-Q3

134 CMU SCS (c) C. Faloutsos, Cantor dust HW2-Q3

135 CMU SCS (c) C. Faloutsos, Cantor dust HW2-Q3

136 CMU SCS (c) C. Faloutsos, Dimensionality? (no length; infinite # points!) Answer: log2 / log3 = 0.6 Cantor dust HW2-Q3

137 CMU SCS (c) C. Faloutsos, Some more entropy plots: Poisson vs real Poisson: slope = ~1 -> uniformly distributed HW2-Q3

138 CMU SCS (c) C. Faloutsos, b-model b-model traffic gives perfectly linear plot Lemma: its slope is slope = -b log 2 b - (1-b) log 2 (1-b) Fitting: do entropy plot; get slope; solve for b E(n) n HW2-Q3

139 CMU SCS (c) C. Faloutsos, Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals –Problem –Main idea (80/20, Hurst exponent) –Experiments - Results HW2-Q3

140 CMU SCS (c) C. Faloutsos, Experimental setup Disk traces (from HP [Wilkes 93]) web traces from LBL lbl-conn-7.tar.Z HW2-Q3

141 CMU SCS (c) C. Faloutsos, Model validation Linear entropy plots Bias factors b: smallest b / smoothest: nntp traffic HW2-Q3

142 CMU SCS (c) C. Faloutsos, Web traffic - results LBL, NCDF of queue lengths (log-log scales) (queue length l) Prob( >l) How to give guarantees? HW2-Q3

143 CMU SCS (c) C. Faloutsos, Web traffic - results LBL, NCDF of queue lengths (log-log scales) (queue length l) Prob( >l) 20% of the requests will see queue lengths <100 HW2-Q3

144 CMU SCS (c) C. Faloutsos, Conclusions Multifractals (80/20, ‘b-model’, Multiplicative Wavelet Model (MWM)) for analysis and synthesis of bursty traffic

145 CMU SCS (c) C. Faloutsos, Books Fractals: Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)

146 CMU SCS (c) C. Faloutsos, Further reading: Crovella, M. and A. Bestavros (1996). Self- Similarity in World Wide Web Traffic, Evidence and Possible Causes. Sigmetrics. [ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1-15, Feb

147 CMU SCS (c) C. Faloutsos, Further reading [Riedi+99] R. H. Riedi, M. S. Crouse, V. J. Ribeiro, and R. G. Baraniuk, A Multifractal Wavelet Model with Application to Network Traffic, IEEE Special Issue on Information Theory, 45. (April 1999), [Wang+02] Mengzhi Wang, Tara Madhyastha, Ngai Hang Chang, Spiros Papadimitriou and Christos Faloutsos, Data Mining Meets Performance Evaluation: Fast Algorithms for Modeling Bursty Traffic, ICDE 2002, San Jose, CA, 2/26/ /1/2002. Entropy plots

148 CMU SCS (c) C. Faloutsos, Outline Motivation... Linear Forecasting Bursty traffic - fractals and multifractals Non-linear forecasting Conclusions

149 CMU SCS (c) C. Faloutsos, Chaos and non-linear forecasting

150 CMU SCS (c) C. Faloutsos, Reference: [ Deepay Chakrabarti and Christos Faloutsos F4: Large-Scale Automated Forecasting using Fractals CIKM 2002, Washington DC, Nov ]

151 CMU SCS (c) C. Faloutsos, Detailed Outline Non-linear forecasting –Problem –Idea –How-to –Experiments –Conclusions

152 CMU SCS (c) C. Faloutsos, Recall: Problem #1 Given a time series {x t }, predict its future course, that is, x t+1, x t+2,... Time Value

153 CMU SCS (c) C. Faloutsos, Datasets Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] time x(t) Lag-plot ARIMA: fails

154 CMU SCS (c) C. Faloutsos, How to forecast? ARIMA - but: linearity assumption Lag-plot ARIMA: fails

155 CMU SCS (c) C. Faloutsos, How to forecast? ARIMA - but: linearity assumption ANSWER: ‘Delayed Coordinate Embedding’ = Lag Plots [Sauer92] ~ nearest-neighbor search, for past incidents

156 CMU SCS (c) C. Faloutsos, General Intuition (Lag Plot) x t-1 xtxtxtxt 4-NN New Point Interpolate these… To get the final prediction Lag = 1, k = 4 NN

157 CMU SCS (c) C. Faloutsos, Questions: Q1: How to choose lag L? Q2: How to choose k (the # of NN)? Q3: How to interpolate? Q4: why should this work at all?

158 CMU SCS (c) C. Faloutsos, Q1: Choosing lag L Manually (16, in award winning system by [Sauer94])

159 CMU SCS (c) C. Faloutsos, Q2: Choosing number of neighbors k Manually (typically ~ 1-10)

160 CMU SCS (c) C. Faloutsos, Q3: How to interpolate? How do we interpolate between the k nearest neighbors? A3.1: Average A3.2: Weighted average (weights drop with distance - how?)

161 CMU SCS (c) C. Faloutsos, Q3: How to interpolate? A3.3: Using SVD - seems to perform best ([Sauer94] - first place in the Santa Fe forecasting competition) X t-1 xtxt

162 CMU SCS (c) C. Faloutsos, Q4: Any theory behind it? A4: YES!

163 CMU SCS (c) C. Faloutsos, Theoretical foundation Based on the ‘Takens theorem’ [Takens81] which says that long enough delay vectors can do prediction, even if there are unobserved variables in the dynamical system (= diff. equations)

164 CMU SCS (c) C. Faloutsos, Theoretical foundation Example: Lotka-Volterra equations dH/dt = r H – a H*P dP/dt = b H*P – m P H is count of prey (e.g., hare) P is count of predators (e.g., lynx) Suppose only P(t) is observed (t=1, 2, …). H P Skip

165 CMU SCS (c) C. Faloutsos, Theoretical foundation But the delay vector space is a faithful reconstruction of the internal system state So prediction in delay vector space is as good as prediction in state space Skip H P P(t-1) P(t)

166 CMU SCS (c) C. Faloutsos, Detailed Outline Non-linear forecasting –Problem –Idea –How-to –Experiments –Conclusions

167 CMU SCS (c) C. Faloutsos, Datasets Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] time x(t) Lag-plot

168 CMU SCS (c) C. Faloutsos, Datasets Logistic Parabola: x t = ax t-1 (1-x t-1 ) + noise Models population of flies [R. May/1976] time x(t) Lag-plot ARIMA: fails

169 CMU SCS (c) C. Faloutsos, Logistic Parabola Timesteps Value Our Prediction from here

170 CMU SCS (c) C. Faloutsos, Logistic Parabola Timesteps Value Comparison of prediction to correct values

171 CMU SCS (c) C. Faloutsos, Datasets LORENZ: Models convection currents in the air dx / dt = a (y - x) dy / dt = x (b - z) - y dz / dt = xy - c z Value

172 CMU SCS (c) C. Faloutsos, LORENZ Timesteps Value Comparison of prediction to correct values

173 CMU SCS (c) C. Faloutsos, Datasets Time Value LASER: fluctuations in a Laser over time (used in Santa Fe competition)

174 CMU SCS (c) C. Faloutsos, Laser Timesteps Value Comparison of prediction to correct values

175 CMU SCS (c) C. Faloutsos, Conclusions Lag plots for non-linear forecasting (Takens’ theorem) suitable for ‘chaotic’ signals

176 CMU SCS (c) C. Faloutsos, References Deepay Chakrabarti and Christos Faloutsos F4: Large- Scale Automated Forecasting using Fractals CIKM 2002, Washington DC, Nov Sauer, T. (1994). Time series prediction using delay coordinate embedding. (in book by Weigend and Gershenfeld, below) Addison-Wesley. Takens, F. (1981). Detecting strange attractors in fluid turbulence. Dynamical Systems and Turbulence. Berlin: Springer-Verlag.

177 CMU SCS (c) C. Faloutsos, References Weigend, A. S. and N. A. Gerschenfeld (1994). Time Series Prediction: Forecasting the Future and Understanding the Past, Addison Wesley. (Excellent collection of papers on chaotic/non-linear forecasting, describing the algorithms behind the winners of the Santa Fe competition.)

178 CMU SCS (c) C. Faloutsos, Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs

179 CMU SCS (c) C. Faloutsos, Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool

180 CMU SCS (c) C. Faloutsos, Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool Linear Forecasting: AR (Box-Jenkins) methodology; AWSOM

181 CMU SCS (c) C. Faloutsos, Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool Linear Forecasting: AR (Box-Jenkins) methodology; AWSOM Bursty traffic: multifractals (80-20 ‘law’)

182 CMU SCS (c) C. Faloutsos, Overall conclusions Similarity search: Euclidean/time-warping; feature extraction and SAMs Signal processing: DWT is a powerful tool Linear Forecasting: AR (Box-Jenkins) methodology; AWSOM Bursty traffic: multifractals (80-20 ‘law’) Non-linear forecasting: lag-plots (Takens)


Download ppt "CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #25: Time series mining and forecasting Christos Faloutsos."

Similar presentations


Ads by Google