Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo.

Similar presentations


Presentation on theme: "Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo."— Presentation transcript:

1 Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo

2 Motivation Objective Introduction Seasonality Estimation Distance Function Experimental results Conclusions Personal opinion Outline

3 Motivation Most traditional clustering algorithms assume that the data is provided without measurement error

4 Objective To present a clustering method that incorporates information contained in these error estimates and a new distance function that is based on the distribution of errors in data

5 Introduction Definition of a good distance or dissimilarity function is a critical step in any distance based clustering method. Problem:Most traditional clustering methods assume that data is without any error,but errors are natural in any data measurement. Example:Sample average

6 Introduction This study and results are focused on time-series clustering in the retail industry This study assume that each point comes from a multidimensional Gaussian distribution

7 Seasonality Estimation (1/4) Seasonality is defined as the normalized underlying demand of a group of similar merchandize as a function of time of the year after taking into account other factors that impact sales such as discounts,inventory,promotions and random effects. Sale it= f I (I it )*f P (P it )*f Q (Q it )*f R (R it )*PLC i (t-t i 0 )*Seas it (1) After (1) remove the effects of all these nonseasonal factors Sale it= PLC i (t-t i 0 )*Seas it

8 Seasonality Estimation (2/4) S is a set of items following similar seasonality,therefore, S consists of items having a variety of PLCs differing in their shape and time duration Theorem 1:

9 Seasonality Estimation (3/4) If we take the average of weekly sales of all items in S then it would nullify the effect of PLCs as suggested by the following equations.

10 Seasonality Estimation (4/4) Seasonality values,Seas t, can be estimated by appropriate Scaling of weekly sales average, Sale t The above procedure provides us with a large number of seasonal patterns, one for each set S, along with estimates of associated errors.

11 Distance Function(1/4) Consider two seasonalities : A i ={(x i1,σ i1 ),(x i2, σ i2 ), …,(x iT, σ iT )} A j ={(x j2, σ j2 ),(x j2, σ j2 ), …,(x jT, σ jT )} We define similarity between two seasonalities as follows: If the null hypothesis H 0 :A i ~A j is true then similarity between Ai and Aj is the probability of accepting the hypothesis. The distance d ij between A i and A j is defined as ( 1-similarity) which is the probability of rejecting the H 0

12 Distance Function(2/4) Consider t th samples of both seasonalities A it =(x it, σ it ) and A jt =(x jt, σ jt ). (x it -x jt ) ~ N( u it -u jt, (σ 2 it + σ 2 jt ) 1/2 ) (1) If Ai~Aj then u it =u jt and consequently the statistic follows a t-distribution.

13 Distance Function(3/4) Finally distance Comparison with Euclidean Distance d ij is monotonically increasing with respect to

14 Distance Function(4/4) Comparison with Euclidean Distance If all σ ’ s were the same and equal to σ then it would become the rank order of (1) which is the same as the rank order of the Euclidean distance,(2)

15 Clustering Clustering Algorithm

16 Experimental Results (1/6) Simulated Data Figure 5: Individual(prior to clustering) seasonality estimates with associated errors

17 Experimental Results (2/6) Figure 6:Seasonalities obtained by hError

18 Experimental Results (3/6) Figure 7: Seasonalities obtained by kmeans and Ward ’ s method using Euclidean distances

19 Experimental Results (4/6) Clustering Method Average # misclassification Average Estimation Error hError Ward ’ s method kmeans 0.87 2.63 2.94 2.0182 4.7021 5.0337 Table 1:Average # misclassifications and Average Estimation Error for different clustering methods

20 Experimental Results (5/6) Clustering MethodAverage Forecast Error % hError Ward ’ s Kmeans No clustering 18.7 23.9 24.2 31.5 Table 2: Average Forecast Error(Retailer Data)

21 Experimental Results (6/6)

22 Conclusions The distance function d ij is invariant under different scales for data and the clustering method obtain better cluster than others.

23 Personal Opinion The concept of incorporating information about errors in the distance function is very good and can be used in many other clustering applications.


Download ppt "Clustering Seasonality Patterns in the Presence of Errors Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Mahesh Kumar Nitin R. Patel Jonathan Woo."

Similar presentations


Ads by Google