Presentation is loading. Please wait.

Presentation is loading. Please wait.

2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin.

Similar presentations


Presentation on theme: "2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin."— Presentation transcript:

1 2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin

2 2002/1/17IDS Lab Seminar Outline Motivation Objective The various paradigms The number of clusters Utility concepts Proposed approach A tourism market application Conclusion

3 2002/1/17IDS Lab Seminar Motivation To evaluate a clustering solution

4 2002/1/17IDS Lab Seminar Objective Propose a framework for evaluating a clustering solution Advocate a multimethodological approach

5 2002/1/17IDS Lab Seminar The various paradigms Statistical method Measures of association, association test, Automatic Interaction Detection(AID), Classification and Regression Tree-CART, Discriminant Analysis and Logistic Regression Machine Learning Tree Classification algorithm-C4.5 and prepositional rules-CN2 The conjugation of methodologies sets the stage for dealing with rich and complex problems

6 2002/1/17IDS Lab Seminar Statistical methodologies Association between two nominal variables Cramer Statistic

7 2002/1/17IDS Lab Seminar Statistical methodologies(cont ’ d) Uncertainty Coefficient

8 2002/1/17IDS Lab Seminar Statistical methodologies(cont ’ d) Mutual Information ANOVA MANOVA CART Discriminant Analysis Logistic Regression

9 2002/1/17IDS Lab Seminar Machine learning methodologies Decision Trees Provide a hierarchical process and model of classification Nonbacktracking and greedy optimisation algorithm Propositional Rules Provide logic models Represented by “ if condition then cluster ” Neural Networks Navie Bayes

10 2002/1/17IDS Lab Seminar The number of clusters May be set a priori May be an outcome of the clustering process itself The best number is obtained by comparing measures of model fit for as alternative numbers of clusters

11 2002/1/17IDS Lab Seminar The number of clusters(cont ’ d) Mixture Model Akaike Criteria(AIC)

12 2002/1/17IDS Lab Seminar Utility concepts The main question in evaluating a clustering  a question about utility Utility is evaluated by judgement

13 2002/1/17IDS Lab Seminar Proposed approach preprocess

14 2002/1/17IDS Lab Seminar Proposed approach(cont ’ d) The choice of a discriminant and classification methodologies  the nature of variables Regarding discrimination, complementary dimensions offer a new perspective and understanding An integration of methodologies and techniques based on the Statistical and Machine Learning Paradigms

15 2002/1/17IDS Lab Seminar A tourism market application The clustering solution Evaluation of clustering solution

16 2002/1/17IDS Lab Seminar Data base The answers to a questionnaire: Portuguese clients of Pousadas de Portugal 49 questions  200 variables 2500 Portuguese clients

17 2002/1/17IDS Lab Seminar Clustering Model sample: 1647 clients (65%) ; Validation sample: 897 clients (35%) Use a priori and a K-Means procedure 4 variables expressing the frequency and type of Pousadas CH, CSUP, C and B type 3 clusters (First time user, Regular users and Heavy users) Model: 18%, 60% and 22% Validation: 16%, 62% and 22%

18 2002/1/17IDS Lab Seminar Clustering(cont ’ d) 2 clusters (Heavy users and Regular users) Model: 16 Pousadas and 5 Pousadas Validation: 17 Pousadas and 4 Pousadas

19 2002/1/17IDS Lab Seminar A tourism market application The clustering solution Evaluation of clustering solution

20 2002/1/17IDS Lab Seminar Evaluation of clustering solution

21 2002/1/17IDS Lab Seminar Analysis of association between clusters and clustering base Measure the degree of correction in classification Model: 82.6% ; Validation: 91.5% The linear combinations of the clustering base variables that maximise the ratio between-within cluster variation

22 2002/1/17IDS Lab Seminar Analysis of association between clusters and clustering base(cont ’ d)

23 2002/1/17IDS Lab Seminar Analysis of association between clusters and other variables Chi-square  the strength of association between clusters and variables Rule Induction Procedures  discriminate and classify on the base of attributes significantly associated with clusters Rule induction provide a better comprehension of the facts discriminating the clusters C4.5 and CN2 evaluate both Model sample and Validation sample

24 2002/1/17IDS Lab Seminar Analysis of association between clusters and other variables(cont ’ d) Memorize a group/beam of the best solutions

25 2002/1/17IDS Lab Seminar Analysis of association between clusters and other variables(cont ’ d)

26 2002/1/17IDS Lab Seminar Analysis of association between clusters and other variables(cont ’ d)

27 2002/1/17IDS Lab Seminar Analysis of association between clusters and other variables(cont ’ d)

28 2002/1/17IDS Lab Seminar Global evaluation In Discriminant Analysis and Logistic Regression  clearly the differences between clusters Chi-square tests  association between variables and clusters C4.5 and CN2  provides a more complex and richer perspective

29 2002/1/17IDS Lab Seminar Conclusion Identifying significant associations characterising the clustered entities guided discriminant and classification analysis Propositional rule induction is suitable for discriminating purposes Multimethodological approach should consider not only inference but also descriptive analysis


Download ppt "2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin."

Similar presentations


Ads by Google