2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Brief introduction on Logistic Regression
Analytics Capabilities. Content DATAMATICS’ RESEARCH & ANALYTICS ADVANCE ANALYTICS CAPABILITIES OUR EXPERIENCES CONJOINT ANALYSIS RESEARCH TOOLS – SIMULATORS.
Weka. Preprocessing Opening a file Editing a file Visualize a variable.
AI Week 22 Machine Learning Data Mining Lee McCluskey, room 2/07
Learning From Data Chichang Jou Tamkang University.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
What Is Multivariate Analysis of Variance (MANOVA)?
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Data Mining – Intro.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Chapter 14 Inferential Data Analysis
Chapter 5 Data mining : A Closer Look.
Introduction to Directed Data Mining: Decision Trees
Romaric GUILLERM Hamid DEMMOU LAAS-CNRS Nabil SADOU SUPELEC/IETR ESM'2009, October 26-28, 2009, Holiday Inn Leicester, Leicester, United Kingdom.
The Practice of Social Research
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
A Comparison of Discriminant Functions and Decision Tree Induction Techniques for Evaluation of Antenatal Fetal Risk Assessment Nilgün Güler, Olcay Taner.
Data Mining Techniques
Bloom’s Critical Thinking Level 1 Knowledge Exhibits previously learned material by recalling facts, terms, basic concepts, and answers.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Chapter 13 Genetic Algorithms. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter.
COMP3503 Intro to Inductive Modeling
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123
Exploiting Clustering Techniques for Web Session Inference A.Bianco, G. Mardente, M. Mellia, M.Munafò, L. Muscariello (Politecnico di Torino)
1 Multivariate Analysis (Source: W.G Zikmund, B.J Babin, J.C Carr and M. Griffin, Business Research Methods, 8th Edition, U.S, South-Western Cengage Learning,
Chapter Fourteen Statistical Analysis Procedures Statistical procedures that simultaneously analyze multiple measurements on each individual or.
2002/4/10IDSL seminar Estimating Business Targets Advisor: Dr. Hsu Graduate: Yung-Chu Lin Data Source: Datta et al., KDD01, pp
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comprehensive Comparison Study of Document Clustering.
What is research? Based on Ranjit Kumar “Research methodology: a step-by-step guide for beginners”, 2005.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
XLMiner – a Data Mining Toolkit QuantLink Solutions Pvt. Ltd.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Multivariate Data Analysis Chapter 1 - Introduction.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Patch Based Prediction Techniques University of Houston By: Paul AMALAMAN From: UH-DMML Lab Director: Dr. Eick.
Optimization by Model Fitting Chapter 9 Luke, Essentials of Metaheuristics, 2011 Byung-Hyun Ha R1.
October 2-3, 2015, İSTANBUL Boğaziçi University Prof.Dr. M.Erdal Balaban Istanbul University Faculty of Business Administration Avcılar, Istanbul - TURKEY.
Overview G. Jogesh Babu. Overview of Astrostatistics A brief description of modern astronomy & astrophysics. Many statistical concepts have their roots.
MKT 700 Business Intelligence and Decision Models Week 8: Algorithms and Customer Profiling (1)
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Chapter 14 Chi-Square Tests.  Hypothesis testing procedures for nominal variables (whose values are categories)  Focus on the number of people in different.
Data Mining and Decision Support
1 UNIT 13: DATA ANALYSIS. 2 A. Editing, Coding and Computer Entry Editing in field i.e after completion of each interview/questionnaire. Editing again.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor : Dr. Hsu Graduate.
A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.
Appendix I A Refresher on some Statistical Terms and Tests.
Data Mining – Intro.
Multivariate Analysis
Mining Time-Changing Data Streams
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

2002/1/17IDS Lab Seminar Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin

2002/1/17IDS Lab Seminar Outline Motivation Objective The various paradigms The number of clusters Utility concepts Proposed approach A tourism market application Conclusion

2002/1/17IDS Lab Seminar Motivation To evaluate a clustering solution

2002/1/17IDS Lab Seminar Objective Propose a framework for evaluating a clustering solution Advocate a multimethodological approach

2002/1/17IDS Lab Seminar The various paradigms Statistical method Measures of association, association test, Automatic Interaction Detection(AID), Classification and Regression Tree-CART, Discriminant Analysis and Logistic Regression Machine Learning Tree Classification algorithm-C4.5 and prepositional rules-CN2 The conjugation of methodologies sets the stage for dealing with rich and complex problems

2002/1/17IDS Lab Seminar Statistical methodologies Association between two nominal variables Cramer Statistic

2002/1/17IDS Lab Seminar Statistical methodologies(cont ’ d) Uncertainty Coefficient

2002/1/17IDS Lab Seminar Statistical methodologies(cont ’ d) Mutual Information ANOVA MANOVA CART Discriminant Analysis Logistic Regression

2002/1/17IDS Lab Seminar Machine learning methodologies Decision Trees Provide a hierarchical process and model of classification Nonbacktracking and greedy optimisation algorithm Propositional Rules Provide logic models Represented by “ if condition then cluster ” Neural Networks Navie Bayes

2002/1/17IDS Lab Seminar The number of clusters May be set a priori May be an outcome of the clustering process itself The best number is obtained by comparing measures of model fit for as alternative numbers of clusters

2002/1/17IDS Lab Seminar The number of clusters(cont ’ d) Mixture Model Akaike Criteria(AIC)

2002/1/17IDS Lab Seminar Utility concepts The main question in evaluating a clustering  a question about utility Utility is evaluated by judgement

2002/1/17IDS Lab Seminar Proposed approach preprocess

2002/1/17IDS Lab Seminar Proposed approach(cont ’ d) The choice of a discriminant and classification methodologies  the nature of variables Regarding discrimination, complementary dimensions offer a new perspective and understanding An integration of methodologies and techniques based on the Statistical and Machine Learning Paradigms

2002/1/17IDS Lab Seminar A tourism market application The clustering solution Evaluation of clustering solution

2002/1/17IDS Lab Seminar Data base The answers to a questionnaire: Portuguese clients of Pousadas de Portugal 49 questions  200 variables 2500 Portuguese clients

2002/1/17IDS Lab Seminar Clustering Model sample: 1647 clients (65%) ; Validation sample: 897 clients (35%) Use a priori and a K-Means procedure 4 variables expressing the frequency and type of Pousadas CH, CSUP, C and B type 3 clusters (First time user, Regular users and Heavy users) Model: 18%, 60% and 22% Validation: 16%, 62% and 22%

2002/1/17IDS Lab Seminar Clustering(cont ’ d) 2 clusters (Heavy users and Regular users) Model: 16 Pousadas and 5 Pousadas Validation: 17 Pousadas and 4 Pousadas

2002/1/17IDS Lab Seminar A tourism market application The clustering solution Evaluation of clustering solution

2002/1/17IDS Lab Seminar Evaluation of clustering solution

2002/1/17IDS Lab Seminar Analysis of association between clusters and clustering base Measure the degree of correction in classification Model: 82.6% ; Validation: 91.5% The linear combinations of the clustering base variables that maximise the ratio between-within cluster variation

2002/1/17IDS Lab Seminar Analysis of association between clusters and clustering base(cont ’ d)

2002/1/17IDS Lab Seminar Analysis of association between clusters and other variables Chi-square  the strength of association between clusters and variables Rule Induction Procedures  discriminate and classify on the base of attributes significantly associated with clusters Rule induction provide a better comprehension of the facts discriminating the clusters C4.5 and CN2 evaluate both Model sample and Validation sample

2002/1/17IDS Lab Seminar Analysis of association between clusters and other variables(cont ’ d) Memorize a group/beam of the best solutions

2002/1/17IDS Lab Seminar Analysis of association between clusters and other variables(cont ’ d)

2002/1/17IDS Lab Seminar Analysis of association between clusters and other variables(cont ’ d)

2002/1/17IDS Lab Seminar Analysis of association between clusters and other variables(cont ’ d)

2002/1/17IDS Lab Seminar Global evaluation In Discriminant Analysis and Logistic Regression  clearly the differences between clusters Chi-square tests  association between variables and clusters C4.5 and CN2  provides a more complex and richer perspective

2002/1/17IDS Lab Seminar Conclusion Identifying significant associations characterising the clustered entities guided discriminant and classification analysis Propositional rule induction is suitable for discriminating purposes Multimethodological approach should consider not only inference but also descriptive analysis