Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)

Slides:



Advertisements
Similar presentations
Different types of data e.g. Continuous data:height Categorical data ordered (nominal):growth rate very slow, slow, medium, fast, very fast not ordered:fruit.
Advertisements

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Cluster Analysis: Basic Concepts and Algorithms
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
N. Kumar, Asst. Professor of Marketing Database Marketing Cluster Analysis.
Chapter 12: Cluster analysis and segmentation of customers
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
Introduction to Bioinformatics
AEB 37 / AE 802 Marketing Research Methods Week 7
Cluster Analysis.
Cluster Analysis Hal Whitehead BIOL4062/5062. What is cluster analysis? Non-hierarchical cluster analysis –K-means Hierarchical divisive cluster analysis.
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
Clustering II.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters.
Lecture 4 Cluster analysis Species Sequence P.symA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTTTTATTTCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG P.xanA AATGCCTGACGTGGGAAATCTTTAGGGCTAAGGTTAATATTCCGTATGCTATGTAGCTTAAGGGTACTGACGGTAG.
Cluster Analysis: Basic Concepts and Algorithms
What is Cluster Analysis?
Multivariate Data Analysis Chapter 9 - Cluster Analysis
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Dr. Michael R. Hyman Cluster Analysis. 2 Introduction Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that.
Clustering analysis workshop Clustering analysis workshop CITM, Lab 3 18, Oct 2014 Facilitator: Hosam Al-Samarraie, PhD.
Cluster Analysis Chapter 12.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Correspondence Analysis Chapter 14.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
CLUSTER ANALYSIS.
© 2007 Prentice Hall20-1 Chapter Twenty Cluster Analysis.
Cluster analysis 포항공과대학교 산업공학과 확률통계연구실 이 재 현. POSTECH IE PASTACLUSTER ANALYSIS Definition Cluster analysis is a technigue used for combining observations.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
1 Cluster Analysis Objectives ADDRESS HETEROGENEITY Combine observations into groups or clusters such that groups formed are homogeneous (similar) within.
Cluster Analysis Cluster Analysis Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups.
Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Machine Learning Queens College Lecture 7: Clustering.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Unsupervised Learning
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Copyright © 2010 Pearson Education, Inc Chapter Twenty Cluster Analysis.
Clustering / Scaling. Cluster Analysis Objective: – Partitions observations into meaningful groups with individuals in a group being more “similar” to.
1 Cluster Analysis Prepared by : Prof Neha Yadav.
Multivariate statistical methods Cluster analysis.
CLUSTER ANALYSIS. What is Cluster analysis? Cluster analysis is a techniques for grouping objects, cases, entities on the basis of multiple variables.
Basic statistical concepts Variance Covariance Correlation and covariance Standardisation.
Chapter_20 Cluster Analysis Naresh K. Malhotra
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Unsupervised Learning
Multivariate statistical methods
Data Mining K-means Algorithm
Jagdish Gangolly State University of New York at Albany
Revision (Part II) Ke Chen
Clustering and Multidimensional Scaling
Revision (Part II) Ke Chen
Jagdish Gangolly State University of New York at Albany
Data Mining – Chapter 4 Cluster Analysis Part 2
Chapter_20 Cluster Analysis
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
SEEM4630 Tutorial 3 – Clustering.
Cluster analysis Presented by Dr.Chayada Bhadrakom
Unsupervised Learning
Presentation transcript:

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 2 Cluster analysis It is a class of techniques used to classify cases into groups that are relatively homogeneous within themselves and heterogeneous between each other These groups are called clusters

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 3 Market segmentation Cluster analysis is especially useful for market segmentation Segmenting a market means dividing its potential consumers into separate sub-sets where Consumers in the same group are similar with respect to a given set of characteristics Consumers belonging to different groups are dissimilar with respect to the same set of characteristics This allows one to calibrate the marketing mix differently according to the target consumer group

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 4 Other uses of cluster analysis Clustering of similar brands or products according to their characteristics allow one to identify competitors, potential market opportunities and available niches. Data reduction number of variables Factor analysis and principal component analysis allow to reduce the number of variables. number of observations Cluster analysis allows to reduce the number of observations, by grouping them into homogeneous clusters. Maps profiling simultaneously consumers and products, market opportunities and preferences as in preference or perceptual mappings.

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 5 Steps to conduct a cluster analysis Select a distance measure Select a clustering algorithm Define the distance between two clusters Determine the number of clusters Validate the analysis

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 6 Distance measures for individual observations To measure similarity between two observations a distance measure is needed. Multiple variables require an aggregate distance measure The most known measure of distance is the Euclidean distance, which is the concept we use in everyday life for spatial coordinates.

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 7 Examples of distances D ij distance between cases i and j x kj value of variable x k for case j Problems: Different measures = different weights Correlation between variables (double counting) Solution: Standardization, rescaling, principal component analysis Euclidean distance City-block (Manhattan) distance A B A B

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 8 Clustering procedures Hierarchical procedures Agglomerative (start from n clusters to get to 1 cluster) Divisive (start from 1 cluster to get to n clusters) Non hierarchical procedures K-means clustering (knowledge of the number of clusters (c) is required).

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 9 Distance between clusters Algorithms vary according to the way the distance between two clusters is defined. The most common algorithm for hierarchical methods include single linkage method complete linkage method average linkage method Ward algorithm centroid method

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 10 Linkage methods Single linkage method (nearest neighbour): distance between two clusters is the minimum distance among all possible distances between observations belonging to the two clusters. Complete linkage method (furthest neighbour): nests two cluster using as a basis the maximum distance between observations belonging to separate clusters. Average linkage method: the distance between two clusters is the average of all distances between observations in the two clusters

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 11 Hierarchical vs. non-hierarchical methods Hierarchical MethodsNon-hierarchical methods  No decision about the number of clusters  Problems when data contain a high level of error  Can be very slow, preferable with small data-sets  Initial decisions are more influential (one-step only)  At each step they require computation of the full proximity matrix  Faster, more reliable, works with large data sets  Need to specify the number of clusters  Need to set the initial seeds  Only cluster distances to seeds need to be computed in each iteration

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 12 The number of clusters c Two alternatives Determined by the analysis Fixed by the researchers segmentation studiescIn segmentation studies, the c represents the number of potential separate segments. Preferable approach: “let the data speak” Hierarchical approach and optimal partition identified through statistical tests (stopping rule for the algorithm) However, the detection of the optimal number of clusters is subject to a high degree of uncertainty If the research objectives allow a choice rather than estimating the number of clusters, non-hierarchical methods are the way to go.

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 13 Example: fixed number of clusters A retailer wants to identify several shopping profiles in order to activate new and targeted retail outlets The budget only allows him to open three types of outlets A partition into three clusters follows naturally, although it is not necessarily the optimal one. Fixed number of clusters and (k-means), non hierarchical approach

Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 14 Determining the optimal number of cluster from hierarchical methods (in SPSS) Agglomeration schedule (programma di agglomerazione) Icicle plot (grafico a “stalattite”) Dendrogram