Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)

Similar presentations


Presentation on theme: "Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)"— Presentation transcript:

1 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)

2 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 2 Cluster analysis It is a class of techniques used to classify cases into groups that are relatively homogeneous within themselves and heterogeneous between each other These groups are called clusters

3 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 3 Market segmentation Cluster analysis is especially useful for market segmentation Segmenting a market means dividing its potential consumers into separate sub-sets where Consumers in the same group are similar with respect to a given set of characteristics Consumers belonging to different groups are dissimilar with respect to the same set of characteristics This allows one to calibrate the marketing mix differently according to the target consumer group

4 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 4 Other uses of cluster analysis Clustering of similar brands or products according to their characteristics allow one to identify competitors, potential market opportunities and available niches. Data reduction number of variables Factor analysis and principal component analysis allow to reduce the number of variables. number of observations Cluster analysis allows to reduce the number of observations, by grouping them into homogeneous clusters. Maps profiling simultaneously consumers and products, market opportunities and preferences as in preference or perceptual mappings.

5 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 5 Steps to conduct a cluster analysis Select a distance measure Select a clustering algorithm Define the distance between two clusters Determine the number of clusters Validate the analysis

6 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 6 Distance measures for individual observations To measure similarity between two observations a distance measure is needed. Multiple variables require an aggregate distance measure The most known measure of distance is the Euclidean distance, which is the concept we use in everyday life for spatial coordinates.

7 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 7 Examples of distances D ij distance between cases i and j x kj value of variable x k for case j Problems: Different measures = different weights Correlation between variables (double counting) Solution: Standardization, rescaling, principal component analysis Euclidean distance City-block (Manhattan) distance A B A B

8 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 8 Clustering procedures Hierarchical procedures Agglomerative (start from n clusters to get to 1 cluster) Divisive (start from 1 cluster to get to n clusters) Non hierarchical procedures K-means clustering (knowledge of the number of clusters (c) is required).

9 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 9 Distance between clusters Algorithms vary according to the way the distance between two clusters is defined. The most common algorithm for hierarchical methods include single linkage method complete linkage method average linkage method Ward algorithm centroid method

10 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 10 Linkage methods Single linkage method (nearest neighbour): distance between two clusters is the minimum distance among all possible distances between observations belonging to the two clusters. Complete linkage method (furthest neighbour): nests two cluster using as a basis the maximum distance between observations belonging to separate clusters. Average linkage method: the distance between two clusters is the average of all distances between observations in the two clusters

11 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 11 Hierarchical vs. non-hierarchical methods Hierarchical MethodsNon-hierarchical methods  No decision about the number of clusters  Problems when data contain a high level of error  Can be very slow, preferable with small data-sets  Initial decisions are more influential (one-step only)  At each step they require computation of the full proximity matrix  Faster, more reliable, works with large data sets  Need to specify the number of clusters  Need to set the initial seeds  Only cluster distances to seeds need to be computed in each iteration

12 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 12 The number of clusters c Two alternatives Determined by the analysis Fixed by the researchers segmentation studiescIn segmentation studies, the c represents the number of potential separate segments. Preferable approach: “let the data speak” Hierarchical approach and optimal partition identified through statistical tests (stopping rule for the algorithm) However, the detection of the optimal number of clusters is subject to a high degree of uncertainty If the research objectives allow a choice rather than estimating the number of clusters, non-hierarchical methods are the way to go.

13 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 13 Example: fixed number of clusters A retailer wants to identify several shopping profiles in order to activate new and targeted retail outlets The budget only allows him to open three types of outlets A partition into three clusters follows naturally, although it is not necessarily the optimal one. Fixed number of clusters and (k-means), non hierarchical approach

14 Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 14 Determining the optimal number of cluster from hierarchical methods (in SPSS) Agglomeration schedule (programma di agglomerazione) Icicle plot (grafico a “stalattite”) Dendrogram


Download ppt "Statistics for Marketing & Consumer Research Copyright © 2008 - Mario Mazzocchi 1 Cluster Analysis (from Chapter 12)"

Similar presentations


Ads by Google