Cluster analysis and segmentation of customers

Cluster analysis and segmentation of customers

What is Clustering analysis?
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a collection of algorithms Knowledge Discovery in Databases (KDD) - It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields. The difference with classification and other techniques is that we do not know a priori the identity of the clusters

Why Clustering? A very wide range of marketing research application:
Customer segmentation, profiling etc Market segmentation based on demographics, consumer behavior or other dimensions Very useful for the design of recommending systems and search engines

Examples A chain of radio-stores uses cluster analysis for identifying three different customer types with varying needs. An insurance company is using cluster analysis for classifying customers into segments like the “self confident customer”, “the price conscious customer” etc. A producer of copying machines succeeds in classifying industrial customers into “satisfied” and “non-satisfied or quarrelling” customers.

Dependence and Independence methods
Dependence Methods: We assume that a variable (i.e. Y) depends on (are caused or determined by) other variables (X1, X2 etc.) Examples: Regression, ANOVA, Discriminant Analysis Independence Methods: We do not assume that any variable(s) is (are) caused by or determined by others. Basically, we only have X1, X2 ….Xn (but no Y) Examples: Cluster Analysis, Factor Analysis etc. When using independence methods we let the data speak for themselves!

Input-data 1 Output-data 2 3 4 Cluster: X1 X2 … Xn Obs. 1 Obs. 2
Obs. i Obs, m Cluster 1 Cluster 2 Classify rows Factor 1 Factor 2 Factor: X1 X2 X3… Xj…Xn Obs. 1 Obs. 2 … Obs. m 3 4 Classify columns Relatedness of multivariate methods: cluster analysis and factor analysis

Multiple Regression The primary focus is on the variables! Y (Sales)
X1 (Price) X2 (Price Competitor) X3 (Adverting) Obs1 Obs2 Obs3 Obs4 Obs5 Obs6 Obs7 Obs8 Obs9 Obs10 . 95 90 80 85 .. 100 75 The primary focus is on the variables!

The primary focus is on the observations!
Cluster analysis X1 X2 X3 Obs1 Obs2 Obs3 Obs4 Obs5 Obs6 Obs7 Obs8 Obs9 Obs10 5 3 2 . 4 1 Cluster 1 Cluster 2 Cluster 3 The primary focus is on the observations!

Cluster analysis output: A new cluster-variable with a cluster-number on each respondent
X1 X2 X3 Cluster Obs1 Obs2 Obs3 Obs4 Obs5 Obs6 Obs7 Obs8 Obs9 Obs10 5 3 2 . 4 .. 1

Cluster analysis: A cross-tab between the cluster- variable and background + opinions is established
Age %-Females Household size Opinion 1 Opinion 2 Opinion 3 32 31 1.4 3.2 2.1 2.2 44 54 2.9 4.0 3.4 3.3 56 46 2.6 3.0 Core-families with Traditional values “Younger male nerds” “Senior-relaxers”

Cluster profiling: (hypothetical)
“Ecological shopper” Cluster 2: “Traditional shopper” Buy ecological food Advertisements funny Low price important 1 = Totally Agree

Governing principle Maximization of homogeneity within clusters and simultaneously Maximization of heterogeneity across clusters

Clustering Steps 1. Data cleaning and processing 2. Select similarity measurement 3. Select and apply clustering method 4. Interpret the result

Data Cleaning and Processing
The data can be numerical, binary, categorical, text, text and numbers, images etc A typical marketing data set may contain: How recently the customer has purchased from our company (POS data) The value of the items purchased (POS data) The customer’s zip code (derived from loyalty cards, credit cards, manual input at check-outs) The median house value for that zip code (secondary data found online) Promo variables (e.g. if the customer redeemed a coupon, rebate or an offer) Other demographic and geographic data (age, gender, income, country, etc) Other supplemental survey data yield the customer’s favorite TV shows, Internet sites etc. Do we need all that? Can we calculate a new figure and add this to our data set? GIGO – Garbage In – Garbage Out

Similarity The success of clustering depends on the measurement of similarity (or the opposite) We need to establish a distance metric. This depends on the type of the data we have available

Euclidean distance (Default in SPSS):
Y (x1, y1) (x2, y2) y2-y1 x2-x1 B * A * X d = (x2-x1)2 + (y2-y1)2 Other distances available in SPSS: City-Block uses  of absolute differences instead of squared differences of coordinates. Moreover: Minkowski distance, Cosine distance, Chebychev distance, Pearson Correlation.

Euclidean distance Y B (3, 5) * 5-2 A * (1, 2) 3-1 X

Which two pairs of points are to be clustered first?
G * A B * * F C * * D * E H * *

Maybe A/B and D/E (depending on algorithm!)
* A B * * F C * * D * E H * *

Quo vadis, C? G * A B * * C * D * E H * *

Quo vadis, C? (Continued)
G * A B * * C * D * E H * *

How does one decide which cluster a “newcoming” point is to join?
Measuring distances from point to clusters or points: “Farthest neighbour” (complete linkage) “Nearest neighbour” (single linkage) “Neighbourhood centre” (average linkage)

Quo vadis, C? (Continued)
G * A B * * 10,5 8,5 7,0 11,0 C * 8,5 D * 9,0 12,0 9,5 E H * *

Complete linkage G * A B * * 10,5 C * D * 9,5 E H * *
Minimize longest distance from cluster to point G * A B * * 10,5 C * D * 9,5 E H * *

Average linkage G * A B * * 8,5 C * D * 9,0 E H * *
Minimize average distance from cluster to point G * A B * * 8,5 C * D * 9,0 E H * *

Single linkage Minimize shortest distance from cluster to point G * A
B * * 7,0 C * 8,5 D * E H * *

Single linkage: Pitfall
* A and C merge into the same cluster omitting B! * Chaining or Snake-like clusters * Cluster formation begins A C * All the time the closest observation is put into the existing cluster(s) * B * * * * *

Single linkage: Advantage
* * * * ** * * Outliers * * * * Entropy group * * * * Good outlier detection and removal procedure in cases with “noisy” data sets

Similarity in textual or categorical data
Edit distance Assume that we want to find the distance between the words Garamond and Gautami. I need to perform 5 changes and 1 deletion. Distance 6. G A R M O N D U T I

Similarity in textual or categorical data
Hamming distance Let’s assume that we want to compare ‘‘GOR234’’ with ‘‘GHT335’’. The distance is 4. G O R 2 3 4 H T 5

Single Linkage: Minimum distance * Complete Linkage: Maximum distance * Average Linkage: Average distance * Centroid method: Distance between centres * Wards method: Minimization of within-cluster variance *

Which similarity metric to select?
Average Linkage Tends to combine clusters with small within-cluster variance. Tends to be biased towards establishing clusters with about the same variance Single linkage Often produces big chain like clusters. It does not require metric data Complete linkage Highly sensitive to outliers. It does not require metric data Centroid Sometimes produces muddled results, but it is less affected from outliers Median Appropriate for non-metric data Ward Produces small clusters with the same size

2 Non overlapping Overlapping Hierarchical Non-hierarchical 1a 1b 1c
Agglomerative Divisive

Overview of clustering methods (in SPSS)
Non-overlapping (Exclusive) Methods Overlapping Methods Non-hierarchical Hierarchical Non-hierarchical/ Partitioning/k-means - Overlapping k-centroids Overlapping k-means Latent class techniques - Fuzzy clustering - Q-type Factor analysis (9) Agglomerative Divisive - Sequential threshold - Parallel threshold - Neural Networks - Optimized partitioning (8) Linkage Methods Centroid Variance Name in SPSS 1 2 3 4 5 6 7 8 9 Between-groups linkage Within-groups linkage Nearest neighbour Furthest neighbour Centroid clustering Median clustering Ward’s method K-means cluster (Factor) - Centroid (5) - Median (6) - Average - Between (1) - Within (2) - Weighted - Single - Ordinary (3) - Density - Two stage Density - Complete (4) - Ward (7) Overview of clustering methods (in SPSS)

Cluster analysis More potential pitfalls & problems:
Do our data at all permit the use of means? Some methods (i.e. Wards) are biased toward production of clusters with approximately the same number of observations. Other methods (i. e. Centroid) require data as input that are metric scaled. So, strictly speaking it is not allowable to use this algorithm, when clustering data containing interval scales (Likert- or semantic differential scales).

Hierarchical Clustering Example I

Steps Let’s assume that each point is a cluster.
We find the closest distance between clusters, and we merge them. We recalculate the distances (many alternatives here!). We continue until all points belong to one cluster We draw the dendrogram and we select were to cut.

Dendrogram

Clustering cities in Italy

Hierarchical Clustering Example II

Cluster analysis: Small artificial example
1 0,68 3 0,42 2 6 0,92 4 5 0,58

Dendrogram * * * * * * 0,2 0,4 0,6 0,8 1,0 OBS 1 OBS 2 Step 0: OBS 3
Each observation is treated as a separate cluster OBS 3 * OBS 4 * OBS 5 * OBS 6 * Distance Measure 0, , , , ,0

Dendrogram (Continued)
OBS 1 * Cluster 1 OBS 2 * Step 1: Two observations with smallest pairwise distances are clustered OBS 3 * OBS 4 * OBS 5 * OBS 6 * 0, , , , ,0

OBS 1 * Cluster 1 OBS 2 * Step 2: Two other observations with smallest distances amongst remaining points/clusters are clustered OBS 3 * OBS 4 * OBS 5 * Cluster 2 OBS 6 * 0, , , , ,0

OBS 1 * Cluster 1 OBS 2 * OBS 3 * Step 3: Observation 3 joins with cluster 1 OBS 4 * OBS 5 * Cluster 2 OBS 6 * 0, , , , ,0

OBS 1 * OBS 2 * “Supercluster” OBS 3 * OBS 4 * Step 4: Cluster 1 and 2 - from Step 3 joint into a “Supercluster” OBS 5 * OBS 6 * 0, , , , ,0 A single observation remains unclustered (Outlier)

Hierarchical Clustering Example III

Market research for exporting activities
Manager Risk aversion Propensity to internationalize Α 2 6 Β 3 9 C 7 1 D 8 E

Running hierarchical cluster analysis in SPSS

SPSS hierarchical cluster analysis dialogue boxes
Note the above deviations from default settings. For instance, if Squared Euclidean distance is used the clustering will look different compared to the clustering displayed on next slide. SPSS hierarchical cluster analysis dialogue boxes

Output from SPSS between-groups (average) linkage

Single linkage B A D E C Cluster 2 Cluster 1 3,162 7,071 4,000 6,083
High B 3,162 Cluster 2 A 7,071 Cluster 1 Propensity to Internationalize (X2) 4,000 6,083 D 2,236 E 5,099 C Low Low Risk aversion (X1) High

Complete linkage B A D E C Cluster 2 Cluster 1 7,071 6,083 High
Propensity to Internationalize (X2) Cluster 1 D 6,083 E C Low Low Risk aversion (X1) High

Average linkage (between)
High B Cluster 2 A 7,071 Propensity to Internationalize (X2) 4,000 Cluster 1 6,083 D E 5,099 C Low Low Risk aversion (X1) High

Average linkage (within)

Wards method B A D E C Cluster 2 Cluster 1 3,162 7,071 4,000 6,083

Centroid method B A D E C Cluster 2 Cluster 1 5,522 5,500 High
Propensity to Internationalize (X2) Cluster 1 D 5,500 E C Low Low High

“outlier” CL2 CL1 Cluster Method: Hierarchical cluster
Algorithm: Between Groups Linkage Measure: Squared Euclidean Distance Plots: Dendrogram flagged

Non-hierarchical Clustering
K-means clustering algorithm

Concept At first we select the number of groups and we randomly assign each point to each team. Next, we revisit the assignments. The above process is repeated until a criterion is met. We need a priori to determine the number of clusters Popular algorithm in this category (non-hierarchical and non- overlapping) is the k-means. Synonyms partitioning and nearest centroid method

K-means Algorithm We select k cluster centers from the points of the data set . We assign each point to the closest center. We recalculate the centers. If the centers remain the same, we stop; Otherwise, we continue with step 2.

Example A study has been made among export managers within twenty industries (pharmaceuticals, heavy machinery, banking etc). Within each industry, interviews were carried out with 10 to 20 managers. Each manager has asked to comment on 2 statements.

Numeric example of k-means cluster analysis
1 1.19 2.33 2 1.25 4.50 3 1.17 5.06 4 4.72 5.35 5 1.47 0.95 6 0.82 3.24 7 1.50 2.64 8 1.60 4.86 9 0.51 2.20 10 3.89 2.89 11 2.07 4.53 12 3.06 5.30 13 5.07 4.67 14 1.54 4.35 15 1.71 1.56 16 0.83 5.12 17 2.78 5.27 18 2.73 4.03 19 1.04 4.02 20 5.45 5.40 Note that observations on X1 and X2 are averages on an industrial level. Within each industry, between 10 and 20 interviews were carried out with managers Table: Managers’ responses to two statements Expected market development (0 = Very negative, 6 = Very positive) Willingness to export (0 = None, 6 = Very high) Industry X1 X2 (Observation 1)

Very high 4 12 20 17 16 3 8 11 2 19 13 14 18 6 (1,19, 2,33) 10 Willingness to export (X2) 7 1 9 15 5 None Very negative Very positive

Introduction of cluster Seeds (Start of iteration 1)
Very high 4 12 20 17 16 3 8 11 2 19 13 14 18 6 10 7 1 9 15 Cluster centres (seeds) 5 None Very negative Very positive

Measurement of distances
between observations and seeds (average linkage) Very high 4 17 12 20 16 3 8 CL2 11 2 5,25 19 13 14 18 6 2,96 7 10 1 9 15 1,41 3,10 5 CL1 None Very negative Very positive

Allocation of observations to closest cluster seed
Very high 4 17 12 20 16 3 8 CL2 11 2 19 13 14 18 6 7 10 1 9 15 5 CL1 None Very negative Very positive

* * Recalculation of seeds (cluster centroids) for Iteration 2 4 12 3
Very high 4 12 3 17 16 8 20 11 * 2 (3,72; 4,68) 19 13 14 (1,22; 3,40) 18 6 * 10 7 CL2 1 9 15 Obs. x1 x2 4 4.72 5.35 10 3.89 2.89 11 2.07 4.53 12 3.06 5.30 13 5.07 4.67 17 2.78 5.27 18 2.73 4.03 20 5.45 5.40 Sum 29.77 37.44 Centroid 3.72 4.68 5 CL1 None Very negative Very positive CL2 29.77/8 = 3.72

Change of seeds between iterations may cause observations to change from one cluster to another
Very high (3,49) 20 11 * 1,66 1,50 * CL2 (3,63) 5 CL1 None Very negative Very positive

* * 20 11 CL2 5 CL1 Very high None Very Very negative positive
Change in centroid coordinates CL1 CL2 Iteration X1 X2 1 1.47 0.95 5.45 5.4 2 1.22 3.3 3.72 4.68 3 1.28 3.49 3.96 4.7 4

Running k-means cluster analysis in SPSS

SPSS k-means cluster analysis dialogue boxes

Determining the best number of clusters: ESS (Error Sum of Squares)
ESS and number of clusters Sum = = ESS for 2 clusters 26.42 Reduction in ESS is big between 2 and 3 clusters. So probably 3 clusters comprises the best solution

Two clusters Very high 4 12 3 17 16 8 20 11 2 19 13 14 18 6 10 7 CL2 1 9 15 5 CL1 None Very negative Very positive

Three clusters Very high 17 4 16 3 8 20 12 11 2 19 18 13 14 CL2 6 10 Willingness to export (X2) 7 CL3 1 9 15 CL1 5 None Very negative Very positive

Four clusters Very high 3 17 4 8 CL3 20 12 16 11 2 13 19 14 CL4 18 6 10 Willingness to export (X2) 7 CL1 1 9 15 5 CL2 None Very negative Very positive

The principle of replicated clustering
Clustering of numerical example using SPSS default seed and 1 iteration New clustering of data this time using observations 10 and 16 as initial seeds (1 iteration) See Table 12.10 Column V Listing of cluster assignment of observations in both cluster runs Panel 4: Crosstab of clusterings Replication 1 versus 2 Replication 1 versus 3 Notice the consistency between the first and the third clustering. Cluster membership number is reversed but the cluster number variable is nominal scaled and therefore the cluster number is arbitrary across runs. The principle of replicated clustering

Non-hierarchical method: Recap
It is an iterative procedure, so – unlike when using the hierarchical methods – there is no guarantee that the optimal solution is found (however, it usually comes quite close) The analyst must subjectively select the “best” number of clusters (some clues may help, though) The procedure is fast Applying (running) is straightforward It is rather easy to understand (interpret) However, in applied settings, differences between clusters may be small, thereby disappointing the analyst.

Non-hierarchical cluster analysis
How does one determine the optimal number of clusters? Unfortunately, no formal help (i.e. a built-in option) is presently available in SPSS for determining the best number of clusters. So: Try 2, 3, and 4 clusters and choose the one that “feels” best. It is recommended to use the ESS-procedure covered above in combination with sound reasoning.

Textbooks in Cluster Analysis
Brian S. Everitt Cluster Analysis for Social Scientists, 1983 Maurice Lorr Cluster Analysis for Researchers, 1984 Charles Romesburg Cluster Analysis, 1984 Aldenderfer and Blashfield

Cluster analysis and segmentation of customers

Similar presentations

Presentation on theme: "Cluster analysis and segmentation of customers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Cluster analysis and segmentation of customers

Similar presentations

Presentation on theme: "Cluster analysis and segmentation of customers"— Presentation transcript:

Similar presentations

About project

Feedback