CLUSTERING (Segmentation)

Name: CLUSTERING (Segmentation)
Uploaded: 2017-10-08T16:02:53+00:00
Duration: PTM9S15
Channel: Vincent Peters
Description: CLUSTERING (Segmentation)

CLUSTERING (Segmentation)
Saed Sayad

Data Mining Steps 1 2 3 4 5 6 www.ismartsoft.com Problem Definition
Data Preparation 3 Data Exploration 4 Modeling 5 Evaluation 6 Deployment

What is Clustering? Given a set of records, organize
the records into clusters Age Income A cluster is a subset of records which are similar

Clustering Requirements
The ability to discover some or all of the hidden clusters. Within-cluster similarity and between-cluster disimilarity. Ability to deal with various types of attributes. Can deal with noise and outliers. Can handle high dimensionality. Scalability, Interpretability and usability.

Similarity - Distance Measure
To measure similarity or dissimilarity between objects, we need a distance measure. The usual axioms for a distance measure D are: D(x, x) = 0 D(x, y) = D(y, x) D(x, y) ≤ D(x, z) + D(z, y) the triangle inequality

Similarity - Distance Measure
Euclidean Manhattan Minkowski

Similarity - Correlation
Dissimilar Credit$ Credit$ Age Age

Similarity – Hamming Distance
Gene 1 A T C G Gene 2 Hamming Distance 1

Clustering Methods Exclusive vs. Overlapping
Hierarchical vs. Partitive Deterministic vs. Probabilistic Incremental vs. Batch learning

Exclusive vs. Overlapping
Age Income Income Age

Hierarchical vs. Partitive
Age Income

Hierarchical Clustering
Hierarchical clustering involves creating clusters that have a predetermined ordering from top to bottom. For example, all files and folders on the hard disk are organized in a hierarchy. There are two types of hierarchical clustering: Agglomerative Divisive

Hierarchical Clustering
Agglomerative Divisive

Hierarchical Clustering - Agglomerative
Assign each observation to its own cluster. Compute the similarity (e.g., distance) between each of the clusters. Join the two most similar clusters. Repeat steps 2 and 3 until there is only a single cluster left.

Hierarchical Clustering - Divisive
Assign all of the observations to a single cluster. Partition the cluster to two least similar clusters. Proceed recursively on each cluster until there is one cluster for each observation.

Hierarchical Clustering – Single Linkage

Hierarchical Clustering – Complete Linkage

Hierarchical Clustering – Average Linkage

K Means Clustering Clusters the data into k groups where k is predefined. Select k points at random as cluster centers. Assign observations to their closest cluster center according to the Euclidean distance function. Calculate the centroid or mean of all instances in each cluster (this is the mean part) Repeat steps 2, 3 and 4 until the same points are assigned to each cluster in consecutive rounds.

K Means Clustering Income Age

K Means Clustering Sum of Squares function

Clustering Evaluation
Sarle’s Cubic Clustering Criterion The Pseudo-F Statistic The Pseudo-T2 Statistic Beale’s F-Type Statistic Target-based

Clustering Evaluation
Target Variable Categorical Chi2 Test K-S Test Numerical ANOVA H Test

Chi2 Test Actual Y N Predicted n11 n12 n21 n22

Analysis of Variance (ANOVA)
Source of Variation Sum of Squares Degree of Freedom Mean Square F P Between Groups SSB dfB MSB = SSB/dfB F=MSB/MSW P(F) Within Groups SSW dfw MSW = SSW/dfw Total SST dfT

Clustering - Applications
Marketing: finding groups of customers with similar behavior. Insurance & Banking: identifying frauds. Biology: classification of plants and animals given their features. Libraries: book ordering. City-planning: identifying groups of houses according to their house type, value and geographical location. World Wide Web: document classification; clustering weblog data to discover groups with similar access patterns.

Summary Clustering is the process of organizing objects (records or variables) into groups whose members are similar in some way. Hierarchical and K-Means are the two most used clustering techniques. The effectiveness of the clustering method depends on the similarity function. The result of the clustering algorithm can be interpreted and evaluated in different ways.

Questions?

CLUSTERING (Segmentation)

Similar presentations

Presentation on theme: "CLUSTERING (Segmentation)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CLUSTERING (Segmentation)

Similar presentations

Presentation on theme: "CLUSTERING (Segmentation)"— Presentation transcript:

Similar presentations

About project

Feedback