Download presentation
Presentation is loading. Please wait.
Published byRaymond Daniel Modified over 5 years ago
1
Assignment-2 Consider a suitable dataset. For clustering of data instances in different groups, apply different clustering techniques (minimum 2). Visualize the clusters using suitable tool.
2
Clustering Clustering is an unsupervised machine learning task that automatically divides the data into clusters, or groupings of similar items. It does this without having been told what the groups should look like ahead of time. As we may not even know what we're looking for, clustering is used for knowledge discovery rather than prediction. It provides an insight into the natural groupings found within data.
3
Applications Segmenting customers into groups with similar demographics or buying patterns for targeted marketing campaigns and/or detailed analysis of purchasing behavior by subgroup Detecting anomalous behavior, such as unauthorized intrusions into computer networks, by identifying patterns of use falling outside known clusters Simplifying extremely large datasets by grouping a large number of features with similar values into a much smaller number of homogeneous categories
4
Example:
5
Clustering
6
K-means clustering K Means Clustering is an unsupervised learning algorithm that tries to cluster data based on their similarity. Unsupervised learning means that there is no outcome to be predicted, and the algorithm just tries to find patterns in the data. In k-means clustering, we have the specify the number of clusters we want the data to be grouped into. The algorithm randomly assigns each observation to a cluster, and finds the centroid of each cluster. Then, the algorithm iterates through two steps: Reassign data points to the cluster whose centroid is closest. Calculate new centroid of each cluster.
7
kmeans () x kmeans(x, centers, iter.max = 10, nstart = 1) Where,
numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). centers either the number of clusters, say k, or a set of initial (distinct) cluster centres. If a number, a random set of (distinct) rows in x is chosen as the initial centres. iter.max the maximum number of iterations allowed. nstart if centers is a number, how many random sets should be chosen?
8
Example: library(ggplot2)
ggplot(iris, aes(Petal.Length, Petal.Width, color = Species)) + geom_point() set.seed(20) irisCluster <- kmeans(iris[, 3:4], 3, nstart = 20) irisCluster
9
Example:
10
Example:
11
Example-2 Mall_Customers.csv # Clients that subscribe to Membership card # Maintains the Purchase history # Score is Dependent on INCOME, # No. times in week the show up in Mall, total expense in same mall YOU ARE!! # TO Segment Clients into Different Groups based on Income & Score # CLUSTERING PROBLEM
12
Steps # We have no Idea to look for # We don't know the Optimal no. of Clusters # USE ELBOW METHOD # Applying K-means to Mall See the Output of same.
14
Useful resources
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.