Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLUSTERING EE 7000-1 Class Presentation. TOPICS  Clustering basic and types  K-means, a type of Unsupervised clustering  Supervised clustering type.

Similar presentations


Presentation on theme: "CLUSTERING EE 7000-1 Class Presentation. TOPICS  Clustering basic and types  K-means, a type of Unsupervised clustering  Supervised clustering type."— Presentation transcript:

1 CLUSTERING EE 7000-1 Class Presentation

2 TOPICS  Clustering basic and types  K-means, a type of Unsupervised clustering  Supervised clustering type Vector Quantization Fuzzy Identification Artificial neural net Fuzzy-neuro system

3 What is clustering ?  A technique that helps to extract more out of data  Clustering involves grouping data points together according to some measure of similarity  Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data

4 The usage of Clustering  Some engineering sciences such as pattern recognition, artificial intelligence have been using the concepts of cluster analysis.  In the life sciences (biology, botany, zoology, entomology, cytology, microbiology), the objects of analysis are life forms such as plants, animals, and insects. The clustering analysis may range from developing complete taxonomies to classification of the species into subspecies. The subspecies can be further classified into subspecies.  Clustering analysis is also widely used in information, policy and decision sciences. The various applications of clustering analysis to documents include votes on political issues, survey of markets, survey of products, survey of sales programs, and R & D.

5 A Clustering Example Income: High Children:1 Car:Luxury Income: Low Children:0 Car:Compact Car: Sedan and Children:3 Income: Medium Children:2 Car:Truck Cluster 1 Cluster 2 Cluster 3 Cluster 4

6 Clustering in FDI ?  Basically used to cluster (thereby identify) data as faulty or non-faulty  Also different fault conditions  Data from the system  processed ( creating residues, Fourier transform….)  Clustering algorithm to identify different conditions of the data

7 Properties of clustering  Hierarchical : multiple steps, fusion of data to get desired number of clusters.  Flat clustering : all clusters are same.  Non-hierarchical or iterative : assume no. of clusters, assign instances to them  Hard : each instance to only one cluster  Soft : assigns as a probability of belonging to all clusters  Disjunctive: Instances can be part of more than one cluster

8 Properties of Clustering a d k j h g i f e c b (a) Hard, non-hierarchical (c) Soft, non-hierarchical, disjunctive 123 a b c 0.40.10.5 0.1 0.80.1 0.3 0.4... (b) Non-hierarchical, disjunctive a d k j h g f e c b (d) Hierarchical, hard Non-disjunctive g acie dkbjfh

9 Types of Clustering Supervised Clustering : The task is to learn to assign instances to pre-defined classes. ( Classification) Example: Cluster, given classes : blue, red & yellow  Unsupervised Clustering : The task is to learn a classification from the data. Discovers natural grouping. Example : cluster the data: given no. of clusters = 3

10 K-means algorithm ( a type of unsupervised clustering )  Specify k, the number of clusters  Choose k points randomly as cluster centers  Assign each instance to its closest cluster center using Euclidian distance  Calculate the median (mean) for each cluster, use it as its new cluster center  Reassign all instances to the closest cluster center  Iterate until the cluster centers do not change any more

11 Select the k cluster centers randomly. Store the k cluster centers. Loop until the change in cluster means is less the amount specified by the user.

12 Initial K cluster centers, calculation of centers in first iteration

13 Changed cluster centers after first iteration

14 Change in clusters during second iteration

15 Final positions of cluster centers centers

16 Supervised Clustering

17 Vector Quantization  Originated from Shannon’s coding theory  Instead of continuous levels, quatize the codes  Quantized levels are called codewords collection of them codebook  For transmission of codes, approximate each code by its nearest codeword ( Euclidean distance)  Divide the space containing codewords by perpendicular bisectors of lines joining two codewords  Neighboring region of a codeword is called voronoi region  Basically mapping of k dimensional vectors in the vector space R(k) into finite set of vectors

18 Voronoi region formation illustration

19


Download ppt "CLUSTERING EE 7000-1 Class Presentation. TOPICS  Clustering basic and types  K-means, a type of Unsupervised clustering  Supervised clustering type."

Similar presentations


Ads by Google