CLUSTERING EE 7000-1 Class Presentation. TOPICS  Clustering basic and types  K-means, a type of Unsupervised clustering  Supervised clustering type.

Slides:



Advertisements
Similar presentations
Clustering Basic Concepts and Algorithms
Advertisements

PARTITIONAL CLUSTERING
K Means Clustering , Nearest Cluster and Gaussian Mixture
Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification,
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering Color/Intensity
Basic Data Mining Techniques
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Unsupervised Learning
K-means clustering CS281B Winter02 Yan Wang and Lihua Lin.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Evaluating Performance for Data Mining Techniques
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Machine Learning. Learning agent Any other agent.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
Data Clustering 1 – An introduction
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Some questions -What is metadata? -Data about data.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Vector Quantization CAP5015 Fall 2005.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Data Mining and Text Mining. The Standard Data Mining process.
Topic 4: Cluster Analysis Analysis of Customer Behavior and Service Modeling.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
What Is Cluster Analysis?
Machine Learning Clustering: K-means Supervised Learning
A Personal Tour of Machine Learning and Its Applications
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Dr. Unnikrishnan P.C. Professor, EEE
Basic concepts of Data Mining, Clustering and Genetic Algorithms
Roberto Battiti, Mauro Brunato
Information Organization: Clustering
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Clustering 77B Recommender Systems
Text Categorization Berlin Chen 2003 Reference:
Presentation transcript:

CLUSTERING EE Class Presentation

TOPICS  Clustering basic and types  K-means, a type of Unsupervised clustering  Supervised clustering type Vector Quantization Fuzzy Identification Artificial neural net Fuzzy-neuro system

What is clustering ?  A technique that helps to extract more out of data  Clustering involves grouping data points together according to some measure of similarity  Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data

The usage of Clustering  Some engineering sciences such as pattern recognition, artificial intelligence have been using the concepts of cluster analysis.  In the life sciences (biology, botany, zoology, entomology, cytology, microbiology), the objects of analysis are life forms such as plants, animals, and insects. The clustering analysis may range from developing complete taxonomies to classification of the species into subspecies. The subspecies can be further classified into subspecies.  Clustering analysis is also widely used in information, policy and decision sciences. The various applications of clustering analysis to documents include votes on political issues, survey of markets, survey of products, survey of sales programs, and R & D.

A Clustering Example Income: High Children:1 Car:Luxury Income: Low Children:0 Car:Compact Car: Sedan and Children:3 Income: Medium Children:2 Car:Truck Cluster 1 Cluster 2 Cluster 3 Cluster 4

Clustering in FDI ?  Basically used to cluster (thereby identify) data as faulty or non-faulty  Also different fault conditions  Data from the system  processed ( creating residues, Fourier transform….)  Clustering algorithm to identify different conditions of the data

Properties of clustering  Hierarchical : multiple steps, fusion of data to get desired number of clusters.  Flat clustering : all clusters are same.  Non-hierarchical or iterative : assume no. of clusters, assign instances to them  Hard : each instance to only one cluster  Soft : assigns as a probability of belonging to all clusters  Disjunctive: Instances can be part of more than one cluster

Properties of Clustering a d k j h g i f e c b (a) Hard, non-hierarchical (c) Soft, non-hierarchical, disjunctive 123 a b c (b) Non-hierarchical, disjunctive a d k j h g f e c b (d) Hierarchical, hard Non-disjunctive g acie dkbjfh

Types of Clustering Supervised Clustering : The task is to learn to assign instances to pre-defined classes. ( Classification) Example: Cluster, given classes : blue, red & yellow  Unsupervised Clustering : The task is to learn a classification from the data. Discovers natural grouping. Example : cluster the data: given no. of clusters = 3

K-means algorithm ( a type of unsupervised clustering )  Specify k, the number of clusters  Choose k points randomly as cluster centers  Assign each instance to its closest cluster center using Euclidian distance  Calculate the median (mean) for each cluster, use it as its new cluster center  Reassign all instances to the closest cluster center  Iterate until the cluster centers do not change any more

Select the k cluster centers randomly. Store the k cluster centers. Loop until the change in cluster means is less the amount specified by the user.

Initial K cluster centers, calculation of centers in first iteration

Changed cluster centers after first iteration

Change in clusters during second iteration

Final positions of cluster centers centers

Supervised Clustering

Vector Quantization  Originated from Shannon’s coding theory  Instead of continuous levels, quatize the codes  Quantized levels are called codewords collection of them codebook  For transmission of codes, approximate each code by its nearest codeword ( Euclidean distance)  Divide the space containing codewords by perpendicular bisectors of lines joining two codewords  Neighboring region of a codeword is called voronoi region  Basically mapping of k dimensional vectors in the vector space R(k) into finite set of vectors

Voronoi region formation illustration