Data clustering: Topics of Current Interest Boris Mirkin 1,2 1 National Research University Higher School of Economics Moscow RF 2 Birkbeck University.

Slides:



Advertisements
Similar presentations
L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:
Advertisements

K-means Clustering Given a data point v and a set of points X,
Cluster Analysis: Basic Concepts and Algorithms
Clustering Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Unsupervised learning
Neural Networks II CMPUT 466/551 Nilanjan Ray. Outline Radial basis function network Bayesian neural network.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Motion Analysis (contd.) Slides are from RPI Registration Class.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Clustering.
What is Cluster Analysis?
What is Cluster Analysis?
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Birch: An efficient data clustering method for very large databases
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Clustering: Tackling Challenges with Data Recovery Approach B. Mirkin School of Computer Science Birkbeck University of London Advert of a Special Issue:
Метод К-средних в кластер-анализе и его интеллектуализация Б.Г. Миркин Профессор, Кафедра анализа данных и искусственного интеллекта, НИУ ВШЭ Москва РФ.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
DATA MINING CLUSTERING K-Means.
1Ellen L. Walker Segmentation Separating “content” from background Separating image into parts corresponding to “real” objects Complete segmentation Each.
Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Apache Mahout. Mahout Introduction Machine Learning Clustering K-means Canopy Clustering Fuzzy K-Means Conclusion.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
CS654: Digital Image Analysis
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Clustering.
DATA CLUSTERING WITH KERNAL K-MEANS++ PROJECT OBJECTIVES o PROJECT GOAL  Experimentally demonstrate the application of Kernel K-Means to non-linearly.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Other Clustering Techniques
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Clustering Approaches Ka-Lok Ng Department of Bioinformatics Asia University.
Clustering Usman Roshan CS 675. Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
COMP24111 Machine Learning K-means Clustering Ke Chen.
Semi-Supervised Clustering
Ke Chen Reading: [7.3, EA], [9.1, CMB]
Data Mining K-means Algorithm
Topic 3: Cluster Analysis
INITIALISATION OF K-MEANS
K-means and Hierarchical Clustering
Hierarchical and Ensemble Clustering
Ke Chen Reading: [7.3, EA], [9.1, CMB]
Hierarchical and Ensemble Clustering
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
Text Categorization Berlin Chen 2003 Reference:
Topic 5: Cluster Analysis
Clustering The process of grouping samples so that the samples are similar within each group.
Introduction to Machine learning
Presentation transcript:

Data clustering: Topics of Current Interest Boris Mirkin 1,2 1 National Research University Higher School of Economics Moscow RF 2 Birkbeck University of London UK Supported by: - “Teacher-Student” grants from the Research Fund of NRU HSE Moscow ( ) -International Lab for Decision Analysis and Choice NRU HSE Moscow (2008 – pres.) -Laboratory of Algorithms and Technologies for Networks Analysis NRU HSE Nizhniy Novgorod Russia (2010 – pres.)

Data clustering: Topics of Current Interest 1.K-Means clustering and two issues 1.Finding right number of clusters 1.Before clustering (anomalous) 2.While clustering (divisive no minima of density function) 2.Weighting features (3-step iterations) 2.K-Means at similarity clustering (kernel k-means) 3.Semi-average similarity clustering 4.Consensus clustering 5.Spectral clustering, Threshold clustering and Modularity clustering 6.Laplacian pseudo-inverse transformation 7.Conclusion

Batch K-Means: a generic clustering method Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids (seeds) 1. Assign points to the centroids according to minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence K= 3 hypothetical centroids * * * * * * * @ ** * * * 3

K-Means: a generic clustering method Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids (seeds) 1. Assign points to the centroids according to Minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence * * * * * * * @ ** * * * 4

K-Means: a generic clustering method Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids (seeds) 1. Assign points to the centroids according to Minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence * * * * * * * @ ** * * * 5

K-Means: a generic clustering method Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids (seeds) 1. Assign points to the centroids according to Minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence 4. Output final centroids and clusters * * * * * * ** * * 6

K-Means criterion: Summary distance to cluster centroids Minimize * * * * * * ** * * 7

Advantages of K-Means - Models typology building - Simple “data recovery” criterion - Computationally effective - Can be utilised incrementally, `on-line’ Shortcomings of K-Means - Initialisation: no advice on K or initial centroids - No deep minima - No defence of irrelevant features 8

CODA Week 8 by Boris Mirkin9

10

11 Preprocess data by centering to Reference point, typically grand mean. 0 is grand mean since that. Build just one Anomalous cluster. CODA Week 8 by Boris Mirkin

12 Preprocess data by centering to Reference point, typically grand mean. 0 is grand mean since that. Build Anomalous cluster: 1. Initial center c is entity farthest away from Cluster update. if d(y i,c) < d(y i,0), assign y i to S. 3. Centroid update: Within-S mean c' if c'  c. Go to 2 with c  c'. Otherwise, halt. CODA Week 8 by Boris Mirkin

13 Anomalous Cluster is (almost) K-Means up to: (i) the number of clusters K=2: the “anomalous” one and the “main body” of entities around 0; (ii) center of the “main body” cluster is forcibly always at 0; (iii) a farthest away from 0 entity initializes the anomalous cluster. CODA Week 8 by Boris Mirkin

14 Anomalous Cluster  iK-Means is superior of: (Chiang, Mirkin, 2010) CODA Week 8 by Boris Mirkin

Issue: Weighting features according to relevance and Minkowski  -distance (Amorim, Mirkin, 2012) w: feature weights=scale factors 3-step K-Means: -Given s, c, find w (weights) -Given w, c, find s (clusters) -Given s,w, find c (centroids) -till convergence 15

Issue: Weighting features according to relevance and Minkowski  -distance 2 Minkowski’s centers Minimize over c At  >1, d(c) is convex Gradient method 16

Issue: Weighting features according to relevance and Minkowski  -distance 3 Minkowski’s metric effects The more uniform distribution of the entities over a feature, the smaller its weight Uniform distribution  w=0 The best Minkowski power  is data dependent The best  can be learnt from data in a semi- supervised manner (with clustering of all objects) Example: at Fisher’s Iris, iMWK-Means gives 5 errors only (a record) 17

K-Means kernelized 1 18

K-Means kernelized 2 19 K-Means equivalent criterion: find partition {S 1,…, S K } to maximize G(S 1,…, S K )= where (S k ) – within cluster mean Mirkin (1976, 1996, 2012): Build partition {S 1,…, S K } finding one cluster at a time

K-Means kernelized 3 20 K-Means equivalent criterion and one cluster S at a time: maximizing g(S)= (S)|S| where (S) – within cluster mean AddRemAdd(i) algorithm by adding/removing one entity at a time

K-Means kernelized 4 21 Semi-average criterion: max g(S)= (S)|S| where (S) – within cluster mean with AddRemAdd(i) (1)Spectral: max (2) Tight: the average similarity of S and j > (S) /2 if j  S < (S) /2 if j  S

Three extensions to entire data set Partitional: Take set of all entities I – 1. Compute S(i)=AddRem(i) for all i  I; – 2. Take S=S(i*) for i* maximizing f(S(i)) over all I – 3. Remove S from I; if I is not empty, goto 1; else halt. Additive: Take set of all entities I – 1. Compute S(i)=AddRem(i) for all i  I; – 2. Take S=S(i*) for i* maximizing f(S(i)) over all I – 3. subtract a(S)ss T from A; if No-stop-condition, goto 1; else halt. Explorative: Take set of all entities I – 1. Compute S(i)=AddRem(i) for all i  I; – 2. Leave those S(i) that do not much overlap. 22

Consensus partition I: Given partitions R1,R2,…,Rn, find an “average” R 23

Consensus partition 2: Given partitions R1,R2,…,Rn, find an “average” R 24 This is equivalent to max:

Consensus partition 3: Given partitions R1,R2,…,Rn, find an “average” R 25 Mirkin, Shestakov (2013): (1)This is superior to a bunch of contemporary consensus clustering approaches (2)Consensus clustering of results of multiple runs of K-Means is better in cluster recovery than best K-Means

Additive clustering I Given similarity A=(A(i,j)), find clusters u 1 =(u i 1 ), u 2 =(u i 2 ),…, u K =(u i K ) u i k either 1 or 0 - crisp clusters 0  u i k  1 - fuzzy clusters  1 u 1,  2 u 2,…,  K u K - intensity Additive Model: A=  1 2 u i 1 u j 1 + …+  V 2 u i V u j V +E; min  E  2 Shepard, Arabie 1979 (presented 1973); Mirkin 1987 (1976 in Russian) 26

Additive clustering II 27

Additive clustering III 28

Different criteria I Summary Uniform (Mirkin 1976 in Russian) Within-S sum of similarities A(i,j)-  to maximize Relates to those considered Summary Modular (Newman 2004) Within-S sum of similarities A(i,j)-B(i,j) to maximize B(i,j)= A(i,+)A(+,j)/A(+,+) 29

Different criteria II 30

FADDIS: Fuzzy Additive Spectral Clustering Spectral: B = Pseudo-inverse Laplacian of A – One cluster at a time Min ||B –  2 u i u j || 2 (One cluster to find) Residual similarity B  B –  2 u i u j Stopping conditions – Equivalent: Rayleigh quotient to maximize Max u T Bu/u T u [follows from model in contrast to a very popular, yet purely heuristic, approach by Shi and Malik 2000] Experimentally demonstrated: Competitive over – ordinary graphs for community detection – conventional (dis)similarity data – affinity data (kernel transformations of feature space data) – in-house synthetic data generators 31

Competitive at: Community detection in ordinary graphs Conventional similarity data Affinity similarity data Lapin transformed similarity data D=diag(B*1 N ) L = I - D -1/2 BD -1/2 L + = pinv(L) There are examples at which Lapin doesn’t work 32

Example at which Lapin does work, but no square error 33

Conclusion Clustering is yet far from a mathematical theory, however, it gets meaty + Gaussian kernels bringing distributions + Laplacian transformation bringing dynamics To make it to a theory, a way to go – Modeling dynamics – Compatibility at Multiple data and metadata – Interpretation