Метод К-средних в кластер-анализе и его интеллектуализация Б.Г. Миркин Профессор, Кафедра анализа данных и искусственного интеллекта, НИУ ВШЭ Москва РФ.

Slides:



Advertisements
Similar presentations
Different Perspectives at Clustering: The Number-of-Clusters Case B. Mirkin School of Computer Science Birkbeck College, University of London IFCS 2006.
Advertisements

L2 and L1 Criteria for K-Means Bilinear Clustering B. Mirkin School of Computer Science Birkbeck College, University of London Advert of a Special Issue:
Clustering Basic Concepts and Algorithms
Unsupervised learning
Data Mining Techniques: Clustering
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Introduction to Bioinformatics
AEB 37 / AE 802 Marketing Research Methods Week 7
Cluster Analysis.
Clustering CMPUT 466/551 Nilanjan Ray. What is Clustering? Attach label to each observation or data points in a set You can say this “unsupervised classification”
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Lecture 09 Clustering-based Learning
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Clustering: Tackling Challenges with Data Recovery Approach B. Mirkin School of Computer Science Birkbeck University of London Advert of a Special Issue:
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
COMP53311 Clustering Prepared by Raymond Wong Some parts of this notes are borrowed from LW Chan ’ s notes Presented by Raymond Wong
Data clustering: Topics of Current Interest Boris Mirkin 1,2 1 National Research University Higher School of Economics Moscow RF 2 Birkbeck University.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
SCATTER/GATHER : A CLUSTER BASED APPROACH FOR BROWSING LARGE DOCUMENT COLLECTIONS GROUPER : A DYNAMIC CLUSTERING INTERFACE TO WEB SEARCH RESULTS MINAL.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
Stratified K-means Clustering Over A Deep Web Data Source Tantan Liu, Gagan Agrawal Dept. of Computer Science & Engineering Ohio State University Aug.
Cluster Analysis Cluster Analysis Cluster analysis is a class of techniques used to classify objects or cases into relatively homogeneous groups.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Clustering.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Applied Multivariate Statistics Cluster Analysis Fall 2015 Week 9.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
K-MEANS CLUSTERING. INTRODUCTION- What is clustering? Clustering is the classification of objects into different groups, or more precisely, the partitioning.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
CLUSTER ANALYSIS. Cluster Analysis  Cluster analysis is a major technique for classifying a ‘mountain’ of information into manageable meaningful piles.
Unsupervised Learning
Semi-Supervised Clustering
Machine Learning Clustering: K-means Supervised Learning
Constrained Clustering -Semi Supervised Clustering-
INITIALISATION OF K-MEANS
K-means and Hierarchical Clustering
Revision (Part II) Ke Chen
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Revision (Part II) Ke Chen
Data Mining – Chapter 4 Cluster Analysis Part 2
Simple Kmeans Examples
Cluster Analysis.
Text Categorization Berlin Chen 2003 Reference:
Topic 5: Cluster Analysis
Cluster analysis Presented by Dr.Chayada Bhadrakom
Unsupervised Learning
Presentation transcript:

Метод К-средних в кластер-анализе и его интеллектуализация Б.Г. Миркин Профессор, Кафедра анализа данных и искусственного интеллекта, НИУ ВШЭ Москва РФ Professor Emeritus, School of Computer Science & Information Systems, Birkbeck College University of London, UK 1

Outline: Clustering as empirical classification K-Means and its issues: (1) Determining K and initialization (2) Weighting variables Addressing (1): Data recovery clustering and K-Means (Mirkin 1987, 1990) One-by-one clustering: Anomalous patterns and iK-Means Other approaches Computational experiment Addressing (2): Three-stage K-Means Minkowski K-Means Computational experiment Conclusion 2

WHAT IS CLUSTERING; WHAT IS DATA K-MEANS CLUSTERING: Conventional K-Means; Initialization of K- Means; Intelligent K-Means; Mixed Data; Interpretation Aids WARD HIERARCHICAL CLUSTERING: Agglomeration; Divisive Clustering with Ward Criterion; Extensions of Ward Clustering DATA RECOVERY MODELS: Statistics Modelling as Data Recovery; Data Recovery Model for K-Means; for Ward; Extensions to Other Data Types; One-by-One Clustering DIFFERENT CLUSTERING APPROACHES: Extensions of K-Means; Graph-Theoretic Approaches; Conceptual Description of Clusters GENERAL ISSUES: Feature Selection and Extraction; Similarity on Subsets and Partitions; Validity and Reliability 3

Referred recent work: B.G. MirkinB.G. Mirkin, Chiang M. (2010) Intelligent choice of the number of clusters in K-Means clustering: An experimental study with different cluster spreads, J. of Classification, 27, 1, 3-41Intelligent choice of the number of clusters in K-Means clustering: An experimental study with different cluster spreads B.G. MirkinB.G. Mirkin, Choosing the number of clusters (2011), WIRE Data Mining and Knowledge Discovery, 1, 3, Choosing the number of clusters B.G. MirkinB.G. Mirkin, R.Amorim (2012) Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering, Pattern Recognition, 45, Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering 4

What is clustering? Finding homogeneous fragments, mostly sets of entities, in datasets for further analysis 5

Example: W. Jevons (1857) planet clusters (updated by Mirkin 1996) Pluto doesn’t fit in the two clusters of planets: originated another cluster (September 2006) 6

Example: A Few Clusters Clustering interface to WEB search engines (Grouper): Query: Israel (after O. Zamir and O. Etzioni 2001) Cluster # sites Interpretation 1 View Refine 24Society, religion Israel and Iudaism Judaica collection 2 View Refine 12Middle East, War, History The state of Israel Arabs and Palestinians 3 View Refine 31Economy, Travel Israel Hotel Association Electronics in Israel 7

Clustering algorithms: Nearest neighbour Agglomerative clustering Divisive clustering Conceptual clustering K-means Kohonen SOM Spectral clustering …………………. 8

Batch K-Means: a generic clustering method Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids (seeds) 1. Assign points to the centroids according to minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence K= 3 hypothetical centroids * * * * * * * @ ** * * * 9

K-Means: a generic clustering method Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids (seeds) 1. Assign points to the centroids according to Minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence * * * * * * * @ ** * * * 10

K-Means: a generic clustering method Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids (seeds) 1. Assign points to the centroids according to Minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence * * * * * * * @ ** * * * 11

K-Means: a generic clustering method Entities are presented as multidimensional points (*) 0. Put K hypothetical centroids (seeds) 1. Assign points to the centroids according to Minimum distance rule 2. Put centroids in gravity centres of thus obtained clusters 3. Iterate 1. and 2. until convergence 4. Output final centroids and clusters * * * * * * ** * * 12

K-Means criterion: Summary distance to cluster centroids Minimize * * * * * * ** * * 13

Advantages of K-Means - Models typology building - Simple “data recovery” criterion - Computationally effective - Can be utilised incrementally, `on-line’ Shortcomings of K-Means - Initialisation: no advice on K or initial centroids - No deep minima - No defence of irrelevant features 14

Initial Centroids: Correct Two cluster case 15

Initial Centroids: Correct Initial Final 16

Different Initial Centroids 17

Different Initial Centroids: Wrong Initial Final 18

(1) To address: * Number of clusters Issue: Criterion W K < W K-1 * Initial setting * Deeper minimum The two are interrelated: a good initial setting leads to a deeper minimum 19

Number K: conventional approach Take a range RK of K, say K=3, 4, …, 15 For each K  RK Run K-Means times from randomly chosen initial centroids and choose the best of them W(S,c)=W K. Compare W K for all K  RK in a special way and choose the best; such as Gap statistic (2001) Jump statistic (2003) Hartigan (1975): In the ascending order of K, pick the first K at which H K = [ W K / W K ]/(N-K-1)  10 20

(1) Addressing * Number of clusters * Initial setting with a PCA-like method in the data recovery approach 21

Representing a partition Cluster k : Centroid c kv (v - feature) Binary 1/0 membership z ik (i - entity) 22

Basic equations (same as for PCA, but score vectors z k constrained to be binary) y – data entry, z – 1/0 membership, not score c - cluster centroid, N – cardinality i - entity, v - feature /category, k - cluster 23

Quadratic data scatter decomposition (Pythagorean) K-means: Alternating LS minimisation y – data entry, z – 1/0 membership c - cluster centroid, N – cardinality i - entity, v - feature /category, k - cluster 24

Equivalent criteria (1) A. Bilinear residuals squared MIN Minimizing difference between data and cluster structure B. Distance-to-Centre Squared MIN Minimizing difference between data and cluster structure 25

Equivalent criteria (2) C. Within-group error squared MIN Minimizing difference between data and cluster structure D. Within-group variance Squared MIN Minimizing within-cluster variance 26

Equivalent criteria (3) E. Semi-averaged within distance squared MIN Minimizing dissimilarities within clusters F. Semi-averaged within similarity squared MAX Maximizing similarities within clusters 27

Equivalent criteria (4) G. Distant Centroids MAX Finding anomalous types H. Consensus partition MAX Maximizing correlation between sought partition and given variables 28

Equivalent criteria (5) I. Spectral Clusters MAX Maximizing summary Raileigh quotient over binary vectors 29

PCA inspired Anomalous Pattern Clustering y iv =c v z i + e iv, where z i = 1 if i  S, z i = 0 if i  S With Euclidean distance squared c S must be anomalous, that is, interesting 30

Initial setting with Anomalous Pattern Cluster Tom Sawyer 31

Anomalous Pattern Clusters: Iterate 0 Tom Sawyer

iK-Means: Anomalous clusters + K-means After extracting 2 clusters (how one can know that 2 is right?) Final 33

iK-Means: Defining K and Initial Setting with Iterative Anomalous Pattern Clustering Find all Anomalous Pattern clusters Remove smaller (e.g., singleton) clusters Put the number of remaining clusters as K and initialise K-Means with their centres 34

Study of eight Number-of-clusters methods (joint work with Mark Chiang): Variance based: Hartigan (HK) Calinski & Harabasz (CH) Jump Statistic (JS) Structure based: Silhouette Width (SW) Consensus based: Consensus Distribution area (CD) Consensus Distribution mean (DD) Sequential extraction of APs (iK-Means): Least Square (LS) Least Moduli (LM) 35

Experimental results at 9 Gaussian clusters (3 spread patterns), 1000 x 15 data size Estimated number of clusters Adjusted Rand Index Large spread Small spread Large spreadSmall spread HK CH JS SW CD DD LS LM 1-time winner 2-times winner 3-times winner Two winners counted each time 36

37

(2) Address: Weighting features according to relevance w: feature weights=scale factors 3-step K-Means: -Given s, c, find w (weights) -Given w, c, find s (clusters) -Given s,w, find c (centroids) -till convergence 38

Minkowski’s centers Minimize over c At  >1, d(c) is convex Gradient method 39

Minkowski’s metric effects The more uniform distribution of the entities over a feature, the smaller its weight Uniform distribution  w=0 The best Minkowski power  is data dependent The best  can be learnt from data in a semi- supervised manner (with clustering of all objects) Example: at Fisher’s Iris, iMWK-Means gives 5 errors only (a record) 40

Conclusion: Data recovery K-Means-wise model of clustering is a tool that involves wealth of interesting criteria for mathematical investigation and application projects Further work: Extending the approach to other data types – text, sequence, image, web page Upgrading K-Means to address the issue of interpretation of the results DecoderModelCoder Data clustering Clusters Data recovery 41

HEFCE survey of students’ satisfaction HEFCE method: ALL 93 of highest mark STRATA: 43 best, ranging 71.8 to