Constrained Clustering -Semi Supervised Clustering-

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Hierarchical Clustering, DBSCAN The EM Algorithm
PARTITIONAL CLUSTERING
Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Unsupervised learning
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
A Probabilistic Framework for Semi-Supervised Clustering
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Unsupervised Learning and Data Mining
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
What is Cluster Analysis?
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Data mining and machine learning A brief introduction.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Text Clustering.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Machine Learning Overview Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Machine Learning Queens College Lecture 7: Clustering.
Definition Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to)
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
K-Means and variants Rahul K Mishra Guide: Prof. G. Ramakrishnan.
Data Mining and Text Mining. The Standard Data Mining process.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Data Mining Classification and Clustering Techniques Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining.
Data Mining Practical Machine Learning Tools and Techniques
Clustering Anna Reithmeir Data Mining Proseminar 2017
Data Mining: Basic Cluster Analysis
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Clustering I Data Mining Soongsil University.
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Machine Learning Clustering: K-means Supervised Learning
Machine Learning Lecture 9: Clustering
Data Mining K-means Algorithm
CSE 5243 Intro. to Data Mining
K-means and Hierarchical Clustering
Revision (Part II) Ke Chen
Information Organization: Clustering
Semi-supervised Learning
Jianping Fan Dept of Computer Science UNC-Charlotte
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Revision (Part II) Ke Chen
Clustering Wei Wang.
Text Categorization Berlin Chen 2003 Reference:
Junheng, Shengming, Yunsheng 11/09/2018
Unsupervised Learning: Clustering
CSE572: Data Mining by H. Liu
Hierarchical Clustering
Introduction to Machine learning
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Presentation transcript:

Constrained Clustering -Semi Supervised Clustering- Thanks to Jieping Ye, Bojun Yan

Introduction In many learning tasks, there is a large supply of unlabeled data but insufficient labeled data since it can be expensive to generate. Semi-supervised learning combines labeled and unlabeled data during training to improve performance. Semi-supervised learning is applicable to both classification and clustering.

supervised classification : semi-supervised classification In supervised classification, there is a known, fixed set of categories and category-labeled training data is used to induce a classification function. In semi-supervised classification, training also exploits additional unlabeled data, frequently resulting in a more accurate classification function(Blum & Mitchell, 1998; Ghahramani & Jordan, 1994)

Clustering algorithm are generally used in an unsupervised fashion Clustering algorithm are generally used in an unsupervised fashion. The algorithm has access only to the set of features describing each object; It is not given any information (e.g. labels) as to where each of the instance should be placed within the partition. However, in real application domains, it is often the case that the experimenter possesses some background knowledge( about the domain or the dataset) that could be useful in clustering the data.

Traditional clustering algorithms have no way to take advantage of this information even when it does exist. The semi-supervised clustering integrate background information into clustering algorithm, and its focus is on clustering large amounts of supervised data in the presence of a small amount of supervised data.

Some real-world tasks: a. similar text searching b. image retrieve c. speaker identification in a conversation d. visual correspondence in multiview image procossing

Outline Overview of clustering and classification What is semi-supervised learning? Semi-supervised clustering Semi-supervised classification What is semi-supervised clustering? Why semi-supervised clustering? Semi-supervised clustering algorithms

Supervised classification versus unsupervised clustering Unsupervised clustering Group similar objects together to find clusters Minimize intra-class distance Maximize inter-class distance Supervised classification Class label for each training sample is given Build a model from the training data Predict class label on unseen future data points

What is clustering? Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster distances are maximized Intra-cluster distances are minimized

What is Classification?

Clustering algorithms K-Means Hierarchical clustering Graph based clustering (Spectral clustering) Bi-clustering

Classification algorithms K-Nearest-Neighbor classifiers Naïve Bayes classifier Linear Discriminant Analysis (LDA) Support Vector Machines (SVM) Logistic Regression Neural Networks

Supervised Classification Example . . . .

Supervised Classification Example . . . . . . . . . . . . . . . . . . . .

Supervised Classification Example . . . . . . . . . . . . . . . . . . . .

Unsupervised Clustering Example . . . . . . . . . . . . . . . . . . . .

Unsupervised Clustering Example . . . . . . . . . . . . . . . . . . . .

Semi-Supervised Learning Combines labeled and unlabeled data during training to improve performance: Semi-supervised classification: Training on labeled data exploits additional unlabeled data, frequently resulting in a more accurate classifier. Semi-supervised clustering: Uses small amount of labeled data to aid and bias the clustering of unlabeled data. Unsupervised clustering Semi-supervised learning Supervised classification

Semi-Supervised Classification Example . . . . . . . . . . . . . . . . . . . .

Semi-Supervised Classification Example . . . . . . . . . . . . . . . . . . . .

Semi-Supervised Classification Algorithms: Semisupervised EM [Ghahramani:NIPS94,Nigam:ML00]. Co-training [Blum:COLT98]. Transductive SVM’s [Vapnik:98,Joachims:ICML99]. Graph based algorithms Assumptions: Known, fixed set of categories given in the labeled data. Goal is to improve classification of examples into these known categories.

Semi-Supervised Clustering Example . . . . . . . . . . . . . . . . . . . .

Semi-Supervised Clustering Example . . . . . . . . . . . . . . . . . . . .

Second Semi-Supervised Clustering Example . . . . . . . . . . . . . . . . . . . .

Second Semi-Supervised Clustering Example . . . . . . . . . . . . . . . . . . . .

Semi-supervised clustering: problem definition Input: A set of unlabeled objects, each described by a set of attributes (numeric and/or categorical) A small amount of domain knowledge Output: A partitioning of the objects into k clusters (possibly with some discarded as outliers) Objective: Maximum intra-cluster similarity Minimum inter-cluster similarity High consistency between the partitioning and the domain knowledge

Why semi-supervised clustering? Why not clustering? The clusters produced may not be the ones required. Sometimes there are multiple possible groupings. Why not classification? Sometimes there are insufficient labeled data. Potential applications Bioinformatics (gene and protein clustering) Document hierarchy construction News/email categorization Image categorization

Semi-Supervised Clustering Domain knowledge Partial label information is given Apply some constraints (must-links and cannot-links) Approaches Search-based Semi-Supervised Clustering Alter the clustering algorithm using the constraints Similarity-based Semi-Supervised Clustering Alter the similarity measure based on the constraints Combination of both

Search-Based Semi-Supervised Clustering Alter the clustering algorithm that searches for a good partitioning by: Modifying the objective function to give a reward for obeying labels on the supervised data [Demeriz:ANNIE99]. Enforcing constraints (must-link, cannot-link) on the labeled data during clustering [Wagstaff:ICML00, Wagstaff:ICML01]. Use the labeled data to initialize clusters in an iterative refinement algorithm (kMeans,) [Basu:ICML02].

Overview of K-Means Clustering K-Means is a partitional clustering algorithm based on iterative relocation that partitions a dataset into K clusters. Algorithm: Initialize K cluster centers randomly. Repeat until convergence: Cluster Assignment Step: Assign each data point x to the cluster Xl, such that L2 distance of x from (center of Xl) is minimum Center Re-estimation Step: Re-estimate each cluster center as the mean of the points in that cluster

K-Means Objective Function Locally minimizes sum of squared distance between the data points and their corresponding cluster centers: Initialization of K cluster centers: Totally random Random perturbation from global mean Heuristic to ensure well-separated centers

K Means Example

K Means Example Randomly Initialize Means

K Means Example Assign Points to Clusters

K Means Example Re-estimate Means

K Means Example Re-assign Points to Clusters

K Means Example Re-estimate Means

K Means Example Re-assign Points to Clusters

K Means Example Re-estimate Means and Converge

Semi-Supervised K-Means Partial label information is given Seeded K-Means Constrained K-Means Constraints (Must-link, Cannot-link) COP K-Means

Semi-Supervised K-Means for partially labeled data Seeded K-Means: Labeled data provided by user are used for initialization: initial center for cluster i is the mean of the seed points having label i. Seed points are only used for initialization, and not in subsequent steps. Constrained K-Means: Labeled data provided by user are used to initialize K-Means algorithm. Cluster labels of seed data are kept unchanged in the cluster assignment steps, and only the labels of the non-seed data are re-estimated.

Seeded K-Means Use labeled data to find the initial centroids and then run K-Means. The labels for seeded points may change.

Seeded K-Means Example

Seeded K-Means Example Initialize Means Using Labeled Data

Seeded K-Means Example Assign Points to Clusters

Seeded K-Means Example Re-estimate Means

Seeded K-Means Example Assign points to clusters and Converge the label is changed x

Exercise 0 1 3 6 9 15 20 21 22 Compute the clustering using seeded Kmeans

Constrained K-Means Use labeled data to find the initial centroids and then run K-Means. The labels for seeded points will not change.

Constrained K-Means Example The labels will not change.

Constrained K-Means Example Initialize Means Using Labeled Data The labels will not change.

Constrained K-Means Example Assign Points to Clusters The labels will not change.

Constrained K-Means Example Re-estimate Means and Converge The labels will not change.

Exercise 0 1 3 6 9 15 20 21 22 Compute the clustering using constrained Kmeans

COP-KMeans algorithm In the Cop-KMeans, the initial background knowledge provided in the form of constraints between instances in the datasets, is used in the clustering process. There are two types of constraints, must-link (two instances have to be together in the same cluster) and cannot-link (two instances have to be in different clusters).

COP K-Means COP K-Means [Wagstaff et al.: ICML01] is K-Means with must-link (must be in same cluster) and cannot-link (cannot be in same cluster) constraints on data points. Initialization: Cluster centers are chosen randomly, but as each one is chosen any must-link constraints that it participates in are enforced (so that they cannot later be chosen as the center of another cluster). Algorithm: During cluster assignment step in COP-K-Means, a point is assigned to its nearest cluster without violating any of its constraints. If no such assignment exists, abort.

COP K-Means Algorithm

Illustration Determine its label Must-link x x Assign to the red class

Illustration Determine its label Cannot-link Assign to the red class x

Illustration Determine its label Must-link Cannot-link x x Cannot-link The clustering algorithm fails

Constraints

Transitive closure dj must-link dh; => di must-link dh How to do transitive closure? If di must link to to dj, dj must link to dh, dl must link to dk, di can not link to dl, then we can do tansitive closure like the following: di must-link dj; dj must-link dh; => di must-link dh dl must-link dk; i j h l k

di cannot link dl => dj cannot link dl dh cannot link dl If di must link to to dj, dj must link to dh, dl must link to dk, di can not link to dl, then we can do transitive closure like the following: di cannot link dl => dj cannot link dl dh cannot link dl di cannot link dk dj cannot link dk dh cannot link dk i j h l k

Similarity-based semi-supervised clustering Alter the similarity measure based on the constraints Paper: From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering. D. Klein et al. Two types of constraints: Must-links and Cannot-links Clustering algorithm: Hierarchical clustering

Overview of Hierarchical Clustering Algorithm Agglomerative versus Divisive Basic algorithm of Agglomerative clustering Compute the distance matrix Let each data point be a cluster Repeat Merge the two closest clusters Update the distance matrix Until only a single cluster remains Key operation is the update of the distance between two clusters

How to Define Inter-Cluster Distance p1 p3 p5 p4 p2 . . . . Distance? MIN MAX Group Average Distance Between Centroids distance matrix

Must-link constraints Distance between must-links pair to zero. Derive a new metric by running an all-pairs-shortest distances algorithm. It is still a metric Faithful to the original metric Computational complexity: O(N2 C) C: number of points involved in must-link constraints N: total number of points

New distance matrix based on must-link constraints p1 p3 p5 p4 p2 . . . . Hierarchical clustering can be carried out based on the new distance matrix. New distance matrix

Cannot-link constraint Run hierarchical clustering with complete link (MAX) The distance between two clusters is determined by the largest distance. Set the distance between cannot-link pair to be The new distance matrix does not define a metric. Work very well in practice

Constrained complete-link clustering algorithm Derive a new distance Matrix based on both Types of constraints

Illustration 1 2 3 4 5 2 1 1 2 3 4 4 5 3 5 Initial distance matrix 0.2 1 2 3 4 5 1 2 3 4 5 0.2 0.5 0.1 0.8 0.4 0.6 0.3 1 2 3 4 5 Initial distance matrix

New distance matrix Must-links: 1—2, 3—4 Cannot-links: 2--3 1 2 3 4 5 1 2 3 4 5 {1 2} { 3 4} 5 0.2 0.5 0.1 0.8 0.4 0.6 0.3 0.1 0.8 0.2 0.6 1 2 3 4 5 1 2 3 4 5 Must-links: 1—2, 3—4 Cannot-links: 2--3

Hierarchical clustering 1 and 2 form a cluster, and 3 and 4 form another cluster 1,2 3,4 5 0.9 0.8 0.2 1,2 3,4 1 2 3 4 5 5

Summary Seeded and Constrained K-Means: partially labeled data COP K-Means: constraints (Must-link and Cannot-link) Constrained K-Means and COP K-Means require all the constraints to be satisfied. May not be effective if the seeds contain noise. Seeded K-Means use the seeds only in the first step to determine the initial centroids. Less sensitive to the noise in the seeds. Semi-supervised hierarchical clustering