Lecture 21: Spectral Clustering

Slides:



Advertisements
Similar presentations
Partitional Algorithms to Detect Complex Clusters
Advertisements

Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Nonlinear Dimension Reduction Presenter: Xingwei Yang The powerpoint is organized from: 1.Ronald R. Coifman et al. (Yale University) 2. Jieping Ye, (Arizona.
Information Networks Graph Clustering Lecture 14.
Normalized Cuts and Image Segmentation
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Clustering II CMPUT 466/551 Nilanjan Ray. Mean-shift Clustering Will show slides from:
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Clustering and Dimensionality Reduction Brendan and Yifang April
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Spectral Clustering Scatter plot of a 2D data set K-means ClusteringSpectral Clustering U. von Luxburg. A tutorial on spectral clustering. Technical report,
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
CS 584. Review n Systems of equations and finite element methods are related.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Dominic Rizzo and Giota Stratou.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Clustering Evaluation April 29, Today Cluster Evaluation – Internal We don’t know anything about the desired labels – External We have some information.
Segmentation Graph-Theoretic Clustering.
Clustering (Part II) 11/26/07. Spectral Clustering.
אשכול בעזרת אלגורתמים בתורת הגרפים
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Clustering Unsupervised learning Generating “classes”
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Presenter : Kuang-Jui Hsu Date : 2011/5/3(Tues.).
Graph Partitioning and Clustering E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.
CSSE463: Image Recognition Day 34 This week This week Today: Today: Graph-theoretic approach to segmentation Graph-theoretic approach to segmentation Tuesday:
Segmentation using eigenvectors Papers: “Normalized Cuts and Image Segmentation”. Jianbo Shi and Jitendra Malik, IEEE, 2000 “Segmentation using eigenvectors:
Machine Learning Problems Unsupervised Learning – Clustering – Density estimation – Dimensionality Reduction Supervised Learning – Classification – Regression.
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
Models and Algorithms for Complex Networks Graph Clustering and Network Communities.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Machine Learning Queens College Lecture 7: Clustering.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Spectral Methods for Dimensionality
Clustering Clustering definition: Partition a given set of objects into M groups (clusters) such that the objects of each group are ‘similar’ and ‘different’
Random Walk for Similarity Testing in Complex Networks
Semi-Supervised Clustering
Jianping Fan Dept of CS UNC-Charlotte
Segmentation Graph-Theoretic Clustering.
Grouping.
Digital Image Processing
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
Linear Algebra Lecture 32.
“Traditional” image segmentation
Presentation transcript:

Lecture 21: Spectral Clustering April 22, 2010

Last Time GMM Model Adaptation UMB-MAP for speaker recognition MAP (Maximum A Posteriori) MLLR (Maximum Likelihood Linear Regression) UMB-MAP for speaker recognition

Today Graph Based Clustering Minimum Cut

Partitional Clustering How do we partition a space to make the best clusters? Proximity to a cluster centroid.

Difficult Clusterings But some clusterings don’t lend themselves to a “centroid” based definition of a cluster. Spectral clustering allows us to address these sorts of clusters.

Difficult Clusterings These kinds of clusters are defined by points that are close any member in the cluster, rather than the average member of the cluster.

Graph Representation We can represent the relationships between data points in a graph.

Graph Representation We can represent the relationships between data points in a graph. Weight the edges by the similarity between points

Representing data in a graph What is the best way to calculate similarity between two data points? Distance based:

Graphs Nodes and Edges Edges can be directed or undirected Edges can have weights associated with them Here the weights correspond to pairwise affinity

Graphs Degree Volume of a set

Graph Cuts The cut between two subgraphs is calculated as follows

Graph Examples - Distance Height Weight 20 5 8 6 9 4 E 4.1 5.1 B 1 12 4.5 2.2 A D 11 5 C

Graph Examples - Similarity Height Weight 20 5 8 6 9 4 E .24 .19 B 1 .08 .22 .45 A D .09 .2 C

Intuition The minimum cut of a graph identifies an optimal partitioning of the data. Spectral Clustering Recursively partition the data set Identify the minimum cut Remove edges Repeat until k clusters are identified

Graph Cuts Minimum (bipartitional) cut

Graph Cuts Minimum (bipartitional) cut

Graph Cuts Minimal (bipartitional) normalized cut. Unnormalized cuts are attracted to outliers.

Graph definitions ε-neighborhood graph k-nearest neighbors Identify a threshold value, ε, and include edges if the affinity between two points is greater than ε. k-nearest neighbors Insert edges between a node and its k-nearest neighbors. Each node will be connected to (at least) k nodes. Fully connected Insert an edge between every pair of nodes.

Intuition The minimum cut of a graph identifies an optimal partitioning of the data. Spectral Clustering Recursively partition the data set Identify the minimum cut Remove edges Repeat until k clusters are identified

Spectral Clustering Example Minimum Cut Height Weight 20 5 8 6 9 4 1 .2 D E C .19 .45 B .22 .24 A .08 .09

Spectral Clustering Example Normalized Minimum Cut Height Weight 20 5 8 6 9 4 E .24 .19 B 1 .08 .22 .45 A D .09 .2 C

Spectral Clustering Example Normalized Minimum Cut Height Weight 20 5 8 6 9 4 E .24 .19 B 1 .08 .22 .45 A D .09 .2 C

Problem Identifying a minimum cut is NP-hard. There are efficient approximations using linear algebra. Based on the Laplacian Matrix, or graph Laplacian

Spectral Clustering Construct an affinity matrix A B C D .4 .2 .5 .3 .5 .3 .6 .1 A D .2 .2 .1 B C .3

Spectral Clustering Construct the graph Laplacian Identify eigenvectors of the affinity matrix

Spectral Clustering K-Means on eigenvector transformation of the data. Project back to the initial data representation. k-eigen vectors Each Row represents a data point in the eigenvector space n-data points

Overview: what are we doing? Define the affinity matrix Identify eigenvalues and eigenvectors. K-means of transformed data Project back to original space

Why does this work? Ideal Case What are we optimizing? Why do the eigenvectors of the laplacian include cluster identification information 1 1

Why does this work? How does this eigenvector decomposition address this? if we let f be eigen vectors of L, then the eigenvalues are the cluster objective functions cluster assignment Cluster objective function – normalized cut!

Normalized Graph Cuts view Minimal (bipartitional) normalized cut. Eigenvalues of the laplacian are approximate solutions to mincut problem.

The Laplacian Matrix L = D-W Positive semi-definite The lowest eigenvalue is 0, eigenvector is The second lowest contains the solution The corresponding eigenvector contains the cluster indicator for each data point

Using eigenvectors to partition Each eigenvector partitions the data set into two clusters. The entry in the second eigenvector determines the first cut. Subsequent eigenvectors can be used to further partition into more sets.

Example Dense clusters with some sparse connections

3 class Example Affinity matrix eigenvectors row normalization output http://ranger.uta.edu/~chqding/Spectral/spectralA.pdf page 8

Example [Ng et al. 2001]

k-means vs. Spectral Clustering

Random walk view of clustering In a random walk, you start at a node, and move to another node with some probability. The intuition is that if two nodes are in the same cluster, you a randomly walk is likely to reach both points.

Random walk view of clustering Transition matrix: The transition probability is related to the weight of given transition and the inverse degree of the current node.

Using minimum cut for semi supervised classification? Construct a graph representation of unseen data. Insert imaginary nodes s and t connected to labeled points with infinite similarity. Treat the min cut as a maximum flow problem from s to t t s

Kernel Method The weight between two nodes is defined as a function of two data points. Whenever we have this, we can use any valid Kernel.

Today Graph representations of data sets for clustering Spectral Clustering

Next Time Evaluation. Classification Clustering