Download presentation

Presentation is loading. Please wait.

Published byDale Moreno Modified over 2 years ago

1
The Stability of a Good Clustering Marina Meila University of Washington mmp@stat.washington.edu

2
Optimizing these criteria is NP-hard’ Data Objective Algorithm similarities Spectral clustering K-means...but “spectral clustering, K-means work well when good clustering exists” worst case interesting case This talk: If a “good” clustering exists, it is “unique” If “good” clustering found, it is provably good

3
Results summary Given objective = NCut, K-means distortion data clustering Y with K clusters Spectral lower bound on distortion If small Then small where = best clustering with K clusters

4
distortion A graphical view clusterings lower bound

5
Overview Introduction Matrix representations for clusterings Quadratic representation for clustering cost The misclassification error distance Results for NCut (easier) Results for K-means distortion (harder) Discussion

6
Clusterings as matrices Clustering of { 1,2,..., n } with K clusters (C 1, C 2,...C K ) Represented by n x K matrix unnormalized normalized All matrices have orthogonal columns

7
Distortion is quadratic in X NCut K-means similarities

8
k k’ m kk’ The Confusion Matrix Two clusterings (C 1, C 2,... C K ) with (C’ 1, C’ 2,... C’ K’ ) with Confusion matrix (K x K’) =

9
The Misclassification Error distance computed by the maximal bipartite matching algorithm between clusters confusion matrix classification error k k’

10
Results for NCut given data A (n x n) clustering X (n x K) Lower bound for NCut (M02, YS03, BJ03) Upper bound for (MSX’05) whenever largest e-values of A

11
small w.r.t eigengap K+1 - K X close to X * Two clusterings X,X’ close to X * trace X T X’ large small convexity proof Relaxed minimization for s.t. X = n x K orthogonal matrix Solution: X * = K principal e-vectors of A

12
Distances between clusterings The “ 2 ” distance Pearson’s 2 functional 1 · 2 · K 2(C, C’) = K iff C = C ’ minimum at independence define “distance” (not a metric) a variant used by Bach & Jordan 03, Huber & Arabie 85

14
2 is Pearson’s statistic 0 · 2 · K-1 2( , ’) = K-1 iff = ’ measures how “close” are two clusterings define “distance” Theorem For any S and any clusterings , ’ with K clusters (M & Xu, 03) “Stability” of the best clustering

15
Stability Theorem 2 Let be two clusterings with Then, with ` Proof: linear algebra convexity of 2 Tighter bounds possible d CE d2d2

16
Tighter bounds ( , C ) C non-uniform C uniform d CE d2d2 d2d2

17
Why the eigengap matters Example A has 3 diagonal blocks K = 2 gap( C ) = gap( C’ ) = 0 but C, C’ not close CC’

18
Remarks on stability results No explicit conditions on S Different flavor from other stability results, e.g Kannan & al 00, Ng & al 01 which assume S “almost” block diagonal But…results apply only if a good clustering is found There are S matrices for which no clustering satisfies theorem Bound depends on aggregate quantities like K cluster sizes (=probabilities) Points are weighted by their volumes (degrees) good in some applications bounds for unweighted distances can be obtained

19
Is the bound ever informative? An experiment: S perfect + additive noise

20
We can do the same... ...but, K-th principal subspace typically not stable K-means distortion 4 K = 4 dim = 30

21
New approach: Use K-1 vectors Non-redundant representation Y Distortion – new expression ...and new (relaxed) optimization problem

22
Solution of the new problem Relaxed optimization problem given Solution U = K-1 principal e-vectors of A W = KxK orthogonal matrix with on first row

23
Clusterings Y,Y’ close to Y * ||Y T Y’|| F large Solve relaxed minimization small Y close to Y * ||Y T Y’|| F large small

24
Theorem For any two clusterings Y,Y’ with Y, Y’ > 0 whenever Corollary: Bound for d(Y,Y opt )

25
Experiments 20 replicates K = 4 dim = 30 true error bound p min

27
B A D

28
Conclusions First (?) distribution independent bounds on the clustering error data dependent hold when data well clustered (this is the case of interest) Tight? – not yet... In addition Improved variational bound for the K-means cost Showed local equivalence between “misclassification error” distance and “Frobenius norm distance” (also known as 2 distance) Related work Bounds for mixtures of Gaussians (Dasgupta, Vempala) Nearest K-flat to n points (Tseng) Variational bounds for sparse PCA (Mogghadan)

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google