Download presentation

Presentation is loading. Please wait.

Published byDale Moreno Modified over 2 years ago

1
The Stability of a Good Clustering Marina Meila University of Washington mmp@stat.washington.edu

2
Optimizing these criteria is NP-hard’ Data Objective Algorithm similarities Spectral clustering K-means...but “spectral clustering, K-means work well when good clustering exists” worst case interesting case This talk: If a “good” clustering exists, it is “unique” If “good” clustering found, it is provably good

3
Results summary Given objective = NCut, K-means distortion data clustering Y with K clusters Spectral lower bound on distortion If small Then small where = best clustering with K clusters

4
distortion A graphical view clusterings lower bound

5
Overview Introduction Matrix representations for clusterings Quadratic representation for clustering cost The misclassification error distance Results for NCut (easier) Results for K-means distortion (harder) Discussion

6
Clusterings as matrices Clustering of { 1,2,..., n } with K clusters (C 1, C 2,...C K ) Represented by n x K matrix unnormalized normalized All matrices have orthogonal columns

7
Distortion is quadratic in X NCut K-means similarities

8
k k’ m kk’ The Confusion Matrix Two clusterings (C 1, C 2,... C K ) with (C’ 1, C’ 2,... C’ K’ ) with Confusion matrix (K x K’) =

9
The Misclassification Error distance computed by the maximal bipartite matching algorithm between clusters confusion matrix classification error k k’

10
Results for NCut given data A (n x n) clustering X (n x K) Lower bound for NCut (M02, YS03, BJ03) Upper bound for (MSX’05) whenever largest e-values of A

11
small w.r.t eigengap K+1 - K X close to X * Two clusterings X,X’ close to X * trace X T X’ large small convexity proof Relaxed minimization for s.t. X = n x K orthogonal matrix Solution: X * = K principal e-vectors of A

12
Distances between clusterings The “ 2 ” distance Pearson’s 2 functional 1 · 2 · K 2(C, C’) = K iff C = C ’ minimum at independence define “distance” (not a metric) a variant used by Bach & Jordan 03, Huber & Arabie 85

14
2 is Pearson’s statistic 0 · 2 · K-1 2( , ’) = K-1 iff = ’ measures how “close” are two clusterings define “distance” Theorem For any S and any clusterings , ’ with K clusters (M & Xu, 03) “Stability” of the best clustering

15
Stability Theorem 2 Let be two clusterings with Then, with ` Proof: linear algebra convexity of 2 Tighter bounds possible d CE d2d2

16
Tighter bounds ( , C ) C non-uniform C uniform d CE d2d2 d2d2

17
Why the eigengap matters Example A has 3 diagonal blocks K = 2 gap( C ) = gap( C’ ) = 0 but C, C’ not close CC’

18
Remarks on stability results No explicit conditions on S Different flavor from other stability results, e.g Kannan & al 00, Ng & al 01 which assume S “almost” block diagonal But…results apply only if a good clustering is found There are S matrices for which no clustering satisfies theorem Bound depends on aggregate quantities like K cluster sizes (=probabilities) Points are weighted by their volumes (degrees) good in some applications bounds for unweighted distances can be obtained

19
Is the bound ever informative? An experiment: S perfect + additive noise

20
We can do the same... ...but, K-th principal subspace typically not stable K-means distortion 4 K = 4 dim = 30

21
New approach: Use K-1 vectors Non-redundant representation Y Distortion – new expression ...and new (relaxed) optimization problem

22
Solution of the new problem Relaxed optimization problem given Solution U = K-1 principal e-vectors of A W = KxK orthogonal matrix with on first row

23
Clusterings Y,Y’ close to Y * ||Y T Y’|| F large Solve relaxed minimization small Y close to Y * ||Y T Y’|| F large small

24
Theorem For any two clusterings Y,Y’ with Y, Y’ > 0 whenever Corollary: Bound for d(Y,Y opt )

25
Experiments 20 replicates K = 4 dim = 30 true error bound p min

27
B A D

28
Conclusions First (?) distribution independent bounds on the clustering error data dependent hold when data well clustered (this is the case of interest) Tight? – not yet... In addition Improved variational bound for the K-means cost Showed local equivalence between “misclassification error” distance and “Frobenius norm distance” (also known as 2 distance) Related work Bounds for mixtures of Gaussians (Dasgupta, Vempala) Nearest K-flat to n points (Tseng) Variational bounds for sparse PCA (Mogghadan)

Similar presentations

OK

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on area of parallelogram with vertices Ppt on emotional intelligence in leadership Ppt on question tags songs Ppt on brand management process Ppt on non biodegradable waste images Ppt on timing diagram Ppt on south african culture videos Ppt on earth hour activities Ppt on waves tides and ocean currents powerpoint Ppt on content addressable memory wiki