Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning in Spectral Clustering Susan Shortreed Department of Statistics University of Washington joint work with Marina Meilă.

Similar presentations


Presentation on theme: "Learning in Spectral Clustering Susan Shortreed Department of Statistics University of Washington joint work with Marina Meilă."— Presentation transcript:

1 Learning in Spectral Clustering Susan Shortreed Department of Statistics University of Washington joint work with Marina Meilă

2 2 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

3 3 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

4 4 University of Washington Susan Shortreed Clustering Goal: Find natural groupings in data

5 5 University of Washington Susan Shortreed Classification Supervised : labeled training data Use training data to decide on classification rules for future data points Training dataTest data

6 6 University of Washington Susan Shortreed Classification Supervised : labeled training data Use training data to decide on classification rules for future data points Training dataTest data

7 7 University of Washington Susan Shortreed Clustering Unsupervised : no labeled data Find natural groups in the data

8 8 University of Washington Susan Shortreed Semi-supervised Clustering Semi-supervised : subset of data labeled Use both labeled and unlabeled data to cluster

9 9 University of Washington Susan Shortreed Clustering Applications Genetics Group patients by using disease type and genetic information Social Networks Group actors to learn about social structure Document sorting Group documents based on citations and keywords

10 10 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

11 11 University of Washington Susan Shortreed Variable Weighting/Selection Data points have many features Distinguish between features which provide information about clustering and those which do not X vs YDensity of YDensity of X

12 12 University of Washington Susan Shortreed Learning Supervised Cluster membership known, use to learn which features give information about the clustering Unsupervised Cluster memberships unknown, want to learn clustering as well as important features Semi-Supervised Cluster memberships on a subset of the data known, use this along with the unknown points to learn clustering and the importance of features

13 13 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

14 14 University of Washington Susan Shortreed Spectral Clustering Overview Similarity matrix Normalize similarity matrix and obtain spectrum Cluster eigenvectors

15 15 University of Washington Susan Shortreed Pairwise Clustering Pairwise features between data points Example features Social Networks: friendship tie, same gender Image Segmentation: intervening contours 1 4 3 2 Picture Copyright © 1995 Saint Mary's College of California

16 16 University of Washington Susan Shortreed Pairwise Clustering Pairwise features between data points Use features to construct pairwise similarities View as a graph: Each data point a node Edge weights represent similarities Good Clustering Assigns similar points to the same cluster Dissimilar points to different clusters

17 17 University of Washington Susan Shortreed Random Walk Over Data Volume (degree) of node Transition Probability

18 18 University of Washington Susan Shortreed Cluster Transition Probabilities

19 19 University of Washington Susan Shortreed Transition Probabilities Out of Cluster Transition Probabilities Want to minimize W oc

20 20 University of Washington Susan Shortreed Minimizing W oc Exact solution which minimizes W oc is indicator vectors for cluster membership Computationally Difficult Relax constraint: allow continuous vectors Solution minimizing relaxed problem: eigenvectors corresponding to largest eigenvalues of P Solution is of relaxed problem must cluster eigenvectors to obtain optimal clustering

21 21 University of Washington Susan Shortreed Conditions for Minimization Eigenvectors of P exactly minimize W oc when P is block stochastic P is called block stochastic If The matrix is non-singular is constant

22 22 University of Washington Susan Shortreed Picturing Block Stochastic

23 23 University of Washington Susan Shortreed Spectral Clustering Example Difficult clustering problem for many standard clustering algorithms Features: horizontal and vertical axes

24 24 University of Washington Susan Shortreed Normalizing Similarity

25 25 University of Washington Susan Shortreed Using the Eigenvectors

26 26 University of Washington Susan Shortreed Clustering Results Spectral Algorithm K-means Clustering

27 27 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

28 28 University of Washington Susan Shortreed Supervised Spectral Learning Reverse of clustering problem Know clustering; want to learn similarities Learn features important to clustering Use features to create similarities Input into SP algorithm to get “good” clustering A “good” clustering is close to the optimal clustering which minimizes the W oc

29 29 University of Washington Susan Shortreed Notation data points, clusters pairwise features feature value for pair vector of parameters (feature weights) similarity matrix Example S:

30 30 University of Washington Susan Shortreed Clustering Quality Lower Bound on [1] Define: [1] Meil ă, M. and Shi, J. A Random Walks View of Spectral Segmentation, AISTATS 2001

31 31 University of Washington Susan Shortreed Clustering Stability Assume K=2

32 32 University of Washington Susan Shortreed Stability Intuition Define eigengap: When eigengap is large Any two clusterings with small gap will be close A clustering with small gap will be close to the optimal clustering

33 33 University of Washington Susan Shortreed Stability Theorem Stability Theorem Stability Theorem: Corollary Corollary:

34 34 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

35 35 University of Washington Susan Shortreed Cost Function Find θ which minimize: Clustering qualityClustering stability

36 36 University of Washington Susan Shortreed Supervised Learning Optimal clustering,, known Use and minimize J with respect to θ using line search

37 37 University of Washington Susan Shortreed Computing the Gradient

38 38 University of Washington Susan Shortreed Computing the Gradient

39 39 University of Washington Susan Shortreed Computing the Gradient

40 40 University of Washington Susan Shortreed Computing the Gradient

41 41 University of Washington Susan Shortreed Computing the Gradient

42 42 University of Washington Susan Shortreed Computing the Gradient Must compute

43 43 University of Washington Susan Shortreed Computing the Gradient Fact: L is symmetric Theorem:

44 44 University of Washington Susan Shortreed Choosing α Grid search over possible α values For each α learn parameters Choose α for which the learned parameters give the smallest gap- eigengap ratio Algorithm robust to choice of α

45 45 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

46 46 University of Washington Susan Shortreed Experiments: Bull’s Eye Two meaningful features, noisy features added

47 47 University of Washington Susan Shortreed Bull’s Eye Experiments ntnt NdimCEgap θ ΔKΔK W oc 15010%4.4e-51.1e-34.9e-3 20020%9.3e-53.0e-47.0e-3 40040%7.9e-55.4e-46.7e-3 700160%1.1e-44.0e-47.2e-3 Average over 10 test samples of size 1000 points

48 48 University of Washington Susan Shortreed Dermatology [2] Guvenier, H. and Ilter, N. (1998) UCI Repository of machine learning databases 298 samples [2] 5 types of erythemato-squamous disease 34 attributes Clinical: age, family history, itching Histopathologoical: characterize skin Pairwise features are absolute difference of individual attribute values.

49 49 University of Washington Susan Shortreed Dermatology results Before LearningAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk 0.42 (0.05) 0.08 (0.02) 0.02 (0.01) 0.12 (0.18) 0.03 (0.03) 0.08 (0.09) Learn parameter on training set Mean (std) 25 test sets

50 50 University of Washington Susan Shortreed Dermatology Parameters

51 51 University of Washington Susan Shortreed Wine Data 178 samples; 3 types of wine [3] 13 attributes measure on each wine Chemical properties and Heuristics Pairwise features are absolute difference of individual attribute values. Noisy attributes permutation of true attributes [3] Aeberhard, S. UCI repository of machine learning databases, July 1991

52 52 University of Washington Susan Shortreed Wine Supervised Before LearningAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk No noise 0.03 (0.02) 0.10 (0.01) 0.13 (0.05) 0.03 (0.02) 0.12 (0.02) 0.25 (0.06) Noise Added 0.34 (0.12) 0.06 (0.02) 0.05 (0.03) 0.04 (0.05) 0.08 (0.03) 0.21 (0.10) Mean (std) of 25 test sets

53 53 University of Washington Susan Shortreed Learned Parameters

54 54 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

55 55 University of Washington Susan Shortreed Cost Function Optimize over θ and clusterings C Iterative Algorithm C-step: update the clustering S-step: update the similarity by learning θ

56 56 University of Washington Susan Shortreed Unsupervised Overview Initialize θ ← θ o While ( can still reduce J ) Step – C: Update C (k) using θ (k-1) Step – S: Update θ (k) using C (k) Output θ (k), C (k) Both Step-C and Step-S reduce J

57 57 University of Washington Susan Shortreed Unsupervised Adjustments Only guaranteed to find a small W oc in neighborhood around block stochastic P Average over sets of clusterings in beginning Narrow down target clusterings as learn Uncertainty in beginning clustering(s) Take fewer learning steps for θ with early clusterings

58 58 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

59 59 University of Washington Susan Shortreed Two meaningful features, noisy features added Four Gaussians

60 60 University of Washington Susan Shortreed Gaussian Results # noisy dim Before LearningAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk 2 0.04 (0.06) 0.13 (0.01) 0.09 (0.04) 0.02 (0.01) 0.09 (0.01) 0.31 (0.03) 4 0.43 (0.10) 0.11 (0.05) 0.04 (0.03) 0.02 (0.01) 0.06 (0.01) 0.32 (0.03) 8 0.68 (0.07) 0.03 (0.04) 0.02 (0.02) 0.02 (0.01) 0.04 (0.01) 0.29 (0.03) Avg (std) over 15 test sets

61 61 University of Washington Susan Shortreed Wine Unsupervised Uniform WeightsAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk No Noise 0.02 (0) 0.11 (0) 0.23 (0) 0.02 (0) 0.11 (0.11) 0.33 (0.01) Noise Added 0.08 (0.13) 0.10 (0.01) 0.05 (0.03) 0.03 (0.02) 0.09 (0.04) 0.21 (0.10) Mean (std) of 25 test sets

62 62 University of Washington Susan Shortreed Learned Parameters

63 63 University of Washington Susan Shortreed Comparing Parameters SupervisedUnsupervised

64 64 University of Washington Susan Shortreed Image Segmentation Sample 1780 pixels 13 features Color, texture, intervening contours, distance

65 65 University of Washington Susan Shortreed Image Segmentation

66 66 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

67 67 University of Washington Susan Shortreed Summary Random walks spectral algorithm Defined cost function based on clustering quality and clustering stability Developed method for supervised learning Experiments show reduction in clustering error Selection of meaningful weights Extended learning algorithm to unsupervised Iterative approach reminiscent of EM method Experiments show promising results

68 68 University of Washington Susan Shortreed Future Directions Semi-supervised learning May help with local optima Learning the number of clusters Optimizing for large data set Outliers Model uncertainty in clusterings which come from spectral algorithm

69 69 University of Washington Susan Shortreed Thank You

70 70 University of Washington Susan Shortreed

71 71 University of Washington Susan Shortreed Distance Between Clusterings Measures the amount of overlap in clusterings in relation to the cluster sizes.

72 72 University of Washington Susan Shortreed Minimizing MNCut Details

73 73 University of Washington Susan Shortreed Detail Cont’d Raleigh Quotient: minimized by the k smallest eigenvectors of L

74 74 University of Washington Susan Shortreed L’s Link to P The eigenvectors which minimize the MNCut are the K largest eigenvectors of P

75 75 University of Washington Susan Shortreed Previous Work Meilă and Shi (2001) Learn parameters which minimized the Kullback-Leibler distance between the observed S and target S* Drawback: over constrains the learning problem Bach and Jordan (2003) Minimize the angle between the subspace spanned by the true cluster indicator vectors and the subspace spanned by the eigenvectors of P(θ) Drawback: calculate derivative of eigenvectors numerically unstable Cour, Gogin and Shi (2005) Minimize the distance between the true indicator vector and the eigenvector of interest Drawback: fixed number of parameters, dependent on n

76 76 University of Washington Susan Shortreed C-Step: More Detail

77 77 University of Washington Susan Shortreed S-Step: More Details


Download ppt "Learning in Spectral Clustering Susan Shortreed Department of Statistics University of Washington joint work with Marina Meilă."

Similar presentations


Ads by Google