Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jianping Fan Dept of Computer Science UNC-Charlotte

Similar presentations


Presentation on theme: "Jianping Fan Dept of Computer Science UNC-Charlotte"— Presentation transcript:

1 Jianping Fan Dept of Computer Science UNC-Charlotte

2 Key issues for Clustering
Similarity or distance function Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters Decision for data clustering Objective Function Inter-cluster distances are maximized Intra-cluster distances are minimized

3 Centers: random & density scan K: start from small K & separate; start
Summary of K-means Centers: random & density scan K: start from small K & separate; start from large K and merge Outliers: Problems of K-means Centers locations Number of K Sensitive to Outliers Data Manifolds Experiences

4 Problems of K-MEANs Distance Function Optimization Step:
Inter-cluster distances are maximized Intra-cluster distances are minimized Distance Function Geometry Distance Optimization Step: Assignment Step:

5 Problems of K-Means & Spectral Clustering
One-way decision: center & distance make decision for clustering Data points and clusters (centers) should be in equal position! Two-way decision is expected!

6 Outliers outlier Inter-cluster distances are maximized
Intra-cluster distances are minimized outlier

7 Outliers outlier How to identify outliers?
Inter-cluster distances are maximized Intra-cluster distances are minimized outlier How to identify outliers?

8

9 of AP Clustering Two-way decision is used!

10 Affinity Propagation Clustering algorithm that works by finding a set of exemplars (prototypes) in the data and assigning other data points to the exemplars [Frey07] Input: pair-wise similarities (negative squared error), data point preferences (larger = more likely to be an exemplar) Approximate maximization of the sum of similarities to exemplars Mechanism – message passing in a factor graph

11

12

13 r(i,k) is initialized as s(i,k)
Sending responsibility r(i,k) from data point i to data point (exemplar) k: How well-suited data point k is to serve as the exemplar for data point i

14 a(i,k) is initialized as 0
Sending availability a(i, k) from exemplar point k to data point i: How appropriate it would be for data point i to choose data point k as its exemplar

15

16

17 a(i,k) = 0

18 S(i,k) is the similarity function between data points i and k
P(s(i,k)) is an exemplar-dependent probability model Data points with larger values of s(i,i) are more likely to be chosen as exemplar Number of clusters: (a) values of input preferences (b) message-passing procedure (competition)

19

20

21 Summary 1. The competitive procedure for updating responsibility and availability is data-driven and does not take into account how many other points favor each candidate exemplar; At any point during affinity propagation, availabilities and responsibilities can be combined to identify exemplars. For data point i, the value of k (data point) that maximizes a(i,k)+ r(i,k) either identifies data point i as an exemplar if k = i, or identifies the data point that is the exemplar for data point i. 3. Each iteration of AP procedure consists of (a) updating all responsibilities given the availabilities; (b) updating all availabilities given the responsibilities © combining availabilities and responsibilities to monitor the exemplar decisions and terminate the algorithm when these decisions did not change for 10 iterations.

22

23

24

25

26

27 Semi-supervised Learning
Large amounts of unlabeled training data Some limited amounts of side information Partial labels Equivalence constraints Half moon data

28 Some Motivating examples

29 AP with partial labels All points sharing the same label should be in the same cluster. Points with different labels should not be in the same cluster. Imposing constraints Via the similarity matrix Explicit function nodes

30 Same label constraints
Set similarity among all similarly labeled data to be maximal. Propagate to other points (teleportation) Without teleportation, local neighborhoods do not ‘move closer’. e.g. Klein02] S(x1,x2)=0 x1 x2 y2 y1

31 Different labels Can still do a similar trick and set similarity among all pair-wise differently labeled data to be minimal. But no equivalent notion of anti-teleportation. x1 x2

32 Adding explicit constraints to account for side-information

33 Adding explicit constraints to account for side-information

34 Problems Let’s call all the labeled points portals
They induce the ability to teleport… At test time, if we want to determine a label for some new point we need to evaluate its closest exemplar, possibly via all pairs of portals - expensive. Pair-wise not-in-class nodes for each pair of differently labeled points is expensive. Introducing…

35 Meta-Portals An alternative way of propagating neighborhood information. Meta-portals are ‘dummy’ points, constructed using the similarities of all portals of a certain label. We add N new entries to the similarity matrix, where N is the number of unique labels.

36 Meta-portals mtp’s can be exemplars.
Unlike regular exemplars, mtp’s can be exemplars for other points but choose a different exemplars themselves

37 These function nodes force the MTP’s to choose other data points as their exemplars.
Similarities alone are not enough, since both MTP can choose same exemplars and still have –inf similarities.

38 Some toy data results


Download ppt "Jianping Fan Dept of Computer Science UNC-Charlotte"

Similar presentations


Ads by Google