Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consensus Partition Liang Zheng 5.21.

Similar presentations


Presentation on theme: "Consensus Partition Liang Zheng 5.21."— Presentation transcript:

1 Consensus Partition Liang Zheng 5.21

2 Outline Introduction Problem formulation Optimization Method
Experiment study Conclusion 2

3 Introduction W is a set of properties At some moments, a user might
It has a public alignment, e.g. an equivalence relation R on W. An equivalence relation can also be represented by a partition At some moments, a user might See a list of a subset of W Align elements of W (move/remove an item to/from a partition) The problem How to preserve personal alignment for each user? How to improve (optimize) the public alignment according to users’ alignments?

4 Introduction Notations and Definitions
V ={v1, v2, ..., vn} the set of objects. (1 ≤ i ≤ n). P = {P1, P2, ..., Pm} is a set of partitions, where each Pi ={Ci,1, Ci,2, ..., Ci,d} is a partition of the set of objects V with d clusters. Ci,j is the jth cluster of the ith partition. (1 ≤ i ≤ m). Pi(v) denote the label of the cluster to which the object v belongs i.e. Pi(v) =j iff v Ci,j PV the set of all possible partitions with the set of V (P PV ) P* consensus partition P*  PV, which better represents the properties of each partition in P

5 Problem statement: Clustering Aggregation(Cluster Ensemble)
Input: Given m partitions P={P1, P2, ..., Pm} over n items, Output: find a consensus partition P* such that minimizes the total number of disagreements with the m partitions. maximizes the similarity with the m partitions.

6 Problem formulation What Means a Good Consensus Partition P* ?
The distance between P* and the input partitions. partition-distance(1) (Gusfield ,2002) Two partitions P1 and P2 of V are identical if and only if every cluster in P1 is a cluster in P2 (the converse is then forced). partition-distance d(P1, P2), is the minimum number of elements that must be deleted from V , so that the two induced partitions (P and P restricted to the remaining elements) are identical. The problem of computing the partition-distance can be cast naturally as a node-cover problem on a graph derived from the partitions.

7 Example: V={1,2,3,4,5,6,7,8,9}; P={P1, P2} ; P1={{1,2}, {4,5,6,7}, {3,8,9}}; P2={{1,2,4,5}, {8,9}, {3,6,7}} d(P1, P2)=N(G(P1, P2))

8 Problem formulation partition-distance(2) (Gionis ,2005)
symmetric difference distance or Mirkin distance d(P1, P2). Consider two objects u and v in V . The following simple 0/1 distance function checks if two partitions P1 and P2 agree on the clustering of u and v. d(P1, P2)= du,v(P1, P2)

9 Problem formulation Formal Definition of Clustering Aggregation(CA)
Given a set of objects V and m partitions {P1 , P2 , ..., Pm} on V. find a consensus partition P* that minimizes This optimization problem is NP- complete. (Barthelemy et al, 1995.) We have to use heuristic algorithms. d(P* , Pi )

10 Problem formulation Generalization of CA-- Correlation clustering(CC)
Given a set of objects V and distances Xuv [0,1] for all pairs u, v  V, find a partition P* for the objects in V that minimizes the score function This optimization problem is NP- complete.

11 P3={{1,2,6}, {3,4}, {5}} ; P4={{1,2,5}, {4,6}, {3}}
Example: V={1,2,3,4,5,6}; P={P1, P2 , P3 , P4 } ; (n=6;m=4) P1={{1,2}, {3,4},{5}, {6}}; P2={{1,2,4}, {3,5}, {6}} ; P3={{1,2,6}, {3,4}, {5}} ; P4={{1,2,5}, {4,6}, {3}} objects co-occurrence matrix M The input distance matrix [Xuv ] 1 2 3 4 5 6 1 2 3 4 5 6 3/4 1/2 M=MP1+ MP2+ MP3+ MP4 xuv=(m-muv)/m

12 Clique partitioning problem
Optimization Method Clique partitioning problem  having a maximum total weight

13 Optimization Method Factor-2 Approximation Algorithm [Filkov04]
Given an instance of CA, select a partition p  {P1, P2, ..., Pm} that minimizes S=  d( p , Pi ) Algorithm is factor-2 approximation to problem CA; time complexity of this algorithm is O(m2n) ; O(n) compute the distance between any two partitions, there are O(m2) pairs.

14 Optimization Method Agglomerative clustering (bottom-up) [FrJa02,GMT07] It starts by placing every node into a singleton cluster. If the average distance of the closest pair of clusters is less than ½, then the two clusters are merged into a single cluster; If not have two clusters with average distance smaller than 1/2, stops. i.e. {{1},{2},{3},{4},{5},{6}}{{1,2},{3,4},{5},{6}} Divisive clustering (top-down) [GMT07] starts by placing all nodes into a single cluster. Then it finds the pair of nodes that are furthest apart, and places them into different clusters. These two nodes become the centers of the clusters. The remaining nodes are assigned to the center that incurs the least cost. This procedure is repeated iteratively, at the end of each step, the cost of the new solution is computed. If it is lower than that of the previous step then the algorithm continues. i.e. {1,2,3,4,5,6}{ {1,2,6},{3,4,5}}{{1,2},{6},{3,4},{5}}

15 Optimization Method LocalSearch [GMT07] //One-element Move[Filkov04]
The algorithm then goes through the nodes and it considers placing them into a different cluster, or creating a new singleton cluster with this node. The node is placed in the cluster that yields the minimum cost. The process is iterated until there is no move that can improve the cost. The LocalSearch can be used as a clustering algorithm, but also as a post-processing step, to improve upon an existing solution. i.e. {{1,2},{3},{4},{5},{6}} {{1,2},{3,4},{5},{6}}

16 Optimization Method The Fusion-Transfer (FT) method [Guénoche2011]
Fusion, a hierarchical ascending method Starting from the atomic partition P0, at each step the two classes maximizing the score value of the resulting partition are joined. Transfer, best-one-element-move method The weight of the assignment of any element to any class is computed. bottom-up + LocalSearch

17 Optimization Method Relabeling+voting Re-labeling Voting C1 C2 C3 v1 1
Find the correspondence between the labels in the partitions and fuse the clusters with the same labels by voting [DuFr03,DWH01] Re-labeling Voting C1 C2 C3 v1 1 3 2 v2 v3 v4 v5 v6 C1 C2 C3 v1 1 v2 v3 2 v4 v5 3 v6 C* 1 2 3

18 Experiment study Dataset: Synthetic + Real
Comparison of average sum of distances (n; k) + noise data Comparison of Run time

19 Conclusion Two main approaches: objects co-occurrence and median partition Preserving personal partition in Sview Future work Compute a consensus partition based on various methods Personal partition fused consensus partition

20 Thanks!


Download ppt "Consensus Partition Liang Zheng 5.21."

Similar presentations


Ads by Google