Presentation is loading. Please wait.

Presentation is loading. Please wait.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co-Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo.

Similar presentations


Presentation on theme: "The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co-Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo."— Presentation transcript:

1 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co-Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yubao Wu 2 Patric F. Sullivan 1 Wei Wang 3 1 University of North Carolina at Chapel Hill, 2 Case Western Reserve University, 3 University of California, Los Angeles Speaker: Wei Cheng The 19 th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD’13)

2 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline Introduction Motivation Co-regularized multi-domain graph clustering  Single domain graph clustering  Cross-domain Co-regularization  Residual sum of squares (RSS) loss  Clustering disagreement (CD) loss Re-evaluation cross-domain relationship Experimental Study Conclusion

3 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph and Graph Clustering Graphs are ubiquitous  social networks  biology interaction networks  literature citation networks, etc Graphs clustering  Decompose a network into sub-networks based on some topological properties  Usually we look for dense sub-networks

4 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL E.g., Detect protein functional modules in a PPI network from Nataša Pržulj – Introduction to Bioinformatics. 2011.

5 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL E.g., Community Detection Collaboration network between scientists from Santo Fortunato –Community detection in graphs

6 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Multi-view Graph clustering Graphs collected from multiple sources/domains Multi-view graph clustering  Refine clustering  Resolve ambiguity

7 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Motivation Multi-view  Exact one-to-one  Complete mapping  The same size More common cases  Many-to-many  Tolerate partial mapping  Different sizes  Mappings are associated with weights(confidence)

8 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Motivation Objective: design algorithm which is  Flexibility  Robustness Suitable for common cases : Many-to-many weighted partial mappings Suitable for common cases : Many-to-many weighted partial mappings Flexibility and Robustness Noisy graphs have little influence on others

9 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Problem Formulation A(1)A(1) A(2)A(2) A(3)A(3) affinity matrix S a,b (i,j) denotes the weight between the a-th instance in D j and the b-th instance in D i.  To partition each A (π) into k π clusters while considering the co-regularized constraints implicitly encoded in cross-domain relationships in S.

10 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Co-regularized multi-domain graph clustering (CGC) Single-domain Clustering  Symmetric Non-negative matrix factorization (NMF).  Minimizing:  Here,, where each represents the cluster assignment of the a-th instance in domain D π

11 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Co-regularized multi-domain graph clustering (CGC) Cross-domain Co-regularization  Residual sum of squares (RSS) loss (when the number of clusters is the same for different domains).  Clustering disagreement (CD) loss (when the number of clusters is the same or different).

12 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Co-regularized multi-domain graph clustering (CGC) Residual sum of squares (RSS) loss  Directly compare the H (π) inferred in different domains.  To penalize the inconsistency of cross-domain cluster partitions for the l-th cluster in D i, the loss for the b-th instance is where denotes the set of indices of instances in D i that are mapped to, and is its cardinality.  The RSS loss is e

13 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Co-regularized multi-domain graph clustering (CGC) Clustering disagreement (CD)  Indirectly measure the clustering inconsistency of cross-domain cluster partitions.  Intuition: A ⃝ and B ⃝ are mapped to 2 ⃝, and C is mapped to 4 ⃝. Intuitively, if the similarity between cluster assignments for 2 ⃝ and 4 ⃝ is small, then the similarity of clustering assignments between A ⃝ and C ⃝ and the similarity between B ⃝ and C ⃝ should also be small.  The CD loss is Linear kernel

14 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Co-regularized multi-domain graph clustering (CGC) Objective function (Joint Matrix Optimization): Can be solved with an alternating scheme: optimize the objective with respect to one variable while fixing others.

15 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Re-Evaluating Cross-Domain Relationship The cross-domain instance relationship based on prior knowledge may contain noise. It is crucial to allow users to evaluate whether the provided relationships violate any single-domain clustering structures.

16 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Re-Evaluating Cross-Domain Relationship We only need to slightly modify the co-regularization loss functions by multiplying a confidence matrix Optimize: Sort the values of W (i,j) and report to users the smallest elements.

17 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Data sets:  UCI (Iris, Wine, Ionosphere, WDBC)  Construct two cross-domain relationships: Iris-Wine, Ionosphere-WDBC, (positive/negative instances only mapped to positive/negative instances in another domain)  Newsgroup data (6 groups from 20 Newsgroups)  comp.os.ms-windows.misc, comp.sys.ibm.pc.hardware, comp.sys.mac.hardware, (3 comp)  rec.motorcycles, rec.sport.baseball, rec.sport.hockey (3 rec)  protein-protein interaction (PPI) networks (from BioGrid), gene co-expression networks (from Gene Expression Ominbus), genetic interaction network (from TEAM) Experimental Study

18 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Effectiveness (UCI data set)

19 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Robustness Evaluation (UCI)

20 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Re-Evaluating Cross-Domain Relationship (UCI)

21 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Binary v.s. Weighted Relationship

22 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Binary v.s. Weighted Relationship

23 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Protein Module Detection by Integrating Multi-Domain Heterogeneous Data 5412 genes490032 genetic markers across 4890 (1952 disease and 2938 healthy) samples. We use 1 million top-ranked genetic marker pairs to construct the network and the test statistics as the weights on the edges

24 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Protein Module Detection: Evaluation: standard Gene Set Enrichment Analysis (GSEA)  we identify the most significantly enriched Gene Ontology categories  significance (p-value) is determined by the Fisher’s exact test  raw p-values are further calibrated to correct for the multiple testing problem

25 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Protein Module Detection: Comparison of CGC and single-domain graph clustering (k = 100)

26 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Protein Module Detection:

27 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Conclusion In this paper…  We propose a flexible co-regularized method, CGC, to tackle the many-to-many, weighted, partial mappings for multi-domain graph clustering.  CGC utilizes cross-domain relationship as co- regularizing penalty to guide the search of consensus clustering structure.  CGC is robust even when the cross-domain relationships based on prior knowledge are noisy.

28 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Thank You ! Questions?

29 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Performance Evaluation


Download ppt "The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co-Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo."

Similar presentations


Ads by Google