The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co-Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo.

Slides:



Advertisements
Similar presentations
Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.
Advertisements

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

Detecting active subnetworks in molecular interaction networks with missing data Luke Hunter Texas A&M University SHURP 2007 Student.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Finding Local Linear Correlations in High Dimensional Data Xiang Zhang Feng Pan Wei Wang University of.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FastANOVA: an Efficient Algorithm for Genome-Wide Association Study Xiang Zhang Fei Zou Wei Wang University.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph Regularized Dual Lasso for Robust eQTL Mapping Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yu Shi 3 Wei.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
An Unsupervised Learning Approach for Overlapping Co-clustering Machine Learning Project Presentation Rohit Gupta and Varun Chandola
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
LinkedPPI: Enabling Intutive, Integrative Protein-Protein Interaction Discovery Laleh Kazemzadeh, Maulik R.Kamdar, Oya D.Beyan, Stefan Decker, Frank Barry.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Graph-based Analytics
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
BeeSpace Informatics Research: From Information Access to Knowledge Discovery ChengXiang Zhai Nov. 7, 2007.
Networks and Interactions Boo Virk v1.0.
Hyun, Bora. Contents Introduction Background & Motivation PreSPI++ Evaluation of PreSPI++ Method DCPPW++ Evaluation Conclusion 2ISI LABORATORY.
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles Jin Chen Sep 2012.
ReferencesReferences AcknowledgementsAcknowledgements TORQUE server DefinitionsDefinitions MethodsMethods IntroductionIntroduction Experiments & Results.
Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]
Discovering the Correlation Between Evolutionary Genomics and Protein-Protein Interaction Rezaul Kabir and Brett Thompson
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Qiong Cheng, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Oct IEEE 7 th International Conference on BioInformatics.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Modeling Ultra-high Dimensional Feature Selection as a Slow Intelligence System Wang Yingze CS 2650 Project.
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
NetQuest: A Flexible Framework for Large-Scale Network Measurement Lili Qiu University of Texas at Austin Joint work with Han Hee Song.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Wei Cheng 1 Xiang Zhang 2 Feng Pan 3 Wei Wang 4 1.
1 Using Graph Theory to Analyze Gene Network Coherence José A. Lagares Jesús S. Aguilar Norberto Díaz-Díaz Francisco A. Gómez-Vela
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Outline Introduction State-of-the-art solutions
Spectral methods for Global Network Alignment
Semi-Supervised Clustering
Jinbo Bi Joint work with Jiangwen Sun, Jin Lu, and Tingyang Xu
机器感知与智能教育部重点实验室学术报告 Key Laboratory of Machine Perception (Minister of Education) Peking University Scalable, Robust and Integrative Algorithms for Analyzing.
Learning with information of features
Functional Coherence in Domain Interaction Networks
Spectral methods for Global Network Alignment
SEG5010 Presentation Zhou Lanjun.
Data Mining for Finding Connections of Disease and Medical and Genomic Characteristics Vipin Kumar William Norris Professor and Head, Department of Computer.
Label propagation algorithm
Presentation transcript:

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co-Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yubao Wu 2 Patric F. Sullivan 1 Wei Wang 3 1 University of North Carolina at Chapel Hill, 2 Case Western Reserve University, 3 University of California, Los Angeles Speaker: Wei Cheng The 19 th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD’13)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Outline Introduction Motivation Co-regularized multi-domain graph clustering  Single domain graph clustering  Cross-domain Co-regularization  Residual sum of squares (RSS) loss  Clustering disagreement (CD) loss Re-evaluation cross-domain relationship Experimental Study Conclusion

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph and Graph Clustering Graphs are ubiquitous  social networks  biology interaction networks  literature citation networks, etc Graphs clustering  Decompose a network into sub-networks based on some topological properties  Usually we look for dense sub-networks

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL E.g., Detect protein functional modules in a PPI network from Nataša Pržulj – Introduction to Bioinformatics

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL E.g., Community Detection Collaboration network between scientists from Santo Fortunato –Community detection in graphs

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Multi-view Graph clustering Graphs collected from multiple sources/domains Multi-view graph clustering  Refine clustering  Resolve ambiguity

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Motivation Multi-view  Exact one-to-one  Complete mapping  The same size More common cases  Many-to-many  Tolerate partial mapping  Different sizes  Mappings are associated with weights(confidence)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Motivation Objective: design algorithm which is  Flexibility  Robustness Suitable for common cases : Many-to-many weighted partial mappings Suitable for common cases : Many-to-many weighted partial mappings Flexibility and Robustness Noisy graphs have little influence on others

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Problem Formulation A(1)A(1) A(2)A(2) A(3)A(3) affinity matrix S a,b (i,j) denotes the weight between the a-th instance in D j and the b-th instance in D i.  To partition each A (π) into k π clusters while considering the co-regularized constraints implicitly encoded in cross-domain relationships in S.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Co-regularized multi-domain graph clustering (CGC) Single-domain Clustering  Symmetric Non-negative matrix factorization (NMF).  Minimizing:  Here,, where each represents the cluster assignment of the a-th instance in domain D π

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Co-regularized multi-domain graph clustering (CGC) Cross-domain Co-regularization  Residual sum of squares (RSS) loss (when the number of clusters is the same for different domains).  Clustering disagreement (CD) loss (when the number of clusters is the same or different).

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Co-regularized multi-domain graph clustering (CGC) Residual sum of squares (RSS) loss  Directly compare the H (π) inferred in different domains.  To penalize the inconsistency of cross-domain cluster partitions for the l-th cluster in D i, the loss for the b-th instance is where denotes the set of indices of instances in D i that are mapped to, and is its cardinality.  The RSS loss is e

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Co-regularized multi-domain graph clustering (CGC) Clustering disagreement (CD)  Indirectly measure the clustering inconsistency of cross-domain cluster partitions.  Intuition: A ⃝ and B ⃝ are mapped to 2 ⃝, and C is mapped to 4 ⃝. Intuitively, if the similarity between cluster assignments for 2 ⃝ and 4 ⃝ is small, then the similarity of clustering assignments between A ⃝ and C ⃝ and the similarity between B ⃝ and C ⃝ should also be small.  The CD loss is Linear kernel

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Co-regularized multi-domain graph clustering (CGC) Objective function (Joint Matrix Optimization): Can be solved with an alternating scheme: optimize the objective with respect to one variable while fixing others.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Re-Evaluating Cross-Domain Relationship The cross-domain instance relationship based on prior knowledge may contain noise. It is crucial to allow users to evaluate whether the provided relationships violate any single-domain clustering structures.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Re-Evaluating Cross-Domain Relationship We only need to slightly modify the co-regularization loss functions by multiplying a confidence matrix Optimize: Sort the values of W (i,j) and report to users the smallest elements.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Data sets:  UCI (Iris, Wine, Ionosphere, WDBC)  Construct two cross-domain relationships: Iris-Wine, Ionosphere-WDBC, (positive/negative instances only mapped to positive/negative instances in another domain)  Newsgroup data (6 groups from 20 Newsgroups)  comp.os.ms-windows.misc, comp.sys.ibm.pc.hardware, comp.sys.mac.hardware, (3 comp)  rec.motorcycles, rec.sport.baseball, rec.sport.hockey (3 rec)  protein-protein interaction (PPI) networks (from BioGrid), gene co-expression networks (from Gene Expression Ominbus), genetic interaction network (from TEAM) Experimental Study

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Effectiveness (UCI data set)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Robustness Evaluation (UCI)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Re-Evaluating Cross-Domain Relationship (UCI)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Binary v.s. Weighted Relationship

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Binary v.s. Weighted Relationship

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Protein Module Detection by Integrating Multi-Domain Heterogeneous Data 5412 genes genetic markers across 4890 (1952 disease and 2938 healthy) samples. We use 1 million top-ranked genetic marker pairs to construct the network and the test statistics as the weights on the edges

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Protein Module Detection: Evaluation: standard Gene Set Enrichment Analysis (GSEA)  we identify the most significantly enriched Gene Ontology categories  significance (p-value) is determined by the Fisher’s exact test  raw p-values are further calibrated to correct for the multiple testing problem

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Protein Module Detection: Comparison of CGC and single-domain graph clustering (k = 100)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Protein Module Detection:

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Conclusion In this paper…  We propose a flexible co-regularized method, CGC, to tackle the many-to-many, weighted, partial mappings for multi-domain graph clustering.  CGC utilizes cross-domain relationship as co- regularizing penalty to guide the search of consensus clustering structure.  CGC is robust even when the cross-domain relationships based on prior knowledge are noisy.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Thank You ! Questions?

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study Performance Evaluation