The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph Regularized Dual Lasso for Robust eQTL Mapping Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yu Shi 3 Wei.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Efficient Algorithms for Imputation of Missing SNP Genotype Data A.Mihajlović, V. Milutinović,
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Finding Local Linear Correlations in High Dimensional Data Xiang Zhang Feng Pan Wei Wang University of.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FastANOVA: an Efficient Algorithm for Genome-Wide Association Study Xiang Zhang Fei Zou Wei Wang University.
Seeing the forest for the trees : using the Gene Ontology to restructure hierarchical clustering Dikla Dotan-Cohen, Simon Kasif and Avraham A. Melkman.
Mutual Information Mathematical Biology Seminar
Evaluation and optimization of clustering in gene expression data analysis A. Fazel Famili, Ganming Liu and Ziying Liu National Research Council of Canada.
Simulation and Application on learning gene causal relationships Xin Zhang.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Fast identification and statistical evaluation of segmental homologies in comparative maps Peter Calabrese 1, Sugata Chakravarty 2 and Todd Vision 3 1.
Genome-Wide Association Studies Xiaole Shirley Liu Stat 115/215.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co-Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Graph-based Analytics
Comments on Rare Variants Analyses Ryo Yamada Kyoto University 2012/08/27 Japan.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Cis-regulation Trans-regulation 5 Objective: pathway reconstruction.
Radiogenomics in glioblastoma multiforme
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Jesse Gillis 1 and Paul Pavlidis 2 1. Department of Psychiatry and Centre for High-Throughput Biology University of British Columbia, Vancouver, BC Canada.
Networks and Interactions Boo Virk v1.0.
Fine mapping QTLs using Recombinant-Inbred HS and In-Vitro HS William Valdar Jonathan Flint, Richard Mott Wellcome Trust Centre for Human Genetics.
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Analysis of the yeast transcriptional regulatory network.
Input: A set of people with/without a disease (e.g., cancer) Measure a large set of genetic markers for each person (e.g., measurement of DNA at various.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Sequential & Multiple Hypothesis Testing Procedures for Genome-wide Association Scans Qunyuan Zhang Division of Statistical Genomics Washington University.
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Recent Research and Development on Microarray Data Mining Shin-Mu Tseng 曾新穆 Dept. Computer Science and Information Engineering.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
C2BAT: Using the same data set for screening and testing. A testing strategy for genome-wide association studies in case/control design Matt McQueen, Jessica.
Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Lecture 22: Quantitative Traits II
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Wei Cheng 1 Xiang Zhang 2 Feng Pan 3 Wei Wang 4 1.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Computer Science and Engineering PhD in Computer Science Monday, November 07, :00 a.m. – 11:00 a.m. Swearingen Conference Room 3A75 Network Based.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
High resolution QTL mapping in genotypically selected samples from experimental crosses Selective mapping (Fig. 1) is an experimental design strategy for.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
EQTLs.
Genome Wide Association Studies using SNP
Recovering Temporally Rewiring Networks: A Model-based Approach
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Chinaza Nnawulezi Research Mentors: Arjun Krishnan and Jianrong Wang
SEG5010 Presentation Zhou Lanjun.
CRISP: Consensus Regularized Selection based Prediction
Presentation transcript:

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph Regularized Dual Lasso for Robust eQTL Mapping Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yu Shi 3 Wei Wang 4 1 University of North Carolina at Chapel Hill, 2 Case Western Reserve University, 3 University of Science and Technology of China, 4 University of California, Los Angeles Speaker: Wei Cheng The 22 th Annual International Conference on Intelligent Systems for Molecular Biology (ISMB’14)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL eQTL (Expression QTL) Goal: Identify genomic locations where genotype significantly affects gene expression.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Partition individuals into groups according to genotype of a SNP Do a statistic (t, ANOVA) test Repeat for each SNP Statistical Test SNPs (X) Gene expression levels (Z) individuals SNP Gene expression level

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Lasso-based feature selection  X: the SNP matrix (each row is one SNP)  Z: the gene expression matrix (each row is one gene expression level)  Objective:

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Incorporating prior knowledge SNPs (and genes) usually are not independent The interplay among SNPs and the interplay among genes can be represented as networks and used as prior knowledge  Prior knowledge: genetic interaction network, PPI network, gene co-expression network, etc. E.g., group lasso, multi-task, SIOL, MTLasso 2G.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Limitations of current methods A clustering step is usually needed to obtain the grouping information. Do not take into consideration the incompleteness of the prior knowledge and the noise in them  E.g., PPI networks may contain many false interactions and miss true interactions Other prior knowledge, such as location and gene pathway information, are not considered.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Motivation Examples of prior knowledge on genetic interaction network S and gene-gene interactions represented by PPI network (or gene co-expression network G).W is the regression coefficients to be learned.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL GD-Lasso: Graph-regularized Dual Lasso Objective: Lasso objective considering confounding factors (L), ||L|| * is the nuclear norm to control L as low-rank. The graph regularizer The fitting constraint for prior knowledge

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL GGD-Lasso: Generalized Graph-regularized Dual Lasso Further incorporating location and pathway information. Objective: D(·, ·) is a nonnegative distance measure.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL GGD-Lasso: Optimization Executes the following two steps iteratively until the termination condition is met:  1) update W while fixing S and G;  2) update S and G according to W, while decreasing:  and  We can maintain a fixed number of edges in S and G. E.g., to update G, we can swap edge (i’, j’) and edge (i,j) when Further integrate location and pathway information

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study: simulation 10 gene expression profiles are generated by ~ ~ ~

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study: simulation The ROC curve. The black solid line denotes what random guessing would have achieved.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study: simulation AUCs of Lasso, LORS, G-Lasso and GD-Lasso. In each panel, we vary the percentage of noises in the prior networks S 0 and G 0.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study: Yeast yeast eQTL dataset  112 yeast segregants generated from a cross of two inbred strains: BY and RM;  removing those SNP markers with percentage of NAs larger than 0.1 (the incomplete SNPs are imputed), and merging those markers with the same genotypes, dropping genes with missing values;  get 1017 SNP markers, 4474 expression profiles; Genetic interaction network and PPI network (S and G)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study: Yeast cis-enrichment analysis  (1) one-tailed Mann-Whitney: test on each SNP for cis hypotheses;  (2) a paired Wilcoxon sign-rank: test on the p-values obtained from (1). trans-enrichment:  Similar strategy: genes regulated by transcription factors (TF) are used as trans-acting signals.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study: Yeast Pairwise comparison of different models using cis-enrichment and trans-enrichment analysis

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study: Yeast Summary of the top-15 hotspots detected by GGD-Lasso. Hotspot (12) in bold cannot be detected by G-Lasso. Hotspot (6) in italic cannot be detected by SIOL. Hotspot (3) in teletype cannot be detected by LORS.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experimental Study: Yeast Hotspots detected by different methods

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Conclusion In this paper…  We propose novel and robust graph regularized regression models to take into account the prior networks of SNPs and genes simultaneously.  Exploiting the duality between the learned coefficients and incomplete prior networks enables more robust model.  We also generalize our model to integrate other types of information, such as location and gene pathway information.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Thank You ! Questions? Travel funding to ISMB 2014 was generously provided by DOE