HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.

Slides:



Advertisements
Similar presentations
Statistical Data Fusion to Prioritize Lists of Genes Bert Coessens, Stein Aerts Departement ESAT - SCD Katholieke Universiteit Leuven Promotor: Bart De.
Advertisements

Basic Gene Expression Data Analysis--Clustering
Periodic clusters. Non periodic clusters That was only the beginning…
Zhen Shi June 2, 2010 Journal Club. Introduction Most disease-causing mutations are thought to confer radical changes to proteins (Wang and Moult, 2001;
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
Improving miRNA Target Genes Prediction Rikky Wenang Purbojati.
Gene prioritization through genomic data fusion Aerts et. al., Nature Biotechnology, 24, , 2006 November 21st, 2008 ENDEAVOUR
Threshold selection in gene co- expression networks using spectral graph theory techniques Andy D Perkins*,Michael A Langston BMC Bioinformatics 1.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Detecting Orthologs Using Molecular Phenotypes a case study: human and mouse Alice S Weston.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Networks and Algorithms in Bio-informatics D. Frank Hsu Fordham University *Joint work with Stuart Brown; NYU Medical School Hong Fang.
Classical tree view of cell cycle data (Spellman, et al MolBiolCell 9, 3273)
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
A Bioinformatics Meta-analysis of Differentially Expressed Genes in Colorectal Cancer Simon Chan, Thursday Trainee Seminar – October 11.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Overview  Introduction  Biological network data  Text mining  Gene Ontology  Expression data basics  Expression, text mining, and GO  Modules and.
Network Analysis and Application Yao Fu
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Functional annotation and identification of candidate disease genes by computational analysis of normal tissue gene expression data L. Miozzi 1, U. Ala.
Networks and Interactions Boo Virk v1.0.
CANDID: A candidate gene identification tool Janna Hutz March 19, 2007.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
Anindya Bhattacharya and Rajat K. De Bioinformatics, 2008.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Gene expression analysis
Supplementary Figure S1 eQTL prior model modified from previous approaches to Bayesian gene regulatory network modeling. Detailed description is provided.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Merge links between probes by Entrez Gene identifiers Genes and proteins of living organisms deploy their functions through a complex series of interactions.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Gene Expression Platforms for Global Co-Expression Analyses A Comparison of spotted cDNA microarrays, Affymetrix microarrays, and SAGE Obi Griffith, Erin.
EB3233 Bioinformatics Introduction to Bioinformatics.
Using blast to study gene evolution – an example.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Functional prediction methods. The usual troubles of the molecular and cellular biology labs What are the functions of a previously non characterized.
Two powerful transgenic techniques Addition of genes by nuclear injection Addition of genes by nuclear injection Foreign DNA injected into pronucleus of.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
GO based data analysis Iowa State Workshop 11 June 2009.
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Phenotype And Trait Ontology (PATO) and plant phenotypes
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
Peter John M.Phil, PhD Atta-ur-Rahman School of Applied Biosciences (ASAB) National University of Sciences & Technology (NUST)
Presented by Meeyoung Park
VWF sequence variants: innocent until proven guilty
Walking the Interactome for Prioritization of Candidate Disease Genes
Anastasia Baryshnikova  Cell Systems 
Modeling cells with protein networks
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Integrating human omics data to prioritize candidate genes
Characteristics of tissue‐specific co‐expression networks (CNs)‏
TFs and predicted regulatory networks for the tissue- and lineage-dependent clusters 2, 3, and 9. TFs and predicted regulatory networks for the tissue-
Presentation transcript:

HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P. and Di Cunto F. Molecular Biotechnology Center, University of Torino

Introduction Massive repositories of gene expression data obtained with microarray technology represent an extremely rich source of biological information; Since genes involved in the same functions tend to show very similar expression profiles, co-expression analysis performed on these datasets could be a very powerful approach for inferring functional relationships among genes and for predicting the involvement of specific sequences in human genetic diseases; However, so far gene co-expression has not proved to be a particularly useful criterion for disease genes identification.

Reasons 1.Microarray data are noisy 2.Many genes showing very similar expression profiles are not functionally related (Spellman et al, 2002) Functional relationships inferred on the basis of co-expression in a single species contain a large majority of false positive predictions.

A powerful help: phylogenetic conservation Since gene regulatory regions evolve at higher speed than coding regions, if the co-expression of two genes is evolutionarily conserved, it is much more likely that the genes are functionally related. Obviously, the confidence level increases with the phylogenetic distance among species. A gene co-expression network constructed with expression data from distant species (H. sapiens, C. elegans, D. melanogaster, S. cerevisiae) (Stuart et al, 2003 )

Human-mouse conserved co-expression represents an excellent compromise between sensitivity and specificity to predict functional relationships among mammalian genes (Pellegrino et al, 2004) A powerful help: phylogenetic conservation

Evaluation of gene expression profile correlation among all the probes by Pearson’s coefficient Single-species datasets of microarray experiments, based on probes which can be linked to EntrezGene IDs Link every probe with the probes which are in the first percentile of the respective ranked lists Merge links between probes by Entrez Gene identifiers Construction of human-mouse conserved coexpression networks for disease gene prediction Step one: single species networks Homo sapiens Mus musculus Human gene co-expression networks H-GCN Mouse gene co-expression networks M-GCN

Select the links found in both the co-expression networks, according to Homologene Construction of human-mouse conserved coexpression networks for disease gene prediction Step two: human-mouse networks Human gene co-expression networks H-GCN Mouse gene co-expression networks M-GCN Human-mouse co-expression network

Conserved co-expression networks Data retrieval 4129 experiments for EST probes for human 467 experiments for EST probes for mouse 353 experiments for probesets for human (Roth et al, 2006) 122 experiments for probesets for mouse (Su et al, 2004) Experiments based on cDNA platforms and performed mostly on tumor cell lines Experiments based on Affymetrix platforms and performed on normal tissues

8512 nodes (genes); edges; nodes (genes); edges; We concentrate our network analysis on CC (Co-expression cluster) defined as the nearest neighbors of each node of networks, thus obtaining a CC for each gene Conserved co-expression networks Results

in vivo in vitroyeast-two-hybrid A-GCN S-GCN Random Both networks exhibit a highly significant overlap with protein-protein interactions reported in the Human Protein Reference Database Conserved co-expression networks Comparison with other networks Good protein-protein predictors

A-CCN S-CCN Random A-CCN and S-CCN show a strong enrichment for functional annotation, compared with random permutations. Conserved co-expression networks GO Analysis Good criterion to identify functionally related genes

Predicting human disease genes MimMiner (Van Driel et al, 2006), a text-mining phenotype similarity relationship database, represents a very useful way for the merging of co-expression data with disease information. A-CCN S-CCN Random A-CCN and S-CCN show also a strong enrichment for what concern OMIM Ids characterizing disease phenotype.

OMIM locus (phenotype description) CCs Conserved Co-expression clusters How to of the algorithm (1)

OMIM locus (phenotype description) CCs Conserved Co-expression clusters DRCCs Disease Related Co-expression Clusters How to of the algorithm (2)

OMIM locus (phenotype description) DRCCs Disease Related Co-expression Clusters How to of the algorithm (3) These genes become our candidate disease genes

Leave-one-out Leave-one-out cross validation tests over all known disease genes have shown good performance

We applied our procedure to 850 OMIM phenotype entries with unknown molecular basis (but mapped to one or more genetic loci). The candidates are 321, covering a set of 81 loci (65 from A-CCN, 6 from S-CCN and 10 from both networks) Predicting human disease genes Results

Examples and discussion of some candidates

Conclusions Our approach, based on conserved co-expression analysis, has been demonstrated particularly successful to provide reliable predictions of potential disease-causing genes because of two main factors: 1.the phylogenetic filter 2.the integration with quantitative phenotype correlation data In conclusion, we propose that our method and our list of candidates will provide a useful support for the identification of new disease-causing genes.

Our real network … Ala U. Piro R. Silengo L. Damasco C. Grassi E. Provero P. Di Cunto F. Brunner H.