Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
The multi-layered organization of information in living systems
Synthetic lethal analysis of Caenorhabditis elegans posterior embryonic patterning genes identifies conserved genetic interactions L Ryan Baugh, Joanne.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Design principle of biological networks—network motif.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
Detecting Orthologs Using Molecular Phenotypes a case study: human and mouse Alice S Weston.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Alternative splicing and evolution Daniel Jeffares.
1 Topology, Functionality and Evolution of Metabolic Networks Jing Zhao Shanghai Center for Bioinformation and Technology 28, September,
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Fuzzy K means.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Biological networks: Types and origin
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Protein Classification A comparison of function inference techniques.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Analyzing transcription modules in the pathogenic yeast Candida albicans Elik Chapnik Yoav Amiram Supervisor: Dr. Naama Barkai.
Networks and Interactions Boo Virk v1.0.
Network Biology Presentation by: Ansuman sahoo 10th semester
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Small RNAs and their regulatory roles. Presented by: Chirag Nepal.
Supplementary Figure S1 eQTL prior model modified from previous approaches to Bayesian gene regulatory network modeling. Detailed description is provided.
Construction of Substitution Matrices
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Changes in Gene Regulation in Δ Zap1 Strain of Saccharomyces cerevisiae due to Cold Shock Jim McDonald and Paul Magnano.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Top X interactions of PIN Network A interactions Coverage of Network A Figure S1 - Network A interactions are distributed evenly across the top 60,000.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Cluster validation Integration ICES Bioinformatics.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Robustness, clustering & evolutionary conservation Stefan Wuchty Center of Network Research Department of Physics University of Notre Dame title.
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Joshua M. Stuart, Eran Segal, Daphne Koller, Stuart K. Kim
Basics of Comparative Genomics
System Structures Identification
Functional Genomics and Gene Network Analysis
Ahnert, S. E., & Fink, T. M. A. (2016). Form and function in gene regulatory networks: the structure of network motifs determines fundamental properties.
Presented by Meeyoung Park
CSCI2950-C Lecture 13 Network Motifs; Network Integration
SEG5010 Presentation Zhou Lanjun.
Volume 3, Issue 1, Pages (July 2016)
Volume 12, Issue 6, Pages (December 2003)
Michal Levin, Tamar Hashimshony, Florian Wagner, Itai Yanai 
Basics of Comparative Genomics
Predicting Gene Expression from Sequence
Presentation transcript:

Comparative Expression Moran Yassour +=

Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes interact Distinguish between accidentally regulated genes from those that are physiologically important

Construction of a gene- coexpression network. Evolutionarily diverse organisms with extensive microarray data: Homo sapiens Drosophila melanogaster Caenorhabditis elegans Saccharomyces cerevisiae. We first associated genes from one organism with their orthologous counterparts in other organisms.

Evolution 101 Paralogs vs. Orthologs

Evolution 101 Paralogs vs. Orthologs

Construct a metagene Using this method, we assigned each gene to at most a single metagene. ignore non- reciprocal hits identify connected components Human gene Fly gene Worm gene Yeast gene best BLAST hit MEG

Some numbers In total we have 6307 metagenes (6591 human genes, 5180 worm genes, 5802 fly genes, and 2434 yeast genes.) We sought to identify pairs of metagenes that not only were coexpressed in one experiment and in one organism but that also showed correlation in diverse experiments in multiple organisms.

Edges in the graph HumanFlyWorm MEG1 MEG2 ? MEG1MEG {2,4,2} significant ? (P-value <? 0.05)  draw an edge

Statistical tests (1) – permuted metagenes Construction of a network from a set of permuted metagenes (random collection of genes from each organism) At P < 0.05, the real networks contained 3.5 ± 0.03 times as many interactions as the random networks contained

Statistical tests (2) – half the data Split microarray data into halves  two networks We then counted the fraction of interactions that were significant in one network (P < 0.05), given that they were significant in the other network at P < p for various values of p. P = 0.05  41% significant expression interactions

Statistical tests (3) – noise stability We added increasing levels of Gaussian noise to the entire data set for each of the organisms. Real network negative log P-value Noise negative log P-value

Visualization x-y plane – negative logarithm of P value K-means clustering z axis – density of genes in the region

Example – Component 5 A total of 241 metagenes 110 of which were previously known to be involved in the cell cycle. 202 cell cycle metagenes in the network. P-value < Of the 241 cell cycle metagenes: 30 – regulating the cell cycle. 80 – terminal cell cycle functions. 131 – unknown.

Experimental validation (1) – expression data Five metagenes with a significant number of links to known cell proliferation genes. Measuring expression levels in dividing pancreatic cancer cells and in nondividing normal cells.

Experimental validation (2) – loss-of-function mutant loss-of-function mutant phenotype for one of these genes (C. elegans gene ZK652.1) RNA interference (RNAi) of ZK652.1 resulted in excess nuclei in the germ line, suggesting that the wild- type function of this gene is to suppress germline proliferation.

Multi-species vs. single species (1) For each gene (of the five metagenes), we constructed an organism-specific neighborhood. On average, the neighborhoods of these five genes were over four times more enriched for cell proliferation and cell cycle genes in the multiple-species network than they were in the best single-species neighborhood.

Multi-species vs. single species (2) Trying to link together genes that were previously known to be involved in a single function (coverage) excluding genes not known to participate in that function (accuracy)

Huge data The multiple-species network was built from more DNA microarray data (3182). Construction of the network out of only 979 DNA microarrays (as in the worm data set) gave similar results.

Summary - Multi is good We map only genes that have orthologs in other species and thus focuses strongly on core, conserved biological processes; Interactions in the multiple-species network imply a functional relationship based on evolutionary conservation. Nice to have – analysis of other components.

Goal Comparative study of large datasets of expression profiles from six evolutionarily distant organisms:

Goal Coexpression is often conserved. Comparing the regulatory relationships between particular functional groups in the different organisms. Comparing global topological properties of the transcription networks derived from the expression data, using a graph theoretical approach.

Homologous gene with preserved function

Coexpression conservation Coexpressed groups - yeast transcription modules For each yeast module we constructed five “homologue modules”.

Refining homologue modules The signature algorithm identifies those homologues that are coexpressed under a subset of the experimental conditions. Furthermore, it reveals additional genes that are not homologous with any of the original genes, but display a similar expression pattern under those conditions

Correlation distribution the distribution of the Z-scores for the average gene–gene correlation of all the “homologue modules”

Higher-order regulatory structures

Cell Cycle Experiments

Subsets of the data Correlations between the sets of conditions for randomly selected subsets of the data. Although the data is sparse, the findings reflect real properties of the expression network.

Decomposition of the expression data Decomposition of the expression data into a set of transcription modules using the iterative signature algorithm (ISA) Modules are colored according to the fraction of homologues they possess in the other organism Protein synthesis

Power-law connectivity distribution

Connections & Connectivity Connections between genes of similar connectivity are enhanced (red regions) Connections between highly and weakly connected genes are suppressed (blue)

Essentiality & Connectivity The likelihood of a gene to be essential increases with its connectivity.

Homology & Connectivity The highly connected genes are more likely to have homologues in the other organisms

Summary Similarity in lower resolution, differences in higher resolution: All expression networks share common topological properties (scale-free connectivity distribution, high degree of modularity). The modular components of each transcription program as well as their higher-order organization appear to vary significantly between organisms and are likely to reflect organism-specific requirements.

Future Gene expression studies Evolution studies

Thank you …