DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.

Slides:



Advertisements
Similar presentations
Journal Club Jenny Gu October 24, Introduction Defining the subset of Superfamilies in LUCA Examine adaptability and expansion of particular superfamilies.
Advertisements

Biological pathway and systems analysis An introduction.
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Table 2 shows that the set TFsf-TGblbs of predicted regulatory links has better results than the other two sets, based on having a significantly higher.
Integrating Cross-Platform Microarray Data by Second-order Analysis: Functional Annotation and Network Reconstruction Ming-Chih Kao, PhD University of.
The multi-layered organization of information in living systems
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
Darwinian Genomics Csaba Pal Biological Research Center Szeged, Hungary.
Are we ready for… Genome-scale Metabolic Modeling in plants Yoav Teboulle October 2012 Collakova, E. et al. (2012). Are we ready for genome-scale modeling.
August 19, 2002Slide 1 Bioinformatics at Virginia Tech David Bevan (BCHM) Lenwood S. Heath (CS) Ruth Grene (PPWS) Layne Watson (CS) Chris North (CS) Naren.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Regulatory networks 10/29/07. Definition of a module Module here has broader meanings than before. A functional module is a discrete entity whose function.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Functional annotation and network reconstruction through cross-platform integration of microarray data X. J. Zhou et al
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Protein Interactions and Disease Audry Kang 7/15/2013.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
GTL User Facilities Facility II: Whole Proteome Analysis Michelle V. Buchanan.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Unit 1: The Language of Science  communicate and apply scientific information extracted from various sources (3.B)  evaluate models according to their.
EnrichNet: network-based gene set enrichment analysis Presenter: Lu Liu.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Network Analysis and Application Yao Fu
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Compare and contrast prokaryotic and eukaryotic cells.[BIO.4A] October 2014Secondary Science - Biology.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
PattArAn – From Annotation Triplets to Sentence Fingerprints Motivation Motivation  Scientific concepts are annotated with controlled vocabulary (CV)
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
Reconstruction of Transcriptional Regulatory Networks
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
Anis Karimpour-Fard ‡, Ryan T. Gill †,
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Decoding the Network Footprint of Diseases With increasing availability of data, there is significant activity directed towards correlating genomic, proteomic,
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Introduction to biological molecular networks
Shortest Path Analysis and 2nd-Order Analysis Ming-Chih Kao U of M Medical School
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Predicting Protein Function Annotation using Protein- Protein Interaction Networks By Tamar Eldad Advisor: Dr. Yanay Ofran Computational Biology.
1 Having genome data allows collection of other ‘omic’ datasets Systems biology takes a different perspective on the entire dataset, often from a Network.
PROTEIN INTERACTION NETWORK – INFERENCE TOOL DIVYA RAO CANDIDATE FOR MASTER OF SCIENCE IN BIOINFORMATICS ADVISOR: Dr. FILIPPO MENCZER CAPSTONE PROJECT.
David Amar, Tom Hait, and Ron Shamir
(3) Gene Expression Gene Expression (A) What is Gene Expression?
The Transcriptional Landscape of the Mammalian Genome
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Whole-cell models: combining genomics and dynamical modeling
Genomic Data Integration
Large Scale Data Integration
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Schedule for the Afternoon
Anastasia Baryshnikova  Cell Systems 
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Interactome Networks and Human Disease
Presentation transcript:

DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions

How do the genes and their products interact to collectively perform a function? A B Gene G 35 RPM Inhibitor U2AF Gene G Molecular interaction networks

A network containing genes connected to each other whenever they physically or functionally interact  Proteins that interact/co-complex (ribosomal, polymerase, etc.)  Transcription factors and their target  Enzymes catalyzing different steps in the same metabolic pathway  Genes with correlation in expression  Genes with similar phylogenetic profiles Functional ^

Arabidopsis is the primary model organism for plants  Complex organization from molecular to whole organism level.  A key challenge …  Understanding the cellular machinery that sustains this complexity.  In the current post-genomic times, a main aspect of this challenge is ‘ gene function prediction ’:  Identification of functions of all the (~30, 000) genes in the genome.

Total of ~30,000 genes in the genome Extent of gene annotations in Arabidopsis ~15% with some experimental annotation ~8% with ‘expert’ annotation ~13% with annotations based on manually curated computational analysis ~14% with electronic annotations Leaving ~50% of the genome without any annotation Ashburner et al, (2000) Nat. Gen. Swarbreck et al (2008) Nuc. Acids. Res.

Exploit high-throughput data  Integrating functional genomic data could lead to  Network models of gene interactions that resemble the underlying cellular map.  Typically these networks contain gene functional interactions  Connecting pairs of genes that participate in the same biological processes.  In such a network, the very place of a gene establishes the functional context that gene.  ‘Guilt-by-association’ – genes of unknown functions can also be imputed with the function of their annotated neighbors.

Functional interaction networks  Functional interaction network models have been developed for Arabidopsis.  Lee et al. (2010) Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana.  Very comprehensive in terms of using and integrating datasets in other organisms for application in plants.  Integrated 24 datasets: 5 datasets from Arabidopsis and the rest from other models.  AraNet: 19,647 genes, 1,062,222 interactions.

Goal of this study …  We examine the state of network-based gene function prediction in Arabidopsis.  Evaluate the performance of multiple prediction algorithms on AraNet.  Assesses the influence of the number of genes annotated to a function and the source of annotation evidence.  Compute the correlation of prediction performance with network properties.  Evaluate prediction performance for plant-specific functions.

Network-based gene function prediction algorithms Propagation of functional annotations across the network Guilt-by-association using direct interactions Use positive and negative examples Use only positive examples SinkSource Hopfield FunctionalFlow – multiple phases Local FunctionalFlow – 1 phase Local+ Each gene in the network

Network-based gene function prediction

 Function A  Function B Network-based gene function prediction

Sink Source In this study … Recall : fraction of known examples predicted correctly TP (TP + FN) Precision : fraction of predictions that are correct TP (TP + FP)

Performance of different algorithms  Computational gene function prediction precedes and guides experimental validation  What we get is a ranked list of novel predictions  An experimenter would choose a manageable number of top-scoring predictions to pursue  Precision at the top of the prediction list  We choose precision at 20% recall ( P20R ) as the measure of performance

Performance of different algorithms SS seems to be better than the other algorithms What about the influence of the number of genes in a function? 3 rd quartile 1 st quartile Median Using only annotations based on experimental/expert evidence

Performance of different algorithms Third groupFirst groupSecond group Number of genes annotated with a function Number of functions Each group containing ~125 functions

Performance of different algorithms For ‘small’ functions, the algorithm does not matter! And, using just experimental annotations is better when you know little about a function. For ‘medium’ functions, SS is a little better and use of ‘electronic’ evidences is mixed. For ‘large’ functions -SS is clearly the best - Using all annotation is better

Performance of different algorithms All ECsSans IEA/ISS Wilcoxon test: SS vs. other algorithms Overall, SinkSource appears to be best algorithm.

Correlation of performance with network properties  Performance on a particular function might depend on how its genes are organized / connected among themselves in the network.  Number of nodes  Number of components  Fraction of nodes in the largest connected component  Total edge weight  Weighted density  Average weighted degree  Average segregation

Correlation of performance with network properties

 Number of nodes = 9  Number of components = 3  Fraction of nodes in the largest connected component = 4/9  Total edge weight = 8  Weighted density = 8/36  Average weighted degree = 16/9

Correlation of performance with network properties Functional modularity: Average Segregation

Correlation of performance with network properties  Avg. seg = 8/22  Avg. seg = 12/15 Functional modularity: Average Segregation

 We have …  Vector of SS P20R values for each function  Vector of values of a particular topological property for each function  Spearman rank correlation Correlation of performance with network properties Weighted density P20R

Correlation of performance with network properties Spearman rank correlation

Performance on plant-specific functions For ‘conserved’ functions -Performance is better than that for all functions -Using all annotations is better For ‘plant-specific’ functions -Performance is much worse compared to ‘conserved’ functions -Using only experimental annotations is better  The underlying network is built based on data from multiple non-plant species 3 rd quartile 1 st quartile Median Using only annotations based on experimental/expert evidence

Most predictable ‘conserved’ functions  protein folding  nucleotide transport  innate immunity  cytoskeleton organization, and  cell cycle

Least predictable ‘conserved’ functions  regulation of … Specialized functions

Most predictable ‘plant-specific’ functions  cell wall modification  auxin/cytokinin signaling, and  photosynthesis Contribution from Arabidopsis datasets

Least predictable ‘plant-specific’ functions  development, morphogenesis  pattern formation  phase transitions of various tissues, organs / growth stages

Conclusions  Evaluated the performance of various prediction algorithms on AraNet.  SinkSource is the overall best prediction algorithm.  Measured the influence of the number of genes annotated to a function and the source of annotation evidence.  All algorithms perform poorly when only a small number of genes are ‘known’ or when annotating very specific functions.  When only a small number of genes are ‘known’, use only experimentally verified annotations to make new predictions.  When a considerable number of genes are ‘known’, use all annotations to make new predictions.

Conclusions  Measured the correlation of performance with network properties  Several topological properties correlate well with performance.  ‘Average segregation’ has the strongest correlation.

Conclusions  Assessed performance on conserved/plant-specific functions  Performance on basic ‘conserved’ functions is better than that for all the functions.  Specialized ‘conserved’ functions are hard to predict.  Performance on ‘plant-specific’ functions is very poor.  Also a consequence of the fact that ‘plant-specific’ functions generally have small number of annotations.

Conclusions  Avenues for improvement in functional interaction networks  Build functional interaction networks that are based on a larger collection of plant datasets.  If possible, rely as little as possible on data from other species.  Avenues for future experimental work  ‘Plant-specific’ functions and  Specialized ‘conserved’ functions.

Acknowledgements  Arjun Krishnan  Brett Tyler  Andy Pereira