Presentation is loading. Please wait.

Presentation is loading. Please wait.

DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.

Similar presentations


Presentation on theme: "DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions."— Presentation transcript:

1

2 DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions

3 How do the genes and their products interact to collectively perform a function? A B Gene G 35 RPM Inhibitor U2AF Gene G Molecular interaction networks

4 A network containing genes connected to each other whenever they physically or functionally interact  Proteins that interact/co-complex (ribosomal, polymerase, etc.)  Transcription factors and their target  Enzymes catalyzing different steps in the same metabolic pathway  Genes with correlation in expression  Genes with similar phylogenetic profiles Functional ^

5 Arabidopsis is the primary model organism for plants  Complex organization from molecular to whole organism level.  A key challenge …  Understanding the cellular machinery that sustains this complexity.  In the current post-genomic times, a main aspect of this challenge is ‘ gene function prediction ’:  Identification of functions of all the (~30, 000) genes in the genome.

6 Total of ~30,000 genes in the genome Extent of gene annotations in Arabidopsis ~15% with some experimental annotation ~8% with ‘expert’ annotation ~13% with annotations based on manually curated computational analysis ~14% with electronic annotations Leaving ~50% of the genome without any annotation Ashburner et al, (2000) Nat. Gen. Swarbreck et al (2008) Nuc. Acids. Res.

7 Exploit high-throughput data  Integrating functional genomic data could lead to  Network models of gene interactions that resemble the underlying cellular map.  Typically these networks contain gene functional interactions  Connecting pairs of genes that participate in the same biological processes.  In such a network, the very place of a gene establishes the functional context that gene.  ‘Guilt-by-association’ – genes of unknown functions can also be imputed with the function of their annotated neighbors.

8 Functional interaction networks  Functional interaction network models have been developed for Arabidopsis.  Lee et al. (2010) Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana.  Very comprehensive in terms of using and integrating datasets in other organisms for application in plants.  Integrated 24 datasets: 5 datasets from Arabidopsis and the rest from other models.  AraNet: 19,647 genes, 1,062,222 interactions.

9 Goal of this study …  We examine the state of network-based gene function prediction in Arabidopsis.  Evaluate the performance of multiple prediction algorithms on AraNet.  Assesses the influence of the number of genes annotated to a function and the source of annotation evidence.  Compute the correlation of prediction performance with network properties.  Evaluate prediction performance for plant-specific functions.

10 Network-based gene function prediction algorithms Propagation of functional annotations across the network Guilt-by-association using direct interactions Use positive and negative examples Use only positive examples SinkSource Hopfield FunctionalFlow – multiple phases Local FunctionalFlow – 1 phase Local+ Each gene in the network

11 Network-based gene function prediction

12  Function A  Function B Network-based gene function prediction

13 Sink Source In this study … Recall : fraction of known examples predicted correctly TP (TP + FN) Precision : fraction of predictions that are correct TP (TP + FP)

14 Performance of different algorithms  Computational gene function prediction precedes and guides experimental validation  What we get is a ranked list of novel predictions  An experimenter would choose a manageable number of top-scoring predictions to pursue  Precision at the top of the prediction list  We choose precision at 20% recall ( P20R ) as the measure of performance

15 Performance of different algorithms SS seems to be better than the other algorithms What about the influence of the number of genes in a function? 3 rd quartile 1 st quartile Median Using only annotations based on experimental/expert evidence

16 Performance of different algorithms Third groupFirst groupSecond group Number of genes annotated with a function Number of functions Each group containing ~125 functions

17 Performance of different algorithms For ‘small’ functions, the algorithm does not matter! And, using just experimental annotations is better when you know little about a function. For ‘medium’ functions, SS is a little better and use of ‘electronic’ evidences is mixed. For ‘large’ functions -SS is clearly the best - Using all annotation is better

18 Performance of different algorithms All ECsSans IEA/ISS Wilcoxon test: SS vs. other algorithms Overall, SinkSource appears to be best algorithm.

19 Correlation of performance with network properties  Performance on a particular function might depend on how its genes are organized / connected among themselves in the network.  Number of nodes  Number of components  Fraction of nodes in the largest connected component  Total edge weight  Weighted density  Average weighted degree  Average segregation

20 Correlation of performance with network properties

21

22  Number of nodes = 9  Number of components = 3  Fraction of nodes in the largest connected component = 4/9  Total edge weight = 8  Weighted density = 8/36  Average weighted degree = 16/9

23 Correlation of performance with network properties Functional modularity: Average Segregation

24 Correlation of performance with network properties  Avg. seg = 8/22  Avg. seg = 12/15 Functional modularity: Average Segregation

25  We have …  Vector of SS P20R values for each function  Vector of values of a particular topological property for each function  Spearman rank correlation Correlation of performance with network properties Weighted density P20R

26 Correlation of performance with network properties Spearman rank correlation

27 Performance on plant-specific functions For ‘conserved’ functions -Performance is better than that for all functions -Using all annotations is better For ‘plant-specific’ functions -Performance is much worse compared to ‘conserved’ functions -Using only experimental annotations is better  The underlying network is built based on data from multiple non-plant species 3 rd quartile 1 st quartile Median Using only annotations based on experimental/expert evidence

28 Most predictable ‘conserved’ functions  protein folding  nucleotide transport  innate immunity  cytoskeleton organization, and  cell cycle

29 Least predictable ‘conserved’ functions  regulation of … Specialized functions

30 Most predictable ‘plant-specific’ functions  cell wall modification  auxin/cytokinin signaling, and  photosynthesis Contribution from Arabidopsis datasets

31 Least predictable ‘plant-specific’ functions  development, morphogenesis  pattern formation  phase transitions of various tissues, organs / growth stages

32 Conclusions  Evaluated the performance of various prediction algorithms on AraNet.  SinkSource is the overall best prediction algorithm.  Measured the influence of the number of genes annotated to a function and the source of annotation evidence.  All algorithms perform poorly when only a small number of genes are ‘known’ or when annotating very specific functions.  When only a small number of genes are ‘known’, use only experimentally verified annotations to make new predictions.  When a considerable number of genes are ‘known’, use all annotations to make new predictions.

33 Conclusions  Measured the correlation of performance with network properties  Several topological properties correlate well with performance.  ‘Average segregation’ has the strongest correlation.

34 Conclusions  Assessed performance on conserved/plant-specific functions  Performance on basic ‘conserved’ functions is better than that for all the functions.  Specialized ‘conserved’ functions are hard to predict.  Performance on ‘plant-specific’ functions is very poor.  Also a consequence of the fact that ‘plant-specific’ functions generally have small number of annotations.

35 Conclusions  Avenues for improvement in functional interaction networks  Build functional interaction networks that are based on a larger collection of plant datasets.  If possible, rely as little as possible on data from other species.  Avenues for future experimental work  ‘Plant-specific’ functions and  Specialized ‘conserved’ functions.

36 Acknowledgements  Arjun Krishnan  Brett Tyler  Andy Pereira


Download ppt "DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions."

Similar presentations


Ads by Google