Presentation is loading. Please wait.

Presentation is loading. Please wait.

Joshua M. Stuart, Eran Segal, Daphne Koller, Stuart K. Kim

Similar presentations


Presentation on theme: "Joshua M. Stuart, Eran Segal, Daphne Koller, Stuart K. Kim"— Presentation transcript:

1 A Gene Coexpression Network for Global Discovery of Conserved Genetic Modules
Joshua M. Stuart, Eran Segal, Daphne Koller, Stuart K. Kim Presented by Carri-Lyn Mead

2 Investigating Gene Function
Genome sequences for Human, Fly, Worm, Yeast DNA Microarrays Coregulated genes Functionally related genes Correlated expression patterns Cross species comparison of gene expression The fundamental purpose of this paper is to look at gene function from the perspective of evolutionary conservation of coregulated genes. We are currently at a unique time in history in that full genome sequences for various diverse organisms are being completed. This allows large scale analysis to be performed. This paper takes advantage of the newly available information in an attempt to discover gene function. Due to the recent emergence of DNA microarray technology, large sets of DNA microarray data is now available for human, flies, worms, and yeast. We can use microarray data to look at coregulated genes. Coregulated genes often participate in the same pathway, and therefore we can infer that functionally related genes will tend to exhibit expression patterns that are correlated under a large number of diverse conditions. HOWEVER coregulated does not necessarily mean that genes are functionally related (cis-regulateory DNA motifs) The researchers took advantage of the ability to use data from several species to determine evolutionary conservation, which allowed them identify of genes that are functionally important from a set of coregulated genes. Coregulation of a pair of genes over large evolutionary distances implies that coregulation of those genes confers a selective advantage.

3 To measure evolutionarily conserved coexpression on a genome-wide scale, create a gene coexpression network

4 Step 1: Find Meta-genes 6307 Total Meta-genes 6591 human 5180 worm
5802 fly 2434 yeast Step 1: Associate genes from one organism with orthologs in other organisms. Identify orthologs by all against all BLAST between every pair of protein sequences from each organism. Identify the metagene sets. Metagenes are genes across multiple organisms whose protein sequences are each other’s best reciprocal BLAST hit. Therefore each gene is at MOST assigned to 1 metagene. ****************************************************************************** Over 78% of metagenes are transitive: Transitivity indicates that if human gene A is a reciprocal best blast hit of worm gene B and fly gene C, then worm gene B and fly gene C also need to be reciprocal best blast hits in order for genes A, B and C to be grouped as orthologs. It could be that some of the links do not identify true orthologs but rather close homologs; this might be the case for the 22% of meta-genes that do not exhibit transitivity. In order to determine whether using a close homolog rather than an ortholog would significantly affect the network relationships, we calculated the Pearson correlation between close homologs and found them to be significantly high. This result indicates that the meta-gene network would yield similar results using close homologs rather than true orthologs.

5 Step 2: Identify Meta-genes with correlated coexpression
3182 DNA Microarrays 1202 human 979 worm 155 fly 643 yeast The DNA microarrays contain expression profiles showing how gene expression is perturbed by: developmental stage different growth conditions stress disease specific mutations Correlation of expression profiles for a set of genes across different experimental conditions suggest a functional relationship.

6 Step 2: Identify Meta-genes with correlated coexpression
Pearson correlation of gene pairs Rank genes by Pearson correlation Generate P –value of rank configuration P < 0.05 cutoff indicates coexpression Link coexpressed meta-genes Correlation of expression profiles for a set of genes across different experimental conditions suggests that the set of genes are functionally related. Computed the Pearson correlation of expression profiles between every pair of genes in the microarray data sets for each organism Ranked all genes according to their Pearson correlations. Used probabilistic method based on order statistics to generate a P-value for the probability of observing a particular configuration of ranks across the different organisms by chance. Used P < 0.05 as a cutoff to indicate that two meta-genes are coexpressed metagenes Combined all links beween pairs of coexpressed meta-genes to construct the network. ******************************************************************************* Pearson correlation measures the relative shape of the gene regulations rather than the absolute levels. This is a natural choice because it is widely used to measure gene correlations. Let gms be a gene belonging to meta-gene m in species s. We ranked all of the other genes relative to gms based on their Pearson correlation and then divided the rank by the total number of genes with meta-genes (and with data) in organism s, yielding n rank ratios for the (m,m’) pair, r1, r2, …, rn. To find out how significant the gene correlations of the pair are, we computed the probability of getting the observed rank ratios by chance where the order of the species did not matter. To correct for the multiple tests performed, we used an adjusted P-value cutoff. Specifically, for a significance level of α=0.05, we included any meta-gene interactions with P-values less than α/N where N was the total number of meta-genes containing data in at least two organisms. In our case, N=4725, giving a P-value cutoff of 1.05x Using this cutoff, we expected Nα = 236 false positives.

7 Gene Coexpression Network
Result: Network of 3416 metagenes Connected by 22,163 expression interactions

8 3-D Terrain Map Used to visualize the interconnectivity of the network in order to gain insights into the evolutionarily conserved patterns of expression and coregulation architecture common to all four organisms. Used 3D layout program called VxInsight. Metagenes are placed near each other in the X-Y plane according to the negative log of their P-value Density of genes in a region is shown by the altitude in the Z direction. Highly-interconnected areas of the network are peaks in the map Each link in the terrain map suggests a potential interaction between two genes that has been conserved across evolution, and is therefore likely to be functionally related. Used K-means clustering on the XY coordinates to define 12 regions of the terrain map containing a large number of highly-interconnected metagenes that they refer to as components.

9 Component 5 Strongly enriched for meta-genes involved in cell cycle processes Contains 241 meta-genes 110 previously known to be involved in cell cycle 131 not previously known to be involved in cell cycle Linking the 131 remaining genes to known cell cycle meta-genes in the network suggests new cell cycle functions for these genes.

10 Testing Significance of Results
Rule out random pairs of meta-genes having significant coexpression interactions Ensure broad and diverse microarray data Test network stability with added noise It is possible that the set of meta-genes exhibited only a few simple types of expression patterns, so that even random pairs of meta-genes might appear to have significant coexpression interactions. To rule this out, generated a set of permuted meta-genes consisting of a random collection of genes from each organism and compared the number of interactions in the random network to the real network for a wide range of P-values. Repeated x5 with different random permutations of orthologs Always found significantly more links in the real network than the random network. At P < 0.05, the real networks contained / times more interactions than the random networks. 2. A significant fraction of the gene expression links should be present in networks built using only a random half of the data. Randomly split the data in each organism’s dataset in 2 halves and built 2 coexpression networks, each using only half the data. Then counted the fraction of interactions that were significant in one network (P < 0.05), given that they were significant in the other network at P < p for various p. Repeat x5 At P =0.05, found that 41% of significant expression interactions were in both networks.

11 Verify Results Experimentally validate predicted gene functions
Select 5 meta-genes MEG1503 (snRNP protein involved in splicing) MEG342 (nucleoporin-interaction component) MEG4513 (novel protein, unknown function) MEG1192 (novel protein, unknown function) MEG1146 (novel protein, unknown function) If a gene is linked in the network to many genes that participate in the same biological process, it is reasonable to hypothesize that it too participates in that process. Selected 5 meta-genes that showed conserved coexpression with genes known to be involved in cell proliferation / cell cycle, but were not previously known to be involved in these processes. All five show a significant number of links in the coexpression networks to known cell proliferation genes.

12 Test gene expression levels:
In dividing pancreatic cancer cells In non-dividing normal cells All 5 genes are over expressed in human pancreatic cancers relative to normal tissue, to the same extent as the genes known to be involved in cell proliferation.

13 Second test: Test the loss-of-function mutant phenotype for one of the metagenes, MEG1503, which includes the C. elegans gene ZK652.1 Induced a loss-of-function mutant phenotype for ZK652.1 by feeding worms double stranded ZK642.1 RNA. Found that RNA interference of gene resulted in excess nuclei in the germ line, suggesting that the wild-type function of this gene is to suppress germ line proliferation. ********** Shown are wild-type gonads and gonads from worms that were fed bacteria producing ZK652.1 dsRNA for two days. Gonads were stained with DAPI to show DNA in nuclei. ZK652.1(RNAi) gonads have more nuclei than wild-type and lack oocytes (ooc.). Oocyte: A cell from which an egg or ovum develops by meiosis; a female gametocyte.

14 Further Analyses Single Species Networks vs Multi-species Network
Accuracy and % Coverage are plotted for different models for differing P-value cutoffs. % Coverage (x-axis) {number of related genes} percentage of metagenes connected to at least one other metagene in that category, is plotted against % Accuracy (y-axis), which is the percent links connecting two members of the category.

15 Further Analyses Accuracy related to more data in Multi-species networks? Repeated experiment with only 979 sets of microarray data (same number as in worm), and came up with virtually identical results.

16 Conservation of Genetic Modules
Wanted to look at conservation of genetic modules. Split set of meta-genes into set of 2969 that contain a yeast ortholog and a set of 3338 that were animal-specific (worms, flies, or humans, but not yeast) Determine degree to which the gene expression links have been conserved for each meta-gene by defining a set-theoretic quantity called the expression conservation index (ECI) where larger values indicate stronger conservation. Columns in each rectangle are fly, human, worm, yeast, whole meta-gene Each box in the heat-map represents the percent of links connected to a meta-gene in one organism that are also present in the multiple species co-expression network. Gray indicates where a meta-gene lacked an ortholog in a particular organism. Components 1, 7, 11 (signalling, ?, neuronal) are most enriched for animal-specific meta-genes and also showed the lowest degree of evolutionary conservation of their gene expression links. Component 9 (ribosomal function) is the least enriched for animal-specific metagenes and shows the highest degree of evolutionary conservation.

17 Conclusions Gene coexpression networks can be used as a powerful tool for generating hypotheses about genes whose functions are unknown. Gene coexpression networks can be used to describe the evolution of genetic interactions. Multi-species networks perform better than single species networks overall. 3943 connections to other meta-genes in network (many with known functions), potentially allowing these 570 novel meta-genes to be characterized.

18 Discussion Topics What other model organisms would be useful to expand the multi-species network? Would the multi-species network be as useful for species that are more closely related? Gene orthology is based on protein sequence similarity. Does sequence conservation equate to conserved function? Are 12 clusters of meta-genes sufficient to hypothesize function for 3416 metagenes? How can gene function for genes without known orthologs be investigated?


Download ppt "Joshua M. Stuart, Eran Segal, Daphne Koller, Stuart K. Kim"

Similar presentations


Ads by Google