Presentation is loading. Please wait.

Presentation is loading. Please wait.

Functional prediction methods. The usual troubles of the molecular and cellular biology labs What are the functions of a previously non characterized.

Similar presentations


Presentation on theme: "Functional prediction methods. The usual troubles of the molecular and cellular biology labs What are the functions of a previously non characterized."— Presentation transcript:

1 Functional prediction methods

2 The usual troubles of the molecular and cellular biology labs What are the functions of a previously non characterized gene ? Are there new functions for a previously characterized gene ? To what cellular structures is associated my preferred protein? What are its molecular partners ?

3 The answers of “wet” technologies Expression studies Genetic manipulation of expression levels and structure (knockout, overexpression of wild type and mutant isoforms) Genetic screens Subcellular localization Biochemical characterization of molecular complexes Two hybrid system

4 How Bioinformatics can help solving these problems ? Homology searches Rosetta stone approach Detection of synteny conservation Phyologenetic footprinting Analysis of massive gene expression and protein interaction data

5 Homolgy searches: finding hortologs and paralogs of your gene in other species 62% sequence identity

6 Homolgy searches: finding hortologs and paralogs of your gene in other species Common ancestor A BC Species 2 B C Orthologs Species 1 Sequence homology Functional conservtion

7 Homolgy searches: finding hortologs and paralogs of your gene in other species Species 1 A A A’ Paralogs Species 1 AA’ Gene duplication Sequence homology

8 “Rosetta stone” approach Species 2Species 1 Gene A Gene B Gene C

9 Conservation of synteny

10 Phylogenetic footprinting

11 Analysis of massive gene expression and protein interaction datasets

12 Analysis of massive gene expression and protein interaction datasets

13 From gene-by-gene to modular biology The amount of primary data is not anymore limiting for obtaining biological knowledge. Today the bottlenecks are the capability to integrate the primary data into functional models, to make predictions and to test them in the lab.

14 What people do with microarray data ? Use them to answer the specific questions of your paper Put them in a database, since journals ask for that…. Stanford microarray database (not the only one) 3573 experiments for humans 198 experiments for mouse 361 for C. Elegans 170 for Drosophila 806 for yeast

15 Is it possible to use this enormous amount of data to extract useful functional information? Genes that are involved in common biological processes and/or physically interact in protein-protein complexes display very frequently similar expression patterns So, if two genes display similar expression patterns under a very high number of conditions they are likely related Systematic studies have shown that the correlation is quite good; however it is also clear that if two genes are co- expressed in one species, it does not mean necessarily that they are functionally related. If one should use this criterion to predict a link between two genes, a very high number of false positives must be expected.

16 Pearson's Correlation Coefficient Definition: Measures the strength of the linear relationship between two variables. Characteristics: Pearson's Correlation Coefficient is usually signified by r (rho), and can take on the values from -1.0 to 1.0. Where -1.0 is a perfect negative (inverse) correlation, 0.0 is no correlation, and 1.0 is a perfect positive correlation.

17 EXPRESSION DATA PLOT r = 0.0 Gene 2 Gene 1

18 EXPRESSION DATA PLOT r = 0.9 Gene 2 Gene 1

19 EXPRESSION DATA PLOT r = - 0.8 Gene 2 Gene 1

20

21 Regulatory information can be easily lost by random mutation…. TATAAA Coding sequence TATGCATAGATGCCTC TBP TF-1 TF-2 TATAAA Coding sequence TATTCATAGATGCCTC TBP TF-1TF-2

22 …or gained with the same mechanism TATAAA Coding sequence TATGCATAGATGCCTC TBP TF-1 TF-2 TATAAA Coding sequence TATGCATAGAGGCCTC TBP TF-1 TF-2

23 The sloppiness of transcriptional regulation Strong transcriptional element Critical gene The gray genes are probably affected by the strong element, and they are consequently coregulated with the critical gene; however, this coregulation has no functional meaning (Spellman & Rubin, 2002, Journal of Biology, 1:5)

24 A powerful help: phylogenetic conservation Since gene regulatory regions evolve at higher speed than coding regions, if the co-expression of two genes is evolutionarily conserved, it is much more likely that the genes are functionally related. Obviously, the confidence level increases with the phylogentic distance among species.

25 Stuart et al. (2003). Science, 302, 249-255 A gene co-expression network constructed with expression data from distant species (human, c. elegans, drosophila, yeast)

26 Stuart et al. (2003). Science, 302, 249-255 A gene co-expression network constructed with expression data from distant species (human, c. elegans, drosophila, yeast)

27 If you are not studying core biological processes, it is very unlikely to obtain useful information on you genes of interest, given the very stringent criteriaof this study. Impossible to find information about mammalian-specific genes. We think so! Our strategy Is it possible a compromise between the low sensitivity of this approach and the low specificity of the single organism strategy?

28 A new, EST-centric strategy for expression profiling-based annotation of orthologous transcriptomes M. Pellegrino 1, P.Provero 1, L.Silengo 1, F. Di Cunto 1 * 1 University of Torino, Dept. of Genetics, Biology and Biochemistry,Italy. ferdinando.dicunto@unito.it

29 1. Concentrate on pairwise species comparison. In particular we focused on human-mouse comparison The INPARANOID approach for orthologous gene identification Protein family human Protein family mouse A B C D E F G I SEARCH Protein family human Protein family mouse A B C D E F G II SEARCH Features of CLOE

30 2. Focusing on single ESTs probes contained in cDNA microarray databases, no probe average AAAAAAA ABCEF D mRNA Probe 1 Probe 2 Probe 3 Coherent signals AAAAAAA BCEF D Transcript 2 Probe 1 Probe 2 Probe 3 Possibly discordant signals AAAAAAA ABCE D Transcript 1 Features of CLOE

31 The procedure: choosing the rigth ESTs The choice is left to the end user, but we developed a simple tool to help in the decision process. It offers the following information: 1) a list of the ESTs in the database belonging to UniGene cluster of interest ; 2) a list of the ESTs of the orthologous UniGene clusters found in the database of the second species; 3) the number of experimental points for each of the above ESTs; 4) the number of points in common for every EST pair in the single organism dataset; 5) the Pearson correlation coefficient among expression profiles all ESTs pairs belonging to the same UniGene cluster.

32 The procedure Gene A Gene A’ Human database Mouse database HS01 HS10 HS05 HS22 HS02 HS65 HS34 HS25 HS11 HS20 HS15 HS32 HS55 HS44 HS35 MM01 MM85 MM25 MM10 MM02 MM34 MM96 MM20 MM32 MM28 MM20 MM98 MM44 MM12 MM05 MM11 +1.00 +0.35 +0.45 +0.20 +0.85 +0.64 +0.77 +0.08 +0.00 - 0.89 - 0.68 - 0.90 - 0.20 - 0.55 - 0.35 - 1.00 +1.00 +0.32 +0.50 +0.20 +0.98 +0.70 +0.85 -0.68 +0.00 - 0.82 - 0.68 - 0.90 - 0.05 - 0.55 - 0.30 - 1.00

33 HS01+1.00 HS10+0.35 HS05+0.45 HS22+0.20 HS02+0.85 HS65+0.64 HS34+0.77 HS25+0.08 HS11+0.00 HS20- 0.89 HS15- 0.68 HS32- 0.90 HS32- 0.20 HS55- 0.55 HS44- 0.35 HS35- 1.00 MM01+1.00 MM85+0.32 MM25+0.50 MM10+0.20 MM02+0.98 MM34+0.70 MM96+0.85 MM20+1.00 MM32+0.00 MM28- 0.82 MM20- 0.68 MM98- 0.90 MM44- 0.05 MM12- 0.55 MM05- 0.30 MM11- 1.00 The procedure Gene A Gene A’ Human database Mouse database

34 What cutoff is more reasonable? p = 1.6·10 -94 p = 1.3·10 -10

35 Does CLOE work? CentrosomeTNF/NFkBPSD Single organism Multiple organisms Human/mouse CLOE Centrosome1.44.26.2 PSD0.95.56.5 TNF  /NF-kB 1.66.16.8 Average1.35.76.6 Percent of correctly predicted protein-protein interactions

36 Does CLOE work? Percent of compatible functional predictions Single organismMultiple organisms Human/Mouse CLOE Centrosome19.53626.3 PSD33.847.841.3 TNFa/NF-kB47.247.444.8 Average33.543.737.4 Average number of candidate partners Single organism = ~ 300 Multiple organisms = 8 CLOE = 17

37 WHAT ARE THE POTENTIAL APPLICATIONS OF CLOE? 1. Finding new potential functional partners for the gene/s of interest. 2. Making testable predictions about the function/s of non annotated genes. 3. Finding new potential functional roles for annotated genes/proteins

38 AN EXEMPLE OF OUTPUT: Putative partners for FAD104

39 AN EXEMPLE OF OUTPUT: Putative annotations for FAD104 KeywordOrganizing principlep-value Endoplasmic reticulumCellular Component9.3·10 -3 Protein bindingMolecular Function6.5·10 -3 Peptidyl-prolyl cis-trans isomeraseMolecular Function6.5·10 -3 Structural constituent of muscleMolecular Function3.4·10 -3 Collagen bindingMolecular Function3.1·10 -3 Structural moleculeMolecular Function1.7·10 -3 Tropomyosin bindingMolecular Function8.9·10 -4 Basement membraneCellular Component5.7·10 -4 CytoskeletonCellular Component5.6·10 -4 Cell adhesionBiological Process6.4·10 -5 Actin bindingMolecular Function4.6·10 -8

40 The results strongly suggest that this protein could be involved in some aspects of the functional interaction between the cytoskeleton and the extracellular matrix.

41 Conclusion CLOE represents a simple and effective data mining approach that can be easily used for meta-analysis of cDNA microarray experiments characterized by very heterogeneous coverage. Importantly, it produces, for the genes of interest, a reasonable number (in the range of standard experimental validation techniques) of high confidence putative partners.


Download ppt "Functional prediction methods. The usual troubles of the molecular and cellular biology labs What are the functions of a previously non characterized."

Similar presentations


Ads by Google