Presentation is loading. Please wait.

Presentation is loading. Please wait.

Anis Karimpour-Fard ‡, Ryan T. Gill †,

Similar presentations


Presentation on theme: "Anis Karimpour-Fard ‡, Ryan T. Gill †,"— Presentation transcript:

1 anis.karimpour-fard@uchsc.edu http://www.colorado.edu/che/research/faculty/gill/ http://compbio.uchsc.edu/Hunter Anis Karimpour-Fard ‡, Ryan T. Gill †, and Lawrence Hunter ‡ ‡ University of Colorado School of Medicine † Department of Chemical and Biological Engineering, University of Colorado, Boulder Investigation of factors affecting prediction of protein-protein interaction networks by phylogenetic profiling Dec 1, 2007

2 The meaning of protein function Eisenberg, D. et. al. Nature 2000 SP A Biochemical view The function of protein A is its action on Substrate to form a Product The function of A is the context of its interactions with other proteins in the cell Post genomic view A B Y Z MD N X C The problem …… More than 500 Microbial genomes are fully sequence and there is high percent of genes with unknown function. For example: E. coli K12 15% P. aeruginosa 45% http://www.genomesonline.org/

3 Homology based methods (gives partial understanding about protein role) –Simple sequence similarity searches (BLAST) –Profile searches (PSI-BLAST) –Databases of conserved domains (Pfam, SMART) Prediction from genomic context Phylogenetic profile Gene cluster Gene neighbor Rosetta Stone Prediction from high-throughput experimental data –Microarray gene expression data –Protein-protein interaction screens –... Prediction protein function

4 Phylogenetic Profile Pellegrini et al. PNAS 96, 4285 (1999) Marcotte et al. PNAS 97, 12115 (2000) 1- Select sets of genomes as a reference set 2- Create phylogenetic profile matrix for target organism: Do one-against-all BLAST search to identify all homologous target genes in diverse reference organisms. Does the selection of the reference genomes influence the prediction? if so? How? How E-value threshold effects the protein-protein interactions prediction? Reference selection? Blast E-value threshold (present or absent) Measure profile similarities Reference selection

5 Protein X: 110001111001001110001111 Protein Y: 111000111100000110001111 19 matching bits out of 24 3- Measure profile similarities 4- Generate protein-protein interactions Generate Protein-protein interactions network 5- Create clusters from set of protein-protein interactions Protein X Protein Y 2 nodes are connected if the 2 proteins have similar profile) 6- Visualize network

6 Protein X Protein Y Measure profile similarities Protein X: 110001111001001110001111 Protein Y: 111000111100000110001111 Mutual information MI(X, Y) = H(X) + H(Y) - H(X, Y) H(Y) = -∑p(i) ln p(i) p(i), (i= 0, 1) as the fraction of genomes in which protein Y in the state i 2 nodes are connected if the 2 proteins have similar profile) Pearson correlation coefficient Inverse homology Calculate the homology between two genomes: The ratio of number of homologs of each reference organism j to the number of proteins in the target genome i ( H i,j ). P ij =1/( H i,j ) otherwise P ij =0. Karimpour-Fard et al. BMC Genomics. 2007;8(1):393

7 c) Comparison of different combinations of reference genomes and E-value thresholds using COG PPV =TP/(TP+FP) –TP = # predicted pair in the same functional category –FP= # predicted pair that were classified but were not same functional category Random sets All Low GC Aerobic Karimpour-Fard et al. BMC Genomics. 2007;8(1):393

8 Co-evolution can be used to assign function to unstudied genes Hypothetical proteins YcgB, YeaH, YeaG are co-conserved across different species. Comparison of sub-graphs across species (CS-CCC) suggested that a previously unstudied S. typhimurium gene, ycgB, is functionally related to yeaH. Experimental data support the hypothesis that both genes are important for antimicrobial peptide resistance. Edge color code: E. coli K12 (green) E. coli O157 (blue) Shigella flexneri (black) S. typhimurium LT2 (purple) P. aeruginosa (mustard) Karimpour-Fard et al. Genome Biology 2007 8:R185


Download ppt "Anis Karimpour-Fard ‡, Ryan T. Gill †,"

Similar presentations


Ads by Google