Using structure in protein function annotation: predicting protein interactions Donald Petrey, Cliff Qiangfeng Zhang, Raquel Norel, Barry Honig Howard Hughes Medical Institute Department of Biochemistry and Molecular Biophysics Center for Computational Biology and Bioinformatics Columbia University
Fold Superfamily Family Classification ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Discrete islands
Thioredoxin Q8L5D4 Glutaredoxin-4 protein disulfide oxidoreductase L-VVVDFS-A-----TWCGPCKMI-KPFFH-SLSEK KSSLVVLY-A-----PWCSFSQAM-DESYN-DVAEK P--ILLYM-KGSPKLPSCGFSAQA-VQALA-AC--- Iron-sulfur cluster assembly
P22 Cro repressorλ Cro repressor 25% Afe1 42% Xfaso 1 39%44% 42% Pfl6
Continuous space
Putative active site (SCREEN)
Formyl-CoA transferase from O. formigenes NESG Target TM1055 from T. maritima Coenzyme-A
CoA from Formyl-CoA transferase SAH from DNA methyltransferase Tyrosine from tyrosyl tRNA synthetase Thiamin diphosphate from DXP synthetase TM1055
Structural neighbors of TM proteins 70 SCOP folds 3 CATH architectures 10 CATH topologies 48 CATH homologous superfamilies ~ 500 distinct ligands
“jelly roll” “β-propeller”“β-prism”
virus cell bacteriumcell “jelly roll”“β-propeller” phagosomelyzosome “β-prism”
Experimental interactions (from BIND+Cellzome) Modeled interactions Davis FP, Braberg H, et. al. (2006). Nucleic Acids Research 34(10): ,42412,
target sequences ? sequence similaritystructural similarity template complex Modeled complex
Structures from the same SCOP family (non-redundant): 8 (SCOP domain d ) Structures from the same SCOP superfamily (non-redundant) : 23 (SCOP domain d.17.4) SCOP fold (non-redundant): 44 (SCOP domain d.17) Structural neighbors by structure alignment: 420 (PSD < 0.8, the SCOP domain id of the green structure here is d )
Structure model the overlap of modeled interface with predicted (shown in red) goodbad
B. subtilis lethal factor
Pelle B. Subtilis lethal factor
Gene co-expression profiles RGS4blockRASD1 CKS1AinteractSKP2 CD4bindTFAP2A GPNMBcontainPPFIBP1 TACR1requirePARP1 GeneWays (literature) Structures Figure 8. Use Bayesian method to integrate PPI evidence from various sources. The likelihood ratio of an interaction between two proteins (x and y),, is inferred from different evidences (c i ). Here and represent the probability that a “clue”, c i, is observed for proteins x and y that are known to interact or not (represented as and ).
Thioredoxin Q8L5D4 Glutaredoxin-4 protein disulfide oxidoreductase L-VVVDFS-A-----TWCGPCKMI-KPFFH-SLSEK KSSLVVLY-A-----PWCSFSQAM-DESYN-DVAEK P--ILLYM-KGSPKLPSCGFSAQA-VQALA-AC--- Iron-sulfur cluster assembly
Conclusions Structural information needs to be leveraged Interactively combining overall function annotation with analysis that depends on local bioinformatic/biophysical features. Infrastructure applies equally to analyzing subtle differences within families.
Acknowledgements NIH grant U54-GM Honig Lab Markus Fischer Cliff Zhang Kely Norel