Presentation on theme: "Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute."— Presentation transcript:
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah email@example.com Division of Mathematical Biology National Institute for Medical Research Mill Hill, London Sixth International Conference on Bioinformatics InCoB2007 HKUST, Hong Kong 27 th – 30 th August 2007
Functional site prediction - applications: To predict function of the protein (Pazos & sternberg, 2004; PNAS 101:14754-9) In protein – protein docking: To select the near- native docked solution. (Chelliah et al., 2006; JMB 357:1669-82). In sequence-structure homology recognition and to improve alignment accuracy (chelliah et al., 2005; Proteins 61:722-31)
Gene sequence Protein sequence Protein structure Xray/NMR Predict structure: De-novo/ab-initio select correct models Protein structure Functional site prediction
Overview De-novo protein structure prediction method (decoy generation) Functional site prediction method Evaluating models Conclusions
Top 1/3 C models Threading Fold Generation and scoring Top 100+N Refinement Top 100+N IDEAL FORMS SEQUENCE ALIGNMENT Predicted sec. structure Predicted Res. burial STRUCTURE PATTERNS Secondary structure ‘stick’ level Residue level Main-chain level Top 200 models De-novo protein structure prediction method * Taylor (2002). Nature. 416:657-660
Biochemically important residues are typically found in close proximity and are also highly conserved. Functional site prediction is done using CRESCENDO * (gives scores for each residue position). * Chelliah, V., L. Chen, et al. (2004). J Mol Biol 342(5): 1487-504. Functional site prediction method
Observed substitution pattern for each amino acid (p) at t th position (sp1+sp2+sp3+sp+…+sp N)/N = Expected substitution pattern for each amino acid (q) at t th position Environment specific substitution table sp1 sp2 sp3 sp4 sp- spN Divergent score between the observed (p) and expected (q) substitution table Multiple sequence alignment of the homologous sequences: structure based sequence alignment Alignment position 1 2 3 4 5 6……………….. CRESCENDO: Functional site prediction method * * Overington et al., (1992). Protein Science 1:216-26
Assumptions Correct or near-native like models will have the critical residues important for binding (identified by CRESCENDO) to be in close proximity to each other. i.e. Functional residues in the correct models form clusters Functional residues in the incorrect models might be scattered. Can correct and incorrect models be distinguished by looking at how the functional residues are packed in the models?
F1F2F3F4Fn 200 decoy models Classify based on fold types ---- SAP * Cluster: rmsd- ≤2 Å & PID ≥60% cut-off Average C coordinate of models of each cluster is used to find the pair-wise distance between residues. * Taylor (1999). Prot. Sci. 8:654-665. Clustering of models
Model score Pair-wise distance and product of CRESCENDO scores between each pair of residues (that are at least 8 residues apart in the linear sequence) are calculated. The number (in %) of pair of residues that are within the spatial distance of 12 Å, in the top 40 pairs (based on product of CRESCENDO scores) was calculated. The percentage scores were added in each step (in steps of 5 pairs) to get the final score of the models.
2trxA- 34 clusters (with ≤ 2Å rmsd and ≥ 60% PID) were obtained from 81 correct models Good and poor models of same fold type Why clustering between models of same type needed? Why clustering between models of same type needed? Function site prediction differs between models of same type due to a) difference in loop conformation, b) beta strand or helix shift even by a single residues. So, even correct folds might have poor models (based on site prediction).
Proximity plot:3chy Best model in each foldtype native Correct model
Fold type Strand and helix order No. of models in each fold type in 200 models No. of cluster with ≤ 2Å rmsd; ≥60% PID cut-off Score of the best model nativeH1(1,5);S2(2,1,3,4,5);H3(2,3,4)--330.96 F1H1(1,5);S2(2,1,3,4,5);H3(2,3,4)16161314.76 F2H1(1*,5);S2(2,1*,3,4,5);H3(2,3,4)32202.21 F3H1(1,5);S2(2,3,1,4,5);H3(2,3,4)1611145.19 F4H1(1,3*,4*);S2(2,1,3*,4*,5*);H3(2*,5*)22150.83 F5H1(1,4);S2(2,3,1,4,5*);H3(2,3,5*)11108.62 F6H1(1,3,5);S2(2,1,4,3,5);H3(2,4)117250.20 F7H1(1,5);S2(2,1,3,4,5);H3(2,3,4)54260.29 F8H1(1,5*);S2(2,1,3,4,5);H3(2,3,4)1167.24 Decoy fold distribution for 3chy
Conclusions The requirement of proteins to form functional sites - used to select the correct protein fold. In larger proteins, difficult due to the conformation of longer loop The competing incorrect folds - mostly strand swapped models. Discriminates between incorrect fold and correct efficiently when the direction of secondary structure element that contain functional residues is altered and when the fold is messy.
Thanks to Dr Willie Taylor National Institute for Medical Research, Mill Hill, London, UK. Prof Sir Tom Blundell Department of Biochemistry, University of Cambridge, Cambridge, UK.