Presentation on theme: "Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence."— Presentation transcript:
Nadia Léonard Unité de Recherche en Biologie Moléculaire F.U.N.D.P. Developing a reliable methodology to align a sequence of known structure and a sequence with low homology, to model it
Introduction 3D Structure : information to understand function to plan directed mutagenesis Number of known structures (8000) smaller than known sequences (500000). Experimental techniques : long and expensive Alternative: modeling Homology modeling : two homologues adopt the same structure
Pairwise alignment: most features well predicted multiple alignment Twilight zone Midnight zone fold recognition (not very reliable) Homology modeling (reliable) Not homologous BUT proteins of different sequences can adopt the same structure %id. Consensus of alignments, some features well predicted
Sequence alignment is the critical step for homology modeling Below 30% of identities, there is no automatic method which allows reliable protein modeling
Aim of our work to propose a reliable alignment method for proteins sharing a small percentage of identities with their template (<30%)
General strategy for homology modeling General strategy for homology modeling Search databanks (PSI-BLAST) Multiple alignment of sequences target-template alignment Modeling Theoretical model evaluation Comparison model to real structure PDB template Critical step
Our methodology 1. Target selection : PDB proteins of which template shares between10 and 30 % of identities (ALIGN) 2. Improvement of sequence-structure alignment Building of 3 alignments 2 from our method (consensus 1 and 2) pairwise alignment PSI-BLAST (best alignment method for Twilight Zone proteins) 3. Homology modeling from each target-template alignment 4. evaluation :geometrical features of the models 5. Comparison of each model to the real structure
Our approach consists in building consensus of several alignment programs Multiple alignment Target template Several programs Several programs Multiple alignment
Our approach consists in building consensus of several alignment programs Multiple alignment Targettemplate Pairwise alignment Several programs Several pairwise alignment Multiple alignment Pairwise alignment
Our approach consists in building consensus of several alignment programs Multiple alignment Target template Pairwise alignment Several programs Several pairwise alignments consensus consensus Multiple alignment Pairwise alignment Consensus building
Multiple alignments (8 alignements) multiple alignments (12 alignments) 13 pairwise alignments Consensus 2 8 pairwise alignments Consensus 1 pairwise alignment PSI-BLAST Databank searching PSI-BLAST Model PSI-BLAST Model 1 Model 2 1) Alignment and modeling
2) Comparison of models to real structure global RMSD between model and structure after superposition local RMSD :percentage of well predicted residues Lower the distance, closer the model from the real structure.Lower the distance, closer the model from the real structure. A wrong modeled region can dramatically increase the global RMSD.
3pte: D-alanyl- D- alanine carboxypeptidase de Streptomyces sp R161 Mod 2 PSI-BLAST Real structure
Results 9 proteins have been modelled. We can distinguish: 3 proteins of the midnight zone (<20% id.) 6 proteins of the twilight zone (20-30%)
Comparison of models to the real structure Midnight Zone proteins (<20% id) For all methods (models 1, 2, PSI), very bad results: most of the residues have been badly modeled. Actually, no reliable alignment method exists below 20%. Our method (models 1 et 2) can not lower this threshold. Modeling of these 3 proteins confirms the limits Modeling of these 3 proteins confirms the limits of alignment methods below 20%.
Twilight Zone proteins (20-30% id) global and local RMS : most accurate models (4/6 et 5/6) come from our method (consensus 1 and 2). In general, model 2 gives better results than model 1 and model PSI-BLAST. It is better to use many alignment programs. models build from our methodology seem to be better than PSI-BLAST models.
Comparison to CASP (Critical Assessment of techniques for protein Structure Prediction) modeling of proteins for which structure is unknown by the entrants (revealed after competition) comparison to the real structure (global RMS) The best CASP ’s models are taken as reference
Conclusions Limits of our method are defined below 20% of identities. Our alignment method appears to be better than PSI-BLAST (above 20% id.) Our results are comparable to the best CASP ’s performances (cfr. graph) consensus for sequence alignment has a future for homology modeling of Twilight Zone proteins.
Perspectives (1) Test our approach on a large set of proteins improve our method: giving more weight to better alignment programs increasing the number of alignment programs using several templates using SSP and fold recognition
Perspectives (2) Evaluate the confidence of regions predicted by a lot of programs take part in CASP competition Automate : expert system (PHD thesis)