Presentation is loading. Please wait.

Presentation is loading. Please wait.

EMBL-EBI MSDfold (SSM) A web service for protein structure comparison and structure searches Eugene Krissinel

Similar presentations


Presentation on theme: "EMBL-EBI MSDfold (SSM) A web service for protein structure comparison and structure searches Eugene Krissinel"— Presentation transcript:

1 EMBL-EBI MSDfold (SSM) A web service for protein structure comparison and structure searches Eugene Krissinel http://www.ebi.ac.uk/msd-srv/ssm/ssmstart.html

2 EMBL-EBI Structure alignment Structure alignment may be defined as identification of residues occupying “equivalent” geometrical positions  Unlike in sequence alignment, residue type is neglected  Used for  measuring the structural similarity  protein classification and functional analysis  database searches

3 EMBL-EBI Methods  Many methods are known:  Distance matrix alignment (DALI, Holm & Sander, EBI)  Vector alignment (VAST, Bryant et. al. NCBI)  Depth-first recursive search on SSEs (DEJAVU, Madsen & Kleywegt, Uppsala)  Combinatorial extension (CE, Shindyalov & Bourne, SDSC)  Dynamical programming on C  (Gerstein & Levitt)  Dynamical programming on SSEs (SSA, Singh & Brutlag, Stanford University)  many other  SSM employs a 2-step procedure: A Initial structure alignment and superposition using SSE graph matching B C  - alignment

4 EMBL-EBI E. M. Mitchell et al. (1990) J. Mol. Biol. 212:151     L  SSE graphs differ from conventional chemical graphs only in that they are labelled by vectors of properties. In graph matching, the labels are compared with tolerances chosen empirically. Graph representation of SSEs

5 EMBL-EBI SSE graph matching H1H1 S1S1 S2S2 S3S3 S4S4 H2H2 H1H1 H2H2 H3H3 H4H4 S1S1 H5H5 H6H6 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 A B H1H1 S1S1 S2S2 H2H2 S3S3 S4S4 S5S5 S6S6 S7S7 H3H3 H4H4 H5H5 H6H6 B H1H1 S1S1 S2S2 S3S3 S4S4 H2H2 A Matching the SSE graphs yields a correspondence between secondary structure elements, that is, groups of residues. The correspondence may be used as initial guess for structure superposition and alignment of individual residues.

6 EMBL-EBI matched helicesmatched strands chain A chain B  SSE-alignment is used as an initial guess for C  -alignment  C  -alignment is an iterative procedure based on the expansion of shortest contacts at best superposition of structures  C  -alignment is a compromise between the alignment length N align and r.m.s.d. Longest contacts are unmapped in order to maximise the Q -score: C  - alignment

7 EMBL-EBI  More than 2 structures are aligned simultaneously  Multiple alignment is not equal to the set of all-to-all pairwise alignments  Helps to identify common structure motifs for a whole family of structures Multiple structure alignment

8 EMBL-EBI Iterative removal of non-aligning SSEs best pairwise alignments A B C Helices may be multiply aligned from pairwise relations Strandsdo not multiply align, but one still can try to align them by probing alternative (not best) alignments

9 EMBL-EBI 4 alternative pairwise alignments A B C 1 2 2 1 1 make up to 4 multiple alignments: A1 - B1 - C1 A1 - B2 - C1 A2 - B1 - C1 A2 - B2 - C1 Complexity prohibitive for structures Iterative removal of non-aligning SSEs

10 EMBL-EBI Heuristics: A B C 1 2 2 1 1 remove non-aligning SSE with lowest alignment score Calculate all-to-all pairwise alignments Are there non- aligning SSEs? Remove one non- aligning SSE with lowest score QuitStart YesNo and reiterate all alignment Iterative removal of non-aligning SSEs

11 EMBL-EBI Multiple C  refinement Central star & consensus A B C X Superpose structures and calculate consensus structure X Score improved? Quit Multiple SSE alignment Initial C  alignment Choose structure, closest to X, as central star  and align all the rest to   Unmap groups of atoms with highest distance score D in order to maximise the score YesNo

12 EMBL-EBI Pairwise Alignment vs. Multiple Alignment Best pairwise alignment of 1SAR:A and 1D1F:B includes only  -sheet Addition of 1MGW:A (close neighbour to 1SAR:A) spots out a common motif of  - sheet and  -helix

13 EMBL-EBI http://www.ebi.ac.uk/msd-srv/ssm SSM server map

14 EMBL-EBI  Table of matched Secondary Structure Elements  Table of matched backbone C  -atoms with distances between them at best structure superposition  Rotation-translation matrix of best structure superposition  Visualisation in Jmol and Rasmol  r.m.s.d. of C  -alignment  Length of C  -alignment N align  Number of gaps in C  -alignment  Quality score Q  Statistical significance scores P(S), Z  Sequence identity SSM output

15 EMBL-EBI  P -value is estimated using Q -scores of SSE deviations  P(S) is the probability of getting a score equal to S or higher at random picking structures from the PDB x1x1 xixi xnxn  P(S) is calibrated on SCOP folds  P(S) is often expressed through Z -score Statistical significance of alignments

16 EMBL-EBI Maximal Q-score d1di2a_ (69 res) Q-score0.213 RMSD2.43 N align 67/184 P0.55 Lowest RMSD d1emn_1 (43 res) Q-score0.019 RMSD0.9 N align 13/184 P0.075 Highest N align d1elxb_ (449 res) Q-score0.02 RMSD5.82 N align 89/184 P~1 Scoring at low structural similarity - 1KNO:A vs SCOP 1.61

17 EMBL-EBI Performance data 4 1 50 s

18 EMBL-EBI Sequence alignment Based on residue identity, sometimes with a modified alphabet --AARNEDDDGKMPSTF-L E-AARNFG-DGK--STFIL Used for:  evolution studies  protein function analysis  guessing on structure similarity Algorithms: Dynamic programming + heuristics Applications: BLAST, FASTA, FLASH and others Structure alignment Based on geometrical equivalence of residue positions, residue type disregarded Used for:  protein function analysis  some aspects of evolution studies Algorithms: Dynamic programming, graph theory, MC, geometric hashing and others Applications: DALI, VAST, CE, MASS, SSM and others Sequence and Structure Alignments

19 EMBL-EBI E. Krissinel & K. Henrick (2004), Acta Cryst. D60, 2256-2268 20% of identical residues are very often sufficient for chains to be structurally similar Good structure similarity Sequence and Structure Identity

20 EMBL-EBI Sequence identity within structure families Given that A  B at 20%, B  C at 20%, is A  C at 20% or more? A 20%  20% ? 20% C B Naively, Ok, 20% sequence identity is not a necessary condition for structural similarity. How distant the sequences within a structure family may be?

21 EMBL-EBI Sequence identity within structure families: case A ABC Aligned residues are structurally conserved through the family. This is a typical assumption for multiple sequence alignment. Implications:  Protein folds are controlled by certain residue types and/or subsequences.  Protein structure and therefore function are clearly sequence- related HIS CYS TRP

22 EMBL-EBI Sequence identity within structure families: case B Aligned residues are not conserved through the family. Implications:  Protein folds are not controlled by any particular residue types and/or subsequences.  Many different sequences may fold into similar structures  Protein structure and therefore function are not clearly sequence-related ABC HIS CYS TRP

23 EMBL-EBI ABC This case may be identified by multiple structure alignment only. Multiple sequence alignment will always find and superpose short fragments: HIS CYS TRP -----AFRNEDDDGGKPSTFKL EAARNAF-------GKKSTFIL EAARNAFDGKMTBIGK------ Sequence identity within structure families: case B

24 EMBL-EBI Multiple alignment of SCOP folds SCOP database 11 classes 945 folds 1539 superfamilies 2845 families 70859 domains SCOP  Structure-related hierarchy  Manually curated Multiple structure alignment of domains in SCOP folds  Sound structure resemblance within folds  Wide sequence variations  Sequence redundancy cut-off at 50%

25 EMBL-EBI Sequence identity in SCOP folds Average multiple sequence identity (A)12% Average pairwise sequence identity (B)19% pairwise sequence conservation (case B) multiple sequence conservation (case A) case A case B

26 EMBL-EBI Residue conservation Odds are calculated as a ratio of observed and expected probabilities to obtain identity residue substitutions: Henikoff, S. and Henikoff, J. G. (1992) Proc. Natl. Acad. Sci. 89, p. 10915.

27 EMBL-EBI Reference data from Naor D. et.al. (1996). J. Mol. Biol. 256, p. 924. Residue conservation

28 EMBL-EBI Log odds matrix for SCOP folds Hydropathy index by Kyte, J. and Doolittle, R. F. (1982). J. Mol. Biol. 157, p. 105.

29 EMBL-EBI Sequence vs “hydropathy” identity in SCOP folds Average pairwise sequence identity19% Average multiple sequence identity12% Average “hydropathy” identity68% hydropathy conservation pairwise sequence conservation (case B) multiple sequence conservation (case A) case A case B

30 EMBL-EBI What is 20% sequence identity? Consider an idealized model, where all residues are indiscriminately substituted by like-hydropathic residues only : Count matrix 10 hydrophilic residues 10 hydrophobic residues Total counts (in upper triangle) Expected sequence identity

31 EMBL-EBI Conclusion  it is quite possible that residue identity plays a much less significant role in protein structure than often believed  as a consequence, the role of residue identity in protein function may be often overestimated  using sequence identity for the assessment of structural or functional features may give more false negatives than expected  physical-chemical properties of residues should be given preference over residue identity in structure and function analysis  modern methods for structure alignment are efficient; there is little sense to use sequence alignment in structure-related studies Acknowledgement. This work has been supported by research grant No. 721/B19544 from the Biotechnology and Biological Sciences Research Council (BBSRC) UK.


Download ppt "EMBL-EBI MSDfold (SSM) A web service for protein structure comparison and structure searches Eugene Krissinel"

Similar presentations


Ads by Google