Presentation is loading. Please wait.

Presentation is loading. Please wait.

PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.

Similar presentations


Presentation on theme: "PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D."— Presentation transcript:

1 PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.

2 Structure alignment Structure alignment may be defined as identification of residues occupying “equivalent” geometrical positions Unlike in sequence alignment, residue type is neglected Used for The similarity analysis of protein struc is a vital step in understanding the protein’s role in different cellular processes.The three dimensional of a protein has a major impact in the proteins ability to bind to other proteins and ligands and as well as the stability of the protein under invivo and invitro conditions. currently there are more than 50,000 protein struct and the number is growing on a daily basis. So analyzing number of known structures requires a efficient tool for protein structure alignment in three dimensions. 3dimensional alignment is based on residues occupying equivalent geometrical positions, rather than biochemical properties.2residues are considered aligned if they satisfy certain distance and orientation criteria at their best superposed position measuring the structural similarity protein classification and functional analysis database searches 2 2

3 Sequence and Structure Alignments
Sequence alignment Structure alignment Based on residue identity, sometimes with a modified alphabet Based on geometrical equivalence of residue positions, residue type disregarded --AARNEDDDGKMPSTF-L E-AARNFG-DGK--STFIL Used for: evolution studies protein function analysis guessing on structure similarity Used for: protein function analysis some aspects of evolution studies Algorithms: Dynamic programming + heuristics Applications: BLAST, FASTA, FLASH and others Algorithms: Dynamic programming, graph theory, MC, geometric hashing and others Applications: DALI, VAST, CE, MASS, SSM and others

4 Methods Many methods are known: SSM employs a 2-step procedure:
Distance matrix alignment (DALI, Holm & Sander, EBI) Vector alignment (VAST, Bryant et. al. NCBI) Depth-first recursive search on SSEs (DEJAVU, Madsen & Kleywegt, Uppsala) Combinatorial extension (CE, Shindyalov & Bourne, SDSC) Dynamical programming on Ca (Gerstein & Levitt) Dynamical programming on SSEs (SSA, Singh & Brutlag, Stanford University) many more … Several approaches towards protein structure alignment has been explored previously.which includes comparison of distance matrices(DALI),analyse differences in vector distance plot(VAST), dynamic programming on pairwise distances between protein residues and SSE. None of these previous methods gave an exact solution to the problem, they agreed relatively well on highly similar structures whereas for structures with low similarity they often disagree. So how is the SSM program different than the previously known methods? The answer lies in its advanced graph matching algorithm, which is capable of delivering protein structure alignments and database searches in less than a minute with high quality alignments. SSM employs a 2-step procedure: Initial structure alignment and superposition using SSE graph matching Ca - alignment

5 Three dimensional graph matching
Protein secondary structure elements (SSE)– natural and convenient objects for building three dimensional graphs. Secondary structures provide most functionality and is conserved through evolution Details of protein fold –expressed in terms of two SSE – helices and strands.

6 Graph representation of SSEs
L a1 Vj Vi SSE graphs- represented by vectors Each SSE can be used as graph vertices (Ti, ρi) Any 2 vertices are connected by an edge label L – describes position and orientation of the connected SSEs Each edge labelled with a property vector – α1/2 angle between edge and vertices, torsion angle between vertices, length of the edge L After defining the slide ---- The set of vertices, edges and their labels gives a

7 Torsion angle comparison – distinguish mirror symmetry mates
Sets of vertices, edges and their labels provides full definition of the graph. Graph matching algorithm is required – set of rules for comparing individual vertices and edges – tolerances chosen empirically Relative and absolute vertex and edge lengths are used for comparison – allows larger absolute differences for longer vertices and edges Torsion angle comparison – distinguish mirror symmetry mates

8 SSE graph matching H1 S1 S2 S3 S4 H2 B A A H1 H2 H3 H4 S1 H5 H6 S2 S3
Here we see two protein molecules. The structure alignment is done using the graph matching algorithm which yields correspondence between SSE in other words groups of residues. Matching the SSE graphs yields a correspondence between secondary structure elements, that is, groups of residues. The correspondence may be used as initial guess for structure superposition and alignment of individual residues. B

9 What next? We have considered three dimensional arrangement of secondary structure element (SSE) regardless of their ordering in protein chain. Connectivity of SSEs is significant (can be neglected in comparing mutated/engineered proteins) In previous methods connectivity was either preserved or neglected.

10 PDBefold (SSM) Approach – a more flexible way
There are three options – 1) connectivity of SSEs neglected Different connectivity in SSE but SSE graphs are geometrically identical

11 2) Soft connectivity – general order of SSEs along their protein chains are same in both structures BUT any number of missing/unmatched SSE between matched ones allowed 3)Strict connectivity – matched SSEs follow same order along their protein chains – separated only by equal number of matched/unmatched SSE in both structures To obtain 3D alignment of individual residues – represent them by their C-alpha atoms – use results of graph matching as a starting point

12 Ca - alignment SSE-alignment is used as an initial guess for Ca-alignment Ca-alignment is an iterative procedure based on the expansion of shortest contacts at best superposition of structures matched helices matched strands chain A chain B Ca-alignment is a compromise between the alignment length Nalign and r.m.s.d. Longest contacts are unmapped in order to maximise the Q-score: Found contacts are expanded in both directions starting from the shortest contact such that the distance between newly mapped atom undergoes minimum possible increase. If it encounters unmappable pair of atoms it stops advancing in that direction Qscore represents the quality function of calpha alignment SSM alignment algorithm.Qsocre takes into account both alignment length and RPDBe. In general more meaningful 3D alignments corresponf to lower RMSD and higher number of aligned residues Q score is 1 for identical structures and it drops down with increasing RMSD or decreasing alignment length.

13 Multiple structure alignment
More than 2 structures are aligned simultaneously Multiple alignment is not equal to the set of all-to-all pairwise alignments Helps to identify common structure motifs for a whole family of structures

14 If you have to ask…. Use PDBefold.
Are there any structures in the PDB that are similar to mine? What SCOP and/or CATH family could my structure belong to ? Can I get some idea about the possible function of my protein based on similarity with others based on structural similarity ? Mutiple alignment of many of my structures ? Use PDBefold. Upload your own PDB file for analysis !! 14 Macromolecular Structure Database

15 SSM output Table of matched Secondary Structure Elements
Table of matched backbone Ca-atoms with distances between them at best structure superposition Rotation-translation matrix of best structure superposition Visualisation in Jmol and Rasmol r.m.s.d. of Ca-alignment Length of Ca-alignment Nalign Number of gaps in Ca-alignment Quality score Q Statistical significance scores P(S), Z Sequence identity

16 The PDBefold Search Interface

17 The Results Page For Pairwise Alignment

18 Analyzing the result from a particular pairwise alignment

19 Residue by Residue Structural alignment result

20 Multiple 3D alignment using PDBefold

21 Results from multiple 3D alignment

22 Conclusion it is quite possible that residue identity plays a much less significant role in protein structure than often believed as a consequence, the role of residue identity in protein function may be often overestimated using sequence identity for the assessment of structural or functional features may give more false negatives than expected physical-chemical properties of residues should be given preference over residue identity in structure and function analysis modern methods for structure alignment are efficient; there is little sense to use sequence alignment in structure-related studies


Download ppt "PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D."

Similar presentations


Ads by Google