Presentation is loading. Please wait.

Presentation is loading. Please wait.

FSSP uses DALI alignments to classify structures all PDB entries representative set of structures representative set of domains group domains into fold.

Similar presentations


Presentation on theme: "FSSP uses DALI alignments to classify structures all PDB entries representative set of structures representative set of domains group domains into fold."— Presentation transcript:

1 FSSP uses DALI alignments to classify structures all PDB entries representative set of structures representative set of domains group domains into fold types (clusters of similar structures) and make set of representatives of each fold eliminate similar sequences divide into domains align domains with DALI! 8320 947 1484 540

2 Judging DALI alignments Z-score: how much better than average is the alignment, i.e. how many standard deviations from the mean of a distribution of alignments of random pairs of proteins. >16 very close, 8-16 pretty close, <8 not so close. RMSD: root mean square deviation of alpha carbons for the matching portion of the structures. LALI: length of alignment (recognizably matching portion of the structures) LSEQ2: total length of the sequence being matched. %IDE: % sequence identity between the two sequences

3 if you go into FSSP, and search for a particular structure, you’ll get an output of its best DALI alignments with other structures STRID2Z RMSDLALI LSEQ2 %IDE PROTEIN 1plc 24.4 0.0 99 99 100 Plastocyanin (cu2+, ph 6.0) 2pcy 23.4 0.2 99 99 100 Apo-plastocyanin (pH 6.0) 1bqk 12.1 2.0 89 124 29 pseudoazurin 1aac 11.0 1.9 84 104 24 amicyanin 1ibzA 9.1 2.5 83 111 19 nitrosocyanin 1qhqA 8.3 2.4 87 139 29 auracyanin 1rcy 8.2 2.5 90 151 17 rusticyanin biological_unit 1qniA 7.7 2.2 78 572 19 nitrous-oxide reductase 1kcw 7.1 2.4 81 1017 17 ceruloplasmin biological_unit 2cuaA 7.0 2.2 80 122 15 cua fragment 1nwpA 6.7 3.1 85 128 24 azurin

4 Divergence of sequence vs. divergence of structure one of the things that structural classification systems allow us to do is to assess how structure changes as the sequences of related proteins diverge. as we will see, this is extremely useful information when trying to predict structure from sequence.

5 Structures of closely related sequences (>50% identity) at this level, usually >90% of residues retain a qualitatively similar structure (common core) 90% of the identical residues would have similar sidechain conformations. 50% of the nonidentical residues would also show some similarity in sidechain conformation. Backbone atoms for common core would have 1.0 Å RMSD or less.

6 More distant relatives: below 25% sequence identity plastocyaninazurin common core of 7-stranded beta-sandwich is 50-60% of the total residues. Sequence identity within this common core is ~20%. RMSD of common core residues is ~2.2Å. both have two- histidine copper binding site

7 so RMSD between the common core of two structures (portion that retains the same qualitative fold) varies as a function of sequence identity...

8 ...as does the percentage of the total residues which belong to the common core... why two points per pair of proteins?

9 Homology modelling the relationship between divergence of sequence and structure forms the basis for homology modelling--model the structure of a protein based on its sequence similarity to protein(s) of known structure. homology modelling is the only consistently accurate structure prediction method known, but... its accuracy depends critically on the similarity between the sequences--best when they are >50% identical, not very useful below about 30%. (Although if there are multiple templates one can do a little better)

10 Homology modelling and structural genomics homology modelling is one of the stated justifications for many structural genomics projects. structural genomics projects take several forms. Some of them try to solve as many structures as possible for a given model organism, and hope that this will yield information on the structures of related proteins in other organisms. another approach is to choose target structures to solve based on how many sequences can be homology modelled from them. another approach is to solve structures for sequences which are deemed least likely to have a fold which is already known.

11 Choosing the template(s) the target sequence (sequence to be modelled) is used in a sequence similarity search of the PDB to determine if it has recognizable similarity to sequences for which a structure is known. if it does, then sequence alignments are constructed between the sequence to be modelled and the template sequences (sequence similarity searches and sequence alignments are an interesting subject in themselves, which we can’t cover here, unfortunately) in the case of multidomain proteins, there may be multiple regions to model separately.

12 Alignment of target and templates: example of FASL receptor precursor Running pair-wise alignments with target sequence Sequence identity of templates with target: 11DDF.pdb: 100 % identity 11CDF.pdb: 37.33 % identity 11EXT.pdb: 26.4 % identity 21EXT.pdb: 26.4 % identity 21NCF.pdb: 26.4 % identity 21TNR.pdb: 26.4 % identity 11NCF.pdb: 26.4 % identity one exact match: has the structure already been solved? these five are different chains within the same crystal structure or the same protein bound to different ligands

13 Alignment of target and templates Looking for template groups Global alignment overview: Taget Sequence: |=======================================================| 11DDF.pdb | ------------------- 11CDF.pdb | --------------- 11EXT.pdb | ------------- 21EXT.pdb | ------------- 21NCF.pdb | ------------- 21TNR.pdb | ------------- 11NCF.pdb | ------------- AlignMaster found 2 regions to model separately: 1: Using template(s) 11DDF.pdb 2: Using template(s) 11CDF.pdb 11EXT.pdb 11NCF.pdb 21EXT.pdb 21NCF.pdb 21TNR.pdb this was our exact match: intracellular DEATH domain has been solved six matches for extracellular ligand-binding domain

14 Target-Template Sequence Alignment part of alignment to go here

15 Align the structures of the templates so here are our six ligand-binding domain template structures, aligned the more similar regions of the sequence are used to establish structurally equivalent residues in the set of templates, and these correspondences are then used to superpose the protein structures the structures are less similar in some places than others-->loops

16 Generating a framework for the target 1. An initial set of coordinates is generated for the target based a local similarity weighted average of the template structures. 2. In other words, the target may be more similar to one of the templates than to the others over a particular range of sequence, so this template will be favored in generating the framework over that range. For instance, if your target has a lysine at a certain position in the alignment, and one of the templates also has a lysine at that position, but none of the others do, it will essentially use that template to model the target lysine sidechain conformation. The bottom line is that the conformations of target sidechains which have a corresponding identical residue in at least one of the templates will tend to mimic the conformation found in the template. 3. This averaging method for placing the atoms of the target structure may lead to some of the residues having ridiculous geometries. Those are thrown out at this stage, to be rebuilt later.

17 Framework generation a framework for just the target backbone is shown at right in yellow against the template structures

18 Loop building there are likely to be some segments of sequence in the target for which there is no corresponding sequence in the template(s), or for which there is such poor agreement among the templates that a reasonable framework cannot be generated. Most often these will be in loop regions. in these cases, the backbone of the target framework will essentially end at some position and then resume abruptly a few residues later. So you have “stems” that need to be connected somehow.

19 Loop libraries what is done is to scan a database of structural fragments derived from the PDB, and look for fragments which have the right conformation to properly connect the stems without colliding with anything else in the structure. The red loop at right is discarded because it collides with part of the framework. framework is shown in orange

20 Side chain building some of the side chains will be missing or somewhat distorted (but not enough to have been thrown out earlier) at this point. this is where rotamer libraries, which consist of sidechain conformations found in known protein structures, come into the picture! we want to use the rotamer library to correct distorted sidechains, and rebuild absent ones.

21 More sidechain building the rotamers in a rotamer library are sorted according to frequency of occurrence in known structures--some are more probable than others, and we want to take that into account. if some sidechains are somewhat distorted, they can be corrected by matching to a rotamer in the library, with more probable rotamers being given a higher weight. if sidechains are altogether missing, they can be rebuilt in the same way, with the restriction that the sidechain not have any steric clashes with other atoms.

22 Profiling: is our model “reasonable”? a method called profiling is used to assess whether a given residue is in a reasonable three-dimensional environment if a stretch of residues has a low profile score, that region is probably modelled wrong. this method can be used not only to judge the quality of models, but also to judge the quality of experimentally determined structures as well. further, it can be used to predict whether a given sequence is compatible with a particular structure--this is known as fold recognition.

23 How profiling works

24 Judging the Quality of Models by Profiling x axis would be the sequence, and the y-axis is the profile score for each residue.

25 Surfaces: does our model look like a real protein? a solvent-accessible surface calculation (slightly different from the one we learned) can now be done to identify any cavities (holes in the middle of the protein) these cavities are compared with any present in the template crystal structures in order to try to uncover poorly modelled regions.

26 A slight digression:“Accessible” vs. “molecular” surface note: these are alternative ways of representing the same thing: the surface which is accessible to solvent!

27 Usefulness of molecular surface molecular and accessible surfaces are both useful representations, but molecular surface is more closely related to the actual atomic surfaces. This makes it somewhat better for visualizing the texture of the outer surface, as well as for assessing the shape and volume of any internal cavities. you will hear the term Connolly surface used often. A Connolly surface is a particular way of calculating the molecular surface.

28 Energy minimization at the end, some molecular dynamics is done (John Rupley may teach you more about this subject), which can slightly reposition the atoms to achieve a lower energy. This can fix tiny little problems, such as slightly unrealistic bond angles and lengths, but it can’t fix anything bigger than that.


Download ppt "FSSP uses DALI alignments to classify structures all PDB entries representative set of structures representative set of domains group domains into fold."

Similar presentations


Ads by Google