Presentation is loading. Please wait.

Presentation is loading. Please wait.

3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.

Similar presentations


Presentation on theme: "3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material."— Presentation transcript:

1 3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto boris.steipe@utoronto.ca (Slides evolved from original material by Chris Hogue, David Wishart, Gary Van Domselaar and Boris Steipe)

2 3.3b2 Concept 1: Threading methods can sometimes find similar folds.

3 3.3b3 Definition Threading - A protein fold recognition technique that involves incrementally replacing the sequence of a known protein structure with a query sequence of unknown structure. The new “model” structure is evaluated using a simple heuristic measure of protein fold quality. The process is repeated against all known 3D structures until an optimal fit is found.

4 3.3b4 Why Threading? Secondary structure is more conserved than primary structure Tertiary structure is more conserved than secondary structure Therefore very remote relationships can be better detected through 2 o or 3 o structural homology instead of sequence homology

5 3.3b5 Fold recognition ("Threading") Template Structure Query Sequence

6 3.3b6 Threading Database of 3D structures and sequences –Protein Data Bank (or non-redundant subset) Query sequence –Sequence < 25% identity to known structures Alignment protocol –Dynamic programming Evaluation protocol –Distance-based potential or secondary structure Ranking protocol

7 3.3b7 Concept 2: Threading can be done in 2D and 3D (even 1D)

8 3.3b8 2 Kinds of Threading 2D Threading or Prediction Based Methods (PBM) –Predict secondary structure (SS) or ASA of query –Evaluate on basis of SS and/or ASA matches 3D Threading or Distance Based Methods (DBM) –Create a 3D model of the structure –Evaluate using a distance-based “hydrophobicity” or pseudo-thermodynamic (empirical) potential

9 3.3b9 2D Threading Algorithm Convert PDB to a database containing sequence, SS and ASA information Predict the SS and ASA for the query sequence using a “high-end” algorithm Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA) Rank the alignments and select the most probable fold

10 3.3b10 2 o Structure Identification DSSP - Database of Secondary Structures for Proteins (swift.embl-heidelberg.de/dssp) VADAR - Volume Area Dihedral Angle Reporter (redpoll.pharmacy.ualberta.ca) PDB - Protein Data Bank (www.rcsb.org) QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCA HHHHHHCCEEEEEEEEEEECCHHHHHHHCCCCCCC

11 3.3b11 ASA Calculation DSSP - Database of Secondary Structures for Proteins (swift.embl-heidelberg.de/dssp) VADAR - Volume Area Dihedral Angle Reporter (www.redpoll.pharmacy.ualberta.ca/vadar/) GetArea - www.scsb.utmb.edu/getarea/area_form.html QHTAWCLTSEQHTAAVIWDCETPGKQNGAYQEDCAMD BBPPBEEEEEPBPBPBPBBPEEEPBPEPEEEEEEEEE 1056298799415251510478941496989999999

12 3.3b12 Other ASA sites Connolly Molecular Surface Home Page –http://www.biohedron.com/ Naccess Home Page –http://sjh.bi.umist.ac.uk/naccess.html ASA Parallelization –http://cmag.cit.nih.gov/Asa.htm Protein Structure Database –http://www.psc.edu/biomed/pages/research/PSdb/

13 3.3b13 2D Threading Algorithm Convert PDB to a database containing sequence, SS and ASA information Predict the SS and ASA for the query sequence using a “high-end” algorithm Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA) Rank the alignments and select the most probable fold

14 3.3b14 ASA Prediction PredictProtein-PHDacc (58%) –http://cubic.bioc.columbia.edu/predictprotein PredAcc (70%?) –condor.urbb.jussieu.fr/PredAccCfg.html QHTAW... QHTAWCLTSEQHTAAVIW BBPPBEEEEEPBPBPBPB

15 3.3b15 2D Threading Algorithm Convert PDB to a database containing sequence, SS and ASA information Predict the SS and ASA for the query sequence using a “high-end” algorithm Perform a dynamic programming alignment using the query against the database (include sequence, SS & ASA) Rank the alignments and select the most probable fold

16 3.3b16 2D Threading Performance In test sets 2D threading methods can identify 30-40% of proteins having very remote homologues (i.e. not detected by BLAST) using “minimal” non-redundant databases (<700 proteins) If the database is expanded ~4x the performance jumps to 70-75% Performs best on true homologues as opposed to postulated analogues

17 3.3b17 2D Threading Advantages Algorithm is easy to implement Algorithm is much faster than 3D threading approaches The 2D database is small ( 1.5 Gbytes) Appears to be just as accurate as DBM or other 3D threading approaches Very amenable to web servers

18 3.3b18 Servers - PredictProtein

19 3.3b19 Servers - 123D

20 3.3b20 Servers - GenThreader

21 3.3b21 More Servers - www.bronco.ualberta.ca

22 3.3b22 2D Threading Disadvantages Reliability is not 100% making most threading predictions suspect unless experimental evidence can be used to support the conclusion Does not produce a 3D model at the end of the process Doesn’t include all aspects of 2 o and 3 o structure features in prediction process PSI-BLAST may be just as good (faster too!) (PSI-BLAST: 1D "threading")

23 3.3b23 Making it Better Include 3D threading analysis as part of the 2D threading process -- offers another layer of information Include more information about the “coil” state (3-state prediction isn’t good enough) Include other biochemical (ligands, function, binding partners, motifs) or phylogenetic (origin, species) information

24 3.3b24 3D Threading Servers Generate 3D models or coordinates of possible models based on input sequence Loopp (version 2) –http://ser-loopp.tc.cornell.edu/loopp.html 3D-PSSM –http://www.sbg.bio.ic.ac.uk/~3dpssm/ Prospect –http://compbio.ornl.gov/prospect/ All require email addresses since the process may take hours to complete

25 3.3b25 Threading Database Search Premise is that most sequences match some 3-D structure that is already known (1/2) Given a database of known 3-D protein folds: –align the test sequence to each known protein –in real 3-D coordinate space (slow but exact) –in parameterized 1-D space (fast but approximate) optimize some scoring function sort out best sequence-structure alignment assess alignments - statistically significant?

26 3.3b26 Threading Statistics Z score (sequence composition correction) number of standard deviations the found alignment is off from the mode of a randomized version of the structure or profile P value (sequence length correction) Shuffle the sequence - make a distribution of random threads… Is the unscrambled thread any better than a randomly optimized sequence… Z score of Z scores Look for P values as a criterion for choosing a threading method...

27 3.3b27 Database Searching... Sensitivity –High sensitivity implies finding all possible true positive matches in the database Specificity –High specificity implies finding no false positive matches in the search.

28 3.3b28 Interpret Threading Accordingly... In a ranked list of 10 matches, expect that only one might be correct Expect that none may be correct Expect that the top ranked hit is a false positive...

29 3.3b29 How then does Threading find things? If there is a true positive in a threading search hit list - People find it... It is most often found by FUNCTIONAL similarity. –Similar enzymatic mechanisms –Motifs, DART... –Similar roles, cellular distributions...


Download ppt "3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material."

Similar presentations


Ads by Google