Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear.

Similar presentations


Presentation on theme: "Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear."— Presentation transcript:

1 Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear magnetic resonance (NMR) techniques Protein Databases: Protein Databases: PDB (protein data bank) PDB (protein data bank) Swiss-Prot Swiss-Prot PIR PIR (Protein Information Resource) SCOP (Structural Classification of Proteins) SCOP (Structural Classification of Proteins)

2 Protein Structure Databases Most extensive for 3-D structure is PDB Most extensive for 3-D structure is PDB

3 Visualization of Proteins A number of programs convert atomic coordinates of 3-d structures into views of the molecule A number of programs convert atomic coordinates of 3-d structures into views of the molecule allow the user to manipulate the molecule by rotation, zooming, etc. allow the user to manipulate the molecule by rotation, zooming, etc. Critical in drug design -- yields insight into how the protein might interact with ligands at active sites Critical in drug design -- yields insight into how the protein might interact with ligands at active sites

4 Visualization of Proteins Most popular programs for viewing 3-D structures: Protein explorer: http://www.umass.edu/microbio/chime/pe/protexpl/frntdoor.htm http://www.umass.edu/microbio/chime/pe/protexpl/frntdoor.htm Rasmol: http://www.umass.edu/microbio/rasmol/ http://www.umass.edu/microbio/rasmol/ Chime: http://www.umass.edu/microbio/chime/ http://www.umass.edu/microbio/chime/ Cn3D: http://www.ncbi.nlm.nih.gov/Structure/ http://www.ncbi.nlm.nih.gov/Structure/ Mage: http://kinemage.biochem.duke.edu/website/kinhome.html http://kinemage.biochem.duke.edu/website/kinhome.html Swiss 3D viewer: http://www.expasy.ch/spdbv/mainpage.html http://www.expasy.ch/spdbv/mainpage.html

5 Alignment of Protein Structure Compare 3D structure of one protein against 3D structure of second protein Compare 3D structure of one protein against 3D structure of second protein Compare positions of atoms in three-dimensional structures Compare positions of atoms in three-dimensional structures Look for positions of secondary structural elements (helices and strands) within a protein domain Look for positions of secondary structural elements (helices and strands) within a protein domain Exam distances between carbon atoms to determine degree structures may be superimposed Exam distances between carbon atoms to determine degree structures may be superimposed Side chain information can be incorporated Side chain information can be incorporated Buried; visible Buried; visible Structural similarity between proteins does not necessarily mean evolutionary relationship Structural similarity between proteins does not necessarily mean evolutionary relationship

6 Alignment of Protein Structure

7 T Simple case – two closely related proteins with the same number of amino acids. Structure alignment Find a transformation to achieve the best superposition

8 Transformations  Translation  Translation and Rotation -- Rigid Motion (Euclidian space)

9 Types of Structure Comparison  Sequence-dependent vs. sequence-independent structural alignment  Global vs. local structural alignment  Pairwise vs. multiple structural alignment

10 1234567 ASCRKLE ¦¦¦¦¦¦¦ ASCRKLE 1 2 34 5 6 7 1 2 3 45 6 7 Minimize rmsd of distances 1-1,...,7-7 Sequence-dependent Structure Comparison 1 2 34 5 6 7 1 2 3 45 6 7

11 Can be solved in O(n) time. Can be solved in O(n) time. Useful in comparing structures of the same protein solved in different methods, under different conformation, through dynamics. Useful in comparing structures of the same protein solved in different methods, under different conformation, through dynamics. Evaluation protein structure prediction. Evaluation protein structure prediction.

12 Sequence-independent Structure Comparison Given two configurations of points in the three dimensional space: find T which produces “largest” superimpositions of corresponding 3-D points. T

13 Evaluating Structural Alignments 1. 1.Number of amino acid correspondences created. 2. 2.RMSD of corresponding amino acids 3. 3.Percent identity in aligned residues 4. 4.Number of gaps introduced 5. 5.Size of the two proteins 6. 6.Conservation of known active site environments 7. 7.… No universally agreed upon criteria. It depends on what you are using the alignment for.

14 Protein Secondary Structure Prediction

15 Why secondary structure prediction? Accurate secondary structure prediction can be an important information for the tertiary structure prediction Accurate secondary structure prediction can be an important information for the tertiary structure prediction Protein function prediction Protein function prediction Protein classification Protein classification Predicting structural change Predicting structural change An easier problem than 3D structure prediction (more than 40 years of history). An easier problem than 3D structure prediction (more than 40 years of history).

16  helix α-helix (30-35%) Hydrogen bond between C=O (carbonyl) & NH (amine) groups within strand (4 positions apart) 3.6 residues / turn, 1.5 Å rise / residue Typically right hand turn Most abundant secondary structure α-helix formers: A,C,L,M,E,Q,H,K

17  sheet &  turn β-sheet / β-strand (20-25%) Hydrogen bond between groups across strands Forms parallel and antiparallel pleated sheets Amino acids less compact – 3.5 Å between adjacent residues Residues alternate above and below β-sheet β-sheet formers: V,I,P,T,W β-turn Short turn (4 residues) Hydrogen bond between C=O & NH groups within strand (3 positions apart) Usually polar, found near surface β-turn formers: S,D,N,P,R

18 Others Loop Regions between α-helices and β-sheets On the surface, vary in length and 3D configurations Do not have regular periodic structures Loop formers: small polar residues Coil (40-50%) Generally speaking, anything besides α-helix, β-sheet, β-turn

19 Assigning Secondary Structure Defining features Defining features Dihedral angles Dihedral angles Hydrogen bonds Hydrogen bonds Geometry Geometry Assigned manually by crystallographers or Assigned manually by crystallographers or Automatic Automatic DSSP (Definition of secondary structure of proteins, Kabsch & Sander,1983) DSSP (Definition of secondary structure of proteins, Kabsch & Sander,1983) DSSP STRIDE (Frishman & Argos, 1995) STRIDE (Frishman & Argos, 1995) STRIDE Continuum (Claus Andersen, Burkhard Rost, 2001) Continuum (Claus Andersen, Burkhard Rost, 2001) Continuum

20 Definition of secondary structure of proteins (DSSP) The DSSP code The DSSP code H = alpha helix H = alpha helix B = residue in isolated beta-bridge B = residue in isolated beta-bridge E = extended strand, participates in beta ladder E = extended strand, participates in beta ladder G = 3-helix (3/10 helix) G = 3-helix (3/10 helix) I = 5 helix (pi helix) I = 5 helix (pi helix) T = hydrogen bonded turn T = hydrogen bonded turn S = bend S = bend CASP Standard CASP Standard H = (H, G, I), E = (E, B), C = (T, S) H = (H, G, I), E = (E, B), C = (T, S)

21 Secondary Structure Prediction Given a protein sequence (primary structure) Given a protein sequence (primary structure) HWIATGQLIREAYEDYSS GHWIATRGQLIREAYEDYRHFSSECPFIP Predict its secondary structure content (C=Coils H=Alpha Helix E=Beta Strands) HWIATGQLIREAYEDYSS GHWIATRGQLIREAYEDYRHFSSECPFIP EEEEEHHHHHHHHHHHHH CEEEEECHHHHHHHHHHHCCCHHCCCCCC

22 Algorithm Chou-Fasman Method Chou-Fasman Method Examining windows of 5 - 6 residues to predict structure Examining windows of 5 - 6 residues to predict structure

23 From PDB database, calculate the propensity for a given amino acid to adopt a certain ss-type From PDB database, calculate the propensity for a given amino acid to adopt a certain ss-type (aa i --- amino acid i,  --- ss type) l Example: #Alanine=2,000, #residues=20,000, #helix=4,000, #Ala in helix=500 P=? Secondary structure propensity

24 From PDB database, calculate the propensity for a given amino acid to adopt a certain ss-type From PDB database, calculate the propensity for a given amino acid to adopt a certain ss-type l Example: #Ala=2,000, #residues=20,000, #helix=4,000, #Ala in helix=500 P( ,aa i ) = 500/20,000, p(  p(aa i ) = 2,000/20,000 P = 500 / (4,000/10) = 1.25

25 Chou-Fasman parameters Note: The parameters given in the textbook are 100*P  i

26 Chou-FasmanChou-Fasman algorithm Chou-Fasman Helix: Helix: Scan through the peptide and identify regions where 4 out of 6 contiguous residues have P(H) > 1.00. That region is declared an alpha-helix. Scan through the peptide and identify regions where 4 out of 6 contiguous residues have P(H) > 1.00. That region is declared an alpha-helix. Extend the helix in both directions until a set of four contiguous residues that have an average P(H) < 1.00 is reached. That is declared the end of the helix. Extend the helix in both directions until a set of four contiguous residues that have an average P(H) < 1.00 is reached. That is declared the end of the helix. If the segment defined by this procedure is longer than 5 residues and the average P(H) > P(E) for that segment, the segment can be assigned as a helix. If the segment defined by this procedure is longer than 5 residues and the average P(H) > P(E) for that segment, the segment can be assigned as a helix. Repeat this procedure to locate all of the helical regions in the sequence. Repeat this procedure to locate all of the helical regions in the sequence.

27 TSPTAELMRSTG P(H)6977576914215112114598776957 TSPTAELMRSTG P(H)6977576914215112114598776957 Initiation Identify regions where 4/6 have a P(H) >1.00 “alpha-helix nucleus”

28 Propagation Extend helix in both directions until a set of four residues have an average P(H) <1.00. TSPTAELMRSTG P(H)6977576914215112114598776957 If the average P(H) > P(E) for that segment, the segment can be assigned as a helix. P(H)=107.5%>P(E)=85.9%

29 Prediction TSPTAELMRSTG P(H)6977576914215112114598776957 HHHHHHHH

30 Chou-FasmanChou-Fasman algorithm Chou-Fasman B-strand: B-strand: Scan through the peptide and identify a region where 3 out of 5 of the residues have a value of P(E)>1.00. That region is declared as a beta-sheet. Scan through the peptide and identify a region where 3 out of 5 of the residues have a value of P(E)>1.00. That region is declared as a beta-sheet. Extend the sheet in both directions until a set of four contiguous residues that have an average P(E) < 1.00 is reached. That is declared the end of the beta-sheet. Extend the sheet in both directions until a set of four contiguous residues that have an average P(E) < 1.00 is reached. That is declared the end of the beta-sheet. Any segment of the region located by this procedure is assigned as a beta-sheet if the average P(E)>1.05 and the average P(E)>P(H) for that region. Any segment of the region located by this procedure is assigned as a beta-sheet if the average P(E)>1.05 and the average P(E)>P(H) for that region. Any region containing overlapping alpha-helical and beta-sheet assignments are taken to be helical if the average P(H) > P(E) for that region. It is a beta sheet if the average P(E) > P(H) for that region. Any region containing overlapping alpha-helical and beta-sheet assignments are taken to be helical if the average P(H) > P(E) for that region. It is a beta sheet if the average P(E) > P(H) for that region.

31 Chou-FasmanChou-Fasman algorithm Chou-Fasman Beta-turn Beta-turn To identify a bend at residue number j, calculate the following value To identify a bend at residue number j, calculate the following value p(t) = f(j)f(j+1)f(j+2)f(j+3) If If (1) p(t) > 0.000075, (2) the average value for P(turn) > 1.00 in the tetrapeptide and (3) the averages for the tetrapeptide obey the inequality P(H) P(E), then a beta-turn is predicted at that location.

32 Exercise Predict the secondary structure of the following protein sequence: Predict the secondary structure of the following protein sequence: Ala Pro Ala Phe Ser Val Ser Leu Ala Ser Gly Ala 142 57 142 113 77 106 77 121 142 77 57 142 83 55 83 138 75 170 75 130 83 75 75 83 66 152 66 60 143 50 143 59 66 143 156 66

33 exercise Predict the secondary structure of the following protein sequence: Predict the secondary structure of the following protein sequence: Ala Pro Ala Phe Ser Val Ser Leu Ala Ser Gly Ala 142 57 142 113 77 106 77 121 142 77 57 142 H H H H H H 83 55 83 138 75 170 75 130 83 75 75 83 E E E E E E E E E E E E E E 66 152 66 60 143 50 143 59 66 143 156 66 T T T T T T T T T T H H E E E E E E T T T T

34 Prediction Methods Single sequence Examine single protein sequence Base prediction on Statistics – composition of amino acids Neural networks – patterns of amino acids Multiple sequence alignment First create MSA Use sequences from PSI-BLAST, CLUSTALW, etc… Align sequence with related proteins in family Predict secondary structure based on consensus/profile Generally improves prediction 8-9%

35 Accuracyaccuracy Statistical method (single sequence) Statistical method (single sequence) 1974 Chou & Fasman~50-53% 1978 Garnier 63% Statistical method (Multiple seqs) Statistical method (Multiple seqs) 1987 Zvelebil66% 1993 Yi & Lander68% Neural network Neural network 1988 Qian & Sejnowski1988 Qian & Sejnowski64.3% 1988 Qian & Sejnowski 1993 Rost & Sander1993 Rost & Sander70.8-72.0% 1993 Rost & Sander 1997 Frishman & Argos1997 Frishman & Argos<75% 1997 Frishman & Argos 1999 Cuff & Barton1999 Cuff & Barton72.9% 1999 Cuff & Barton 1999 Jones1999 Jones76.5% 1999 Jones 2000 Petersen et al.2000 Petersen et al.77.9% 2000 Petersen et al.

36 Neural network Input signals are summed and turned into zero or one J1J1 J2J2 J3J3 J4J4 Feed-forward multilayer network Input layerHidden layerOutput layer neurons

37 Enter sequences Compare Prediction to Reality Adjust Weights Neural network training

38 Neural net for secondary structure

39 Neural net for SS Prediction Jury decisions Use multiple neural networks & combine results Average output Majority decision

40 Neural net for SS Prediction JPRED [Cuff+ 1998] Finds consensus from PHD, PREDATOR, DSC, NNSSP, etc…


Download ppt "Protein Structure Databases Databases of three dimensional structures of proteins, where structure has been solved using X-ray crystallography or nuclear."

Similar presentations


Ads by Google