Presentation is loading. Please wait.

Presentation is loading. Please wait.

CAP5510 – Bioinformatics Protein Structures

Similar presentations


Presentation on theme: "CAP5510 – Bioinformatics Protein Structures"— Presentation transcript:

1 CAP5510 – Bioinformatics Protein Structures
Tamer Kahveci CISE Department University of Florida

2 What and Why? Proteins fold into a three dimensional shape
Structure can reveal functional information that we can not find from sequence Misfolding proteins can cause diseases Sickle cell anemia, mad cow disease Used in drug design Hemoglobin Normal v.s. sickled blood cells E → V HIV protease inhibitor

3 Goals Understand protein structures Learn how protein shapes are
Primary, secondary, tertiary Learn how protein shapes are determined Predicted Structure comparison (?)

4 A Protein Sequence >gi| |ref|NP_ | unknown protein; protein id: At1g [Arabidopsis thaliana] MPSESSYKVHRPAKSGGSRRDSSPDSIIFTPESNLSLFSSASVSVDRCSSTSDAHDRDDSLISAWKEEFEVKKDDESQNL DSARSSFSVALRECQERRSRSEALAKKLDYQRTVSLDLSNVTSTSPRVVNVKRASVSTNKSSVFPSPGTPTYLHSMQKGW SSERVPLRSNGGRSPPNAGFLPLYSGRTVPSKWEDAERWIVSPLAKEGAARTSFGASHERRPKAKSGPLGPPGFAYYSLY SPAVPMVHGGNMGGLTASSPFSAGVLPETVSSRGSTTAAFPQRIDPSMARSVSIHGCSETLASSSQDDIHESMKDAATDA QAVSRRDMATQMSPEGSIRFSPERQCSFSPSSPSPLPISELLNAHSNRAEVKDLQVDEKVTVTRWSKKHRGLYHGNGSKM

5 Amino Acid Composition
Basic Amino Acid Structure: The side chain, R, varies for each of the 20 amino acids Side chain C R C H N O OH Amino group Carboxyl group

6 The Peptide Bond O O Dehydration synthesis
Repeating backbone: N–C –C –N–C –C Convention – start at amino terminus and proceed to carboxy terminus O O

7 Peptidyl polymers A few amino acids in a chain are called a polypeptide. A protein is usually composed of 50 to 400+ amino acids. We call the units of a protein amino acid residues. amide nitrogen carbonyl carbon

8 Side chain properties Carbon does not make hydrogen bonds with water easily – hydrophobic O and N are generally more likely than C to h-bond to water – hydrophilic We group the amino acids into three general groups: Hydrophobic Charged (positive/basic & negative/acidic) Polar

9 The Hydrophobic Amino Acids

10 The Charged Amino Acids

11 The Polar Amino Acids

12 More Polar Amino Acids And then there’s…

13 Planarity of the Peptide Bond
Phi () – the angle of rotation about the N-C bond. Psi () – the angle of rotation about the C-C bond. The planar bond angles and bond lengths are fixed.

14 Primary & Secondary Structure
Primary structure = the linear sequence of amino acids comprising a protein: AGVGTVPMTAYGNDIQYYGQVT… Secondary structure Regular patterns of hydrogen bonding in proteins result in two patterns that emerge in nearly every protein structure known: the -helix and the -sheet The location of direction of these periodic, repeating structures is known as the secondary structure of the protein

15 The Alpha Helix     60°

16 Properties of the Alpha Helix
    60° Hydrogen bonds between C=O of residue n, and NH of residue n+4 3.6 residues/turn 1.5 Å/residue rise 100°/residue turn

17 Properties of -helices
4 – 40+ residues in length Often amphipathic or “dual-natured” Half hydrophobic and half hydrophilic If we examine many -helices, we find trends… Helix formers: Ala, Glu, Leu, Met Helix breakers: Pro, Gly, Tyr, Ser

18 The beta strand (& sheet)
   135°   +135°

19 Properties of beta sheets
Formed of stretches of 5-10 residues in extended conformation Parallel/aniparallel, contiguous/non-contiguous

20 Anti-Parallel Beta Sheets

21 Parallel Beta Sheets

22 Mixed Beta Sheets

23 Turns and Loops Secondary structure elements are connected by regions of turns and loops Turns – short regions of non-, non- conformation Loops – larger stretches with no secondary structure. Sequences vary much more than secondary structure regions

24 Ramachandran Plot

25 Levels of Protein Structure
Secondary structure elements combine to form tertiary structure Quaternary structure occurs in multienzyme complexes

26 Protein Structure Example
Beta Sheet Helix Loop ID: 12as 2 chains

27 Views of a Protein Wireframe Ball and stick

28 Views of a protein Spacefill Cartoon CPK colors
Carbon = green, black, or grey Nitrogen = blue Oxygen = red Sulfur = yellow Hydrogen = white

29 Common Protein Motifs

30 Mostly Helical Folding Motifs
Four helical bundle: Globin domain:

31 / Motifs / barrel:

32 Open Twisted Beta Sheets

33 Beta Barrels

34 Determining the Structure of a Protein
Experimental Methods X-ray NMR As of August 2013, structure of > 85,000 proteins are determined

35 X-Ray Crystallography
Discovery of X-rays (Wilhelm Conrad Röntgen, 1895) Crystals diffract X-rays in regular patterns (Max Von Laue, 1912) The first X-ray diffraction pattern from a protein crystal (Dorothy Hodgkin, 1934)

36 X-Ray Crystallography
Grow millions of protein crystals Takes months Expose to radiation beam Analyze the image with computer Average over many copies of images PDB Not all proteins can be crystallized!

37 NMR Nuclear Magnetic Resonance
Nuclei of atoms vibrate when exposed to oscillating magnetic field Detect vibrations by external sensors Computes inter-atomic distances. Requires complex analysis. NMR can be used for short sequences (<200 residues) More than one model can be derived from NMR.

38 Determining the Structure of a Protein
Computational Methods

39 The Protein Folding Problem
Central question of molecular biology: “Given a particular sequence of amino acid residues (primary structure), what will the secondary/tertiary/quaternary structure of the resulting protein be?” Input: AAVIKYGCAL… Output: 11, 22…

40 Structure v.s. Sequence Observation: A protein with the same sequence (under the same circumstances) yields the same shape. Protein folds into a shape that minimizes the energy needed to stay in that shape. Protein folds in ~10-15 seconds.

41 Secondary Structure Prediction

42 Chou-Fasman methods Uses statistically obtained Chou-Fasman parameters. For each amino acid has P(a): alpha P(b): beta P(t): turn f(): additional turn parameter.

43 Chou-Fasman Parameters

44 C.-F. Alpha Helix Prediction (1)
M Q S Y V 142 151 83 121 70 145 111 77 69 106 37 119 130 105 110 75 147 170 P(a) P(b) Find P(a) for all letters Find 6 contiguous letters, at least 4 of them have P(a) > 100 Declare these regions as alpha helix

45 C.-F. Alpha Helix Prediction (2)
M Q S Y V 142 151 83 121 70 145 111 77 69 106 37 119 130 105 110 75 147 170 P(a) P(b) Extend in both directions until 4 consecutive letters with P(a) < 100 found

46 C.-F. Alpha Helix Prediction (3)
M Q S Y V 142 151 83 121 70 145 111 77 69 106 37 119 130 105 110 75 147 170 P(a) P(b) Find sum of P(a) (Sa) and sum of P(b) (Sb) in the extended region If region is long enough ( >= 5 letters) and P(a) > P(b) then declare the extended region as alpha helix

47 C.-F. Beta Sheet Prediction
Same as alpha helix replace P(a) with P(b) Resolving overlapping alpha helix & beta sheet Compute sum of P(a) (Sa) and sum of P(b) (Sb) in the overlap. If Sa > Sb => alpha helix If Sb > Sa => beta sheet

48 C.-F. Turn Prediction A E T L C M Q S Y V 142 151 83 121 70 145 111 77 69 106 37 119 130 105 110 75 147 170 66 74 96 59 60 98 143 114 50 i i+1 i+2 i+3 P(a) P(b) P(t) f() An amino acid is predicted as turn if all of the following holds: f(i)*f(i+1)*f(i+2)*f(i+3) > Avg(P(i+k)) > 100, for k=0, 1, 2, 3 Sum(P(t)) > Sum(P(a)) and Sum(P(b)) for i+k, (k=0, 1, 2, 3)

49 Other Methods for SSE Prediction
Similarity searching Predator Markov chain Neural networks PHD ~65% to 80% accuracy

50 Tertiary Structure Prediction

51 Forces driving protein folding
It is believed that hydrophobic collapse is a key driving force for protein folding Hydrophobic core Polar surface interacting with solvent Minimum volume (no cavities) Disulfide bond formation stabilizes Hydrogen bonds Polar and electrostatic interactions

52 Fold Optimization Simple lattice models (HP-models or Hydrophobic-Polar models) Two types of residues: hydrophobic and polar 2-D or 3-D lattice The only force is hydrophobic collapse Score = number of HH contacts

53 Scoring Lattice Models
H/P model scoring: count noncovalent hydrophobic interactions. Sometimes: Penalize for buried polar or surface hydrophobic residues

54 Can we use lattice models?
For smaller polypeptides, exhaustive search can be used Looking at the “best” fold, even in such a simple model, can teach us interesting things about the protein folding process For larger chains, other optimization and search methods must be used Greedy, branch and bound Evolutionary computing, simulated annealing

55 The “hydrophobic zipper” effect
Ken Dill ~ 1997

56 Representing a lattice model
Absolute directions UURRDLDRRU Relative directions LFRFRRLLFFL Advantage, we can’t have UD or RL in absolute Only three directions: LRF What about bumps? LFRRR Bad score Use a better representation

57 Preference-order representation
Each position has two “preferences” If it can’t have either of the two, it will take the “least favorite” path if possible Example: {LR},{FL},{RL}, {FR},{RL},{RL},{FL},{RF} Can still cause bumps: {LF},{FR},{RL},{FL}, {RL},{FL},{RF},{RL}, {FL}

58 More realistic models Higher resolution lattices (45° lattice, etc.)
Off-lattice models Local moves Optimization/search methods and / representations Greedy search Branch and bound EC, Monte Carlo, simulated annealing, etc.

59 How to Evaluate the Result?
Now that we have a more realistic off-lattice model, we need a better energy function to evaluate a conformation (fold). Theoretical force field: G = Gvan der Waals + Gh-bonds + Gsolvent + Gcoulomb Empirical force fields Start with a database Look at neighboring residues – similar to known protein folds?

60 Comparative Modeling Identify similar protein sequences from a database of known proteins (BLAST) Find conserved regions by aligning these proteins (CLUSTAL-W) Predict alpha helices and beta sheets from conserved regions, backbone Predict loops Predict side chain positions Evaluate

61 Threading: Fold recognition
Given: Sequence: IVACIVSTEYDVMKAAR… A database of molecular coordinates Map the sequence onto each fold Evaluate Objective 1: improve scoring function Objective 2: folding

62 Folding : still a hard problem
Levinthal’s paradox – Consider a 100 residue protein. If each residue can take only 3 positions, there are 3100 = 5  1047 possible conformations. If it takes 10-13s to convert from 1 structure to another, exhaustive search would take 1.6  1027 years.

63 Protein Classification
Class: Similar secondary structure properties All alpha, all beta, alpha/beta, alpha+beta Fold: major secondary structure similarity. Globin like (6 helices, folded leaf, partly opened) Super family: distant homologs % sequence identity. Family: close homologs. Evolved from the same ancestor. High identity.


Download ppt "CAP5510 – Bioinformatics Protein Structures"

Similar presentations


Ads by Google