Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Structure Prediction Graham Wood Charlotte Deane.

Similar presentations


Presentation on theme: "Protein Structure Prediction Graham Wood Charlotte Deane."— Presentation transcript:

1 Protein Structure Prediction Graham Wood Charlotte Deane

2 The problem - in brief MVLSEGEWQL VLHVWAKVEA DVAGHGQDIL … AKYKELCYOG Databases Algorithms Software +=

3 Why is protein structure prediction needed? Essential functioning of cells is mediated by proteins It is protein structure that leads to protein function 3D structure determination is expensive, slow and difficult (by X-ray crystallography or NMR) Assists in the engineering of new proteins

4 Terminology Target - the unknown structure you are trying to model Parent - a known structure which provides a basis for modelling

5 The problem- more detail Configuration space Energy EKGPDLYLIPLT Protein databases EKGPDLYLIPLT Biologist Physicist

6 CASP Critical Assessment of Structure Prediction Jan-Apr May Jun Jul Aug Sept Oct Nov Dec Biologists Caspers Organisers Call for structures Publish seqs on web Give sequences to organisers Structure determination Give structures to organisers Predict structure from sequence Expert assessment 4 day mtg

7 Degree of evolutionary conservation Less conserved Information poor More conserved Information rich DNA seqProtein SeqStructureFunction ACAGTTACAC CGGCTATGTA CTATACTTTG HDSFKLPVMS KFDWEMFKPC GKFLDSGKLG

8 Three main approaches (in order of current success) 1.Comparative modelling 2.Fold recognition 3.De novo

9 Comparative modelling Conserved backbone Energy EKGPDLYLIPLT Target Close homologues Variable backbone Side chains

10 Comparative modelling (protein building) 1.Prepare the raw materials 2.Build the model (two methods) 3.Check the model 4.Accept or reject the model

11 C1: Preparing the raw materials Structurally align parents Align target to parents EKGPDLYLIPLT Given target AA sequence Identify parents (homologues)

12 loop region secondary structure region Structurally conserved regions and structurally variable regions SCR SVR

13 C2: Building (choice of two methods) Attach and orient side-chains Refine model Determine SCRs and build associated backbone Determine SVRs and build rest of backbone Assemble fragmentsUse spatial restraints

14

15 C2: Building (choice of two methods) Orient side-chains Refine model Determine SCRs and build associated backbone Determine SVRs and build rest of backbone Assemble fragmentsUse spatial restraints Optimally satisfy spatial restraints

16 D T N V A Y C N K D

17 C3: Test model (C4: then accept or reject) Examine the model in the light of all experimental data PROCHECK, VERIFY3D, PROSA II, Visual inspection using 3D software, JOY

18 Problems in comparative modelling Aligning the target to the parents The packing of secondary structure elements in the core The long insertions and deletions in the structurally variable regions

19 Fold Recognition ? Target

20 Fold recognition Energy EKGPDLYLIPLT Target Structurally similar proteins

21 Fold recognition (protein finding) 1.Obtain library of non-duplicate folds 2.Perform sequence-structure alignment 3.Assess success of alignment Biologist – use substitution matrix Physicist – use potentials 4.Accept or reject the model

22 Sequence-structure alignment 1. Construct sequence profile 2. Use profile to score the sequence TargetParent BLASTP OWLMULTAL Dynamic programming algorithm Score

23 Amino acid substitutions are constrained by local environments Different substitution patterns Environment-specific substitution tables

24 Main-chain conformation and secondary structure (α-helix, β-strand, coil and positive φ) Solvent accessibility (accessible and inaccessible) Hydrogen bonds (side-chain to main-chain NH, side-chain to main-chain CO and side-chain to side-chain) Definition of local environments

25 Substitution scores Background probability of observing amino acid b, match occurring by chance Log odds score scaled to the nearest integer Probability that amino acid a in environment E is replaced by amino acid b Frequency of observing amino acid a in environment E replaced by b

26 Scoring with potentials Energy potential Solvation potential

27 The Novel Fold Problem ? asdghklprtwecvmnasetyasdghklprtwecvmnasety

28 De novo – new fold methods Energy EKGPDLYLIPLT Segment configurations Sets of local configurations

29 Defining a “New Fold” CATH –Somewhat objective SCOP –No objective definition –Tends towards evolutionary relationships Ask A. Murzin

30 New fold approach All structure information is in the AA sequence (Anfinson, Science, 1973) Seek “lowest free energy conformation” Tactic is to simplify the problem, for example Simplified model of protein (one atom per residue) Simple or knowledge based potential function Assist in detecting distant homologues

31 New fold recognition (structure discovery) 1.Set up domain and objective function 2.Perform optimisation 3.Check the model 4.Accept or reject the model

32 De Novo (biologist) ROSETTA (Baker et al.) Domain of objective function sequence 9 residues... Set of local structures consistent with local sequence

33 De Novo (biologist) ROSETTA Objective function to be maximised constant Function of energy

34 De Novo (biologist) ROSETTA Maximising the probability of the sequence 1.Choose each local conformation and start with a fully extended chain 2.Generate a neighbouring conformation 3.Accept in simulated annealing style, using P(structure|sequence) 4.Do this many times and cluster results – use centre of largest cluster as prediction

35

36 De Novo (physicist) ASTROFOLD (Floudas et al.) 1.Predict α-helices and β-strands 2.Predict β-sheets and disulphide bridges using ILP 3.Use deterministic global optimisation, with energy function and constraints to predict tertiary structure

37 Testing of prediction servers - LiveBench SensitivitySpecificityAdded Value ServerTypeEasyHardAllHardEasyHard Pcons2Consensus642233 ShotGun on 5Consensus124475 ShotGun on 3Consensus211122 Shotgun-INBGUThreading333341 INBGUThreading756956 Fugue3Threading14898159 Fugue2Threading12787108 Fugue1Threading1714 111615 mGenTHREADERThreading8111613611 GenTHREADERThreading13121715813 3D-PSSMThreading51012 10 ORFeusSequence467614 FFASSequence995597 Sam-T99Sequence101513161116 SuperfamilySequence151311101712 ORF-BLASTBLAST11161014 PDB-BLASTBLAST161715171317 BLAST 18

38 Review - comparative modelling Conserved backbone Energy EKGPDLYLIPLT Target Close homologues Variable backbone Side chains

39 Review - fold recognition Energy EKGPDLYLIPLT Target Structurally similar proteins

40 Review - new fold methods Energy EKGPDLYLIPLT Segment configurations Sets of local configurations

41 Summary: Prediction Methods Comparative modelling –There exists a protein with clear homology –PSI-BLAST Fold recognition –There exists a protein of similar fold (analogy) –DALI (CATH & SCOP) Novel Fold methods –The sequence has a new fold Better methods needed yet for it all to be useful!


Download ppt "Protein Structure Prediction Graham Wood Charlotte Deane."

Similar presentations


Ads by Google