Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department

Similar presentations


Presentation on theme: "Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department"— Presentation transcript:

1 Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department dimaio@cs.wisc.edu

2 3D Protein Structure backbone sidechain backbone sidechain C-alpha

3 3D Protein Structure ALA LEU PRO VAL ARG …… ???

4 High-Throughput Structure Determination Protein-structure determination important Understanding function of a protein Understanding mechanisms Targets for drug design Some proteins produce poor density maps Interpreting poor electron-density maps is very (human) laborious I aim to automatically interpret poor-quality electron-density maps

5 Electron-Density Map Interpretation … … GIVEN: 3D electron-density map, (linear) amino-acid sequence

6 Electron-Density Map Interpretation … … FIND: All-atom Protein Model

7 My focus Density Map Resolution Morris et al. (2003) Ioerger et al. (2002) Terwilliger (2003) 2.0Å3.0Å4.0Å1.0Å

8 Thesis Contributions A probabilistic approach to protein-backbone tracing DiMaio et al., Intelligent Systems for Molecular Biology (2006) Improved template matching in electron-density maps DiMaio et al., IEEE Conference on Bioinformatics and Biomedicine (2007) Creating all-atom protein models using particle filtering DiMaio et al. (under review) Pictorial structures for atom-level molecular modeling DiMaio et al., Advances in Neural Information Processing Systems (2004) Improving the efficiency of belief propagation DiMaio and Shavlik, IEEE International Conference on Data Mining (2006) Iterative phase improvement in ACMI

9 A CMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

10 A CMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

11 5-mer Lookup …SAW C VKFEKPADKNGKTE… Protein DB A CMI searches map for each template independently Spherical-harmonic decomposition allows rapid search of all template rotations

12 Spherical-Harmonic Decomposition f (θ,φ)

13 5-mer Fast Rotation Search pentapeptide fragment from PDB (the “template”) electron density map calculated (expected) density in 5A sphere map-region sampled in spherical shells template-density sampled in spherical shells sampled region of density in 5A sphere

14 5-mer Fast Rotation Search map-region sampled in spherical shells template-density sampled in spherical shells template spherical- harmonic coefficients map-region spherical- harmonic coefficients correlation coefficient as function of rotation fast-rotation function (Navaza 2006, Risbo 1996)

15 Convert Scores to Probabilities correlation coefficients over density map t i (u i ) scan density map for fragment probability distribution over density map P(5-mer at u i | EDM) Bayes’ rule

16 A CMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

17 Probabilistic Backbone Model Trace assigns a position and orientation u i ={x i, q i } to each amino acid i The probability of a trace U = {u i } is This full joint probability intractable to compute Approximate using pairwise Markov field

18 Pairwise Markov-Field Model Joint probabilities defined on a graph as product of vertex and edge potentials GLYLYSLEUSERALA

19 ACMI’s Backbone Model Observational potentials tie the map to the model LEUSERGLYLYSALA

20 GLYLYSLEUSERALA ACMI’s Backbone Model Adjacency constraints ensure adjacent amino acids are ~3.8 Å apart and in proper orientation Occupancy constraints ensure nonadjacent amino acids do not occupy same 3D space

21 Backbone Model Potential

22 Constraints between adjacent amino acids × =

23 Backbone Model Potential Constraints between all other amino acid pairs

24 Backbone Model Potential Observational (“template-matching”) probabilities

25 Inferring Backbone Locations Want to find backbone layout that maximizes

26 Inferring Backbone Locations Exact methods are intractable Use belief propagation (Pearl 1988) to approximate marginal distributions Want to find backbone layout that maximizes

27 Belief Propagation Example LYS 31 LEU 32 m LYS31→LEU32 p LEU32 p LYS31 ˆ ˆ

28 Belief Propagation Example LYS 31 LEU 32 m LEU32→LYS31 p LEU32 p LYS31 ˆ ˆ

29 Naïve implementation O(N 2 G 2 ) N = the number of amino acids in the protein G = # of points in discretized density map O(G 2 ) computation for each message passed O(G log G) as Fourier-space multiplication O(N 2 ) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator Improved implementation O(NG log G) Scaling BP to Proteins (DiMaio and Shavlik, ICDM 2006)

30 Naïve implementation O(N 2 G 2 ) N = the number of amino acids in the protein G = # of points in discretized density map O(G 2 ) computation for each message passed O(G log G) as Fourier-space multiplication O(N 2 ) messages computed & stored Approx (N-3) occupancy msgs with 1 message O(N) messages using a message accumulator Improved implementation O(NG log G) Scaling BP to Proteins (DiMaio and Shavlik, ICDM 2006)

31 To pass a message Occupancy Message Approximation occupancy edge potential product of incoming msgs to i except from j

32 To pass a message Occupancy Message Approximation occupancy edge potential product of all incoming msgs to i “Weak” potentials between nonadjacent amino acids lets us approximate

33 15 6 2 3 4 Occupancy Message Approximation

34 15 6 2 3 4

35 15 6 2 3 4 Send outgoing occupancy message product to a central accumulator ACC

36 15 6 2 3 4 Occupancy Message Approximation ACC Then, each node’s incoming message product is computed in constant time

37 BP Output After some number of iterations, BP gives probability distributions over Cα locations ALA LEU PRO VAL ARG …… ………

38 A CMI ’s Backbone Trace Independently choose Cα locations that maximize approximate marginal distribution … …

39 Example: 1XRI HIGH LOW 0.1 0.9 0.9009Å RMSd 93% complete prob(AA at location) 3.3Å resolution density map 39° mean phase error

40 Testset Density Maps (raw data) Density-map resolution (Å) Density-map mean phase error (deg.) 15 30 45 60 75 1.02.03.04.0

41 0 20 40 60 80 100 Experimental Accuracy % Cα’s located within 2Å of some Cα / correct Cα ACMIARP/ wARP TextalResolve % backbone correctly placed % amino acids correctly identified

42 Experimental Accuracy on a Per-Protein Basis ACMI % Cα’s located ARP/wARP % Cα’s located Resolve % Cα’s located Textal % Cα’s located 0 20 40 60 80 100 020406080100 0 20 40 60 80 100 020406080100 0 20 40 60 80 100 020406080100

43 A CMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories

44 Problems with A CMI Biologists want location of all atoms All C α ’s lie on a discrete grid Maximum-marginal backbone model may be physically unrealistic Ignoring a lot of information Multiple models may better represent conformational variation within crystal Probability=0.4Probability=0.35Probability=0.25 Maximum- marginal structure

45 A CMI with Particle Filtering (A CMI -PF) Idea: Represent protein using a set of static 3D all-atom protein models

46 Particle Filtering Overview (Doucet et al. 2000) Given some Markov process x 1:K  X with observations y 1:K  Y Particle Filtering approximates some posterior probability distribution over X using a set of N weighted point estimates

47 Particle Filtering Overview Markov process gives recursive formulation Use importance fn. q(x k |x 0:k-1,y k ) to grow particles Recursive weight update,

48 Particle Filtering for Protein Structures Particle refers to one specific 3D layout of some subsequence of the protein At each iteration advance particle’s trajectory by placing an additional amino-acid’s atoms

49 Particle Filtering for Protein Structures Alternate extending chain left and right

50 Particle Filtering for Protein Structures Alternate extending chain left and right An iteration alternately places C α position b k+1 given b k All sidechain atoms s k given b k-1:k+1 bkbk b k+1 sksk b k-1

51 Particle Filtering for Protein Structures Key idea: Use the conditional distribution p(b k |b i k-1,Map) to advance particle trajectories Construct this conditional distribution from BP’s marginal distributions bkbk b k+1 sksk b k-1

52 Algorithm place “seeds” b k i for each particle i=1…N while amino-acids remain place b k i +1 / b j i -1 given b j:k i for each i=1…N place s k i given b k i -1:k+1 for each i=1…N optionally resample N particles end while Particle Filtering for Protein Structures bkbk b k-1 b k+1 sksk … …

53 Backbone Step (for particle i ) (1) Sample L b k+1 ’s from b k-1 –b k –b k+1 pseudoangle distribution b k b k+1 1…L b k-1 place b k i +1 given b k i for each i=1…N

54 Backbone Step (for particle i ) p k+1 (b ) k+1k+1 1 k+1k+1 2 k+1k+1 L … b k b k-1 (2) Weight each sample by its ACMI-computed approximate marginal place b k i +1 given b k i for each i=1…N b k+1 1…L

55 Backbone Step (for particle i ) p k+1 (b ) k+1 1 p k+1 (b ) k+1 2 p k+1 (b ) k+1 L … b k b k-1 (3) Select b k+1 with probability proportional to sample weight place b k i +1 given b k i for each i=1…N b k+1 1…L

56 Backbone Step (for particle i ) b k-1 b k b k+1 (4) Update particle weight as sum of sample weights place b k i +1 given b k i for each i=1…N

57 Sidechain Step (for particle i ) place s k i given b k i -1:k+1 for each i=1…N (1) Sample s k from a database of sidechain conformations Protein Data Bank

58 Sidechain Step (for particle i ) p k (EDM | s ) k 1 k 2 k 3 (2) For each sidechain conformation, compute probability of density map given the sidechain place s k i given b k i -1:k+1 for each i=1…N

59 Sidechain Step (for particle i ) p k (EDM | s ) k 1 k 3 k 2 (3) Select sidechain conformation from this weighted distribution place s k i given b k i -1:k+1 for each i=1…N

60 Sidechain Step (for particle i ) (4) Update particle weight as sum of sample weights place s k i given b k i -1:k+1 for each i=1…N

61 Particle Resampling wt = 0.1 wt = 0.4 wt = 0.3 wt = 0.1 wt = 0.2 wt = 0.1 wt = 0.4 wt = 0.3 wt = 0.1

62 Amino-Acid Sampling Order Begin at some amino acid k with probability At each step, move left to right with probability j k

63 Experimental Methodology Run ACMI-PF 10 times with 100 particles each Return highest-weight particle from each run Each run samples amino-acids in a different order Refine each structure for 10 iterations in Refmac5 Compare 10-structure model to others using R free

64 A CMI -PF Versus A CMI -Naïve Refined R free Number of ACMI-PF runs Additionally, ACMI-PF’s models have … Fewer gaps (10 vs. 28) Lower sidechain RMS error (2.1Å vs. 2.3Å)

65 A CMI -PF Versus Others ACMI-PF R free ARP/wARP R free Resolve R free Textal R free 0.25 0.35 0.45 0.55 0.65 0.250.350.450.550.65 0.25 0.35 0.45 0.55 0.65 0.250.350.450.550.65 0.25 0.35 0.45 0.55 0.65 0.250.350.450.550.65

66 A CMI -PF Example: 2A3Q 1.79Å RMSd 92% complete 2.3Å resolution 66° phase err.

67 A CMI Overview Phase 1: Local pentapeptide search (ISMB 2006, BIBM 2007) Independent amino-acid search Templates model 5-mer conformational space Phase 2: Coarse backbone model (ISMB 2006, ICDM 2006) Protein structural constraints refine local search Markov field (MRF) models pairwise constraints Phase 3: Sample all-atom models Particle filtering samples high-prob. structures Probs. from MRF guide particle trajectories Phase 4: Iterative phase improvement Use particle-filtering models to improve density-map quality Rerun entire pipeline on improved density map Repeat until convergence

68 Phase Problem Intensities Phases Measured by X-ray crystallography Experimentally estimated (e.g. MAD, MIR)

69 Density-Map Phasing 30°60°75°0° mean phase error

70 Iterative Phase Improvement Predicted 3D model Initial density map Revised density map

71 A CMI -PF’s Phase Improvement Error in initial phases (deg. mean phase error) Error in ACMI-PF’s phases (deg. mean phase error) 0 15 30 45 60 75 01530456075

72 Two-Iteration A CMI % backbone located Iteration 1 % backbone located Iteration 2 50 60 70 80 90 100 5060708090100

73 Future Work: Many-iteration A CMI 0 10 20 30 40 50 60 01234 0 5 10 15 20 12345 Number of ACMI iterations Average % uninterpreted AAs Average mean phase error 

74 Conclusions ACMI’s three steps construct a set of all-atom protein models from a density map Novel message approximation allows inference on large, highly-connected models Resulting protein models are more accurate than other methods

75 Ongoing and Future Work Incorporate additional structural biology background knowledge Incorporate more complex potential functions Further work on iterative phase improvement Generalize my algorithms to other 3D image data

76 Acknowledgements Advisor Jude Shavlik Committee George Phillips Charles Dyer David Page Mark Craven Collaborators Ameet Soni Dmitry Kondrashov Eduard Bitto Craig Bingman 6th floor MSCers Center for Eukaryotic Structural Genomics Funding UW-Madison Graduate School NLM 1T15 LM007359 NLM 1R01 LM008796


Download ppt "Probabilistic Methods for Interpreting Electron-Density Maps Frank DiMaio University of Wisconsin – Madison Computer Sciences Department"

Similar presentations


Ads by Google