Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio Jude Shavlik

Similar presentations


Presentation on theme: "Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio Jude Shavlik"— Presentation transcript:

1 Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio dimaio@cs.wisc.edu Jude Shavlik shavlik@cs.wisc.edu George N. Phillips, Jr. phillips@biochem.wisc.edu ICML Bioinformatics Workshop 21 August 2003

2 Task Overview Given Electron density for a region in a protein Protein’s topology Find Atomic positions of individual atoms in the density map 

3 Pictorial Structures A pictorial structure is… a collection of image parts together with… a deformable conformation of these parts

4 Pictorial Structures Formally, a model consists of Set of parts V={v 1, …, v n } Configuration L=(l 1, …, l n ) Edges e ij  E, connect neighboring parts v i, v j – Explicit dependency between l i, l j – G = (V,E) forms a Markov Random Field Appearance parameters A i for each part Connection parameters C ij for each edge v3v3 v4v4 v5v5 v6v6 v1v1 v2v2 e 13 e 23 e 34 e 35 e 46 v4v4

5 Matching Algorithm Overview Want configuration L of model Θ maximizing P(L|I,Θ)  P(I|L,Θ) · P(L|Θ) P(I|L,Θ) = Π i P(I|l i,Θ) = 1 Z1Z1 e - Σ i match i (l i ) P(L|Θ) = Π (v i,v j )  E P(l i,l j |C ij ) = 1 Z2Z2 e - Σ (v i,v j )  E d ij (l i,l j ) Equivalent to minimizing Σ i match i (l i )+ Σ (v i,v j )  E d ij (l i,l j )

6 Linear-Time Matching Algorithm A Dynamic Programming implementation runs in quadratic time  Requires tree configuration of parts Felzenszwalb & Huttenlocher (2000) developed linear-time matching algorithm  Additional constraint on part-to-part cost function d ij  Basic “Trick”: Parallelize minimization computation over entire grid using a Generalized Distance Transform

7 Pictorial Structures for Map Interpretation Basic Idea: Build pictorial structure that is able to model all configurations of a molecule  Each part in “collection of parts” corresponds to an atom  Model has low-cost conformation for low-energy states of the molecule

8 The Screw-Joint Model Ideally, we would have cost function = atomic energy Problem: Impossible to represent atomic energy function using pairwise potentials while maintaining tree-structure Solution: screw-joint model  Ignore non-bonded interactions  Edges correspond to covalent bonds  Allow free rotation around bonds

9 Screw-Joint Model Details Each part’s configuration has six params (x,y,z,α,β,γ) with  (x,y,z) is part’s position  α is part’s rotation (about bond connecting v i and v j )  (β,γ) is part’s orientation vivi vjvj vivi vjvj (x ij,y ij,z ij ) (βi,γi)(βi,γi) (βj,γj)(βj,γj) (xi,yi,zi)(xi,yi,zi) (xj,yj,zj)(xj,yj,zj) αjαj αiαi  Part-to-part cost function d ij based on child’s deviation from ideal  Matching cost function match i based on 3x3x3 template match

10 Pictorial Structures for Map Interpretation  Ideally, we would … Build pictorial structure for the entire protein Run the matching algorithm to get best layout  However, computationally infeasible  Instead, we use two-phase algorithm that … a) computes best backbone trace b) computes best sidechain conformation (current focus)

11 Sidechain Refinement Assume we have a rough C α trace of the protein Next use pictorial structure matching to place sidechains Walk along chain one residue at a time, placing individual atoms C α, MET_80 C α, ARG_81 C α, ALA_82 C α, PRO_83

12 Sidechain Refinement Given:  residue type  approximate C α locations Find: most likely location for sidechain atoms in the residue Example Alanine N C -1 CαCα Cα -1 O -1 CCβCβ O Cα +1 N +1 O N N O Matching algorithm

13 Learning Model Parameters O N N O C CαCα N CβCβ Averaged 3D Template Averaged Bond Geometry Canonic Orientation N C -1 CαCα CCβ ON +1 Alanine C α C CαCα N CβCβ r = 1.53 θ = 0.0° φ = -19.3° r = 1.51 θ = 118.4° φ = -19.7°

14 Soft Maximums Sometimes we may get an optimal match like the one to the right When this occurs, explore the space of non-optimal solutions via soft maximums in DP Basic Idea: Take a path with probability inversely proportional to its cost ACTUAL PREDICTED 1

15 Soft Maximums Figure to the right shows soft maximums Red molecule eventually found Annealing increases “softness” until legal structure found Legal structure may not be “right” ACTUAL PREDICTED 1 PREDICTED 2

16 Results Only sidechain refinement implemented & tested Experimental Methodology  Assume C α ’s known to within 2Å  Trained on 1.7 Å resolution protein, tested on 1.9 Å resolution protein  Templates built for ALA, VAL, TYR, LYS Model Parameters  Grid spacing of 0.5 Å within diameter 10 Å sphere  Rotational discretization: 12 rotational steps 84 orientations

17 Sidechain Placement Compared predicted vs. actual location for 599 atoms on testset protein 29.9% atoms within 0.5Å 72.3% atoms within 1.0Å 93.0% atoms within 2.0Å Recall 0.5Å grid spacing

18 Predictive Accuracy Task We used DP matching score as a predictor of amino acid type Tested 49 ALA, LYS, TYR, VAL residues Highest scoring normalized template determined type 61.2% accuracy (majority classification = 33%)

19 The Good… PREDICTED ACTUAL PREDICTED vs. ACTUAL LYSINE VALINE TYROSINE

20 … and the Bad PREDICTEDACTUAL PREDICTED vs. ACTUAL LYSINE ALANINE TYROSINE VALINE

21 Future Work Implement & integrate backbone tracing algorithm, to create complete two-tiered solution Better strategies to handle illegal molecule configurations  perturbation of branches involved in collisions  more accurate representation of atomic energy function, e.g. torsion angle Better match function … make use of previous work? More tests (larger training set, higher resolution)

22 Acknowledgements NLM grant 1T15 LM007359-01 NLM grant 1R01 LM07050-01 NIH grant P50 GM64598.


Download ppt "Using Pictorial Structures to Identify Proteins in X-ray Crystallographic Electron Density Maps Frank DiMaio Jude Shavlik"

Similar presentations


Ads by Google