Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Thomas R. Ioerger Department of Computer Science

Similar presentations


Presentation on theme: "Dr. Thomas R. Ioerger Department of Computer Science"— Presentation transcript:

1 TEXTAL: Applications of Pattern Recognition to Macromolecular Crystallography
Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University Collaboration with: Dr. James C. Sacchettini, Center for Structural Biology, Texas A&M 7/2/2019

2 Automating Structure Determination
Typical Steps: obtain crystals collect data (e.g. MAD, at synchrotron) determine initial set of phases generate electron density map density modification/phase refinement construct model (atomic coordinates) 7/2/2019

3 Automating Structure Determination
Existing computational routines: heavy atom search, Patterson correlation, solvent flattening, maximum likelihood phase combination few methods to interpret electron density maps requires humans: potential bottleneck difficulty: low res., phase errors, weak density must automate for structural genomics and rational drug design 7/2/2019

4 Overview of TEXTAL Apply pattern recognition techniques
Exploit database of previously-solved maps Model molecular structures in local regions (e.g. spheres of 5 Angstrom radius) Intuitive principles: 1) Have I ever seen a region with a pattern of density like this before? 2) If so, what were previous local atomic coordinates? 7/2/2019

5 Overview (cont’d) Divide-and-Conquer:
1) identify alpha-carbon positions (chain-tracing) 2) model regions around alpha-carbons (CAs), including backbone and side-chain atoms 3) concatenate local models back together, resolve any conflicts Database contains many regions centered on CAs from previous maps ~5A radius right for “structural repetition” 7/2/2019

6 Overview (cont’d) Database: ~105 regions from ~100 maps
How to identify closest match (efficiently)??? Calculate numerical features that represent the pattern in each region Must be rotation-invariant Search can be very fast: just compare features 7/2/2019

7 Overview (cont’d) 7/2/2019

8 Database Construction
Ideally would use solved MAD/MIR maps Using “back-transformed” maps works well PDB  structure factors (include B-factors) keep reflections down to 2.8A Fourier transform  electron density map 50 proteins from PDBSelect (non-homol.) about 50,000 regions Feature extraction done offline 7/2/2019

9 Rotation-Invariant Features
Average density: m=(1/n)Sri, where ri is density at each lattice point in region Other Statistical Features: standard deviation, kurtosis… Distant to center of mass: <xc,yc,zc>=(1/n)< Sxiri/m,Syiri/m,Sziri/m> dcen=(xc2+ yc2+ zc2) 7/2/2019

10 More Features Moments of inertia
measures dispersion around axes of symmetry in a density distribution calculate 3x3 inertia matrix diagonalize to get eigenvalues sort from largest to smallest take magnitudes and ratios of moments 7/2/2019

11 More Features Spoke angles surface area of contours
if region centered on CA, should have 3 “spokes” of density emanating from center find best-fit vectors; calc. angles among them surface area of contours connectivity of density/bones in region other geometrical features... 7/2/2019

12 Details of Matching Process
Feature-based matching: Euclidean distance metric between feature vectors. dist(R1,R2)=Swi(Fi(R1)-Fi(R2))2 Must weight features by relevance less-relevant features add noise Slider algorithm: optimize weights by comparing features in matching regions versus mismatches Verify selections by density correlation requires search for optimal rotation 7/2/2019

13 Experiments Goal: evaluate potential of pattern-matching
Assumption: CA positions known Procedure 1. extract features for each region 2. collect top K=400 feature-based matches in DB 3. calculate density correlation, take best match 4. rotate backbone+sidechain atoms into position ~30sec/residue on SGI Origin 2000 7/2/2019

14 Feature Weights 7/2/2019

15 Results 1gcn = glucagon 1fnb = ferredoxin reductase
1tup = p53 tumor suppressor IFABP = intestinal fatty acid binding protein BT = back-transformed 7/2/2019

16 Results Structural similarity groups: Ala Asp, Asn, Leu Gly Glu, Gln
Pro Arg, Lys, Met Cys, Ser Phe, Trp, Tyr, His Ile, Val, Thr 7/2/2019

17 Results 7/2/2019

18 Example: Portion of 1tup
7/2/2019

19 Example: Glucagon 7/2/2019

20 Post-Processing Routines
Concatenate local models per a.a. into PDB Detect and repair flips by majority chain direction Utilize amino acid sequence information map chains into known sequence (alignment) re-lookup residues based on identity Real-space refinement 7/2/2019

21 CAPRA Need to find CAs automatically and accurately
Bones doesn’t identify CAs (except branches) Use pattern recognition again Extract features for all lattice points inside 1s contour, or along trace Use neural net to predict distance to true CA Training set: examples of {<F1,F2…>,Di} Status: currently 1A rms, need to get 7/2/2019

22 Example 7/2/2019

23 See our forthcoming paper in: Acta Cryst. D
Acknowledgements Dr. James C. Sacchettini Center for Structural Biology, Texas A&M Graduate students/post-docs: Dr. Jon Christopher, Tom Holton, Lydia Tapia Funding provided by: NIH (GM-59398) See our forthcoming paper in: Acta Cryst. D 7/2/2019


Download ppt "Dr. Thomas R. Ioerger Department of Computer Science"

Similar presentations


Ads by Google