Dr. Thomas R. Ioerger Department of Computer Science

TEXTAL: Applications of Pattern Recognition to Macromolecular Crystallography
Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University Collaboration with: Dr. James C. Sacchettini, Center for Structural Biology, Texas A&M 7/2/2019

Automating Structure Determination
Typical Steps: obtain crystals collect data (e.g. MAD, at synchrotron) determine initial set of phases generate electron density map density modification/phase refinement construct model (atomic coordinates) 7/2/2019

Automating Structure Determination
Existing computational routines: heavy atom search, Patterson correlation, solvent flattening, maximum likelihood phase combination few methods to interpret electron density maps requires humans: potential bottleneck difficulty: low res., phase errors, weak density must automate for structural genomics and rational drug design 7/2/2019

Overview of TEXTAL Apply pattern recognition techniques
Exploit database of previously-solved maps Model molecular structures in local regions (e.g. spheres of 5 Angstrom radius) Intuitive principles: 1) Have I ever seen a region with a pattern of density like this before? 2) If so, what were previous local atomic coordinates? 7/2/2019

Overview (cont’d) Divide-and-Conquer:
1) identify alpha-carbon positions (chain-tracing) 2) model regions around alpha-carbons (CAs), including backbone and side-chain atoms 3) concatenate local models back together, resolve any conflicts Database contains many regions centered on CAs from previous maps ~5A radius right for “structural repetition” 7/2/2019

Overview (cont’d) Database: ~105 regions from ~100 maps
How to identify closest match (efficiently)??? Calculate numerical features that represent the pattern in each region Must be rotation-invariant Search can be very fast: just compare features 7/2/2019

Overview (cont’d) 7/2/2019

Database Construction
Ideally would use solved MAD/MIR maps Using “back-transformed” maps works well PDB  structure factors (include B-factors) keep reflections down to 2.8A Fourier transform  electron density map 50 proteins from PDBSelect (non-homol.) about 50,000 regions Feature extraction done offline 7/2/2019

Rotation-Invariant Features
Average density: m=(1/n)Sri, where ri is density at each lattice point in region Other Statistical Features: standard deviation, kurtosis… Distant to center of mass: <xc,yc,zc>=(1/n)< Sxiri/m,Syiri/m,Sziri/m> dcen=(xc2+ yc2+ zc2) 7/2/2019

More Features Moments of inertia
measures dispersion around axes of symmetry in a density distribution calculate 3x3 inertia matrix diagonalize to get eigenvalues sort from largest to smallest take magnitudes and ratios of moments 7/2/2019

More Features Spoke angles surface area of contours
if region centered on CA, should have 3 “spokes” of density emanating from center find best-fit vectors; calc. angles among them surface area of contours connectivity of density/bones in region other geometrical features... 7/2/2019

Details of Matching Process
Feature-based matching: Euclidean distance metric between feature vectors. dist(R1,R2)=Swi(Fi(R1)-Fi(R2))2 Must weight features by relevance less-relevant features add noise Slider algorithm: optimize weights by comparing features in matching regions versus mismatches Verify selections by density correlation requires search for optimal rotation 7/2/2019

Experiments Goal: evaluate potential of pattern-matching
Assumption: CA positions known Procedure 1. extract features for each region 2. collect top K=400 feature-based matches in DB 3. calculate density correlation, take best match 4. rotate backbone+sidechain atoms into position ~30sec/residue on SGI Origin 2000 7/2/2019

Feature Weights 7/2/2019

Results 1gcn = glucagon 1fnb = ferredoxin reductase
1tup = p53 tumor suppressor IFABP = intestinal fatty acid binding protein BT = back-transformed 7/2/2019

Results Structural similarity groups: Ala Asp, Asn, Leu Gly Glu, Gln
Pro Arg, Lys, Met Cys, Ser Phe, Trp, Tyr, His Ile, Val, Thr 7/2/2019

Results 7/2/2019

Example: Portion of 1tup
7/2/2019

Example: Glucagon 7/2/2019

Post-Processing Routines
Concatenate local models per a.a. into PDB Detect and repair flips by majority chain direction Utilize amino acid sequence information map chains into known sequence (alignment) re-lookup residues based on identity Real-space refinement 7/2/2019

CAPRA Need to find CAs automatically and accurately
Bones doesn’t identify CAs (except branches) Use pattern recognition again Extract features for all lattice points inside 1s contour, or along trace Use neural net to predict distance to true CA Training set: examples of {<F1,F2…>,Di} Status: currently 1A rms, need to get 7/2/2019

Example 7/2/2019

See our forthcoming paper in: Acta Cryst. D
Acknowledgements Dr. James C. Sacchettini Center for Structural Biology, Texas A&M Graduate students/post-docs: Dr. Jon Christopher, Tom Holton, Lydia Tapia Funding provided by: NIH (GM-59398) See our forthcoming paper in: Acta Cryst. D 7/2/2019

Dr. Thomas R. Ioerger Department of Computer Science

Similar presentations

Presentation on theme: "Dr. Thomas R. Ioerger Department of Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dr. Thomas R. Ioerger Department of Computer Science

Similar presentations

Presentation on theme: "Dr. Thomas R. Ioerger Department of Computer Science"— Presentation transcript:

Similar presentations

About project

Feedback