The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University.

The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University Collaboration with: Dr. James C. Sacchettini, Center for Structural Biology, Texas A&M Univ. With support from: National Institutes of Health

Automated Structure Determination Key step to high-throughput Structural Genomics, structure-based drug design, etc. Many computational tools to generate a map, but... Given electron density map, how to extract atomic coordinates automatically? Currently requires humans (+O): potential bottleneck Sources of difficulty: complexity, low resolution, phase errors, weak density Related methods: Shake&Bake, ARP/wARP, X- Powerfit, template convolution...

Overview of TEXTAL Apply pattern recognition techniques Exploit database of previously-solved maps Model molecular structures in local regions (e.g. spheres of 5 Angstrom radius) Intuitive principles: 1) Have I ever seen a region with a pattern of density like this before? 2) If so, what were previous local atomic coordinates?

Overview (cont’d) Divide-and-Conquer: 1) identify alpha-carbon positions (chain-tracing) 2) model regions around alpha-carbons (CAs), including backbone and side-chain atoms 3) concatenate local models back together, resolve any conflicts Database contains many regions centered on CAs from previous maps ~5A radius right for “structural repetition”

Main Stages of TEXTAL electron density map CAPRA C-alpha chains LOOKUP model (initial coordinates) model (final coordinates) Post-processing routines Reciprocal-space refinement/ML DM Human Crystallographer (editing) build-in side-chain and main-chain atoms locally around each CA example: real-space refinement

Feature Extraction Database: ~10 5 regions from ~100 maps How to identify closest match (efficiently)??? Calculate numerical features that represent the pattern in each region Must be rotation-invariant Search can be very fast: just compare features

Rotation-Invariant Features Average density:  =(1/n)  i, where  i is density at each lattice point in region Other Statistical Features: standard deviation, kurtosis… Distant to center of mass: – =(1/n)<  x i  i /  y i  i /  z i  i /  –d cen =  (x c 2 + y c 2 + z c 2 )

More Features Moments of inertia –measures dispersion around axes of symmetry in a density distribution –calculate 3x3 inertia matrix –diagonalize to get eigenvalues –sort from largest to smallest –take magnitudes and ratios of moments

More Features Spoke angles –if region centered on CA, should have 3 “spokes” of density emanating from center –find best-fit vectors; calc. angles among them surface area of contours connectivity of density/bones in region other geometrical features...

Feature Weights

CAPRA: C-Alpha Pattern- Recognition Algorithm Tracer - remove lattice points from map (lowest density first) without breaking connectivity Neural nework - for each pseudo atom, extract features, input to network, predict distances to CAs (1:10 in trace), trained on example points in real maps Linking - desire long chains, good CA predictions (not in side-chains), “structurally plausible” (e.g. linear, helical) Density Trace Neural Network Linking into C-alpha chains pseudo atoms predictions of distance to true CA map C-alpha coordinates

Example of the CAPRA Process

Example of CAPRA chains

The LOOKUP Process

Database Construction Ideally would use solved MAD/MIR maps Using “back-transformed” maps works well PDB  structure factors (include B-factors) keep reflections down to 2.8A Fourier transform  electron density map 50 proteins from PDBSelect (non-homol.) about 50,000 regions Feature extraction done offline

Details of Matching Process Feature-based matching: –Euclidean distance metric between feature vectors. –dist(R1,R2)=  w i (F i (R1)-F i (R2)) 2 Must weight features by relevance –less-relevant features add noise –Slider algorithm: optimize weights by comparing features in matching regions versus mismatches Verify selections by density correlation –requires search for optimal rotation

Post-Processing Routines Imperfections in the initial model: –backbone atoms not necessarily juxtaposed between adjacent residues, or in same direction –side-chains occasionally “flipped” into backbone –residue identities often incorrect (based on dens.) Fixing “flips” and direction - take candidate match with next highest correlation Real-space refinement: regularizes backbone Use sequence alignment to fix identities?

New Results on Real MAD Maps a CZRA: missed a 5-res loop (weak density) and C-terminus b M01: missed a 17-res helix, 9 deletions, 5 due to breaks, 3-res false backbone

Histograms of Distances Between Matched Atoms

Analysis of Amino Acid Types Confusion Matrix for CZRA: Amino acid in true structure Amino acid in TEXTAL model

The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University.

Similar presentations

Presentation on theme: "The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University.

Similar presentations

Presentation on theme: "The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University."— Presentation transcript:

Similar presentations

About project

Feedback