CAPRA: C-Alpha Pattern Recognition Algorithm Thomas R. Ioerger Department of Computer Science Texas A&M University.

Slides:

Advertisements

Similar presentations

NEURAL NETWORKS Backpropagation Algorithm

Advertisements

Hybrid Context Inconsistency Resolution for Context-aware Services

Surface Reconstruction From Unorganized Point Sets

Automated phase improvement and model building with Parrot and Buccaneer Kevin Cowtan

L.M. McMillin NOAA/NESDIS/ORA Regression Retrieval Overview Larry McMillin Climate Research and Applications Division National Environmental Satellite,

Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms Jianlin Cheng and Pierre Baldi Institute for Genomics.

Joint Estimation of Image Clusters and Image Transformations Brendan J. Frey Computer Science, University of Waterloo, Canada Beckman Institute and ECE,

Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.

Lecture 14 – Neural Networks

Geometric reasoning about mechanical assembly By Randall H. Wilson and Jean-Claude Latombe Henrik Tidefelt.

Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.

The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University.

Taking a Numeric Path Idan Szpektor. The Input A partial description of a molecule: The atoms The bonds The bonds lengths and angles Spatial constraints.

Nature’s Algorithms David C. Uhrig Tiffany Sharrard CS 477R – Fall 2007 Dr. George Bebis.

The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.

Shape Modeling International 2007 – University of Utah, School of Computing Robust Smooth Feature Extraction from Point Clouds Joel Daniels ¹ Linh Ha ¹.

Clustering Color/Intensity

PcaA Mycolic acid cyclopropyl synthase (Smith&Sacchettini) original structure solved at 2.0A via MAD R-value = 0.22, R-free = residues,  fold.

Current Status and Future Directions for TEXTAL March 2, 2003 The TEXTAL Group at Texas A&M: Thomas R. Ioerger James C. Sacchettini Tod Romo Kreshna Gopal.

TEXTAL - Automated Crystallographic Protein Structure Determination Using Pattern Recognition Principal Investigators: Thomas Ioerger (Dept. Computer Science)

Don't fffear the buccaneer Kevin Cowtan, York. ● Map simulation ⇨ A tool for building robust statistical methods ● 'Pirate' ⇨ A new statistical phase improvement.

Automated Model-Building with TEXTAL Thomas R. Ioerger Department of Computer Science Texas A&M University.

Recent Developments in TEXTAL Phenix Workshop Berkeley Sept Thomas R. Ioerger Texas A&M University.

Identification of Domains using Structural Data Niranjan Nagarajan Department of Computer Science Cornell University.

TEXTAL: A System for Automated Model Building Based on Pattern Recognition Thomas R. Ioerger Department of Computer Science Texas A&M University.

TEXTAL Progress Basic modeling of side-chain and backbone coordinates seems to be working well. –even for experimental MAD maps, 2.5-3A –using pattern-recognition.

A Hybrid Self-Organizing Neural Gas Network James Graham and Janusz Starzyk School of EECS, Ohio University Stocker Center, Athens, OH USA IEEE World.

Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

כמה מהתעשייה? מבנה הקורס השתנה Computer vision.

A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.

Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.

The P HENIX project Crystallographic software for automated structure determination Computational Crystallography Initiative (LBNL) -Paul Adams, Ralf Grosse-Kunstleve,

Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.

Protein Tertiary Structure Prediction

Structural alignments of Proteins using by TOPOFIT method Vitkup D., Melamud E., Moult J., Sander C. Completeness in structural genomics. Nature Struct.

CPSC 601 Lecture Week 5 Hand Geometry. Outline: 1.Hand Geometry as Biometrics 2.Methods Used for Recognition 3.Illustrations and Examples 4.Some Useful.

PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.

Representations of Molecular Structure: Bonds Only.

RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Interactive surface reconstruction on triangle meshes with subdivision surfaces Matthias Bein Fraunhofer-Institut für Graphische Datenverarbeitung IGD.

Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.

Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.

Generalized Hough Transform

PFI Cobra/MC simulator Peter Mao. purpose develop algorithms for fiducial (FF) and science (SF) fiber identification under representative operating conditions.

Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)

1/27 Discrete and Genetic Algorithms in Bioinformatics 許聞廉中央研究院資訊所.

Coevolution Chapter 6, Essentials of Metaheuristics, 2013 Spring, 2014 Metaheuristics Byung-Hyun Ha R2R3.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

March 23 & 28, Hashing. 2 What is Hashing? A Hash function is a function h(K) which transforms a key K into an address. Hashing is like indexing.

On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)

Page 1 Visual calibration for (mobile) devices Lode De Paepe GOAL: Calibrate the luminance response of a display (transfer function of the display.

Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.

Model-based learning: Theory and an application to sequence learning P.O. Box 49, 1525, Budapest, Hungary Zoltán Somogyvári.

A Framework for a Fully Automatic Karyotyping System E. Poletti, E. Grisan, A. Ruggeri Department of Information Engineering, University of Padova, Italy.

CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob Fast Algorithms for Projected Clustering.

Protein Tertiary Structure Prediction Structural Bioinformatics.

Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.

Deep Learning Overview Sources: workshop-tutorial-final.pdf

EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.

Score maps improve clarity of density maps

Substitution Structures of Large Molecules and Medium Range Correlations in Quantum Chemistry Calculations Luca Evangelisti Dipartmento di Chimica “Giacomo.

Machine Learning Feature Creation and Selection

Not your average density

Introduction to Sensor Interpretation

Introduction to Sensor Interpretation

Dr. Thomas R. Ioerger Department of Computer Science

Protein structure prediction

Presentation transcript:

CAPRA: C-Alpha Pattern Recognition Algorithm Thomas R. Ioerger Department of Computer Science Texas A&M University

Overview of CAPRA goal: predict CA chains from density map not just “tracing” - more than Bones desire 1:1 correspondence, ~3.8A apart based on principles of pattern recognition –use neural net to estimate which pseudo-atoms in trace “look” closest to true C-alphas –use feature extraction to capture 3D patterns in density for input to neural net –use other heuristics for “linking” together into chains, including geometric analysis (s.s.)

What can you do with CA chains? build-in side-chain and backbone atoms –TEXTAL, Segment-Match Modeling (Levitt), Holm and Sander recognize fold from secondary structure –identify candidates for molecular replacement evaluate map quality (num/len of chains) density modification –create poly-alanine backbone and use it to do phase recombination

Role in Automated Model Building Model building is one of the bottlenecks in high-throughput Structural Genomics Automation is needed TEXTAL CAPRA PHENIX reflections map model CA chains (ha/dm/ncs) refinement

Steps in CAPRA

Examples of CAPRA Steps

Tracer

Neural Network

Feature Extraction characterize 3D patterns in local density must be “rotation invariant” examples: –average density in region –standard deviation, kurtosis... –distance to center of mass –moments of inertia, ratios of moments –“spoke angles” calculated over spheres of 3A and 4A radius

Forward Propagation: Backward Propagation:

Selection of Candidate C-alpha’s method: –pick candidates in order of lowest predicted distance first, –among all pseudo-atoms in trace, –as long as not closer than 2.5A notes: –no 3.8A constraint; distance can be as high as 5A –don’t rely on branch points (though often near) –picked in random order throughout map –initially covers whole map, including side-chains and disconnected regions (e.g. noise in solvent)

Linking into Chains initial connectivity of CA candidates based on the trace “over-connected” graph - branches, cycles... start by computing connected components (islands, or clusters) two strategies: –for small clusters (<=20 candidates), find longest internal chain with “good” atoms –for large clusters (>20 candidates), incrementally clip branch points using heuristics

Extracting Chains from Small Clusters exhaustive depth-first search of all paths scoring function: –length –penalty for inclusion of points with high predicted distance to true CA by neural net –preference for following secondary structure (locally straight or helical)

Secondary Structure Analysis generate all 7-mers (connected fragments of candidate CAs of length 7) evaluate “straightness” –ratio of sum of link lengths to end-to-end distance –straightness>0.8 ==> potential beta-strand evaluate “helicity” –average absolute deviation of angles and torsions along 7-mer from ideal values (95º and 50º) –helicity potential alpha-helix

Handling Large Clusters start by breaking cycles (near “bad” atoms) clip links at branch points till only linear chains remain clip the most “obvious” links first, e.g. –if other two links are part of sec. struct. –if clipped branch has “bad” atom nearby –if clipped branch is small and other 2 are large ?? ?

Results

Analysis of RMS by Sec. Struct. (DSSP)

Example of CA-chains for CzrA fit by CAPRA

Results for MVK

Effect of Resolution IF5a –initial map: 2.1A, RMS error: 1.23A –limited map: 2.8A, RMS error: 0.86A PCAa (2Fo-Fc) –initial map: 2.0A, RMS error: 1.1A –limited map: 2.8A, RMS error: 0.82A

Effect of Density Modification anecdotal evidence from ICL –before DM: many short, broken chains –after DM: longer chains, reasonable model hard to quantify, but the moral is: –the accuracy of CAPRA results depends on “quality” of density, and CAPRA might not give useful results in noisy maps experiments with “blurring” maps –convolution with Gaussian by FFT

Future Work build poly-alanine –must determine directionality –currently done as part of TEXTAL (fits backbone carbonyls as well as side-chain atoms) connect ends of chains –improve robustness to breaks in density use partial models to improve phases and hence make better maps (iteratively) –a new form of density modification?

Related Approaches Resolve (Terwilliger) –template convolution search, max. likelihood MAID (D. Levitt) –density correlation search, grow ends Critical-point analysis (Glasgow/Fortier) ARP/wARP (Perrakis and Lamzin) MAIN (D. Turk) –chiral carbons; iterate: extend ends, phase recomb. X-Powerfit (T. Oldfield, MSI)

Availability on pompano, add /xray/textal/bin/capra to your path run ‘capra ’ where.xplor is your map in X-PLOR fmt map should cover at least one whole molecule, though smaller=faster takes a minutes to an hour (especially for feature calculations) any space group & unit cell resolution: A, 2.8A recommended remember: quality of density must be high, e.g. post- solvent-flattening, etc.

Acknowledgements Funding –National Institutes of Health –Welch Foundation People –Dr. James C. Sacchettini –The TEXTAL Group! Tod Romo Kreshna Gopal Reetal Pai