The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University.

Slides:



Advertisements
Similar presentations
3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
Advertisements

3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Automatic in vivo Microscopy Video Mining for Leukocytes * Chengcui Zhang, Wei-Bang Chen, Lin Yang, Xin Chen, John K. Johnstone.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Ioerger Lab – Bioinformatics Research
Adam Rachmielowski 615 Project: Real-time monocular vision-based SLAM.
CAPRA: C-Alpha Pattern Recognition Algorithm Thomas R. Ioerger Department of Computer Science Texas A&M University.
The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.
CSE803 Fall Pattern Recognition Concepts Chapter 4: Shapiro and Stockman How should objects be represented? Algorithms for recognition/matching.
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
PcaA Mycolic acid cyclopropyl synthase (Smith&Sacchettini) original structure solved at 2.0A via MAD R-value = 0.22, R-free = residues,  fold.
Current Status and Future Directions for TEXTAL March 2, 2003 The TEXTAL Group at Texas A&M: Thomas R. Ioerger James C. Sacchettini Tod Romo Kreshna Gopal.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
TEXTAL - Automated Crystallographic Protein Structure Determination Using Pattern Recognition Principal Investigators: Thomas Ioerger (Dept. Computer Science)
MICE input beam and weighting Dr Chris Rogers Analysis PC 05/09/2007.
Don't fffear the buccaneer Kevin Cowtan, York. ● Map simulation ⇨ A tool for building robust statistical methods ● 'Pirate' ⇨ A new statistical phase improvement.
Automated Model-Building with TEXTAL Thomas R. Ioerger Department of Computer Science Texas A&M University.
Recent Developments in TEXTAL Phenix Workshop Berkeley Sept Thomas R. Ioerger Texas A&M University.
A unified statistical framework for sequence comparison and structure comparison Michael Levitt Mark Gerstein.
TEXTAL: A System for Automated Model Building Based on Pattern Recognition Thomas R. Ioerger Department of Computer Science Texas A&M University.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Stockman CSE803 Fall Pattern Recognition Concepts Chapter 4: Shapiro and Stockman How should objects be represented? Algorithms for recognition/matching.
Recap Don’t forget to – pick a paper and – me See the schedule to see what’s taken –
Prediction of Local Structure in Proteins Using a Library of Sequence-Structure Motifs Christopher Bystroff & David Baker Paper presented by: Tal Blum.
TEXTAL Progress Basic modeling of side-chain and backbone coordinates seems to be working well. –even for experimental MAD maps, 2.5-3A –using pattern-recognition.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structures.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Module 2: Structure Based Ph4 Design
CPSC 601 Lecture Week 5 Hand Geometry. Outline: 1.Hand Geometry as Biometrics 2.Methods Used for Recognition 3.Illustrations and Examples 4.Some Useful.
EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:
1 Pattern Recognition Concepts How should objects be represented? Algorithms for recognition/matching * nearest neighbors * decision tree * decision functions.
ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.
Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Phasing Today’s goal is to calculate phases (  p ) for proteinase K using PCMBS and EuCl 3 (MIRAS method). What experimental data do we need? 1) from.
CSCE 643 Computer Vision: Extractions of Image Features Jinxiang Chai.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Levels of Image Data Representation 4.2. Traditional Image Data Structures 4.3. Hierarchical Data Structures Chapter 4 – Data structures for.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Image Registration Advanced DIP Project
X-ray crystallography – an overview (based on Bernie Brown’s talk, Dept. of Chemistry, WFU) Protein is crystallized (sometimes low-gravity atmosphere is.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Today: compute the experimental electron density map of proteinase K Fourier synthesis  (xyz)=  |F hkl | cos2  (hx+ky+lz -  hkl ) hkl.
Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Lab Lab 10.2: Homology Modeling Lab Boris Steipe Departments of Biochemistry and.
Big data classification using neural network
Hand Geometry Recognition
Reduce the need for human intervention in protein model building
Protein Planes Bob Fraser CSCBC 2007.
Protein Structures.
Volume 19, Issue 7, Pages (July 2011)
Protein structure prediction.
EE 492 ENGINEERING PROJECT
Dr. Thomas R. Ioerger Department of Computer Science
Protein structure prediction
Presentation transcript:

The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University Collaboration with: Dr. James C. Sacchettini, Center for Structural Biology, Texas A&M Univ. With support from: National Institutes of Health

Automated Structure Determination Key step to high-throughput Structural Genomics, structure-based drug design, etc. Many computational tools to generate a map, but... Given electron density map, how to extract atomic coordinates automatically? Currently requires humans (+O): potential bottleneck Sources of difficulty: complexity, low resolution, phase errors, weak density Related methods: Shake&Bake, ARP/wARP, X- Powerfit, template convolution...

Overview of TEXTAL Apply pattern recognition techniques Exploit database of previously-solved maps Model molecular structures in local regions (e.g. spheres of 5 Angstrom radius) Intuitive principles: 1) Have I ever seen a region with a pattern of density like this before? 2) If so, what were previous local atomic coordinates?

Overview (cont’d) Divide-and-Conquer: 1) identify alpha-carbon positions (chain-tracing) 2) model regions around alpha-carbons (CAs), including backbone and side-chain atoms 3) concatenate local models back together, resolve any conflicts Database contains many regions centered on CAs from previous maps ~5A radius right for “structural repetition”

Main Stages of TEXTAL electron density map CAPRA C-alpha chains LOOKUP model (initial coordinates) model (final coordinates) Post-processing routines Reciprocal-space refinement/ML DM Human Crystallographer (editing) build-in side-chain and main-chain atoms locally around each CA example: real-space refinement

Feature Extraction Database: ~10 5 regions from ~100 maps How to identify closest match (efficiently)??? Calculate numerical features that represent the pattern in each region Must be rotation-invariant Search can be very fast: just compare features

F=

Rotation-Invariant Features Average density:  =(1/n)  i, where  i is density at each lattice point in region Other Statistical Features: standard deviation, kurtosis… Distant to center of mass: – =(1/n)<  x i  i /  y i  i /  z i  i /  –d cen =  (x c 2 + y c 2 + z c 2 )

More Features Moments of inertia –measures dispersion around axes of symmetry in a density distribution –calculate 3x3 inertia matrix –diagonalize to get eigenvalues –sort from largest to smallest –take magnitudes and ratios of moments

More Features Spoke angles –if region centered on CA, should have 3 “spokes” of density emanating from center –find best-fit vectors; calc. angles among them surface area of contours connectivity of density/bones in region other geometrical features...

Feature Weights

CAPRA: C-Alpha Pattern- Recognition Algorithm Tracer - remove lattice points from map (lowest density first) without breaking connectivity Neural nework - for each pseudo atom, extract features, input to network, predict distances to CAs (1:10 in trace), trained on example points in real maps Linking - desire long chains, good CA predictions (not in side-chains), “structurally plausible” (e.g. linear, helical) Density Trace Neural Network Linking into C-alpha chains pseudo atoms predictions of distance to true CA map C-alpha coordinates

Example of the CAPRA Process

Example of CAPRA chains

The LOOKUP Process

Database Construction Ideally would use solved MAD/MIR maps Using “back-transformed” maps works well PDB  structure factors (include B-factors) keep reflections down to 2.8A Fourier transform  electron density map 50 proteins from PDBSelect (non-homol.) about 50,000 regions Feature extraction done offline

Details of Matching Process Feature-based matching: –Euclidean distance metric between feature vectors. –dist(R1,R2)=  w i (F i (R1)-F i (R2)) 2 Must weight features by relevance –less-relevant features add noise –Slider algorithm: optimize weights by comparing features in matching regions versus mismatches Verify selections by density correlation –requires search for optimal rotation

Post-Processing Routines Imperfections in the initial model: –backbone atoms not necessarily juxtaposed between adjacent residues, or in same direction –side-chains occasionally “flipped” into backbone –residue identities often incorrect (based on dens.) Fixing “flips” and direction - take candidate match with next highest correlation Real-space refinement: regularizes backbone Use sequence alignment to fix identities?

New Results on Real MAD Maps a CZRA: missed a 5-res loop (weak density) and C-terminus b M01: missed a 17-res helix, 9 deletions, 5 due to breaks, 3-res false backbone

Histograms of Distances Between Matched Atoms

Analysis of Amino Acid Types Confusion Matrix for CZRA: Amino acid in true structure Amino acid in TEXTAL model