Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng.

Slides:



Advertisements
Similar presentations
SHAPE THEORY USING GEOMETRY OF QUOTIENT SPACES: STORY STORY SHAPE THEORY USING GEOMETRY OF QUOTIENT SPACES: STORY STORY ANUJ SRIVASTAVA Dept of Statistics.
Advertisements

Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Computational methods in molecular biophysics (examples of solving real biological problems) EXAMPLE I: THE PROTEIN FOLDING PROBLEM Alexey Onufriev, Virginia.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Ronald R. Coifman , Stéphane Lafon, 2006
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
With thanks to Zhijun Wu An introduction to the algorithmic problems of Distance Geometry.
Structural bioinformatics
A Naive Bayesian Classifier To Assign Protein Sequences to Protein Subfamilies Learning Set Test Set The development of high throughput technologies in.
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
1 Processing & Analysis of Geometric Shapes Shortest path problems Shortest path problems The discrete way © Alexander & Michael Bronstein, ©
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Mutual Information Mathematical Biology Seminar
Dimensionality Reduction and Embeddings
Recent Development on Elimination Ordering Group 1.
Dimensionality Reduction
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Correspondence & Symmetry
Epipolar geometry. (i)Correspondence geometry: Given an image point x in the first view, how does this constrain the position of the corresponding point.
1 Numerical geometry of non-rigid shapes Spectral Methods Tutorial. Spectral Methods Tutorial 6 © Maks Ovsjanikov tosca.cs.technion.ac.il/book Numerical.
Predicting Communication Latency in the Internet Dragan Milic Universität Bern.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Dimensionality Reduction
1 Numerical geometry of non-rigid shapes Non-Euclidean Embedding Non-Euclidean Embedding Lecture 6 © Alexander & Michael Bronstein tosca.cs.technion.ac.il/book.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Volume distortion for subsets of R n James R. Lee Institute for Advanced Study & University of Washington Symposium on Computational Geometry, 2006; Sedona,
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
1 A Combinatorial Toolbox for Protein Sequence Design and Landscape Analysis in the Grand Canonical Model Ming-Yang Kao Department of Computer Science.
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
PIC: Practical Internet Coordinates for Distance Estimation Manuel Costa joint work with Miguel Castro, Ant Rowstron, Peter Key Microsoft Research Cambridge.
online convex optimization (with partial information)
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Statistical Physics of the Transition State Ensemble in Protein Folding Alfonso Ramon Lam Ng, Jose M. Borreguero, Feng Ding, Sergey V. Buldyrev, Eugene.
Nearest Neighbor Searching Under Uncertainty
Shape Modeling and Matching in Protein Structure Identification Sasakthi Abeysinghe, Tao Ju Washington University, St. Louis, USA Matthew Baker, Wah Chiu.
Comparing Data from MD simulations and X-ray Crystallography What can we compare? 3D shapes (Scalar coupling constants, a.k.a. J-values, nuclear Overhauser.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Computer Vision Lab. SNU Young Ki Baik Nonlinear Dimensionality Reduction Approach (ISOMAP, LLE)
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
Stabbing balls and simplifying proteins Ovidiu Daescu and Jun Luo Department of Computer Science University of Texas at Dallas Richardson, TX
Jan Kamenický.  Many features ⇒ many dimensions  Dimensionality reduction ◦ Feature extraction (useful representation) ◦ Classification ◦ Visualization.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
University “Ss. Cyril and Methodus” SKOPJE Cluster-based MDS Algorithm for Nodes Localization in Wireless Sensor Networks Ass. Biljana Stojkoska.
Bundle Adjustment A Modern Synthesis Bill Triggs, Philip McLauchlan, Richard Hartley and Andrew Fitzgibbon Presentation by Marios Xanthidis 5 th of No.
Multidimensional Scaling By Marc Sobel. The Goal  We observe (possibly non-euclidean) proximity data. For each pair of objects number ‘i’ and ‘j’ we.
FlexWeb Nassim Sohaee. FlexWeb 2 Proteins The ability of proteins to change their conformation is important to their function as biological machines.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Mean Field Theory and Mutually Orthogonal Latin Squares in Peptide Structure Prediction N. Gautham Department of Crystallography and Biophysics University.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Protein structure prediction Computer-aided pharmaceutical design: Modeling receptor flexibility Applications to molecular simulation Work on this paper.
Protein Tertiary Structure Prediction Structural Bioinformatics.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
1 An approach based on shortest path and connectivity consistency for sensor network localization problems Makoto Yamashita (Tokyo Institute of Technology)
Motion Segmentation with Missing Data using PowerFactorization & GPCA
Spectral Methods Tutorial 6 1 © Maks Ovsjanikov
LSI, SVD and Data Management
Local Feature Extraction Using Scale-Space Decomposition
Lecture 16: Earth-Mover Distance
Estimating Networks With Jumps
A Novel Geometric Build-Up Algorithm
Lecture 15: Least Square Regression Metric Embeddings
Assessing Exhaustiveness of Stochastic Sampling for Integrative Modeling of Macromolecular Structures  Shruthi Viswanath, Ilan E. Chemmama, Peter Cimermancic,
Presentation transcript:

Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng Dong, Peter Vedell, Di Wu A Novel Geometric Build-Up Algorithm for Solving the Distance Geometry Problem and Its Application to Multidimensional Scaling

S Multidimensional Scaling data classification geometric mapping of data T Distance Geometry mapping from semi-metric to metric spaces Euclidean and non-Euclidean B Molecular Conformation embedding in 3D Euclidean space protein structure prediction and determination fundamental problem: find the coordinates for a set of points, given the distances for all pairs of points Cayley-Menger determinant necessary & sufficient conditions of embedding singular-value decomposition method strain/stress minimization sparse, inexact distances, bounds on the distances, probability distributions

HIV Retrotranscriptase 554 amino acids4200 atoms Proteins are building blocks of life and key ingredients of biological processes. A biological system may have up to hundreds of thousands of different proteins, each with a specific role in the system. A protein is formed by a polypeptide chain with typically several hundreds of amino acids and tens of thousands of atoms. A protein has a unique 3D structure, which determines in many ways the function of the protein. an example:

Molecular Distance Geometry Problem Given n atoms a 1, …, a n and a set of distances d i,j between a i and a j, (i,j) in S

Problems and Complexity problems with all distances: solvable in O (n 3 ) using SVD problems with sparse sets of distances: NP-complete (Saxe 1979) problems with distance ranges (NMR results): NP-complete (More and Wu 1997), if the ranges are small problems with probability distributions of distances: stochastic multidimensional scaling, structure prediction

Embed Algorithm by Crippen and Havel CNS Partial Metrization by Brünger et al Graph Reduction by Hendrickson Alternating Projection by Glunt and Hayden Global Optimization by Moré and Wu Multidimensional Scaling by Trosset, et al Current Approaches

1.bound smooth; keep distances consistent 2.distance metrization; estimate the missing distances 3.repeat (say 1000 times): 4.randomly generate D in between L and U 5.find X using SVD with D 6.if X is found, stop 7.select the best approximation X 8.refine X with simulated annealing 9.final optimization Embed Algorithm Crippen and Havel 1988 (DGII, DGEOM) Brünger et al 1992, 1998 (XPLOR, CNS) time consuming in O(n 3 ~n 4 ) costly in O(n 2 ~n 3 )

Independent Points: A set of k+1 points in R k is called independent if it is not a set of points in R k-1. Metric Basis: A set of points B in a space S is a metric basis of S provided each point of S is uniquely determined by its distances from the points in B. Fundamental Theorem: Any k+1 independent points in R k form a metric basis for R k. Geometric Build-Up Blumenthal 1953: Theory and Applications of Distance Geometry

in two dimension Geometric Build-Up

in three dimension

Geometric Build-Up in three dimension

Geometric Build-Up x 1 = (u 1, v 1, w 1 ) x 2 = (u 2, v 2, w 2 ) x 3 = (u 3, v 3, w 3 ) x 4 = (u 4, v 4, w 4 ) ||x i - x 1 || = d i,1 ||x i - x 2 || = d i,2 ||x i - x 3 || = d i,3 ||x i - x 4 || = d i,4 ||x j - x 1 || = d j,1 ||x j - x 2 || = d j,2 ||x j - x 3 || = d j,3 ||x j - x 4 || = d j,4 ? x i = (u i, v i, w i ) ? x j = (u j, v j, w j ) j i

The geometric build-up algorithm solves a molecular distance geometry problem in O (n) when distances between all pairs of atoms are given, while the singular value decomposition algorithm requires O (n 2 ~n 3 ) computing time!

The X-ray crystallography structure (left) of the HIV-1 RT p66 protein (4200 atoms) and the structure (right) determined by the geometric build-up algorithm using the distances for all pairs of atoms in the protein. The algorithm took only 188,859 floating-point operations to obtain the structure, while a conventional singular-value decomposition algorithm required 1,268,200,000 floating-point operations. The RMSD of the two structures is ~10 -4 Å.

Problems with Sparse Sets of Distances

Control of Rounding Errors

Tolerate Distance Errors

i j (i,j) in S x j are determined. Tolerate Distance Errors

(i,j) in S x j are determined. The objective function is convex and the problem can be solved using a standard Newton method. Each function evaluation requires order of n floating point operations, where n is the number of atoms. In the ideal case when every atom can be determined, n atoms require O (n 2 ) floating point operations.

NMR Structure Determination i j The distances are given with their possible ranges.

(i, j) in S

The structure of 4MBA (red lines) determined by using a geometric build-up algorithm with a subset of all pairs of inter-atomic distances. The X-ray crystallography structure is shown in blue lines. Computational Results

The total distance errors (red) for the partial structures of a polypeptide chain obtained by using a geometric build-up are all smaller than 1 Å, while those (blue) by using CNS (Brünger et al) grow quickly with increasing numbers of atoms in the chain. Computational Results

Extension to Statistical Distance Data i j the distributions of the distances in structure database structure prediction