Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng.

Similar presentations


Presentation on theme: "Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng."— Presentation transcript:

1 Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng Dong, Peter Vedell, Di Wu A Novel Geometric Build-Up Algorithm for Solving the Distance Geometry Problem and Its Application to Multidimensional Scaling

2 S Multidimensional Scaling data classification geometric mapping of data T Distance Geometry mapping from semi-metric to metric spaces Euclidean and non-Euclidean B Molecular Conformation embedding in 3D Euclidean space protein structure prediction and determination fundamental problem: find the coordinates for a set of points, given the distances for all pairs of points Cayley-Menger determinant necessary & sufficient conditions of embedding singular-value decomposition method strain/stress minimization sparse, inexact distances, bounds on the distances, probability distributions

3 HIV Retrotranscriptase 554 amino acids4200 atoms Proteins are building blocks of life and key ingredients of biological processes. A biological system may have up to hundreds of thousands of different proteins, each with a specific role in the system. A protein is formed by a polypeptide chain with typically several hundreds of amino acids and tens of thousands of atoms. A protein has a unique 3D structure, which determines in many ways the function of the protein. an example:

4 Molecular Distance Geometry Problem Given n atoms a 1, …, a n and a set of distances d i,j between a i and a j, (i,j) in S

5 Problems and Complexity problems with all distances: solvable in O (n 3 ) using SVD problems with sparse sets of distances: NP-complete (Saxe 1979) problems with distance ranges (NMR results): NP-complete (More and Wu 1997), if the ranges are small problems with probability distributions of distances: stochastic multidimensional scaling, structure prediction

6 Embed Algorithm by Crippen and Havel CNS Partial Metrization by Brünger et al Graph Reduction by Hendrickson Alternating Projection by Glunt and Hayden Global Optimization by Moré and Wu Multidimensional Scaling by Trosset, et al Current Approaches

7 1.bound smooth; keep distances consistent 2.distance metrization; estimate the missing distances 3.repeat (say 1000 times): 4.randomly generate D in between L and U 5.find X using SVD with D 6.if X is found, stop 7.select the best approximation X 8.refine X with simulated annealing 9.final optimization Embed Algorithm Crippen and Havel 1988 (DGII, DGEOM) Brünger et al 1992, 1998 (XPLOR, CNS) time consuming in O(n 3 ~n 4 ) costly in O(n 2 ~n 3 )

8 Independent Points: A set of k+1 points in R k is called independent if it is not a set of points in R k-1. Metric Basis: A set of points B in a space S is a metric basis of S provided each point of S is uniquely determined by its distances from the points in B. Fundamental Theorem: Any k+1 independent points in R k form a metric basis for R k. Geometric Build-Up Blumenthal 1953: Theory and Applications of Distance Geometry

9 in two dimension Geometric Build-Up

10 in three dimension

11 Geometric Build-Up in three dimension

12 Geometric Build-Up x 1 = (u 1, v 1, w 1 ) x 2 = (u 2, v 2, w 2 ) x 3 = (u 3, v 3, w 3 ) x 4 = (u 4, v 4, w 4 ) ||x i - x 1 || = d i,1 ||x i - x 2 || = d i,2 ||x i - x 3 || = d i,3 ||x i - x 4 || = d i,4 ||x j - x 1 || = d j,1 ||x j - x 2 || = d j,2 ||x j - x 3 || = d j,3 ||x j - x 4 || = d j,4 ? x i = (u i, v i, w i ) ? x j = (u j, v j, w j ) 3 42 1 j i

13 The geometric build-up algorithm solves a molecular distance geometry problem in O (n) when distances between all pairs of atoms are given, while the singular value decomposition algorithm requires O (n 2 ~n 3 ) computing time!

14 The X-ray crystallography structure (left) of the HIV-1 RT p66 protein (4200 atoms) and the structure (right) determined by the geometric build-up algorithm using the distances for all pairs of atoms in the protein. The algorithm took only 188,859 floating-point operations to obtain the structure, while a conventional singular-value decomposition algorithm required 1,268,200,000 floating-point operations. The RMSD of the two structures is ~10 -4 Å.

15 Problems with Sparse Sets of Distances

16 Control of Rounding Errors

17

18 Tolerate Distance Errors

19 i j (i,j) in S x j are determined. Tolerate Distance Errors

20 (i,j) in S x j are determined. The objective function is convex and the problem can be solved using a standard Newton method. Each function evaluation requires order of n floating point operations, where n is the number of atoms. In the ideal case when every atom can be determined, n atoms require O (n 2 ) floating point operations.

21 NMR Structure Determination i j The distances are given with their possible ranges.

22 (i, j) in S

23 The structure of 4MBA (red lines) determined by using a geometric build-up algorithm with a subset of all pairs of inter-atomic distances. The X-ray crystallography structure is shown in blue lines. Computational Results

24 The total distance errors (red) for the partial structures of a polypeptide chain obtained by using a geometric build-up are all smaller than 1 Å, while those (blue) by using CNS (Brünger et al) grow quickly with increasing numbers of atoms in the chain. Computational Results

25 Extension to Statistical Distance Data i j the distributions of the distances in structure database structure prediction


Download ppt "Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng."

Similar presentations


Ads by Google