Presentation is loading. Please wait.

Presentation is loading. Please wait.

Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.

Similar presentations


Presentation on theme: "Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List."— Presentation transcript:

1 Conformational Space

2  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List of coordinates of atom centers  List of torsional angles (e.g., the  -  -  for a protein)  Conformational space: Space of all conformations

3 Conformational Space q1q1 qiqi q2q2 qjqj q N-1 qNqN

4 Conformational Space q 1 q 3 q 0 q n q 4

5 Relation to Robotics/Graphics q 1 q 3 q 0 q n q 4 q 2  (t) Configuration space

6 Need for a Metric  Simulation and sampling techniques can produce millions of conformations  Which conformations are similar?  Which ones are close to the folded one?  Do some conformations form small clusters (e.g. key intermediates while folding)?

7 Metric in Conformational Space  A metric over conformational space C is a function: d: c,c’  C  d(c,c’)   +  {0} such that:  d(c,c’) = 0  c = c’ (non-degeneracy)  d(c,c’) = d(c’,c) (symmetry)  d(c,c’) + d(c’,c”)  d(c,c”)(triangle inequality)

8 But not all metrics are “good”  Euclidean metric: d(c,c’) =  i=1,...,n (|  i -  i ’| 2 + |  i -  i ’| 2 )

9

10

11 Metric in Conformational Space  A “good” metric should measure how well the atoms in two conformations can be aligned  Usual metrics: cRMSD, dRMSD

12 RMSD  Given two sets of n points in  3 A = {a 1,…,a n } and B = {b 1,…,b n }  The RMSD between A and B is: RMSD(A,B) = [ (1/n)  i=1,…,n ||a i -b i || 2 ] 1/2 where ||a i -b i || denotes the Euclidean distance between a i and b i in  3  RMSD(A,B) = 0 iff a i = b i for all i

13 cRMSD  Molecule M with n atoms a 1,…,a n  Two conformations c and c’ of M  a i (c) is position of a i when M is at c  cRMSD(c,c’) is the minimized RMSD between the two sets of atom centers: min T [ (1/n)  i=1,…,n ||a i (c) – T(a i (c’))|| 2 ] 1/2 where the minimization is over all possible rigid-body transform T

14

15

16 cRMSD  cRMSD verifies triangle inequality  cRMSD takes linear time to compute  Often, cRMSD is restricted to a subset of atoms, e.g., the C  atoms on a protein’s backbone

17 Representation Restricted to C  Atoms Protein 1tph - The positions of AA residue centers (Cα atoms) mainly determine the structure of a protein. - In structural comparison, people usually work only on the backbone of Cα atoms, and neglect the other atoms.

18 Possible project: Design a method for efficiently finding nearest neighbors in a sampled conformation space of a protein, using the cRMSD metric.

19 dRMSD  Molecule M with n atoms a 1,…,a n  Two conformations c and c’ of M  {d ij (c)}: n  n symmetrical intra-molecular distance matrix in M at c  dRMD(c, c’) is : [ (1/n(n-1))  i=1,…,n-1  j =i+1,…,n (d ij (c) – d ij (c’)) 2 ] 1/2  {d ij } is usually restricted to a subset of atoms, e.g., the C  atoms on a protein’s backbone

20 Intra-Molecular Distance Matrix Distances between C  pairs of a protein with 142 residues. Darker squares represent shorter distances.

21 Intra-Molecular Distance Matrix Distances between C  pairs of a protein with 142 residues. Darker squares represent shorter distances. 1 40 85 45

22 Intra-Molecular Distance Matrix

23 dRMSD  Molecule M with n atoms a 1,…,a n  Two conformations c and c’ of M  {d ij (c)}: n  n symmetrical intra-molecular distance matrix in M at c  dRMSD(c, c’) = [ (2/n(n-1))  i=1,…,n-1  j =i+1,…,n (d ij (c) – d ij (c’)) 2 ] 1/2  {d ij } is usually restricted to a subset of atoms, e.g., the C  atoms on a protein’s backbone

24 dRMSD  Molecule M with n atoms a 1,…,a n  Two conformations c and c’ of M  {d ij (c)}: n  n symmetrical intra-molecular distance matrix in M at c  dRMSD(c, c’) = [ (2/n(n-1))  i=1,…,n-1  j =i+1,…,n (d ij (c) – d ij (c’)) 2 ] 1/2  {d ij } is usually restricted to a subset of atoms, e.g., the C  atoms on a protein’s backbone  Advantage: No aligning transform  Drawback: Takes quadratic time to compute

25 Is dRMSD a metric?  dRMSD(c, c’) = [ (2/n(n-1))  i=1,…,n-1  j =i+1,…,n (d ij (c) – d ij (c’)) 2 ] 1/2 is a metric in the n(n-1)/2-dimensional space, where a conformation c is represented by {d ij (c)}  But, in this representation, the same point represents both a conformation and its mirror image

26 k -Nearest-Neighbors Problem Given a set S of conformations of a protein and a query conformation c, find the k conformations in S most similar to c (w.r.t. cRMSD, dRMSD, other metric) Can be done in time O(N(log k + L)) where: - N = size of S - L = time to compare two conformations

27 k -Nearest-Neighbors Problem The total time needed to compute the k nearest neighbors of every conformation in S is O(N 2 (log k + L)) Much too long for large datasets where N ranges from 10,000’s to millions!!! Can be improved by: 1. Reducing L 2. More efficient algorithm (e.g., kd-tree)

28 kd-Tree In a d-dimensional space, where d>2, range searching for a point takes O(dn 1-1/d )

29 k -Nearest-Neighbors Problem Idea: simplify protein’s description

30 cRMSD  O(n) time dRMSD  O(n 2 ) time Assume that each conformation is described by the coordinates of the n C  atoms

31 This representation is highly redundant  Proximity along the chain entails spatial proximity  Atoms can’t bunch up, hence far away atoms along the chain are on average spatially distant cici cjcj

32  m-Averaged Approximation  Cut the backbone into fragments of m C  atoms  Replace each fragment by the centroid of the m C  atoms  Simplified cRMSD and dRMSD 3n coordinates3n/m coordinates

33  8 diverse proteins (54 -76 residues)  Decoy sets of N =10,000 conformations from the Park-Levitt set [Park et al, 1997] Evaluation: Test Sets [Lotan and Schwarzer, 2003] mcRMSDdRMSD 30.990.96-0.98 40.98-0.990.94-0.97 60.92-0.990.78-0.93 90.81-0.980.65-0.96 120.54-0.920.52-0.69 Higher correlation for random sets (  greater savings) Correlation:

34 Running Times

35 Further Reduction for dRMSD 1) Stack m-averaged distance matrices as vectors of a matrix A

36 A r N Vector a i of elements of distance matrix of i th conformation (i = 1 to N)

37 Further Reduction for dRMSD 1) Stack m-averaged distance matrices as vectors of a matrix A 2) Compute the SVD A = UDV T

38 A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition Vector a j of elements of distance matrix of j th conformation (j = 1 to N) Orthonormal (rotation) matrix Diagonal matrix

39 A (r x N) r N U (r x r) V T (r x N) = SVD Decomposition Vector a j of elements of distance matrix of j th conformation (j = 1 to N) Orthonormal (rotation) matrix Diagonal matrix s 1 s 2 s r 0 0 s 1  s 2 ...  s r  0 (singular values)

40 A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition Vector a j of elements of distance matrix of j th conformation (j = 1 to N) Orthonormal (rotation) matrix Diagonal matrix Matrix with orthonormal rows vjTvkTvjTvkT v i and v j are orthogonal unit Nx1 vectors

41 A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition r-dimensional space x y X Y Representation of A in space (X,Y) does not depend on the coordinate system!

42 v1Tv1T v2Tv2T A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition s 1 s 2 s 3 s r ||s 1 v 1 ||  ||s 2 v 2 ||...

43 v1Tv1T v2Tv2T A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition s 1 s 2 s 3 s r vpTvpT p principal components

44 A (r x N) r N U (r x r) D (r x r) V T (r x N) = SVD Decomposition s 1 s 2 s p v1Tv1T v2Tv2T vpTvpT p principal components 0

45 Further Reduction for dRMSD 1) Stack m-averaged distance matrices as vectors of a matrix A 2) Compute the SVD A = UDV T 3) Project onto p principal components

46 Correlation between dRMSD and is reduced to summing up 12 to 20 terms (instead of ~ 80 to 200, since the proteins have 54 to 76 amino acids)

47 Complexity of SVD  SVD of rxN matrix, where N > r, takes O(r 2 N) time  Here r ~ (n/m) 2  So, time complexity is O(n 4 N)  Would be too costly without m-averaging

48 Evaluation for 1CTF Decoy Sets [Lotan and Schwarzer, 2003]  N = 100,000, k = 100, 4-averaging, 16 PCs  70% correct, with furthest NN off by 20%  Brute-force: 84 h  Brute-force + m-averaging: 4.8 h  Brute-force + m-averaging + PC: 41 min  kD-tree + m-averaging + PC: 19 min  Speedup greater than x200  6 k approximate NNs contain all true k NNs  Use m-averaging and PC reduction as fast filters


Download ppt "Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List."

Similar presentations


Ads by Google