Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,

Similar presentations


Presentation on theme: "Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,"— Presentation transcript:

1 Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo, Ontario, Canada j3qian@cs.uwaterloo.ca

2 Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme

3 Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme

4 Introduction Protein is a sequence of amino acids. A protein always folds into a specific 3-D shape. Structures are important to proteins:  The functional properties of proteins depend on their 3-D structures.  Structures are more conserved than sequence during the evolution of proteins.

5 Structural Motif Structural motif is a frequently occurring substructure of proteins. Motifs are thought to be tightly related to protein functions. Identifying motifs from a set of proteins can help us to know their evolutionary history and functions.

6 Structural Motif Finding Problem Given a set of protein structures, to find the frequently occurring substructure. Informally, to find one substructure from each protein, that exhibit the highest degree of similarity.

7 How to measure the similarity of two substructures? Two popular measurements:  dRMSD: measure the root mean square Euclidean distance between the corresponding residues from different protein structures.  cRMSD: calculate the internal distance matrix for each protein, and compare the distance matrices for input structures.

8 Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme

9 Related Work L.P.Chew proposed an iterative algorithm to compute the conserved shape and proved its convergence. (2002) D. Bandyopadhyay applied graph-based data- mining tools to find the family-specific fingerprints. (2006) M. Shatsky presented an algorithm to uncover the binding pattern. (2006) DALI and CE attempt to identify structural alignment with minimal dRMSD. STRUCTRAL and TM-Align employ heuristics to detect the alignment with minimal cRMSD.

10 Related Work (continued) However, these methods are all heuristic; the solutions are not guaranteed to be optimal or near optimal. The first PTAS for pairwise structural alignment:  R. Kolodny explored the Lipschitz property of the scoring function. (2004) Though this algorithm can be extended to the case of multiple structure alignment, the simple extension has a time complexity exponential in the number of proteins. Is there a PTAS to multiple structure motif finding?

11 Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme

12 We focus on (R, C)-Compact Motif. What is (R, C)-compact motif?  A motif is bounded in a minimum ball with radius R.  In this ball, at most C residues do not belong to this motif. (R,C)-compact motif is biologically meaningful since  We focus on globular proteins.  We allows at most C exceptions.

13 (R, C)-Compact Motif Finding Problem Input: protein structures S 1 …, S n, and length l Output: a consensus consists of l 3D points  q=(q 1, …, q l )  a substructure u i from each protein Si Objective:  min (  1  i  n d 2 (q, u i )) 1/2 Here, we adopt the dRMSD distance function, i.e.,  d(q, u i )=min  ||q-  (u i )|| 2  consists of a rotation and a translation ||*|| 2 is the Euclidean metric.

14 Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme

15 (R,C)-compact motif finding is still NP-Hard. Reduction from the Sequence Consensus Problem  Input: n binary strings S 1, …, S n, each is of length m  Output: A substring t i of length l from each string S i, 1  i  n,  Objective: minimize  1  i <i’  n d H (t i, t i’ ), where d H is Hamming distance. Basic Idea:  Try to find a way of reduction to make: dRMSD=Hamming Distance

16 (R,C)-compact motif finding is still NP-Hard. Each l-mer is transformed into 6l 3D points. 110  110 001 000000 111111 0  (0, 2i, 0), 1  (1, 2i, 0)

17 (R,C)-compact motif finding is still NP-Hard. Each l-mer is transformed into 6l 3D points. 110  110 001 000000 111111 0  (0, 2i, 0), 1  (1, 2i, 0)  The centroid will be (1/2, 2i, 0) (Easy translation)  Large “tail”  no rotation  RMSD = Hamming Distance Small distortion to each point to make it protein- like. Sequence Consensus Problem  (1,0)- Compact Motif Finding Problem

18 Outline Introduction to Structural Motif Related Work Compact Motif-finding Problem Formulation NP-Hard of the Compact Motif-finding Problem A Polynomial Time Approximate Scheme

19 The Basic Idea of Our PTAS There are always a few “important” sub- structures, whose consensus holds most of the “secrets” of the true optimal motif. Therefore, if we can simply do exhaustive search to find these few sub-structures, then the trivial optimal solution for these sub- structures is a good approximation to the real optimal solution.

20 Technique 1: Sampling We sample only r proteins, consider each motif in a sampled protein, we can say we almost know the optimal solution.

21 Sampling will introduce only a bit of error. There is at least one selection schema, whose consensus has a cost value less than (1+1/r)OPT. So, we can find this schema by simply enumerating operation.

22 Technique 2: Discretize the Rotation Space Each rotation is parameterized by three angles   1,  2,  3  [0, 2  ) Discretize the angles with step size  ’  we get an  ’-rotation net.

23 Discretized rotation will not introduce a large error, either. A parameterized algorithm for protein structure alignment. J. Xu, F. Jiao, and B. Berger. RECOMB2006.

24 PTAS

25

26 Performance Ratio Analysis

27 Running Time Each protein contains M motifs  M is a polynomial of protein length Each motif can adopt W rotations  W depends on the constant  So the number of consensus is less than  O(n r (MW) r )= O((nMW) r )

28 Conclusion and Future Work We prove the (R,C)-compact motif finding problem is NP-hard We obtain a PTAS for this problem. Future Work:  Further reduce the time complexity  Design some practical algorithms.  Solve a more general case.

29 Thank You. Questions…


Download ppt "Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,"

Similar presentations


Ads by Google