Presentation is loading. Please wait.

Presentation is loading. Please wait.

Department of Computer Science, University of California, Santa Barbara August 11-14, 2003 CTSS: A Robust and Efficient Method for Protein Structure Alignment.

Similar presentations


Presentation on theme: "Department of Computer Science, University of California, Santa Barbara August 11-14, 2003 CTSS: A Robust and Efficient Method for Protein Structure Alignment."— Presentation transcript:

1 Department of Computer Science, University of California, Santa Barbara August 11-14, 2003 CTSS: A Robust and Efficient Method for Protein Structure Alignment Based on Local Geometrical and Biological Features Tolga Can and Yuan-Fang Wang

2 2 CSB2003, August 11-14, 2003 Introduction Importance of discovering structural relationships between proteins Structural Alignment: NP-Hard Protein structure representation: no standard as in sequence alignment Many algorithms  Inter-atomic Distances (CE, DALI)  SSE vectors (VAST, 3D-Lookup) Different similarity measures  RMSD, p-value, etc.

3 3 CSB2003, August 11-14, 2003 Problem Definition Given a protein structure, find similar protein structures from a database of protein structures. 1fse:A 1jek:B 1alu:_ 2spc:A 1l3l:C 1k61:D 1kzu:B 1et1:A 1jig:A 1wdc:A 1nkd:_ 1fmh:A 1gl2:A ? 1l3l:C1kzu:B1jig:A1nkd:_ =

4 4 CSB2003, August 11-14, 2003 Protein Structure? HEADER PHEROMONE 20-DEC-95 2ERL.................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA.................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318.................................. We use C α coordinates to represent the protein structure. PDB File

5 5 CSB2003, August 11-14, 2003 Protein Structure HEADER PHEROMONE 20-DEC-95 2ERL.................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA.................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318.................................. The C α coordinates of a protein define a curve in 3D space. PDB File

6 6 CSB2003, August 11-14, 2003 Spline Approximation HEADER PHEROMONE 20-DEC-95 2ERL.................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA.................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318.................................. We smooth the C α curve based on secondary structure information. PDB File

7 7 CSB2003, August 11-14, 2003 Spline Approximation HEADER PHEROMONE 20-DEC-95 2ERL.................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA.................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318.................................. We smooth the C α curve based on secondary structure information. HelixTurn PDB File

8 8 CSB2003, August 11-14, 2003 Matching Two Curves Are they similar?

9 9 CSB2003, August 11-14, 2003 Curvature and Torsion Curvature: Torsion: If two single-valued continuous functions  (s) and  (s) are given for s > 0, then there exists exactly one space curve, determined except for orientation and position in space (i.e., up to a Euclidian motion), where s is the intrinsic arc length,  is the curvature, and  is the torsion. Fundamental Theorem of Space Curves: Measure of how far the curve deviates from being planar Measure of how far the curve deviates from being linear

10 10 CSB2003, August 11-14, 2003 Curvature and Torsion They are invariant to rotation and translation. They are localized. Curvature Torsion

11 11 CSB2003, August 11-14, 2003 Feature Extraction For each amino acid a (Curvature, Torsion) tuple is computed and Secondary Structure assignment information from PDB web site is gathered This constitutes a 3D feature vector of length n, where n is the number of amino acids in the protein + Curvature Torsion Secondary Structure Information (3 rd dimension not shown above)

12 12 CSB2003, August 11-14, 2003 Indexing the Features Why is indexing necessary? Hash Table (show in 2D below, 3 rd Dimension is the SSE type) Torsion Curvature A Hash Bin

13 13 CSB2003, August 11-14, 2003 Query Execution Hierarchical approach:  Pruning before detailed pairwise alignment hash table  Accumulate vote  vote protein ++  Normalize vote  vote protein /length protein  Threshold

14 14 CSB2003, August 11-14, 2003 Query Execution Pairwise alignment by Smith-Waterman dynamic programming technique performed after screening process: Distance Matrix SW 1fse:A 1l3l:C Gap length:63 RMSD:1.61 A o

15 15 CSB2003, August 11-14, 2003 SW Alignment Result 1fse:A 1l3l:C

16 16 CSB2003, August 11-14, 2003 Sample Query Results Query: 1faz:A, database: 1938 protein chains Screening time: 18 seconds Pairwise Alignment time: 29 seconds length:42 RMSD:2.8 A o 1faz:A & 1ytf:D length:38 RMSD:3.68 A o 1faz:A & 1dj7:A

17 17 CSB2003, August 11-14, 2003 Sample Query Results Query: 1b16:A, database: 1938 protein chains Screening time: 25 seconds Pairwise Alignment time: 68 seconds length:35 RMSD:3.26 A o 1b16:A & 1h05:A length:35 RMSD:1.58 A o 1b16:A & 1qp8:A

18 18 CSB2003, August 11-14, 2003 Current and Future Work Evaluation of  Accuracy  Comparison with SCOP classification  Efficiency  Comparison with other techniques like CE, or DALI Better index structures  Faster and more accurate screening of candidates Incorporating biological, chemical properties of amino acids to the structure signatures of proteins.

19 19 CSB2003, August 11-14, 2003 Conclusions A new method for protein structure alignment is presented:  Extracted structural features are:  Compact: O(n)  Localized: computed for each amino acid  Robust: error handling by spline approximation  Invariant: suitable for indexing  Meaningful: Biological, chemical properties can be incorporated easily  An indexing technique is deployed to avoid exhaustive scan of the structure database Experiment results show that this method is suitable for finding structural motifs.

20 20 CSB2003, August 11-14, 2003 Thank you for your attention! Tolga Can Department of Computer Science University of California at Santa Barbara Santa Barbara, CA 93106, U.S. Email: tcan@cs.ucsb.edu URL: http://www.cs.ucsb.edu/~tcan/CTSS/http://www.cs.ucsb.edu/~tcan/CTSS/ For More Information:


Download ppt "Department of Computer Science, University of California, Santa Barbara August 11-14, 2003 CTSS: A Robust and Efficient Method for Protein Structure Alignment."

Similar presentations


Ads by Google