Presentation is loading. Please wait.

Presentation is loading. Please wait.

Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.

Similar presentations


Presentation on theme: "Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results."— Presentation transcript:

1

2 Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results

3 Introduction

4 Importance Protein analysis: Protein classification Detecting functional units which share similar geometrical configurations. Applications to: Docking Protein engineering Drug design: Pharmacophore searching

5 The LCP Problem Given a collection of M point-sets in 3D space, find the largest common subset. Known as the LCP problem The LCP problem is NP-hard. All solutions are based on some heuristics.

6 The Multiple Alignment by Secondary Structures (MASS) Algorithm

7 Motivation The MUSTA algorithm Leibowitz, Fligelman, Nussinov, and Wolfson 1999 A truly multiple-based approach Desired improvements: Efficiency Finding partial solutions i.e. alignments between a subset of the input molecules.

8 Partial Alignments AA B B A B CC Two types of partial alignments: B & C

9 General Strategy Pivot scheme Based on a two-level alignment: Local secondary structure superposition Global atomic superposition Geometric hashing paradigm

10 Why Secondary Structure? Stability: Secondary structures are conserved during evolution Robustness: Proteins are dense molecules Efficiency: Introduces great savings in structural description

11 The Pairwise Case Outline: SSE assignment SSE representation Detection of seed matches Clustering the seed matches Global extension & refinement SSE Representation Atomic Representation

12 Step 1: SSE assignment The proteins are represented by their secondary structure elements.

13 Secondary Structure Element (SSE) HelixStrand abundant π rare 3 10 infrequent Alpha abundant

14 Secondary Structure Assignment PDB Bernstein et al 1977 DSSP Kabsch & Sander 1983 DSSPCont Andersen et al. 2002 STICK Taylor 2001

15 Step 2: SSE representation A SSE is represented by a 3D line segment with fuzzy endpoints. Helix representation:

16 Strand representation: least squares line N-terminus C-terminus

17 The SSE least-square line minimizes: Cα Atom (xi,yi) di

18 Step 3: detection of seed matches Base – SSE pair Finding bases, whose configuration appears in both proteins. A base configuration is represented by a fingerprint

19 A base fingerprint is a 5D vector composed of: SSE types: helix, strand Line distance Midpoint distance Angle

20 midpoint distance line distance

21 The fingerprint is invariant to 3D rigid transformation Bases with a similar fingerprint can be aligned in different ways: Axis system superposition Midpoint to midpoint alignment RMSD minimization

22 Axis system superposition: Axis system superposition: Define an axis-system on each base: SSE 2 SSE 1 Z-Axis X-Axis Y-Axis

23 Superimpose the axis-systems of matched bases. Z-Axis X-Axis Y-Axis

24 Based on the assumption: The line distance segments are conserved Pros: No use of the SSE length and endpoints Cons: The assumption is not always correct. Pathological Example in 2D: d d=0

25 Midpoint to midpoint alignment: Midpoint to midpoint alignment: Align the mid Cα atoms Expand to the sides

26 Based on the assumptions: SSE endpoints are fuzzy SSE midpoints are conserved. Pros: Simplicity Cons: The SSE midpoints are not always conserved. The DSSP sometimes split a SSE in two

27 RMSD minimization: RMSD minimization: Iterate over all the possible atomic alignment between the matched SSEs. Choose the alignment that minimizes the RMSD

28 Pros: No assumption Cons: Convergence to a local minimum instead of a global one.

29 To find congruent bases efficiently: All bases are stored in a geometric hash according to their fingerprint. GH

30 Bases that reside in the same hash bin or in adjacent bins are congruent: ε 2D Cut: ε - tolerance

31 For each hash bin: Retrieve all the bases in the bin and in the adjacent bins Insert the bases into a combinatorial bucket Two bases from different column define a seed match Protein 1Protein 2 3 x 2 seed matches

32 Step 4: clustering the seed matches Detecting matches with a similar transformation and join them into clusters. Using RMSD clustering: Similar to (Rarey 1996) Works in an iterative manner

33 T1 T2 T3 T4 T6 T5 1 2 3 1 3

34 Step 5: global extension & refinement For each match: Apply its transformation Find corresponding atoms that lie close enough to each other after the superposition. Use least-squares fitting to refine the transformation Iterate until the RMSD convergence.

35 The Multiple Case Outline: SSE assignment & representation Detection of seed matches Clustering the seed pairwise matches Global extension of pairwise matches Computing multiple matches Refinement Selecting high-scoring multiple matches

36 Finding bases whose configuration appears in sufficient number of molecules: All bases are stored in a geometric hash according to their fingerprint. Bases that reside in the same bin or in adjacent bins are congruent.

37 For each hash bin: Retrieve all the bases in the bin and in the adjacent bins Insert them into a combinatorial bucket (CB): Protein i Protein j Protein k Protein r Protein s i<j<k<r<s

38 Construct pairwise seed matches. The reference protein is the one with the smaller index Cluster the pairwise matches Global extend the pairwise matches

39 Recursively construct multiple alignment: Protein i Protein j Protein k Protein r Protein s i<j<k<r<s

40 Refinement Selecting high-scoring multiple matches The score of a multiple match with n proteins and k atoms is given by: n = 3 k = 4 score = 12

41 Experimental Results

42 MASS vs. MUSTA

43 Partial Solutions

44 All-alpha Class The core between ten proteins. The proteins belong to 4 different folds of the all-alpha class.

45

46 Tim-barrel Fold The core between 6 proteins out of 7 proteins, taken from different super families of the tim-barrel fold

47 Calcium Binding The core of 6 proteins, belong to 3 different families of the EF hand-like super family

48 Lipase Family An alignment of four structures from different species of the Lipase Family. Two of the conformations are open and two of them are closed.


Download ppt "Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results."

Similar presentations


Ads by Google