Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.

Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results

Introduction

Importance Protein analysis: Protein classification Detecting functional units which share similar geometrical configurations. Applications to: Docking Protein engineering Drug design: Pharmacophore searching

The LCP Problem Given a collection of M point-sets in 3D space, find the largest common subset. Known as the LCP problem The LCP problem is NP-hard. All solutions are based on some heuristics.

The Multiple Alignment by Secondary Structures (MASS) Algorithm

Motivation The MUSTA algorithm Leibowitz, Fligelman, Nussinov, and Wolfson 1999 A truly multiple-based approach Desired improvements: Efficiency Finding partial solutions i.e. alignments between a subset of the input molecules.

Partial Alignments AA B B A B CC Two types of partial alignments: B & C

General Strategy Pivot scheme Based on a two-level alignment: Local secondary structure superposition Global atomic superposition Geometric hashing paradigm

Why Secondary Structure? Stability: Secondary structures are conserved during evolution Robustness: Proteins are dense molecules Efficiency: Introduces great savings in structural description

The Pairwise Case Outline: SSE assignment SSE representation Detection of seed matches Clustering the seed matches Global extension & refinement SSE Representation Atomic Representation

Step 1: SSE assignment The proteins are represented by their secondary structure elements.

Secondary Structure Element (SSE) HelixStrand abundant π rare 3 10 infrequent Alpha abundant

Secondary Structure Assignment PDB Bernstein et al 1977 DSSP Kabsch & Sander 1983 DSSPCont Andersen et al. 2002 STICK Taylor 2001

Step 2: SSE representation A SSE is represented by a 3D line segment with fuzzy endpoints. Helix representation:

Strand representation: least squares line N-terminus C-terminus

The SSE least-square line minimizes: Cα Atom (xi,yi) di

Step 3: detection of seed matches Base – SSE pair Finding bases, whose configuration appears in both proteins. A base configuration is represented by a fingerprint

A base fingerprint is a 5D vector composed of: SSE types: helix, strand Line distance Midpoint distance Angle

midpoint distance line distance

The fingerprint is invariant to 3D rigid transformation Bases with a similar fingerprint can be aligned in different ways: Axis system superposition Midpoint to midpoint alignment RMSD minimization

Axis system superposition: Axis system superposition: Define an axis-system on each base: SSE 2 SSE 1 Z-Axis X-Axis Y-Axis

Superimpose the axis-systems of matched bases. Z-Axis X-Axis Y-Axis

Based on the assumption: The line distance segments are conserved Pros: No use of the SSE length and endpoints Cons: The assumption is not always correct. Pathological Example in 2D: d d=0

Midpoint to midpoint alignment: Midpoint to midpoint alignment: Align the mid Cα atoms Expand to the sides

Based on the assumptions: SSE endpoints are fuzzy SSE midpoints are conserved. Pros: Simplicity Cons: The SSE midpoints are not always conserved. The DSSP sometimes split a SSE in two

RMSD minimization: RMSD minimization: Iterate over all the possible atomic alignment between the matched SSEs. Choose the alignment that minimizes the RMSD

Pros: No assumption Cons: Convergence to a local minimum instead of a global one.

To find congruent bases efficiently: All bases are stored in a geometric hash according to their fingerprint. GH

Bases that reside in the same hash bin or in adjacent bins are congruent: ε 2D Cut: ε - tolerance

For each hash bin: Retrieve all the bases in the bin and in the adjacent bins Insert the bases into a combinatorial bucket Two bases from different column define a seed match Protein 1Protein 2 3 x 2 seed matches

Step 4: clustering the seed matches Detecting matches with a similar transformation and join them into clusters. Using RMSD clustering: Similar to (Rarey 1996) Works in an iterative manner

T1 T2 T3 T4 T6 T5 1 2 3 1 3

Step 5: global extension & refinement For each match: Apply its transformation Find corresponding atoms that lie close enough to each other after the superposition. Use least-squares fitting to refine the transformation Iterate until the RMSD convergence.

The Multiple Case Outline: SSE assignment & representation Detection of seed matches Clustering the seed pairwise matches Global extension of pairwise matches Computing multiple matches Refinement Selecting high-scoring multiple matches

Finding bases whose configuration appears in sufficient number of molecules: All bases are stored in a geometric hash according to their fingerprint. Bases that reside in the same bin or in adjacent bins are congruent.

For each hash bin: Retrieve all the bases in the bin and in the adjacent bins Insert them into a combinatorial bucket (CB): Protein i Protein j Protein k Protein r Protein s i<j<k<r<s

Construct pairwise seed matches. The reference protein is the one with the smaller index Cluster the pairwise matches Global extend the pairwise matches

Recursively construct multiple alignment: Protein i Protein j Protein k Protein r Protein s i<j<k<r<s

Refinement Selecting high-scoring multiple matches The score of a multiple match with n proteins and k atoms is given by: n = 3 k = 4 score = 12

Experimental Results

MASS vs. MUSTA

Partial Solutions

All-alpha Class The core between ten proteins. The proteins belong to 4 different folds of the all-alpha class.

Tim-barrel Fold The core between 6 proteins out of 7 proteins, taken from different super families of the tim-barrel fold

Calcium Binding The core of 6 proteins, belong to 3 different families of the EF hand-like super family

Lipase Family An alignment of four structures from different species of the Lipase Family. Two of the conformations are open and two of them are closed.

Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.

Similar presentations

Presentation on theme: "Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.

Similar presentations

Presentation on theme: "Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results."— Presentation transcript:

Similar presentations

About project

Feedback