Order independent structural alignment of circularly permutated proteins T. Andrew Binkowski Bhaskar DasGupta  Jie Liang ‡ Bioengineering Computer Science.

Slides:



Advertisements
Similar presentations
Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Advertisements

Minimum Clique Partition Problem with Constrained Weight for Interval Graphs Jianping Li Department of Mathematics Yunnan University Jointed by M.X. Chen.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Advanced Topics in Algorithms and Data Structures Lecture 7.2, page 1 Merging two upper hulls Suppose, UH ( S 2 ) has s points given in an array according.
Improved Approximation Algorithms for the Spanning Star Forest Problem Prasad Raghavendra Ning ChenC. Thach Nguyen Atri Rudra Gyanit Singh University of.
PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
Effective Heuristics for NP-Hard Problems Arising in Molecular Biology Richard M. Karp Bangalore, January 5, 2011.
Structural bioinformatics
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Sequence order independent structural alignment Joe Dundas, Andrew Binkowski, Bhaskar DasGupta, Jie Liang Department of Bioengineering/Bioinformatics,
Yield- and Cost-Driven Fracturing for Variable Shaped-Beam Mask Writing Andrew B. Kahng CSE and ECE Departments, UCSD Xu Xu CSE Department, UCSD Alex Zelikovsky.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
Protein Modules An Introduction to Bioinformatics.
Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Identification of Domains using Structural Data Niranjan Nagarajan Department of Computer Science Cornell University.
ECE Synthesis & Verification - Lecture 4 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Allocation:
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Fast identification and statistical evaluation of segmental homologies in comparative maps Peter Calabrese 1, Sugata Chakravarty 2 and Todd Vision 3 1.
Physical Mapping II + Perl CIS 667 March 2, 2004.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
A Framework For Community Identification in Dynamic Social Networks Chayant Tantipathananandh Tanya Berger-Wolf David Kempe Presented by Victor Lee.
Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, Eisenberg D. (1999). Detecting protein function and protein-protein interactions from genome sequences.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Protein Structures.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Introduction to Job Shop Scheduling Problem Qianjun Xu Oct. 30, 2001.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
O PTICAL M APPING AS A M ETHOD OF W HOLE G ENOME A NALYSIS M AY 4, 2009 C OURSE : 22M:151 P RESENTED BY : A USTIN J. R AMME.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Dynamic Programming.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
DALI Method Distance mAtrix aLIgnment
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Evolution of new protein topologies through multistep gene rearrangements Sergio G Peisajovich, Liat Rockah & Dan S Tawfik.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Using simplified meshes for crude registration of two partially overlapping range images Mercedes R.G.Márquez Wu Shin-Ting State University of Matogrosso.
Course 8 Contours. Def: edge list ---- ordered set of edge point or fragments. Def: contour ---- an edge list or expression that is used to represent.
Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and Modules: the how and why of molecular.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance Andrew I. Jewett, Conrad C. Huang and Thomas.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
1 An Efficient Optimal Leaf Ordering for Hierarchical Clustering in Microarray Gene Expression Data Analysis Jianting Zhang Le Gruenwald School of Computer.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
CSE280Stefano/Hossein Project: Primer design for cancer genomics.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Jin Zhang, Jiayin Wang and Yufeng Wu
Algorithmic Problems Related to Sequences and Phylogenetic Trees
Protein Structures.
SEG5010 Presentation Zhou Lanjun.
DALI Method Distance mAtrix aLIgnment
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

Order independent structural alignment of circularly permutated proteins T. Andrew Binkowski Bhaskar DasGupta  Jie Liang ‡ Bioengineering Computer Science Bioengineering UIC UIC UIC  Supported by NSF grants CCR , CCR , CCR and CAREER IIS ‡ Supported by NSF grants CAREER DBI , DBI and NIH grant GM-68958

Circular Permutations Ligation of the N and C termini of a protein and a concurrent cleavage elsewhere in the chain Structurally similar, stable, and retain function Occur in nature: –Tandem repeats via duplication of the C-terminal of one repeat with the N-terminal of the next repeat –Transposable elements lead to rearrangement of segments within the same gene –Ligation and cleavage of the peptide chains during post-translational modification Artificially created in lab: –Protein folding studies

Why study them? Important mechanism to generate new folds Many inserted domains are circular permutations of homologues Different domain orientations expose different surface regions for substrate binding Circular permutations offer an efficient way to generate biologically important functional diversity

Current Methods of Identifying Circular Permutations Sequence alignment: –Post processing dynamic programming –Customized algorithms –Miss distantly related proteins –Many false positives from tandem repeats Structure alignment: –No current methods of identification –Current structural alignment methods do not work Continuous fragment assembly

Difficulty in Identifying Circular Permutations Similar domains Similar spatial arrangements Discontinuity of primary sequence and domain ordering Problems: –“Breaks” –reverse ordering (N->C)

Basic Methodology Fragments of the protein structure Looking for fragments pair sets that maximize the total similarity Our approach to provide an approximate solution to the BSSIΛ, σ problem is to adopt the approximation algorithm for scheduling split- interval graphs which is based on a fractional version of the local-ratio approach.

Non- overlapping fragments and define neighbors Define linear programming variables for each fragment pair set Substructure pairs are disjoint Ensure consistency between set pairs and substructures Non-negative values

Compute local conflict and solve recursively Identify non-overlapping fragment pair substructures that maximize the total similarity

Delete all vertices with 0 weight LP formulation Algorithm guarantees: Update: Substructures with no neighbors Superposition Exhaustively fragment and compare Threshold Simplified Example

Fragment and Compare Two proteins structures Sa and Sb Systematically cut Sb into fragments (length 7-25) Exhaustively compare to Sa fragments of equal length: Fragment pair represented as a vertex in a graph Threshold 6

Simplified Example Similarity score for aligned fragments Problem of identify best fragments:

Delete all vertices with 0 weight LP formulation Algorithm guarantees: Update: Substructures with no neighbors Superposition Exhaustively fragment and compare Threshold Simplified Example

LP Formulation Conflict graph for the set fragments Sweep line determines which vertices (fragments) overlap A conflict is shown as an edge between vertices

Simplified Example Linear programming equations (MPS): Solve using BPMPD

Delete all vertices with 0 weight LP formulation Algorithm guarantees: Update: Substructures with no neighbors Superposition Exhaustively fragment and compare Threshold Simplified Example

Results Extracted known examples from literature Natural and artificial (below line)

Lectins Plant lectins interact with glycoproteins and glycolipids through the binding of various carbohydrates The structures of lectin from garden pea (1rin) (a) and concanavalin A (2cna) (b) –The permutation is a result of post-translational modifications 3 fragments align over 45 residues; 0.82˚A

C2 Domains The C2 domain is a Ca2+-binding module involved mainly in signal transduction phospholipase Cγ C2 domain (1qas) (a) and synaptotagmin I C2 domain (1rsy) (b) 4 fragments, 44 residues at a root mean square distance of 1.1 ˚A.

Adolse Transaldolase, one of the enzymes in the non-oxidative branch of the pentose phosphate pathway Transaldolase (1onr) and fructose-1,6-phosphate aldolase (1fba); 7 fragments; 77 residues; 2.4˚A. In agreement with the manual alignments of Jia et. al., the best alignments occur when the first β strand of transaldolase is aligned to the third β strand of aldolase Timing affected by many different factors: –72 second to run

Conclusion, Future Work The approximation algorithm introduced in this work can find good solutions for the problem of detecting circular permuted proteins Future work: –optimize the similarity scoring system for different tasks –improve the sensitivity and specificity of detecting matched protein substructures. –statistical measurement of significance of matched substructures