Exploring Folding Landscapes with Motion Planning Techniques Bonnie Kirkpatrick 2, Xinyu Tang 1, Shawna Thomas 1, Dr. Nancy Amato 1 1 Texas A&M University.

Slides:



Advertisements
Similar presentations
NUS CS5247 Motion Planning for Camera Movements in Virtual Environments By Dennis Nieuwenhuisen and Mark H. Overmars In Proc. IEEE Int. Conf. on Robotics.
Advertisements

By Lydia E. Kavraki, Petr Svestka, Jean-Claude Latombe, Mark H. Overmars Emre Dirican
Presented By: Aninoy Mahapatra
Probabilistic Roadmap
By Guang Song and Nancy M. Amato Journal of Computational Biology, April 1, 2002 Presentation by Athina Ropodi.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Iterative Relaxation of Constraints (IRC) Can’t solve originalCan solve relaxed PRMs sample randomly but… start goal C-obst difficult to sample points.
Geometric Algorithms for Conformational Analysis of Long Protein Loops J. Cortess, T. Simeon, M. Remaud- Simeon, V. Tran.
Randomized Motion Planning for Car-like Robots with C-PRM Guang Song and Nancy M. Amato Department of Computer Science Texas A&M University College Station,
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
RNA Folding Xinyu Tang Bonnie Kirkpatrick. Overview Introduction to RNA Previous Work Problem Hofacker ’ s Paper Chen and Dill ’ s Paper Modeling RNA.
Planning Paths for Elastic Objects Under Manipulation Constraints Florent Lamiraux Lydia E. Kavraki Rice University Presented by: Michael Adams.
Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University.
Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces Kavraki, Svestka, Latombe, Overmars 1996 Presented by Dongkyu, Choi.
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.
Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.
Using Motion Planning to Map Protein Folding Landscapes
Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.
Randomized Motion Planning for Car-like Robots with C-PRM Guang Song, Nancy M. Amato Department of Computer Science Texas A&M University College Station,
Robot Motion Planning Bug 2 Probabilistic Roadmaps Bug 2 Probabilistic Roadmaps.
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
RNA Folding Kinetics Bonnie Kirkpatrick Dr. Nancy Amato, Faculty Advisor Guang Song, Graduate Student Advisor.
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.
Motion Planning: From Intelligent CAD to Computer Animation to Protein Folding Nancy M. Amato Parasol Lab,Texas A&M University.
CS 326A: Motion Planning Probabilistic Roadmaps: Sampling and Connection Strategies.
Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces Kavraki, Svestka, Latombe, Overmars 1996 Presented by Chris Allocco.
Exploring Folding Landscapes with Motion Planning Techniques Bonnie Kirkpatrick Montana State University Dr. Nancy Amato Guang Song Xinyu Tang Texas A&M.
Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces Lydia E. Kavraki Petr Švetka Jean-Claude Latombe Mark H. Overmars Presented.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
1 Protein Folding Atlas F. Cook IV & Karen Tran. 2 Overview What is Protein Folding? Motivation Experimental Difficulties Simulation Models:  Configuration.
Protein Tertiary Structure Prediction
Deterministic Sampling Methods for Spheres and SO(3) Anna Yershova Steven M. LaValle Dept. of Computer Science University of Illinois Urbana, IL, USA.
DNA, RNA & Proteins Transcription Translation Chapter 3, 15 & 16.
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
Biology 10.1 How Proteins are Made:
Quiz tiiiiime What 3 things make up a nucleotide?
Generating Better Conformations for Roadmaps in Protein Folding PARASOL Lab, Department of Computer Science, Texas A&M University,
Intelligent Systems for Bioinformatics Michael J. Watts
Using Motion Planning to Study Protein Folding Pathways Susan Lin, Guang Song and Nancy M. Amato Department of Computer Science Texas A&M University
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Behrouz Haji Soleimani Dr. Moradi. Outline What is uncertainty? Some examples Solutions to uncertainty Ignoring uncertainty Markov Decision Process (MDP)
Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces (1996) L. Kavraki, P. Švestka, J.-C. Latombe, M. Overmars.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
Some Mathematical Challenges from Life Sciences Part III Peter Schuster, Universität Wien Peter F.Stadler, Universität Leipzig Günter Wagner, Yale University,
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Deterministic Sampling Methods for Spheres and SO(3) Anna Yershova Steven M. LaValle Dept. of Computer Science University of Illinois Urbana, IL, USA.
Introduction to Motion Planning
RNA Structure Prediction RNA Structure Basics The RNA ‘Rules’ Programs and Predictions BIO520 BioinformaticsJim Lund Assigned reading: Ch. 6 from Bioinformatics:
PROTEIN FOLDING: H-P Lattice Model 1. Outline: Introduction: What is Protein? Protein Folding Native State Mechanism of Folding Energy Landscape Kinetic.
This seems highly unlikely.
Events in protein folding. Introduction Many proteins take at least a few seconds to fold, but almost all proteins undergo major structural transitions.
Chapter 3 The Interrupted Gene.
Randomized Kinodynamics Planning Steven M. LaVelle and James J
Motif Search and RNA Structure Prediction Lesson 9.
RNA DBP: modeling and dynamics of RNA Russ Altman Vijay Pande.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
CS 326A: Motion Planning Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces (1996) L. Kavraki, P. Švestka, J.-C. Latombe,
PRM based Protein Folding
Stochastic Context-Free Grammars for Modeling RNA
RNA Secondary Structure Prediction
Stochastic Context-Free Grammars for Modeling RNA
Presented By: Aninoy Mahapatra
Sampling and Connection Strategies for Probabilistic Roadmaps
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Path Planning using Ant Colony Optimisation
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
A Comparison of Genotype-Phenotype Maps for RNA and Proteins
Presentation transcript:

Exploring Folding Landscapes with Motion Planning Techniques Bonnie Kirkpatrick 2, Xinyu Tang 1, Shawna Thomas 1, Dr. Nancy Amato 1 1 Texas A&M University 2 Montana State University 6 Aug 2003

Outline l Motivation: Biopolymers l Goal: Folding Landscapes l Method: Motion Planning l Application: RNA Folding

Outline l Motivation: Biopolymers l Goal: Folding Landscapes l Method: Motion Planning l Application: RNA Folding

What is Ribonucleic Acid (RNA) ? l Is composed of a sequence of nucleotides l Folds in 3D into energetically optimal conformations l Is essential to the process of carrying out a gene functions in cells. l Performs specific functions including protein synthesis, acting as catalysts, and splicing introns, and regulating activities l The folding behaviors of the molecule tell us much about their structure and function.

Ribonucleic Acid (RNA) Molecules l Primary Structure –Sequence of bases –Each base is one of: l {A, C, G, U} –e.g. ACGUGCCAUCG –Obtained by experiment l Secondary Structure –A 2D, planar representation l Tertiary Structure –The sequence loops back on itself and folds in 3D.

RNA Molecules l Primary Structure –Sequence of bases –Bases: A, C, G, U –e.g. ACGUGCCAUCG –Obtained by experiment l Secondary Structure –A 2D, planar representation –Base Pair: l A-U l G-C l G-U l Tertiary Structure –The sequence loops back on itself and folds in 3D.

RNA Conformations l Chemical bonds (or contacts) form between complementary residues in close proximity. l There are many possible conformations of the primary sequence. –e.g. CACAGAGUGU l Potential energy calculations based on number and types of bonds are used to classify conformations. –The stable, low-energy conformation is known as the native structure. –Conformations with few bonds and high energy are referred to as unfolded.

Planar Representations l Bonds between base pairs are lines or parentheses All representations are equivalent (((((.(((....))))))))

Planar Representations l Bonds between base pairs are lines or parentheses All representations are equivalent (((((.(((....)))))))) Bulge

Representations (cont.) l Contact Map l A dot is placed in the i th row and jth column of a triangular array to represent the intra- chain contact [i, j]

Violates criteria (2) Violates criteria (1) Secondary Structure Formalized l A secondary structure conformation is specified by a set of intra-chain contacts (bonds between base pairs) that follow certain rules. l Given any two intra-chain contacts [i, j] with i < j and [k, l] with k < l, then: 1) If i = k, then j = l Each base can appear in only one contact pair 2) If k < j, then i < k < l < j No pseudo-knots Pseudo-knot

Secondary Structure Summarized l 2D representation of the tertiary structure l Planar representation l Nested pairs l Sufficient structural information l Pseudo knots are considered a tertiary structure, rather than a secondary structure

Outline l Motivation: Biopolymers l Goal: Folding Landscapes l Method: Motion Planning l Application: RNA Folding

Folding Landscapes l A “grand challenge” problem in biology l Study the kinetics of folding l Each RNA has a unique folding landscape l Assumption: Native state ↔ Lowest energy conformation l Different from the structure prediction problem –Prediction of the native conformation

The Folding Process (a.k.a. the black box) AGGCUACUGGGAGCCUUCUCCCC Physical Laws cause folding Unfolded Conformation (high energy) Native Conformation (low energy)

Folding Landscapes l Description of the “black box” l A space in which every point corresponds to a conformation (or set of conformations) and its associated potential energy value (C-space). l A complete folding landscape contains a point for every possible conformation of a given sequence. Tetrahymena Ribozyme Landscape [Russell, Zhuang, Babcock, Millett, Doniach, Chu, and Herschlag, 2002] Native State Conformation Space Potential

Folding Landscapes (cont.) l Conformational changes describe how a molecule changes physically to fold from one conformation to another –Discrete l RNA Folding Model l Bonds either exist or do not exist

Features of Folding Landscapes l Folding pathways consist of the set of conformational changes a molecule is likely to fold though when moving from one conformation to another. –N to X to Y l Energy barriers are areas of the landscape with high energy that separate groups of conformations. –Y is separated from X and N l Intermediate states are conformations lying on the folding pathway represent local minimums of potential. –Y and X Mutant α mRNA fragment [Chen and Dill, 2000] Native State

An RNA Folding Pathway Phenylalanine tRNA [Hofacker, 1998] Energy Barrier Native State Unfolded

Mapping Folding Landscapes Existing techniques for mapping landscapes are limited to relatively short sequences (~200 nucleotides). A robotics motion planning technique called PRM has successfully been applied to protein folding.

Outline l Motivation: Biopolymers l Goal: Folding Landscapes l Method: Motion Planning l Application: RNA Folding

Motion Planning start goal obstacles (Basic) Motion Planning (in a nutshell): Given a movable object, find a sequence of valid configurations that moves the object from the start to the goal. Motion Planning for Foldable Objects: Given a foldable object, find a valid folding sequence that transforms the object from one folded state to another.

Probabilistic Roadmap Method (PRM): Robotics

Native state Construct the roadmap: 1. Generate nodes. 2. Connect to form roadmap The Roadmap is like a net being laid down on the RNA’s potential landscape. A conformation Conformation space Potential Now the roadmap can be used: 1.To extract multiple paths 2.To compute population kinetics Probabilistic Roadmap Method (PRM): Folding Landscapes

Outline l Motivation: Biopolymers l Goal: Folding Landscapes l Method: Motion Planning l Application: RNA Folding

Outline l Conformation Space l Node Generation l Node Evaluation l Node Connection l Edge Weights

PRM: Conformation Space l C is less than the set of every possible combination of contact pairs. l C contains only valid secondary structures. l C-space is very large. –Sequence: (ACGU) 10 –Length: 40 nucleotides –C-Space: 1.6x10 8 structures –Smaller than the conformation space for protein folding. l Purpose: generate a roadmap in C-space that describe the space without covering it Where n is the number of possible contact pairs C-space

PRM: Node Generation l Enumeration of C-space –Only feasible for small RNA l Enumeration of stack-based conformations –A stack is any two base pairs that occur sequentially in the secondary structure:.((.....)) l Maximal pair random sampling –Every generated node has the maximal number of contacts possible

PRM: Node Generation l Random Node Generation Algorithm –Starting with an empty configuration, c, random contacts are added to c one at a time. –Each step preserves the condition that c contains a valid set of base pair contacts. –Contacts are added until there are no more contacts that do not conflict with the contact set of c. l Every node generated has valid secondary structure and is a member of C-space. l Since every generated node has the maximal number of contacts, the sampling is biased toward the area of C-space near the native state.

PRM: Collision Detection l Evaluation of Nodes –Potential energy determines how good a node is. –Only add a node to the roadmap if it has a low energy. –Probability of a node q being added to the roadmap:

PRM: Node Connection l Choosing nodes to connect –K-closest –Fixed Radius l Distance Metric: Base Pair Distance –The number of contact pairs that differ between the two conformations –This is the number of contacts that must be opened or closed to move from one conformation to the next..(..(.)..)....(.((.)).)..

PRM: Node Connection l Given two nodes in C-Space, C 1 and C 2, find a path between them consisting of a sequence of nodes: { C 1 = S 1, S 2, …, S n-1, S n = C 2 } l The path must have the property that for each i, 1 < i < n, the set of contact pair of S i differs from that of S i-1 by the application of one transformation operation: (1) open or (2) close a single contact pair. C 1 = S 1 S2S2 S n-1 S n = C 2 … Discrete move-set.

Local Planner (cont.) l There exists a path between any two nodes in C-Space. l Not just any path will do; we want a good one. l Bad paths have high energy nodes in them. l How do we find the lowest energy path?

Connection Algorithm l more contacts  less potential energy l Heuristic: if a contact is opened by the transition from one node to another, try to close a contact in the next transition c1 = s1:..(.((..))).. s2:..(.(....)).. s3:..(.((.).)).. s4:..(..(.)..).. C2 = s5:..(.((.)).).. Bases involved in the change are marked in red. open close open close

Connection Algorithm Dependency Graph Open Close Start End

Edge Weight l Weights indicate the energetic feasibility of the edge. l ΔE i = E(s i+1 ) – E(s i ) C 1 = S 1 S2S2 S n-1 S n = C 2 … Up-hill SiSi S i+1

Roadmap Analysis l What does the roadmap tell us? l Folding Pathways A conformation

Roadmap Analysis l Population Kinetics A conformation

Roadmap Analysis l Population Kinetics A conformation

Roadmap Analysis l Population Kinetics A conformation

Roadmap Analysis l Population Kinetics A conformation

Roadmap Analysis l Population Kinetics A conformation

Roadmap Analysis l Population Kinetics A conformation

Roadmap Analysis l Population Kinetics –Solved using a differential equation

Future Work l Validation –Compare other statistical mechanical models –Compare with experimental results –Compare our sampled landscapes with complete landscapes l Explore the limits of our model l Try different sampling methods l Experiment with different distance metrics

References Shi-Jie Chen and Ken A. Dill. Rna folding energy landscapes. PNAS, 97: , Ivo L. Hofacker. Rna secondary structures: A tractable model of biopolymer folding. J.Theor.Biol., 212:35-46, Ivo L. Hofacker Jan Cupal and Peter F. Stadler. Dynamic programming algorithm for the density of states of rna secondry structures. Computer Science and Biology 96, 96: , L. Kavraki, P. Svestka, J. C. Latombe, and M. Overmars. Probabilistic roadmaps for path planning in high-dimensional conguration spaces. IEEE Trans. Robot. Automat., 12(4): , August J. C. Latombe. Robot Motion Planning. Kluwer Academic Publishers, Boston, MA, R. Li and C. Woodward. The hydrogen exchange core and protein folding. Protein Sci., 8: , John S. McCaskill. The equilibrium partition function and base pair binding probabilities for rna secondary structure. Biopolymers, 29: , Ruth Nussinov, George Piecznik, Jerrold R. Griggs, and Danel J. Kleitman. Algorithms for loop matching. SIAM J. Appl. Math., 35:68-82, R. Russell, X. Zhuang, H. Babcock, I. Millet, S. Doniach, S. Chu, and D. Herschlag. Exploring the folding landscape of a structured RNA. Proc. Natl. Acad. Sci., 99: , Proc. Natl. Acad. Sci. U.S.A. 99, D. Sanko and J.B. Kruskal. Time warps, string edits and macromolecules: the theory and practice of sequence comparison. Addison Wesley, London, A.P. Singh, J.C. Latombe, and D.L. Brutlag. A motion planning approach to exible ligand binding. In 7th Int. Conf. on Intelligent Systems for Molecular Biology (ISMB), pages , G. Song and N. M. Amato. Using motion planning to study protein folding pathways. In Proc. Int. Conf. Comput. Molecular Biology (RECOMB), pages , Stefan Wuchty. Suboptimal secondary structures of rna. Master Thesis, 1998.