RNA Secondary Structure Prediction Introduction RNA is a single-stranded chain of the nucleotides A, C, G, and U. The string of nucleotides specifies the.

Slides:



Advertisements
Similar presentations
MATH 224 – Discrete Mathematics
Advertisements

Chapter 7 Dynamic Programming.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 20.
6 -1 Chapter 6 The Secondary Structure Prediction of RNA.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Dynamic Programming Reading Material: Chapter 7..
1 8. Safe Query Languages Safe program – its semantics can be at least partially computed on any valid database input. Safety is tied to program verification,
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.
Cmpt-225 Algorithm Efficiency.
MAE 552 – Heuristic Optimization Lecture 26 April 1, 2002 Topic:Branch and Bound.
RNA Secondary Structure Prediction
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
UNC Chapel Hill Lin/Manocha/Foskey Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject.
Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick.
Dynamic Programming Reading Material: Chapter 7 Sections and 6.
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
Copyright © Cengage Learning. All rights reserved. 5 Integrals.
Physical Mapping II + Perl CIS 667 March 2, 2004.
CISC667, F05, Lec27, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Review Session.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
Dynamic Programming Introduction to Algorithms Dynamic Programming CSE 680 Prof. Roger Crawfis.
Objectives of Multiple Regression
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Sequences Informally, a sequence is a set of elements written in a row. – This concept is represented in CS using one- dimensional arrays The goal of mathematics.
Chapter 3: The Fundamentals: Algorithms, the Integers, and Matrices
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Analysis of Algorithms
Chapter 3 Sec 3.3 With Question/Answer Animations 1.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
1 Dr. Scott Schaefer Coons Patches and Gregory Patches.
1Introduction 2Theoretical background Biochemistry/molecular biology 3Theoretical background computer science 4History of the field 5Splicing systems.
Copyright © Cengage Learning. All rights reserved. 4 Integrals.
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
CSC401: Analysis of Algorithms CSC401 – Analysis of Algorithms Chapter Dynamic Programming Objectives: Present the Dynamic Programming paradigm.
RNA secondary structure RNA is (usually) single-stranded The nucleotides ‘want’ to pair with their Watson-Crick complements (AU, GC) They may ‘settle’
Algorithm Analysis Part 2 Complexity Analysis. Introduction Algorithm Analysis measures the efficiency of an algorithm, or its implementation as a program,
Prediction of Secondary Structure of RNA
Doug Raiford Lesson 7.  RNA World Hypothesis  RNA world evolved into the DNA and protein world  DNA advantage: greater chemical stability  Protein.
Relations, Functions, and Matrices Mathematical Structures for Computer Science Chapter 4 Copyright © 2006 W.H. Freeman & Co.MSCS Slides Relations, Functions.
Output Grouping Method Based on a Similarity of Boolean Functions Petr Fišer, Pavel Kubalík, Hana Kubátová Czech Technical University in Prague Department.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Dynamic Programming1. 2 Outline and Reading Matrix Chain-Product (§5.3.1) The General Technique (§5.3.2) 0-1 Knapsack Problem (§5.3.3)
Motif Search and RNA Structure Prediction Lesson 9.
CS623: Introduction to Computing with Neural Nets (lecture-12) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
molecule's structure prediction
CISC667, S07, Lec25, Liao1 CISC 467/667 Intro to Bioinformatics (Spring 2007) Review Session.
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
NUMERICAL ANALYSIS I. Introduction Numerical analysis is concerned with the process by which mathematical problems are solved by the operations.
Chapter - 12 GRAPH MATRICES AND APPLICATIONS.
Database Management System
Advanced Algorithms Analysis and Design
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
Structure Prediction dmitra 11/18/2018.
Dynamic Programming (cont’d)
Dynamic Programming 1/15/2019 8:22 PM Dynamic Programming.
Coons Patches and Gregory Patches
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
Dynamic Programming II DP over Intervals
Presentation transcript:

RNA Secondary Structure Prediction Introduction RNA is a single-stranded chain of the nucleotides A, C, G, and U. The string of nucleotides specifies the linear structure of the RNA strand. When RNA folds, complementary nucleotides form base pairs (CG and AU). The tertiary (3 dimensional) structure is too complicated for us to calculate. We calculate only secondary structures, lists of base pairs. Knowing the base pairs tells a lot about the 3 dimensional structure.

Chemical Structure of RNA Four base types. Distinguishable ends.

Partial Tertiary Structure One illustration

Yet Another Tertiary Structure Found via google

Our Final Tertiary Picture Very complex

A Partial RNA Secondary Structure

Pure Secondary Structure

Our Basic Model RNA linear structure: R=r 1 r 2... r n from {A,C,G,U} RNA secondary structure: pairs (r i,r j ) such that 0<i<j<n+1. Goal: secondary structures with minimum free energy.

Implementing Model Restrictions No knots: pairs (r i,r j ) and (r k,r l ) such that i<k<j<l. RNA does contain knots. Program loop structure. No “close” base pairs: j-i>t for some t>0. High free energy. Complementary base pairs: A-U, C-G. High free energy.

Our Two Algorithms Independent base pairs – quite easy, but inaccurate. Calculate loops’ free energy – best we can do for today’s class.

Independent Base Pair Algorithm Assumption: Independent base pairs. Advantage 1: Simpler calculations. Advantage 2: Illustrates ideas for a much more accurate algorithm. Disadvantage: Unrealistic answers.

Independent Base Pairs What Makes It “Easy”? Assumption: The energy of each base pair is independent of all of the other pairs and the loop structure. Consequence: Total free energy is the sum of all of the base pair free energies.

Independent Base Pairs Basic Approach Use solutions for smaller strings to determine solutions for larger strings. This is precisely the kind of decoupling required for dynamic programming algorithms to work.

Independent Base Pairs Notation a(r i,r j ) – the free energy of a base pair joining r i and r j. S i,j – The secondary structure of the RNA strand from base r i to base r j. Ie, the set of base pairs between r i and r j inclusive. E(S i,j ) – The free energy associated with the secondary structure S i,j. We define a(r i,r j ) large when constraints are violated.

Consider the RNA strand from position i to j. Consider whether r j is paired If r j is paired, E(S i, j )=E(S i, k-1 )+a(k,j)+E(S k+1, j-1 ) for some i-1<k<j If r j isn’t paired, then E(S i, j )=E(S i, j-1 ) Independent Base Pairs: Calculating Free Energy

Independent Base Pairs - Algorithm We search for intervals with minimum free energy. For each interval, the free energy is given by this formula: E(S i,j ) = min( E(S i+1,j-1 )+a(r i,r j ), E(S i,k-1 +a(r i,r k )+S k+1,j-1 ), i -1<k<j+1 ) The free energy of the RNA strand is E(S 1,n ).

Independent Base Pairs: Question 1 How does this formula deal with the case where r j isn’t paired with any base? A special case of E(S i,k-1 +a(r i,r k )+S k+1,j-1 ), i -1<k<j+1 The special case with k=j.

Independent Base Pairs: Question 2 What is the high level algorithm flow? 1.Advance from smaller to larger intervals, calculating free energy costs. 2.Trace back the path that corresponds to the maximum free energy cost.

Independent Base Pairs: Question 3 In what orders can the intervals’ free energy costs be evaluated? 1.Major = lower, minor = upper bound 2.Major = upper, minor = lower bound 3.Diagonally 4.Any order (eg, random) that respects the partial order induced by inclusion

Independent Base Pairs: Question 4 What are the time and storage requirements of this algorithm? Express your answer in terms of the number of bases in the RNA strand. Since the number of intervals is quadratic, the storage requirements are quadratic. Since the time requirement for each interval is linear, total time is cubic.

Independent Base Pairs: Question 5 Why not simply calculate free energies as they are needed? Why store them at all? Because the recursive calls would turn our polynomial algorithm into an exponential algorithm.

Independent Base Pairs: Question 6 How does traceback work for this algorithm? 1.Recalculate which subinterval yields the maximum free energy. 2.Save traceback paths.

Loop Free Energy Algorithm An RNA molecule’s free energy is not independent of all other base pairs. An RNA molecules free energy actually depends on its loop structure. What do we mean by loops?

Types of Loops Each base pair (r i,r j ) encloses a loop: 1.Hairpin loop 2.Bulge on i or j 3.Interior loop 4.Helical region

Hairpin Loop There are no base pairs (r k,r l ) for i<k<l<j.

Bulge on i and j Bulge on i: (r i,r j ) and (r k,r j-1 ) are base pairs with k>i+1. r i+1 is not paired. The bulge on j is symmetric.

Interior loop (r i,r j ) and (r k,r l ) are base pairs with i+1<k 1 <k 2 <j-1. r i+1 and r j-1 are not in base pairs

Helical region (r i,r j ) and (r i+1,r j-1 ) are base pairs.

Free energy analysis E(S i,j ) = E(S i+1,j ) when r i isn’t paired. E(S i,j ) = E(S i,j-1 ) when r j isn’t paired. E(S i,j ) = min(E(S i,k )+E(S k+1,j )) for i<k<l, k between i’s and j’s pairs when i and j are paired but not to each other E(S i,j ) = E(L i,j ) where Li,j is loop energy when I and j are paired to each other

Free Energy Functions a(r i,r j ) – Free energy of base pair (r i,r j ) H(k) – Destabilizing free energy of a hairpin loop with size k. R – Stabilizing free energy of adjacent base pairs (helical region). B(k) – Destabilizing free energy of a bulge of size k. I(k) – Destabilizing free energy of an interior loop of size k.

Loop Energy Formulas H(j-i-1) – for a hairpin loop R + E(S i+1,j-1 ) – for a helical region B(k) + E(S i+k+1,j-1 ) – for a bulge on i B(k) + E(S i+1,j-k-1 ) – for a bulge on j I(k1+k2) + E(S i+k1+1,j-k2-1 ) – for an interior loop

Free Energy Calculation for interval (i,j) Minimize over 1.Case where (r i,r j ) is not a pair. 2.Case where (r i,r j ) is a pair. Add a(r i,r j ) to the formulas. Minimize over k, k1, and k2.

What is the Apparent Complexity? The interior loop calculations are given by I(k1+k2) + E(S i+k1+1,j-k2-1 ) The number of inner loop possibilities is quadratic in the interval size. The number of intervals is quadratic in the size of the problem. The complexity appears to grow as n 4.

What is the Actual Complexity? Overall reduction from n 4 to n 3 is possible. Interval reduction from n 2 to linear. Store the minimum free energy V i,j,k where the interval (i,j) contains an interior loop of size k.

Multiple Solutions Care must be taken to define the issues. Multiple solutions can be obtained by adding flexibility to the traceback logic. The number of solutions can grow exponentially.

References M. Zuker, “The Use of dynamic programming in RNA secondary structure prdiction”. In M. S. Waterman, editor, Mathematical Methods for DNS Sequences. Boca Raton, FL: CRC Press, 1989 J, Setubal and J. Meidanis,Ch 8.1, Introduction to Computational Molecular Biology, Pacific Grove, CA: Brooks/Cole Publishing Co., 1997