RNA Secondary Structure Prediction

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Hidden Markov Model in Biological Sequence Analysis – Part 2
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.
Chapter 7 Dynamic Programming.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
RNA Structure Prediction
6 -1 Chapter 6 The Secondary Structure Prediction of RNA.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
RNA Folding Xinyu Tang Bonnie Kirkpatrick. Overview Introduction to RNA Previous Work Problem Hofacker ’ s Paper Chen and Dill ’ s Paper Modeling RNA.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
RNA Secondary Structure Prediction
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Presenting: Asher Malka Supervisor: Prof. Hermona Soreq.
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick.
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
Protein Multiple Sequence Alignment Sarah Aerni CS374 December 7, 2006.
Predicting RNA Structure and Function
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
RNA-Seq and RNA Structure Prediction
RNA: Secondary Structure Prediction and Analysis
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Pairwise & Multiple sequence alignments
RNA informatics Unit 12 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
RNA folding & ncRNA discovery I519 Introduction to Bioinformatics, Fall, 2012.
Applied Bioinformatics Week 11. Topics Protein Secondary Structure RNA Secondary Structure.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
© Wiley Publishing All Rights Reserved. RNA Analysis.
Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
RNA Structure Prediction
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Exploiting Conserved Structure for Faster Annotation of Non-Coding RNAs without loss of Accuracy Zasha Weinberg, and Walter L. Ruzzo Presented by: Jeff.
Prediction of Secondary Structure of RNA
Doug Raiford Lesson 7.  RNA World Hypothesis  RNA world evolved into the DNA and protein world  DNA advantage: greater chemical stability  Protein.
Motif Search and RNA Structure Prediction Lesson 9.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
AAA AAAU AAUUC AUUC UUCCG UCCG CCGG G G Karen M. Pickard CISC889 Spring 2002 RNA Secondary Structure Prediction.
bacteria and eukaryotes
Stochastic Context-Free Grammars for Modeling RNA
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
RNA Secondary Structure Prediction
Stochastic Context-Free Grammars for Modeling RNA
Dynamic Programming (cont’d)
Predicting the Secondary Structure of RNA
Comparative RNA Structural Analysis
RNA Secondary Structure Prediction
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
Computational Genomics of Noncoding RNA Genes
Presentation transcript:

RNA Secondary Structure Prediction Dynamic Programming Approaches Sarah Aerni http://www.tbi.univie.ac.at/

Outline RNA folding Dynamic programming for RNA secondary structure prediction Covariance model for RNA structure prediction

RNA Basics RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U 2 Hydrogen Bonds 3 Hydrogen Bonds – more stable RNA bases A,C,G,U Canonical Base Pairs A-U G-C G-U “wobble” pairing Bases can only pair with one other base. Image: http://www.bioalgorithms.info/

RNA Basics transfer RNA (tRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) small interfering RNA (siRNA) micro RNA (miRNA) small nucleolar RNA (snoRNA) http://www.genetics.wustl.edu/eddy/tRNAscan-SE/

RNA Secondary Structure Pseudoknot Stem Interior Loop Single-Stranded Bulge Loop Junction (Multiloop) Hairpin loop Image– Wuchty

Sequence Alignment as a method to determine structure Bases pair in order to form backbones and determine the secondary structure Aligning bases based on their ability to pair with each other gives an algorithmic approach to determining the optimal structure

Base Pair Maximization – Dynamic Programming Algorithm S(i,j) is the folding of the subsequence of the RNA strand from index i to index j which results in the highest number of base pairs Simple Example: Maximizing Base Pairing Bifurcation Unmatched at i Umatched at j Base pair at i and j Images – Sean Eddy

S(i, j – 1) S(i + 1, j) S(i + 1, j – 1) +1 Base Pair Maximization – Dynamic Programming Algorithm Alignment Method Align RNA strand to itself Score increases for feasible base pairs Each score independent of overall structure Bifurcation adds extra dimension S(i, j – 1) S(i + 1, j) Initialize first two diagonal arrays to 0 Fill in squares sweeping diagonally Bases cannot pair, similar to unmatched alignment Bases can pair, similar to matched alignment Dynamic Programming – possible paths S(i + 1, j – 1) +1 Images – Sean Eddy

k = 0 : Bifurcation max in this case Base Pair Maximization – Dynamic Programming Algorithm Alignment Method Align RNA strand to itself Score increases for feasible base pairs Each score independent of overall structure Bifurcation adds extra dimension k = 0 : Bifurcation max in this case S(i,k) + S(k + 1, j) Reminder: For all k S(i,k) + S(k + 1, j) Reminder: For all k S(i,k) + S(k + 1, j) Initialize first two diagonal arrays to 0 Fill in squares sweeping diagonally Bases cannot pair, similar Bases can pair, similar to matched alignment Dynamic Programming – possible paths Bifurcation – add values for all k Images – Sean Eddy

Base Pair Maximization - Drawbacks Base pair maximization will not necessarily lead to the most stable structure May create structure with many interior loops or hairpins which are energetically unfavorable Comparable to aligning sequences with scattered matches – not biologically reasonable

Energy Minimization Thermodynamic Stability Estimated using experimental techniques Theory : Most Stable is the Most likely No Pseudknots due to algorithm limitations Uses Dynamic Programming alignment technique Attempts to maximize the score taking into account thermodynamics MFOLD and ViennaRNA

Energy Minimization Results Images – David Mount Linear RNA strand folded back on itself to create secondary structure Circularized representation uses this requirement Arcs represent base pairing All loops must have at least 3 bases in them Equivalent to having 3 base pairs between all arcs Exception: Location where the beginning and end of RNA come together in circularized representation

Trouble with Pseudoknots Images – David Mount Pseudoknots cause a breakdown in the Dynamic Programming Algorithm. In order to form a pseudoknot, checks must be made to ensure base is not already paired – this breaks down the recurrence relations

Energy Minimization Drawbacks Compute only one optimal structure Usual drawbacks of purely mathematical approaches Similar difficulties in other algorithms Protein structure Exon finding

Alternative Algorithms - Covariaton Incorporates Similarity-based method Evolution maintains sequences that are important Change in sequence coincides to maintain structure through base pairs (Covariance) Cross-species structure conservation example – tRNA Manual and automated approaches have been used to identify covarying base pairs Models for structure based on results Ordered Tree Model Stochastic Context Free Grammar Covariation ensures ability to base pair is maintained and RNA structure is conserved Mutation in one base yields pairing impossible and breaks down structure Expect areas of base pairing in tRNA to be covarying between various species Base pairing creates same stable tRNA structure in organisms

Binary Tree Representation of RNA Secondary Structure Representation of RNA structure using Binary tree Nodes represent Base pair if two bases are shown Loop if base and “gap” (dash) are shown Pseudoknots still not represented Tree does not permit varying sequences Mismatches Insertions & Deletions Images – Eddy et al.

Covariance Model HMM which permits flexible alignment to an RNA structure – emission and transition probabilities Model trees based on finite number of states Match states – sequence conforms to the model: MATP – State in which bases are paired in the model and sequence MATL & MATR – State in which either right or left bulges in the sequence and the model Deletion – State in which there is deletion in the sequence when compared to the model Insertion – State in which there is an insertion relative to model Transitions have probabilities Varying probability – Enter insertion, remain in current state, etc Bifurcation – no probability, describes path

Covariance Model (CM) Training Algorithm S(i,j) = Score at indices i and j in RNA when aligned to the Covariance Model Frequency of seeing the symbols (A, C, G, T) together in locations i and j depending on symbol. Independent frequency of seeing the symbols (A, C, G, T) in locations i or j depending on symbol. Frequencies obtained by aligning model to “training data” – consists of sample sequences Reflect values which optimize alignment of sequences to model

Alignment to CM Algorithm Calculate the probability score of aligning RNA to CM Three dimensional matrix – O(n³) Align sequence to given subtrees in CM For each subsequence calculate all possible states Subtrees evolve from Bifurcations For simplicity Left singlet is default Images – Eddy et al.

Alignment to CM Algorithm Images – Eddy et al. For each calculation take into account the Transition (T) to next state Emission probability (P) in the state as determined by training data Deletion – does not have an emission probability (P) associated with it Bifurcation – does not have a probability associated with the state

Covariance Model Drawbacks Needs to be well trained Not suitable for searches of large RNA Structural complexity of large RNA cannot be modeled Runtime Memory requirements

References How Do RNA Folding Algorithms Work?. S.R. Eddy. Nature Biotechnology, 22:1457-1458, 2004.