Dynamic Programming (cont’d)

Slides:



Advertisements
Similar presentations
Gene Prediction: Similarity-Based Approaches
Advertisements

RNA Secondary Structure Prediction
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Chapter 7 Dynamic Programming.
6 - 1 Chapter 6 The Secondary Structure Prediction of RNA.
Inexact Matching of Strings General Problem –Input Strings S and T –Questions How distant is S from T? How similar is S to T? Solution Technique –Dynamic.
Combinatorial Pattern Matching CS 466 Saurabh Sinha.
Andrew Hendriks CMPT 889 Selected Topics in Bioinformatics
Gene Prediction: Similarity-Based Approaches (selected from Jones/Pevzner lecture notes)
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
Pattern Discovery in RNA Secondary Structure Using Affix Trees (when computer scientists meet real molecules) Giulio Pavesi& Giancarlo Mauri Dept. of Computer.
Sequencing and Sequence Alignment
RNA Secondary Structure Prediction
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University.
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
CISC667, F05, Lec19, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) RNA secondary structure.
FA05CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
Predicting RNA Structure and Function
FA05CSE182 CSE 182-L2:Blast & variants I Dynamic Programming
Class 2: Basic Sequence Alignment
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
RNA Secondary Structure Prediction Introduction RNA is a single-stranded chain of the nucleotides A, C, G, and U. The string of nucleotides specifies the.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
Dynamic Programming II
An Introduction to Bioinformatics Algorithmswww.bioalgorithms.info Gene Prediction: Similarity-Based Approaches.
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Comp. Genomics Recitation 2 12/3/09 Slides by Igor Ulitsky.
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.
CS 8833 Algorithms Algorithms Dynamic Programming.
RNA secondary structure RNA is (usually) single-stranded The nucleotides ‘want’ to pair with their Watson-Crick complements (AU, GC) They may ‘settle’
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Prediction of Secondary Structure of RNA
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Motif Search and RNA Structure Prediction Lesson 9.
Local Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
RNA sequence-structure alignment
Stochastic Context-Free Grammars for Modeling RNA
Sequence comparison: Local alignment
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
RNA Secondary Structure Prediction
Stochastic Context-Free Grammars for Modeling RNA
Structure Prediction dmitra 11/18/2018.
Sequence Alignment Using Dynamic Programming
Intro to Alignment Algorithms: Global and Local
Predicting the Secondary Structure of RNA
Comparative RNA Structural Analysis
RNA Secondary Structure Prediction
CSE 589 Applied Algorithms Spring 1999
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure CISC667, S07, Lec19, Liao.
Dynamic Programming II DP over Intervals
String Processing.
Fragment Assembly 7/30/2019.
Presentation transcript:

Dynamic Programming (cont’d) CS 466 Saurabh Sinha

Spliced Alignment Begins by selecting either all putative exons between potential acceptor and donor sites or by finding all substrings similar to the target protein (as in the Exon Chaining Problem). This set is further filtered in a such a way that attempt to retain all true exons, with some false ones. Then find the chain of exons such that the sequence similarity to the target protein sequence is maximized

Spliced Alignment Problem: Formulation Input: Genomic sequences G, target sequence T, and a set of candidate exons (blocks) B. Output: A chain of exons Γ such that the global alignment score between Γ* and T is maximized Γ* - concatenation of all exons from chain Γ

The DAG Vertices: One vertex for each block in B Directed edge connecting non-overlapping blocks Label of vertex = string of block it represents A path through the DAG spells out the string obtained by concatenating that particular chain of blocks Weight of a path is the score of the optimal alignment between the string it spells out and the target sequence

Dynamic programming Genomic sequence G = g1g2…gn Target sequence T = t1t2…tm As usual, we want to find the optimal alignment score of the i-prefix of G and the j-prefix of T Problem is, there are many i-prefixes possible (since multiple blocks may include position i)

Idea Find the optimal alignment score of the i-prefix of G and the j-prefix of T assuming that this alignment uses a particular block B at position i S(i, j, B) … for every block B that includes i

Recurrence If i is not the starting vertex of block B: S(i, j, B) = max { S(i – 1, j, B) – indel penalty S(i, j – 1, B) – indel penalty S(i – 1, j – 1, B) + δ(gi, tj) } If i is the starting vertex of block B: max { S(i, j – 1, B) – indel penalty maxall blocks B’ preceding block B S(end(B’), j, B’) – indel penalty maxall blocks B’ preceding block B S(end(B’), j – 1, B’) + δ(gi, tj) }

RNA secondary structure prediction

RNA RNA is similar to DNA chemically. It is usually only a single strand. T(hyamine) is replaced by U(racil) Some forms of RNA can form secondary structures by “pairing up” with itself. This can change its properties dramatically. tRNA linear and 3D view: http://www.cgl.ucsf.edu/home/glasfeld/tutorial/trna/trna.gif

RNA There’s more to RNA than mRNA RNA can adopt interesting non-linear structures, and catalyze reactions tRNAs (transfer RNAs) are the “adapters” that implement translation

Secondary structure Several interesting RNAs have a conserved secondary structure (resulting from base-pairing interactions) Sometimes, the sequence itself may not be conserved for the function to be retained It is important to tell what the secondary structure is going to be, for homology detection

Conserved secondary structure N-Y A A N-N’ R / N Consensus binding site for R17 phage coat protein. N = A/C/G/U, N’ is a complementary base pairing to N, Y is C/U, R is A/G Source: DEKM

Basics of secondary structure G-C pairing: three bonds (strong) A-U pairing: two bonds (weaker) Base pairs are approximately coplanar

Basics of secondary structure

Basics of secondary structure G-C pairing: three bonds (strong) A-U pairing: two bonds (weaker) Base pairs are approximately coplanar Base pairs are stacked onto other base pairs (arranged side by side): “stems”

Secondary structure elements loop at the end of a stem stem loop single stranded bases within a stem … on both sides of stem … only on one side of stem Loop: single stranded subsequences bounded by base pairs

Non-canonical base pairs G-C and A-U are the canonical base pairs G-U is also possible, almost as stable

Nesting Base pairs almost always occur in a nested fashion If positions i and j are paired, and positions i’ and j’ are paired, then these two base-pairings are said to be nested if: i < i’ < j’ < j OR i’ < i < j < j’ Non-nested base pairing: pseudoknot

Pseudoknot (9, 18) (2, 11) NOT NESTED 9 18 2 11

Pseudoknot problems Pseudoknots are not handled by the algorithms we shall see Pseudoknots do occur in many important RNAs But the total number of pseudoknotted base pairs is typically relatively small

Secondary structure prediction Approach 1. Find the secondary structure with most base pairs. Nussinov’s algorithm Recursive: finds best structure for small subsequences, and works its way outwards to larger subsequences

Nussinov’s algorithm: idea There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences (1) Add unpaired position i onto best structure for subsequence (i+1,j) i+1 j i

Nussinov’s algorithm: idea There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences (2) Add unpaired position j onto best structure for subsequence (i,j-1) i j-1 j

Nussinov’s algorithm: idea There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences (3) Add (i,j) pair onto best structure for subsequence (i+1,j-1) i+1 j-1 i j

Nussinov’s algorithm: idea There are only four possible ways of getting the best structure for subsequence (i,j) from the best structures of the smaller subsequences (4)Combine two optimal substructures (i,k) and (k+1,j) i k k+1 j

Nussinov RNA folding algorithm Given a sequence s of length L with symbols s1 … sL. Let (i,j) = 1 if si and sj are a complementary base pair, and 0 otherwise. We recursively calculate scores g(i,j) which are the maximal number of base pairs that can be formed for subsequence si…sj. Dynamic programming

Recursion Starting with all subsequences of length 2, to length L g(i,j) = max of g(i+1, j) g(i,j-1) g(i+1,j-1) + (i,j) maxi < k < j [g(i,k) + g(k+1,j)] Initialization g(i,i-1) = 0 g(i,i) = 0 O(n2) ? No. O(n3)

Traceback As usual in sequence alignment ? Optimal sequence alignment is a linear path in the dynamic programming table Optimal secondary structure can have “bifurcations” Traceback uses a pushdown stack

Traceback Push (1,L) onto stack Repeat until stack is empty: pop (i,j) if i >= j continue else if g(i+1,j) = g(i,j) push (i+1,j) else if g(i,j-1) = g(i,j) push (i,j-1) else if g(i+1,j-1) + (i,j) = g(i,j) record (i,j) base pair push (i+1,j-1) else for k = i+1 to j-1, if g(i,k)+g(k+1,j) g(i,j) push (k+1,j) push (i,k) break (for loop)

Secondary structure prediction Approach 2 Based on minimization of ∆G, the equilibrium free energy, rather than maximization of number of base pairs Better fit to real (experimental) ∆G Energy of stem is sum of “stacking” contributions from the interface between neighboring base pairs

Neighboring base pairs: stack Single bulges OK in stacking Source: DEKM U U A A G-C A U-A A-U C-G 4nt loop +5.9 -1.1 terminal mismatch of hairpin -2.9 stack 1nt bulge +3.3 -2.9 stack (special case) -1.8 stack -0.9 stack -1.8 stack -2.1 stack dangle -0.3 Neighboring base pairs: stack Single bulges OK in stacking Longer bulges: no stacking term Loop destabilisation energy Loop terminal mismatch energy

hairpin loop: exactly one base pair internal loop: exactly two base pairs bulge: internal loop with one base from each base pair being adjacent multibranched loop: > 2 base pairs in a loop, one base pair is closest to ends of RNA. this is the “exterior” or “closing” base pair all other base pairs are “interior” Source: Martin Tompa’s lecture notes

Energy contributions eS(i,j): Free energy of stacked pair (i,j) and (i+1,j-1) eH(i,j): Free energy of a loop closed by (i,j): depends on length of loop, bases at i,j, and bases adjacent to them eL(i,j,i’,j’): Free energy of an internal loop or bulge, with (i,j) and (i’,j’) being the bordering base pairs. Depends on bases at these positions, and unpaired bases adjacent to them eM(i,j,i1,j1,…ik,jk): Free energy of a multibranch loop with (i,j) as the closing base pair and i1j1 etc as the internal base pairs

Zuker’s algorithm: Dynamic programming W(j): FE of optimal structure of s[1..j] V(i,j): FE of optimal structure of s[i..j] assuming i,j form a base pair VBI(i,j): FE of optimal structure of s[i..j] assuming i,j closes a bulge or internal loop VM(i,j): FE of optimal structure of s[i..j] assuming i,j closes a multibranch loop WM(i,j): used to compute VM

Dynamic programming recurrences W(j): FE of optimal structure of s[1..j] W(0) = 0 W(j) = min( W(j-1), min1<=i<jV(i,j)+W(i-1)) s[j] is external base (a base not in any loop) s[j] pairs with s[i], for some i < j

Dynamic programming recurrences V(i,j): FE of optimal structure of s[i..j] assuming i,j form a base pair V(i,j) = infinity if i >= j V(i,j) = min( eH(i,j), eS(i,j) + V(i+1,j-1), VBI(i,j), VM(i,j)) if i < j i,j is exterior base pair of a hairpin loop

Dynamic programming recurrences V(i,j): FE of optimal structure of s[i..j] assuming i,j form a base pair V(i,j) = infinity if i >= j V(i,j) = min( eH(i,j), eS(i,j) + V(i+1,j-1), VBI(i,j), VM(i,j)) if i < j i,j is exterior pair of a stacked pair. i+1,j-1 is therefore a pair too.

Dynamic programming recurrences V(i,j): FE of optimal structure of s[i..j] assuming i,j form a base pair V(i,j) = infinity if i >= j V(i,j) = min( eH(i,j), eS(i,j) + V(i+1,j-1), VBI(i,j), VM(i,j)) if i < j i,j is exterior pair of a bulge or interior loop

Dynamic programming recurrences V(i,j): FE of optimal structure of s[i..j] assuming i,j form a base pair V(i,j) = infinity if i >= j V(i,j) = min( eH(i,j), eS(i,j) + V(i+1,j-1), VBI(i,j), VM(i,j)) if i < j i,j is exterior pair of a multibranch loop

Dynamic programming recurrences VBI(i,j): FE of optimal structure of s[i..j] assuming i,j closes a bulge or internal loop VBI(i,j) = min (eL(i,j,i’,j’) + V(i’,j’)) Slow ! i’,j’ i<i’<j’<j Energy of the bulge

Dynamic programming recurrences VM(i,j): FE of optimal structure of s[i..j] assuming i,j closes a multibranch loop VM(i,j) = min (eM(i,j,i1,j1,..,ik,jk) + ∑hV(ih,jh)) Very slow ! k>=2 i1,j1,…ik,jk Energy of the loop itself

Order of computation What order to fill the DP table in ? Increasing order of (j-i) VBI(i,j) and VM(i,j) before V(i,j)