Presentation is loading. Please wait.

Presentation is loading. Please wait.

RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University.

Similar presentations


Presentation on theme: "RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University."— Presentation transcript:

1 RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University

2 RNA structure analysis2 Overview Introduction to RNA RNA secondary structure prediction –Nussinov folding algorithm –Zuker folding algorithm Demonstration Questions

3 RNA structure analysis3 Introduction to RNA (1) Ribonucleic acid To many people: –“RNA is the passive intermediary messenger between DNA genes and the protein translation machinery” But: –Many non-coding RNAs exist Adopt sophisticated 3D structures Catalyse biochemical reactions

4 RNA structure analysis4 Introduction to RNA (2) Three major types of RNA Messenger RNA (mRNA) –Serving as a temporary copy of genes that is used as a template for protein synthesis. Transfer RNA (tRNA) –Functioning as adaptor molecules that decode the genetic code. Ribosomal RNA (rRNA) –Catalyzing the synthesis of proteins.

5 RNA structure analysis5 RNA world hypothesis RNA is the only biological polymer that serves as both a catalyst (like proteins) and as information storage (like DNA). For this reason some people think that a RNA- like molecule was the basis of life early in evolution.

6 RNA structure analysis6 Terminology of RNA (1) Four nucleotides: –Adenine –Cytosine –Guanine –Uracil Canonical base pairs: –G-C –A-U Non-canonical base pairs –G-U

7 RNA structure analysis7 Terminology of RNA (2) Base pairs are approximately coplanar and almost always stacked onto other base pairs in a RNA structure –Contiguous stacked base pairs are called stems –In 3D, RNA stems generally form a regular double helix

8 RNA structure analysis8 RNA secondary structure Unlike DNA, RNA is typically produced as a single stranded molecule which then folds intramolecularly to form a number of short base- paired stems. This base-paired structure is called the secondary structure of the RNA.

9 RNA structure analysis9 Elements of a RNA secondary structure (1) Loop: single stranded subsequence bounded by base pairs Hairpin loop: a loop at the end of a stem Bulge (loop): single stranded bases occurring within a stem Interior loop: single stranded bases interrupting both sides of a stem Multi-branched loop: a loop from which three or more stems radiate

10 RNA structure analysis10 Elements of a RNA secondary structure (2) G ● C G ● C U ● A A ● U C ● G G G 3’ G A 5’ C C C etc. U G U

11 RNA structure analysis11 Pseudoknots (1) Base pairs almost always occur in a nested fashion in RNA secondary structure A base pair between position i and j and a base pair between i’ and j’ are nested if and only if: Non-nested base pairs are called pseudoknots

12 RNA structure analysis12 Pseudoknots (2) None of the dynamic programming algorithms can deal with pseudoknots, including the Zuker and Nussinov RNA folding algorithms. Pseudoknots occur in many important RNA’s: –The algorithms ignore biologically important information. For database searching for RNA homologues, it is acceptable to sacrifice the information in pseudoknots.

13 RNA structure analysis13 RNA sequence evolution The sequence evolution of RNA is constrained by the structure. It is possible to have two different RNA sequences with the same secondary structure. Drastic changes in sequence can often be tolerated as long as compensatory mutations maintain base-pairing complementarity.

14 RNA structure analysis14 RNA sequence evolution (2) Suppose we want to search for a nucleotide sequence for occurrences of consensus R17 coat protein: –It is useless to use standard sequence alignment R17 coat protein binds and represses translation of its replicase: –It blinds most of the primary sequence positions {A, C, G, U} {A,G} {C,U} Complement of base N

15 RNA structure analysis15 RNA sequence evolution (3) How to solve this problem? –RNA pattern-matching program (RNAMOT). Searches for deterministic (non- stochastic) motifs but with secondary structure constraints as extra terms. Works fine for small, well-defined patterns but is somewhat insensitive and problematic for finding matches to less well conserved structures.

16 RNA structure analysis16 Inferring structure by comparative sequence analysis In a structurally correct multiple alignment of RNAs, conserved base pairs are often revealed by the presence of frequent correlated compensatory mutations RNA secondary prediction method: comparative sequence analysis The accepted consensus structures of most well- studied RNAs have been derived by comparative analysis.

17 RNA structure analysis17 How does comparative sequence analysis work? (1) Inferring the correct structure by comparative analysis requires knowing a structurally correct alignment Inferring a structurally correct multiple alignment requires knowing the correct structure Problem!

18 RNA structure analysis18 How does comparative sequence analysis work? (2) Solution: make use of an iterative refinement process of: –Guessing the structure based on the current best guess of the alignment –Realigning based on the new guess at the structure The sequences to be compared must be: –Sufficiently similar to start the process –Sufficiently dissimilar that a number of co-varying substitutions can be detected

19 RNA structure analysis19 Mutual information (1) A quantitative measure of pairwise sequence covariation Given two aligned columns i, j, the mutual information is given by: The frequency of one of the four bases observed in column i. The joint (pairwise) frequency of one of the sixteen possible base pairs observed in columns i, j. M ij varies between 0 and 2 bits

20 RNA structure analysis20 Mutual information (2) M ij tells us how much information we get about the identity of the residue in one position if we are told the identity of the residue in the other position –If you know that i is a G, the uncertainty about j collapses from four different possibilities to just one (C)  2 bits of information –If i and j are uncorrelated, the mutual information is zero

21 RNA structure analysis21 RNA secondary structure prediction (1) Many plausible secondary structures can be drawn for a sequence But: the number of secondary structures increases exponentially with sequence length –An RNA of 200 bases has over 10 50 possible base-paired structures Goal: distinguish the biologically correct structure from all the incorrect structures.

22 RNA structure analysis22 RNA secondary structure prediction (2) We need: –A function that assigns the correct structure the highest score –An algorithm for evaluating the scores of all possible structures Two methods: –Nussinov folding algorithm –Zuker folding algorithm

23 Need a break? Well here it is!

24 RNA structure analysis24 Nussinov folding algorithm (1) Goal: Find the structure with the most base pairs Nussinov introduced an efficient dynamic programming algorithm for this problem A recursive algorithm that calculates –the best structure for small subsequences and –works its way outwards to larger and larger subsequences

25 RNA structure analysis25 Nussinov folding algorithm (2) Key idea of recursion: –There are only four possible ways of getting the best structure for i,j from the best structure of the smaller subsequences Two stages: –Fill stage of the algorithm –Trace back stage of the algorithm

26 RNA structure analysis26 Nussinov folding algorithm (3) The four possible ways: 1.Add unpaired position i onto the best structure for subsequence i+1,j 2.Add unpaired position j onto the best structure for subsequence i,j-1 3.Add i,j pair onto best structure found for subsequence i+1,j-1 4.Combine two optimal substructures i,k and k+1,j

27 RNA structure analysis27 Nussinov folding algorithm (4) Formal description of the algorithm: –Given a sequence x of length L with symbols x i,…,x L –Let if x i and x j are complementary base pairs else –Recursively calculate scores which are the maximum number of base pairs that can be formed for subsequence x i,…,x L

28 RNA structure analysis28 Nussinov algorithm: fill stage –Initialisation : –Recursion: starting with all sub sequences of length 2, to length L:

29 RNA structure analysis29 Example sequence: GGGAAAUCC j i 123456789 GGGAAAUCC 1G0 2G00 3G00 4A00 5A00 6A00 7U00 8C00 9C00

30 RNA structure analysis30 Example sequence: GGGAAAUCC j i 123456789 GGGAAAUCC 1G00 2G000 3G000 4A000 5A000 6A001 7U000 8C000 9C00 A*U= base pair

31 RNA structure analysis31 Example sequence: GGGAAAUCC j i 123456789 GGGAAAUCC 1G000000123 2G000000123 3G00000122 4A0000111 5A000111 6A00111 7U0000 8C000 9C00 This value gives the maximum nr. of base pairs

32 RNA structure analysis32 Nussinov algorithm: traceback stage Initialisation: Push (1,L) onto the stack. Recursion: Repeat until stack is empty:

33 RNA structure analysis33 Example sequence: GGGAAAUCC j i 123456789 GGGAAAUCC 1G000000123 2G000000123 3G00000122 4A0000111 5A000111 6A00111 7U0000 8C000 9C00 Initialisation: Push (1,L)

34 RNA structure analysis34 Example sequence: GGGAAAUCC j i 123456789 GGGAAAUCC 1G000000123 2G000000123 3G00000122 4A0000111 5A000111 6A00111 7U0000 8C000 9C00 Recursion:

35 RNA structure analysis35 Example sequence: GGGAAAUCC j i 123456789 GGGAAAUCC 1G000000123 2G000000123 3G00000122 4A0000111 5A000111 6A00111 7U0000 8C000 9C00

36 RNA structure analysis36 Example sequence: GGGAAAUCC j i 6789 AUCC 1G0123 2G0123 3G0122 4A0111 5A0111 6A0111 7U0000 G G ● C A A A ● U

37 RNA structure analysis37 SCFG version of the Nussinov algorithm Stochastic Context-Free Grammars –Will be discussed next Wednesday Makes use of production rules: –S  aS | cS | gS | uS (i unpaired) Every production rule has a associated probability parameter. The maximum probability parse is equivalent to the maximum probability secondary structure.

38 RNA structure analysis38 Needed terminology The inside-outside (recursive dynamic programming) algorithm for SCTGs in Chomsky normal form is the natural counterpart of the forward-backward algorithm for HMM. Best path variant of the inside-outside algorithm is the Cocke-Younger-Kasami (CYK) algorithm. It finds the maximum probabilistic alignment of the SCFG to the sequence. Just as the viterbi algorithm for HMMs Chomsky normal form: All context free grammar production rules are of the form: S  SS or S  a

39 RNA structure analysis39 CYK for Nussinov-style RNA SCFG (2) Initialisation: Recursion: Addition to the fill stage of the Nussinov algorithm. The principal difference is that the SCFG description is a probabilistic model.

40 RNA structure analysis40 CYK for Nussinov-style RNA SCFG (2) The is the log likelihood of the optimal structure given the SCFG model The traceback to find the secondary structure corresponding to the best score is performed analogously to the traceback in the Nussinov algorithm

41 RNA structure analysis41 CYK for Nussinov-style RNA SCFG (3) Good starting example (10.2), but it is too simple to be an accurate RNA folder The algorithm does not consider important structural features like preferences for certain: –Loop lengths –Nearest neighbours in the structure caused by stacking interactions between neighbouring base pairs in a stem.

42 RNA structure analysis42 Zuker folding algorithm (1) Most sophisticated secondary structure prediction method for single RNAs –An energy minimisation algorithm which assumes that the correct structure is the one with the lowest equilibrium free energy The of an RNA secondary structure is approximated as the sum of individual contributions from loops, base pairs and other secondary structure elements.

43 RNA structure analysis43 Zuker folding algorithm (2) Difference with the Nussinov folding algorithm: –Energies of stems are calculated by adding stacking contributions for the interface between neighbouring base pairs instead of individual contributions for each pair. Advantage: –Better fit to experimentally observed values for RNA structures, but it complicates the dynamic programming algorithm

44 RNA structure analysis44 Zuker folding algorithm (3) Freier energy rules The energies in the tables are from the older ‘Freier rules’ at 37ºC. For more information see the article ”Improved free- energy parameters for predictions of RNA duplex stability” by Freier et al.

45 RNA structure analysis45 Zuker folding algorithm (4)

46 RNA structure analysis46 Zuker folding algorithm (5)

47 RNA structure analysis47 Zuker folding algorithm (6) The minimum energy structure can be calculated recursively by a dynamic programming algorithm very similar to how the maximum base-paired structure was calculated like the Nussinov algorithm. Now we keep two matrices –W(i,j) is the energy of the best structure on i,j –V(i,j) is the energy of the best structure on i,j given that i,j are paired.

48 RNA structure analysis48 Suboptimal RNA folding (CYK algorithm will be explained next Wednesday) The original Zuker algorithm finds only the optimal structure. The biologically correct structure is often not the calculated optimal structure. Zuker introduced a suboptimal folding algorithm. –Is similar to running the CYK algorithm in both inside and outside directions. The algorithm samples one base pair sub optimally. The rest of the structure is the optimal structure given that base pair.

49 Demonstration RNAstructure By David H. Mathews Michael Zuker Doulas H. Turner

50 RNA structure analysis50 Demo: RNAstructure (1) The core of RNAstructure is a dynamic programming algorithm to predict RNA or DNA secondary structures from sequence based on the principle of minimizing free energy.

51 RNA structure analysis51 Demo: RNAstructure (2) The prediction of a secondary structure is based on the Zuker algorithm for free energy minimization using the nearest neighbour parameters of Doug Turner and co-workers. A recursive algorithm is used that generates an optimal structure and a series of structures that are called sub-optimal structures (structures with free energy similar to the lowest free energy structure).

52 RNA structure analysis52 Demo: RNAstructure (3) The number of sub-optimal structures generated is controlled by two parameters entered by the user: –Max % Energy Difference: Sets the percent difference from the lowest free energy allowed for the structures output. For example if the lowest-free energy structure is -100 kcal/mol, and the Max % Energy Difference is 10, any structures with an energy of -90 kcal/mol or higher is rejected (higher means less negative). –Max number of structures: Sets an absolute upper limit on the number of structures that can be generated. A maximum of 1000 structures can be generated.

53 RNA structure analysis53 Demo: RNAstructure (4) A third parameter entered is Window Size. This controls how different the sub-optimal structures must be from each other. A small window size allows very similar structures to be generated while a larger window size requires them to be more different

54 Demonstration

55 Questions?


Download ppt "RNA structure analysis Jurgen Mourik & Richard Vogelaars Utrecht University."

Similar presentations


Ads by Google