Presentation is loading. Please wait.

Presentation is loading. Please wait.

RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.

Similar presentations


Presentation on theme: "RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons."— Presentation transcript:

1 RNA structure prediction

2 RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons –Medicine

3 Base pairs C-G stronger than U-A Non-standard G-U

4 Base-pairs are usually coplanar are almost always stacked stems – continuous stacks 3D structure of a stack is a helix hairpin Stacking

5 Predictable structures

6 Hard-to-predict structures Pseudoknots, kissing hairpins, hairpin-bulge

7 Secondary structure notations

8 Tertiary structure

9

10

11 RNAi

12

13

14 Structure representation

15

16

17 Main approaches to RNA secondary structure prediction Energy minimization –dynamic programming approach –does not require prior sequence alignment –require estimation of energy terms contributing to secondary structure Comparative sequence analysis –Using sequence alignment to find conserved residues and covariant base pairs. –most trusted

18 Dotplot

19 Think! Make a dotplot of an RNA molecule –Sequence : GGGAAAUCC What is the secondary structure?

20 Dynamic programming approach Nussinov algorithm

21 Dynamic programming approach a) i,j is paired E(i,j) = E(i+1,j-1) +  (ri,rj) b) i is unpaired E(i,j) = E(i+1,j) c) j is unpaired E(i,j) = E(i,j-1) d) bifurcation E(i,j) = E(i,k)+E(k+1,j) i+1 j-1 i+1 j j i j-1 i j i i k k+1 a)b) c) d) Let E(i,j) = minimum energy for subchain starting at i and ending at j  (ri,rj) = energy of pair ri, rj (rj = base at position j)

22 RNA secondary structure algorithm Given: RNA sequence x 1,x 2,x 3,x 4,x 5,x 6,…,x L Initialization: E(i, i-1) = 0 for i = 2 to L E(i, i) = 0 for i = 1 to L Recursion: for n = 2 to L # iteration over length E(i,j) = min {E(i+1, j), E(i, j-1), E(i+1, j-1)+  (ri,rj), min i<k<j {E(i,k)+E(k+1, j)} } Cost: O(n 3 )

23 Example Let  (ri,rj) = -1 if ri,rj form a base pair and 0 otherwise Input : GGAAAUCC GGAAAUCC G0 G00 A00 A00 A00 U00 C00 C00 E(i,j) = lowest energy conformation for subchain from i to j i j Here we should have min energy for AAAUC

24 Example-continued GGAAAUCC G00 G000 A000 A000 A00 U000 C000 C00 GGA (i=2, j=3) min {0, 0, 0+  (GA) } = 0 AAU (i=5, j=6) min { 0, 0, 0+  (AU) } = -1 0 i j

25 Recovering the structure from the DP table Complexity O(n 3 ) Main difference to sequence alignment – we are tracing back a tree-like structure not a single optimal path (bifurcation introduces branch points). Method 1: Leave pointers as you compute the table: for each element of the table store (at most two) pointers to the subsequences used in the solution. Method 2: Recover history based on numerical values in the table. –Stacking – check value along diagonal –Bifurcation - find k such that E(i,k)+E(k+1,j) = E(i,j)

26

27

28

29 More realistic energy function

30 Stacking energies

31 Even more realistic energy function Loops have destabilizing effect structure (d) should have lower energy that (b). Destabilizing contribution of loops should depend on the loop length (k). Stacking has additional stabilizing contribution .  (k)  (k)  (k) 

32 More realistic energy function requires slightly more involved recurrence E(i,j) = min{ E(i+1,j), E(i,j-1), min{E(i,k)+E(k+1,j), L(i,j)} where L(i,j) = {  (ri,rj) +  (j-i-1) if L(i,j) is a hairpin loop;  (ri,rj) +  i  j-1   if hairpin min k {  (ri,rj) +  (k)+E(i+k+1,j-1)} if i-bulge min k {  (ri,rj) +  (k)+E(i+1,j-k-1)} if j-bulge min k1,k2 {  (ri,rj) +  (k1+k2)+E(i+k1+1,j-k2-1)} if internal loop } Extra “min” gives O(n 4 ) algorithm

33

34 Covariance method In a correct multiple alignment RNAs, conserved base pairs are often revealed by the presence of frequent correlated compensatory mutations. Two boxed positions are covarying to maintain Watson- Crick complementary. This covariation implies a base pair which may then be extended in both directions. GCCUUCGGGC GACUUCGGUC GGCUUCGGCC

35 Alignment

36

37 Quantities measure of pairwise sequence covariation Mutual information M ij between two aligned columns i, j M ij =  i,j f x i x j log 2 (f x i x j /f x i f x j ) Where f x i x j frequency of the pair (observed) f x i frequency of nucleotide x i at position i Observations: 0 <= M ij <=2 i,j uncorrelated M ij = 0

38 MI: examples A A C G U U G C f Ai =.5 f Ci =.25 f Gi =.25 f Uj =.5 f Cj =.25 f Gj =.25 f AU =.5 f CG =.25 f GC =.25 M ij =  x i x j f x i x j log 2 (f x i x j /f x i f x j ) =.5 log 2 (.5/(.5*.5))+2*.25 log 2 (.25/(.25*.25))=.5 *1 +.5*2 = 1.5 A A A A U U U U M ij = 1 log 1 = 0 U A C G A U G C M ij = 4*.25 log 4 = 2 i j

39 Other methods HMMs Stochastic context free grammars

40 Conclusion RNA secondary structure prediction –Single sequence: Dot-plot Nussinov dynamic programming Energy function –Covariance analysis Mutual information Hidden Markov Models SCFGs

41

42

43

44

45 Finding “most probable structure” S – structure then, E(S) free energy of S p(S) = exp(-E(S)/kT)/Q Q =  x exp(-E(x)/kT) ) partition function Problem: computing Q Method to compute Q – dynamic programming (similar as presented before but scores are replaced with probabilities and min energy with sum of probabilities).

46 tRNA

47 Answer http://ludwig-sun2.unil.ch/~bsondere/nussinov/form.html#CYK)


Download ppt "RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons."

Similar presentations


Ads by Google