RNA Secondary Structure Prediction
16s rRNA
RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image– Wuchty Pseudoknot Dangling end
RNA secondary structure G A A A G G A-U U-G C-G A-U G-C Loop Stem wobble pair canonical pair
Legitimate structure Pseudoknots RNA secondary structure representation
Non-canonical interactions of RNA secondary-structure elements Pseudoknot Kissing hairpins Hairpin-bulge contact These patterns are excluded from the prediction schemes as their computation is too intensive.
“Rules for 2D RNA prediction” Base Pairs in stems: GOOD Additional possible assumptions: e.g., G:C better than A:T Bulges, Loops: BAD Canonical Interactions (base pairs, stems, bulges, loops): OK Non canonical interactions (pseudoknots, kissing hairpins): Forbidden The more interactions: The better
Predicting RNA secondary Structure Allowed base pairing rules (Watson-Crick A:U, G:C, and Wobble pair G:U) Sequences may form different structures An free energy value is associated with each possible structure Predict the structure with the minimal free energy (MFE)
Simplifying Assumptions for Structure Prediction RNA folds into one minimum free-energy structure. There are no non-canonical interactions. The energy of a particular base pair in a double stranded regions is sequence independent –Neighbors have no influence. Was solved by dynamic programming Zucker and Steigler 1981
Sequence-dependent free-energy (the nearest neighbor model) U U C G G C A U G C A UCGAC 3’ U U C G U A A U G C A UCGAC 3’ Example values: GC GC AU GC CG UA
Free energy computation U U A G C A G C U A A U C G A U A 3’ A 5’ mismatch of hairpin -2.9 stacking +3.3 (1 nt bulge) -2.9 stacking -1.8 stacking 5’ dangling -0.9 stacking -1.8 stacking -2.1 stacking G= -4.6 KCAL/MOL +5.9 (4 nt loop)
Prediction Programs Mfold Vienna RNA Secondary Structure Prediction
Mfold - Suboptimal Folding For any sequence of N nucleotides, the expected number of structures is greater than 1.8 N A sequence of 100 nucleotides has ~3 possible folds. If a computer can calculate 1000 folds/second, it would take years (age of universe = ~10 10 years)! Mfold generates suboptimal folds whose free energy fall within a certain range of values. Many of these structures are different in trivial ways. These suboptimal folds can still be useful for designing experiments.
Example:
Output: