Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting the Secondary Structure of RNA

Similar presentations


Presentation on theme: "Predicting the Secondary Structure of RNA"— Presentation transcript:

1 Predicting the Secondary Structure of RNA
Mitra Shokat Spring 2017 For my project, I chose to write an algorithm predicts the optimal secondary structure of any given string of RNA.

2 Optimizing RNA Secondary Structure
Goal - Given a sequence of RNA, determine most probable secondary structure What determines secondary structure? Base pairings Thermodynamics The secondary structure of RNA is formed by interactions between base pairs on the same sequence of RNA. There are a few algorithms that already exist for predicting optimal secondary structures. Challenge: many thermodynamic factors that have to be considered when deciding which set of base pair interactions is most favorable, and it is difficult to incorporate all of these factors into one algorithm For my project, I chose to implement a variation of the Nussinov Algorithm, which simplifies the problem of predicting structures by defining the optimal structure as the one with the most base pairings and no pseudoknots.

3 Nussinov Algorithm Dynamic programming approach to predicting RNA secondary structure Input – sequence of RNA (string) Output – graphical representation of base pairings in optimal secondary structure Simplifying assumptions: Optimal structure is one that contains maximum number of base pairings Pseudoknots not allowed Nussinov algorithm = dynamic programming approach to predicting the optimal secondary structure of a given strand of RNA. It breaks down the sequence of RNA into smaller sequences, finds the optimal structure of those subsequences, and then combines subsequences to determine the overall structure. My program takes a sequence of RNA as input and outputs a graphical representation of the optimal structure by showing which bases are paired. Again, this optimal structure is only based on maximizing the number of paired bases. Dataset (test then alanine)

4 Pseudocode Initialize matrix:
Fill main diagonal and diagonal below it with zeros Fill matrix: For each index, choose option that yields max score - 4 options: 1. pair rna[i] and rna[j] and attach to best structure for rna[i+1:j-1] 2. add rna[i] to best structure of rna[i+1:j] 3. add rna[j] to best structure of rna[i:j-1] 4. combine two optimal structures for rna[i:k] and rna[k+1:j] Backtrack: For each index in matrix, backtrack to source of maximum score create a scoring matrix that compares the sequence of RNA to itself. for each index i,j , we fill in the table with the maximum number of bases that can be paired in the substring from index i to j At each position, there are four options for filling in the score (they’re shown here in pic). And at each index we choose the option with the highest score. After the scoring matrix is complete, the algorithm uses a backtracking method to retrace its steps in order to find the optimal structure. I used the python graphics library to convert the program’s output into a visual representation of which bases are paired with which in the optimal structure. S.Will. “RNA Structure and RNA Structure Prediction”MIT.2011

5 Results Input - 'GACACGACGA’ Predicted Output - Actual Output -
So with my short test sequence, the program ran well. This is an image I drew of the predicted optimal secondary structure given this particular strand of RNA. And here is the program’s output. And as you can see, they match.

6 Results Input – alanine tRNA sequence
Predicted Output - Actual Output - However, I had some more problems when I tried to run the program on the sequence of alanine tRNA. This is what the secondary structure of this particular tRNA looks like. And this is the output structure my program gave. So as you can see, they do not match. But this result did reveal some ways I might be able to improve the program. First, I could allow for the wobble pair, which is G paired with U. Second, I could someone change the weights of certain scores so as to take into account the number of consecutively paired bases, because having more of these is more favorable.


Download ppt "Predicting the Secondary Structure of RNA"

Similar presentations


Ads by Google