Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick.

Similar presentations


Presentation on theme: "Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick."— Presentation transcript:

1 Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick

2 6-Jul-20062 Ribonucleic Acid (RNA)  RNA is an organic molecule that forms long chains  Each position in the chain can be one of 4 types (bases): A, G, C, U  RNA can code gene information (messenger RNA, viral RNA)  RNA can also form structures and take many functions within a cell (eg. tRNA, rRNA and other RNA-protein complexes)

3 6-Jul-20063 RNA Bonds and Structures  RNA bases can form bonds, in a largely pairwise fashion (A-U, G-C, some exceptions)  RNA is single stranded; its bonds form mostly within a single chain, folding it into a complex structure held together by its bonds  RNA function is affected by its structure  If two bases are paired, it often does not matter what they are; only unpaired bases are ‘available’  Common substructures can help investigate functional relationships

4 6-Jul-20064 RNA Structural Complexity  Deceptively simple, since bases are usually paired  Stems are formed from two bonded strands, in an antiparallel orientation  These simple bonds can however combine to form complex structures  Some are nested (stems within loops)  Some are knotted (stems effectively crossing)  RNA molecules can be very long (eg. > 1000 bases), confounding exhaustive comparison techniques

5 6-Jul-20065 Arc Representation  At a bond level, the bond structure of an RNA molecule can be represented as arcs superimposed onto the “stretched” RNA sequence.  Each arc represents a bonded pair, and the structure is a set of pairs. Nested StructurePseudoknot

6 6-Jul-20066 Maximum Common Ordered Substructure Input: Structures S 1 and S 2, where each structure is a set of pairs over n 1 and n 2 positions (resp.) Output: max. substructure S c with n c positions, such that there exist 1-1 functions f 1 and f 2 where:

7 6-Jul-20067 General Structures are Hard  The general MCOS problem, allowing positions to bond multiple times, is NP-hard (Goldman et al., 1999)  Comparing two RNA (pair-bond) structures is polynomial if they do not have knots (Bafna et al., 1995)  A structure S has a knot if and only if: there are pairs (i 1, j 1 ) and (i 2, j 2 ) in S where i 1 < i 2 < j 1 < j 2 ( [ ) ]  Comparing knotted arc structures is NP-hard for arbitrary pair-bond structures (Evans 1999, and others)

8 6-Jul-20068 Comparing Knot-Free Structures If the two structures are composed only of nested bonds, they can be compared in O(n 4 ) time using a dynamic programming algorithm that computes: M[i 1, j 1, i 2, j 2 ] = max { M[i 1, j 1 -1, i 2, j 2 ], M[i 1, j 1, i 2, j 2 -1], M[i 1, k 1 -1, i 2, k 2 -1] + M[k 1 +1, j 1 -1, k 2 +1, j 2 -1] +1 if (k 1, j 1 ) is in S 1 and (k 2, j 2 ) is in S 2 } our answer is in M[0,|A|-1,0,|B|-1] (result: Bafna et al. 1995)

9 6-Jul-20069 Limited Context  The polynomial time DP algorithm for nested bond structures works due to the context-free nature of segments in the nested structures.  Knotted structures have segments that are not context-free, but we can limit the context that they need if we consider special cases that cover most known RNA structures.

10 6-Jul-200610 Pseudoknot Observations  Three mutually crossing arcs generally do not occur in RNA structures (3-knot)  A structure without 3-knots can be separated into 2 layers of non-crossing arcs (2-colourable)

11 6-Jul-200611 Pseudoknot Observations  Crossing arcs tend to be grouped into crossing stems, though there can be some nesting  Interleaving between left and right endpoints does not usually occur, and would be biochemically unstable

12 6-Jul-200612 Forming LSPs To take advantage of these restrictions, we will consider that bonds group into stems, and that a stem can break the RNA sequence into linked segment pairs (LSPs): a matched pair of segments that are, or may be, linked by bonds. ij hlij Segment LSP: an ordered segment pair

13 6-Jul-200613 Merging LSPs The key to the use of LSPs is our ability to merge them to construct a larger LSP, as shown. The restrictions allow us to consider only pairwise LSP merges – we can always fill at least one existing “hole” when we merge.

14 6-Jul-200614 Structure Pieces We can then consider two types of comparison cases, and build up our results from them:  Segment-to-segment (4 dimensions)  LSP-to-LSP (8 dimensions) We do not need to match LSPs to segments, as long as we allow both segments and LSPs to be broken into parts.

15 6-Jul-200615 Segment Cases Segment cases are based on the BMR95 algorithm. s1: value of matching segment (i 1, j 1 -1) to (i 2, j 2 ) s2: value of matching segment (i 1, j 1 ) to (i 2, j 2 -1) s3: if j 1 links to k 1 and j 2 links to k 2 : 1 + (value of matching segment (i 1, k 1 -1) to (i 2, k 2 -1)) + (value of matching segment (k 1 +1, j 1 -1) to (k 2 +1, j 2 -1))

16 6-Jul-200616 Creating an LSP While a matched arc can break a segment into two (as in case s3), it can also create an LSP, if we allow the segments to be linked. s4: 1+ (value of matching LSP (i 1, k 1 -1, k 1 +1, j 1 -1) to (i 2, k 2 -1, k 2 +1, j 2 -1))

17 6-Jul-200617 LSP Cases – Simple The first cases for matching LSPs are based on the segment matching: two paring and one split. a1: value of matching LSP (h 1,l 1,i 1, j 1 -1) to (h 2,l 2,i 2, j 2 ) a2: value of matching LSP (h 1,l 1,i 1, j 1 ) to (h 2,l 2,i 2, j 2 -1) a3: (value of matching segment (h 1, l 1 ) to (h 2, l 2 )) + (value of matching segment (i 1, j 1 ) to (i 2, j 2 )) Case a3 can be used with s4 to allow new LSPs to be made from right segments of matched LSPs.

18 6-Jul-200618 LSP Cases – Within Right If the arcs link to positions within the right side of the LSPs, then the segments within the arcs can be the right sides of new LSPs. a4: 1 + (value of matching LSP (h 1,l 1,k 1 +1, j 1 -1) to (h 2,l 2, k 2 +1, j 2 -1)) + (value of matching segment (i 1, k 1 -1) to (i 2, k 2 -1))

19 6-Jul-200619 LSP Cases – Within Right Alternatively, the arcs could bound segments that are within the structure of the right side of the LSPs. a5: 1 + (value of matching LSP (h 1, l 1, i 1, k 1 -1) to (h 2, l 2, i 2, k 2 -1)) + (value of matching segment (k 1 +1, j 1 -1) to (k 2 +1, j 2 -1))

20 6-Jul-200620 LSP Cases – Cross Left If the arcs cross to the left side of the LSPs, then their left endpoints (k) can form a hole to start new LSPs. a6: 1 + (value of matching LSP (h 1,k 1 -1, k 1 +1, l 1 ) to (h 2,k 2 -1, k 2 +1, l 2 )) + (value of matching segment (i 1, j 1 -1) to (i 2, j 2 -1))

21 6-Jul-200621 LSP Cases – Cross Left The arcs can instead separate the LSP within them from initial segments. a7: 1 + (value of matching LSP (k 1 +1,l 1,i 1, j 1 -1) to (k 2 +1,l 2,i 2, j 2 -1)) + (value of matching segment (h 1, k 1 -1) to (h 2, k 2 -1)) We do not try to link the first and third segments as they would form part of a 3-knot.

22 6-Jul-200622 LSP Cases – Cross Left Matched arcs can break the LSPs into three segments. a8: 1 + (value of matching segment (h 1, k 1 -1) to (h 2, k 2 -1)) + (value of matching segment (k 1 +1, l 1 ) to (k 2 +1, l 2 )) + (value of matching segment (i 1, j 1 -1) to (i 2, j 2 -1))

23 6-Jul-200623 LSP Cases – Crossed LSPs Arcs crossing existing LSPs could need a merging of the LSP types in a6 and a7 – but then we need to consider all places for the split to occur. a9: 1 + max [over all s 1,s 2 with k 1 <s 1 <l 1, k 2 <s 2 <l 2 ] (value of matching LSP (h 1,k 1 -1, s 1 +1,l 1 ) to (h 2,k 2 -1, s 2 +1,l 2 )) +(value of matching LSP (k 1 +1,s 1,i 1, j 1 -1) to (k 2 +1,s 2,i 2, j 2 -1))

24 6-Jul-200624 Dynamic Programming  These cases take care of all possibilities for how LSPs and segments can be broken down, and their results merged.  They can be turned straightforwardly into a dynamic programming algorithm that uses two tables (one for segments, one for LSPs)  The algorithm will need to weave between these two tables in a way consistent with the data

25 6-Jul-200625 Making It Feasible This algorithm makes very heavy use of multidimensional dynamic programming tables, and looks more of theoretical interest than practical use.  Time complexity is high at O(n 10 )  Space complexity is even more crucial at O(n 8 ) Careful implementation is needed to avoid these theoretical worst cases.

26 6-Jul-200626 Engineering Space and Time  Space and time usage can be minimised by eliminating those computations that are not needed.  The recurrence should be computed recursively (using memoisation) to enable the data to help this pruning  Note that most segment pairs will not correspond to LSPs consistent with a given arc structure  The table can be allocated dynamically, in layers, so that a hyperplane of the table is only allocated if it will contain an entry (and note h < l < i < j )  We can reduce this further by limiting hyperplane sizes to the corresponding segment within an arc

27 6-Jul-200627 Experiments  Having reduced the space, experiments were run on a variety of RNA structural data to determine if the algorithm is of practical use  Large Subunit ribosomal RNA structures  RNAse P structures  Mosaic Virus structures  Structures of up to 400 arcs were compared effectively in 4Gb of space, with correct substructures found  allocating about 10 -14 of the theoretical table  Even the O(n 4 ) recurrence for unknotted structures would need too much space without the space saving technique

28 6-Jul-200628 Conclusion and Future Work  Under these restrictions, RNA bond structures can be compared in polynomial time  With careful case pruning, the algorithm is feasible and produces useful results  The problem of comparing general 2-colourable bond structures (allowing endpoint interleaving) is still open  Extensions to pattern discovery for multiple structures can be explored  Weights can be added to model RNA more accurately


Download ppt "Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick."

Similar presentations


Ads by Google