Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick.

Slides:



Advertisements
Similar presentations
RNA Secondary Structure Prediction
Advertisements

Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Recombinant DNA technology
Prune-and-search Strategy
UNC Chapel Hill Lin/Manocha/Foskey Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject.
Structural Alignment of Pseudoknotted RNAs Banu Dost, Buhm Han, Shaojie Zhang, Vineet Bafna.
RNA & Protein Synthesis
Warm ups: 1. What are the four bases that make up DNA? 2. What is a mutation? 3. List the different types of mutations…
13.3: RNA and Gene Expression
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
DNA Replication.
Vocabulary Review A. Three part subunit made up of a deoxyribose sugar (5 carbon sugar), a phosphate group, and a nitrogenous base. A. Three part subunit.
NUCLEIC ACIDS STRUCTURE AND FUNCTION RNADNA. MONONUCLEOTIDE PHOSPHATE PENTOSE SUGAR ORGANIC BASE.
Journal Entry: What is DNA? What are the subunits of DNA? Objectives:
DNA Replication and Protein Synthesis
Introduction Nucleic acids are macromolecules made up of smaller nucleotide subunits. They carry genetic information, form specific structures in a cell.
DNA, RNA, and Proteins.  Students know and understand the characteristics and structure of living things, the processes of life, and how living things.
DNA StructureDNA Structure  DNA is composed of a chain of nucleotides.
Transcription and Translation
Chapter 11 Key Knowledge: molecular genetics principal events in transcription and translation; cell reproduction: cell cycle, DNA replication, apoptosis;
Heterogeneous Parallelization for RNA Structure Comparison Eric Snow, Eric Aubanel, and Patricia Evans University of New Brunswick Faculty of Computer.
7.4.1 Explain that each tRNA molecule is recognized by a tRNA-activating enzyme that binds a specific amino acid to the tRNA, using ATP for energy. 3 Summary:
Transcription and Translation
Dynamic Programming. Well known algorithm design techniques:. –Divide-and-conquer algorithms Another strategy for designing algorithms is dynamic programming.
RNA Structure and Transcription Mrs. MacWilliams Academic Biology.
© The McGraw-Hill Companies, Inc., Chapter 6 Prune-and-Search Strategy.
Blueprint of Life Topic 18: Protein Synthesis
DNA It’s in our Genes!. DNA-What is it? DNA stands for deoxyribonucleic acid It is a nucleic acid that contains our genetic/hereditary information (located.
DNA, RNA, and Proteins Section 3 Section 3: RNA and Gene Expression Preview Bellringer Key Ideas An Overview of Gene Expression RNA: A Major Player Transcription:
 DNA is the blueprint for life – it contains your genetic information  The order of the bases in a segment of DNA (GENE) codes for a particular protein;
Visual Anatomy & Physiology First Edition Martini & Ober
DNA: The Regulation of Life. Master Program of the Cell DNA – Deoxyribonucleic Acid – The molecule found in the nucleus that contains the genetic code.
Structural Alignment of Pseudo-knotted RNA
DNA and the genetic code DNA is found in the chromosomes in the nucleus in eukaryotic cells or in the cytoplasm in prokaryotic cells. DNA is found in the.
8-2 DNA Structure & Replication  DNA - Carries information about heredity on it genes.  Deoxyribonucleic Acid  belongs to the class of macromolecules.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Microbiology Chapter 9 Genetics - Science of the study of heredity, variations in organisms that are transferable from generations to generation DNA is.
2/10/2014 to 2/14/2014. DNA structure In 1952, scientist Rosalind Franklin discovered that DNA is two chains of molecules in a spiral form. The actual.
DNA and Protein Synthesis Chapter 11. DNA Structure & Function p58- video video Warm up: write down 1 thing you learned from the video. Warm up: write.
CSCI 256 Data Structures and Algorithm Analysis Lecture 16 Some slides by Kevin Wayne copyright 2005, Pearson Addison Wesley all rights reserved, and some.
Local Exact Pattern Matching for Non-fixed RNA Structures Mika Amit, Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Mohl, Christina Schmiedl, Sebastian.
Transcription, Translation & Protein Synthesis Do you remember what proteins are made of ?  Hundreds of Amino Acids link  together to make one Protein.
DNA. Unless you have an identical twin, you, like the sisters in this picture will share some, but not all characteristics with family members.
Molecules to Eye Color DNA, RNA and Protein Synthesis.
Gene Expression DNA, RNA, and Protein Synthesis. Gene Expression Genes contain messages that determine traits. The process of expressing those genes includes.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
Structure of Nucleic Acids
GENETICS.
Unit 2.1: BASIC PRINCIPLES OF HUMAN GENETICS
Chapter 13 From DNA to Proteins
Section 3: RNA and Gene Expression
Human Cells Gene Expression
RNA & Protein Synthesis
RNA Secondary Structure Prediction
GENETICS.
Nucleic Acids and Protein Synthesis
Fundamentals of Organic Chemistry
Unit 2.1: BASIC PRINCIPLES OF HUMAN GENETICS
Comparative RNA Structural Analysis
RNA and Protein Synthesis
RNA Secondary Structure Prediction
Molecular Basis of Heredity
Unit Animal Science.
Dynamic Programming II DP over Intervals
An Overview of Gene Expression
Nucleic Acids And Protein Synthesis
A C G C C T T G A T C T G T C G C A T T T A G C
Fundamentals of Organic Chemistry
Dr. Israa ayoub alwan Lec -8-
Presentation transcript:

Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick

6-Jul Ribonucleic Acid (RNA)  RNA is an organic molecule that forms long chains  Each position in the chain can be one of 4 types (bases): A, G, C, U  RNA can code gene information (messenger RNA, viral RNA)  RNA can also form structures and take many functions within a cell (eg. tRNA, rRNA and other RNA-protein complexes)

6-Jul RNA Bonds and Structures  RNA bases can form bonds, in a largely pairwise fashion (A-U, G-C, some exceptions)  RNA is single stranded; its bonds form mostly within a single chain, folding it into a complex structure held together by its bonds  RNA function is affected by its structure  If two bases are paired, it often does not matter what they are; only unpaired bases are ‘available’  Common substructures can help investigate functional relationships

6-Jul RNA Structural Complexity  Deceptively simple, since bases are usually paired  Stems are formed from two bonded strands, in an antiparallel orientation  These simple bonds can however combine to form complex structures  Some are nested (stems within loops)  Some are knotted (stems effectively crossing)  RNA molecules can be very long (eg. > 1000 bases), confounding exhaustive comparison techniques

6-Jul Arc Representation  At a bond level, the bond structure of an RNA molecule can be represented as arcs superimposed onto the “stretched” RNA sequence.  Each arc represents a bonded pair, and the structure is a set of pairs. Nested StructurePseudoknot

6-Jul Maximum Common Ordered Substructure Input: Structures S 1 and S 2, where each structure is a set of pairs over n 1 and n 2 positions (resp.) Output: max. substructure S c with n c positions, such that there exist 1-1 functions f 1 and f 2 where:

6-Jul General Structures are Hard  The general MCOS problem, allowing positions to bond multiple times, is NP-hard (Goldman et al., 1999)  Comparing two RNA (pair-bond) structures is polynomial if they do not have knots (Bafna et al., 1995)  A structure S has a knot if and only if: there are pairs (i 1, j 1 ) and (i 2, j 2 ) in S where i 1 < i 2 < j 1 < j 2 ( [ ) ]  Comparing knotted arc structures is NP-hard for arbitrary pair-bond structures (Evans 1999, and others)

6-Jul Comparing Knot-Free Structures If the two structures are composed only of nested bonds, they can be compared in O(n 4 ) time using a dynamic programming algorithm that computes: M[i 1, j 1, i 2, j 2 ] = max { M[i 1, j 1 -1, i 2, j 2 ], M[i 1, j 1, i 2, j 2 -1], M[i 1, k 1 -1, i 2, k 2 -1] + M[k 1 +1, j 1 -1, k 2 +1, j 2 -1] +1 if (k 1, j 1 ) is in S 1 and (k 2, j 2 ) is in S 2 } our answer is in M[0,|A|-1,0,|B|-1] (result: Bafna et al. 1995)

6-Jul Limited Context  The polynomial time DP algorithm for nested bond structures works due to the context-free nature of segments in the nested structures.  Knotted structures have segments that are not context-free, but we can limit the context that they need if we consider special cases that cover most known RNA structures.

6-Jul Pseudoknot Observations  Three mutually crossing arcs generally do not occur in RNA structures (3-knot)  A structure without 3-knots can be separated into 2 layers of non-crossing arcs (2-colourable)

6-Jul Pseudoknot Observations  Crossing arcs tend to be grouped into crossing stems, though there can be some nesting  Interleaving between left and right endpoints does not usually occur, and would be biochemically unstable

6-Jul Forming LSPs To take advantage of these restrictions, we will consider that bonds group into stems, and that a stem can break the RNA sequence into linked segment pairs (LSPs): a matched pair of segments that are, or may be, linked by bonds. ij hlij Segment LSP: an ordered segment pair

6-Jul Merging LSPs The key to the use of LSPs is our ability to merge them to construct a larger LSP, as shown. The restrictions allow us to consider only pairwise LSP merges – we can always fill at least one existing “hole” when we merge.

6-Jul Structure Pieces We can then consider two types of comparison cases, and build up our results from them:  Segment-to-segment (4 dimensions)  LSP-to-LSP (8 dimensions) We do not need to match LSPs to segments, as long as we allow both segments and LSPs to be broken into parts.

6-Jul Segment Cases Segment cases are based on the BMR95 algorithm. s1: value of matching segment (i 1, j 1 -1) to (i 2, j 2 ) s2: value of matching segment (i 1, j 1 ) to (i 2, j 2 -1) s3: if j 1 links to k 1 and j 2 links to k 2 : 1 + (value of matching segment (i 1, k 1 -1) to (i 2, k 2 -1)) + (value of matching segment (k 1 +1, j 1 -1) to (k 2 +1, j 2 -1))

6-Jul Creating an LSP While a matched arc can break a segment into two (as in case s3), it can also create an LSP, if we allow the segments to be linked. s4: 1+ (value of matching LSP (i 1, k 1 -1, k 1 +1, j 1 -1) to (i 2, k 2 -1, k 2 +1, j 2 -1))

6-Jul LSP Cases – Simple The first cases for matching LSPs are based on the segment matching: two paring and one split. a1: value of matching LSP (h 1,l 1,i 1, j 1 -1) to (h 2,l 2,i 2, j 2 ) a2: value of matching LSP (h 1,l 1,i 1, j 1 ) to (h 2,l 2,i 2, j 2 -1) a3: (value of matching segment (h 1, l 1 ) to (h 2, l 2 )) + (value of matching segment (i 1, j 1 ) to (i 2, j 2 )) Case a3 can be used with s4 to allow new LSPs to be made from right segments of matched LSPs.

6-Jul LSP Cases – Within Right If the arcs link to positions within the right side of the LSPs, then the segments within the arcs can be the right sides of new LSPs. a4: 1 + (value of matching LSP (h 1,l 1,k 1 +1, j 1 -1) to (h 2,l 2, k 2 +1, j 2 -1)) + (value of matching segment (i 1, k 1 -1) to (i 2, k 2 -1))

6-Jul LSP Cases – Within Right Alternatively, the arcs could bound segments that are within the structure of the right side of the LSPs. a5: 1 + (value of matching LSP (h 1, l 1, i 1, k 1 -1) to (h 2, l 2, i 2, k 2 -1)) + (value of matching segment (k 1 +1, j 1 -1) to (k 2 +1, j 2 -1))

6-Jul LSP Cases – Cross Left If the arcs cross to the left side of the LSPs, then their left endpoints (k) can form a hole to start new LSPs. a6: 1 + (value of matching LSP (h 1,k 1 -1, k 1 +1, l 1 ) to (h 2,k 2 -1, k 2 +1, l 2 )) + (value of matching segment (i 1, j 1 -1) to (i 2, j 2 -1))

6-Jul LSP Cases – Cross Left The arcs can instead separate the LSP within them from initial segments. a7: 1 + (value of matching LSP (k 1 +1,l 1,i 1, j 1 -1) to (k 2 +1,l 2,i 2, j 2 -1)) + (value of matching segment (h 1, k 1 -1) to (h 2, k 2 -1)) We do not try to link the first and third segments as they would form part of a 3-knot.

6-Jul LSP Cases – Cross Left Matched arcs can break the LSPs into three segments. a8: 1 + (value of matching segment (h 1, k 1 -1) to (h 2, k 2 -1)) + (value of matching segment (k 1 +1, l 1 ) to (k 2 +1, l 2 )) + (value of matching segment (i 1, j 1 -1) to (i 2, j 2 -1))

6-Jul LSP Cases – Crossed LSPs Arcs crossing existing LSPs could need a merging of the LSP types in a6 and a7 – but then we need to consider all places for the split to occur. a9: 1 + max [over all s 1,s 2 with k 1 <s 1 <l 1, k 2 <s 2 <l 2 ] (value of matching LSP (h 1,k 1 -1, s 1 +1,l 1 ) to (h 2,k 2 -1, s 2 +1,l 2 )) +(value of matching LSP (k 1 +1,s 1,i 1, j 1 -1) to (k 2 +1,s 2,i 2, j 2 -1))

6-Jul Dynamic Programming  These cases take care of all possibilities for how LSPs and segments can be broken down, and their results merged.  They can be turned straightforwardly into a dynamic programming algorithm that uses two tables (one for segments, one for LSPs)  The algorithm will need to weave between these two tables in a way consistent with the data

6-Jul Making It Feasible This algorithm makes very heavy use of multidimensional dynamic programming tables, and looks more of theoretical interest than practical use.  Time complexity is high at O(n 10 )  Space complexity is even more crucial at O(n 8 ) Careful implementation is needed to avoid these theoretical worst cases.

6-Jul Engineering Space and Time  Space and time usage can be minimised by eliminating those computations that are not needed.  The recurrence should be computed recursively (using memoisation) to enable the data to help this pruning  Note that most segment pairs will not correspond to LSPs consistent with a given arc structure  The table can be allocated dynamically, in layers, so that a hyperplane of the table is only allocated if it will contain an entry (and note h < l < i < j )  We can reduce this further by limiting hyperplane sizes to the corresponding segment within an arc

6-Jul Experiments  Having reduced the space, experiments were run on a variety of RNA structural data to determine if the algorithm is of practical use  Large Subunit ribosomal RNA structures  RNAse P structures  Mosaic Virus structures  Structures of up to 400 arcs were compared effectively in 4Gb of space, with correct substructures found  allocating about of the theoretical table  Even the O(n 4 ) recurrence for unknotted structures would need too much space without the space saving technique

6-Jul Conclusion and Future Work  Under these restrictions, RNA bond structures can be compared in polynomial time  With careful case pruning, the algorithm is feasible and produces useful results  The problem of comparing general 2-colourable bond structures (allowing endpoint interleaving) is still open  Extensions to pattern discovery for multiple structures can be explored  Weights can be added to model RNA more accurately