Alain Denise Bioinformatique LRI Orsay UMR CNRS 8623 Université Paris-Sud 11 Algorithmes pour la comparaison des structures secondaires dARN Algorithmes.

Slides:



Advertisements
Similar presentations
B. Knudsen and J. Hein Department of Genetics and Ecology
Advertisements

RNA Secondary Structure Prediction
Name ____________________ Date ___________ Period ____.
Algorithmics of -1 frameshift RNA sequences Michaël Bekaert 1, Laure Bidou 1, Alain Denise 1,2, Guillemette Duchateau-Nguyen 1, Céline Fabret 1 Jean-Paul.
School of CSE, Georgia Tech
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Chapter 7 Dynamic Programming.
Profiles for Sequences
Predicting the 3D Structure of RNA motifs Ali Mokdad – UCSF May 28, 2007.
Structural bioinformatics
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
§ 8 Dynamic Programming Fibonacci sequence
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Quantifying Basepair Isostericity Jesse Stombaugh 1, Craig L. Zirbel 2, Eric Westhof 4, and Neocles B. Leontis 3,* 1 Department of Biological Sciences,
Reading Report Ce WANG A segment alignment approach to protein comparison.
7 -1 Chapter 7 Dynamic Programming Fibonacci Sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Geometric Crossovers for Supervised Motif Discovery Rolv Seehuus NTNU.
Finding Common RNA Pseudoknot Structures in Polynomial Time Patricia Evans University of New Brunswick.
Structural Alignment of Pseudoknotted RNAs Banu Dost, Buhm Han, Shaojie Zhang, Vineet Bafna.
Guided Forest Edit Distance: Better Structure Comparisons by Using Domain-knowledge Z.S. Peng H.F. Ting.
CMPT-825 (Natural Language Processing) Presentation on Zipf’s Law & Edit distance with extensions Presented by: Kaustav Mukherjee School of Computing Science,
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Tree edit distance1 Tree Edit Distance.  Minimum edits to transform one tree into another Tree edit distance2 TED.
An Investigation into Selection Constraints in RNA Genes Naila Mimouni, Rune Lyngsoe and Jotun Hein Department of Statistics, Oxford University Aim A robust.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Dynamic Programming (cont’d) CS 466 Saurabh Sinha.
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
The dynamic nature of the proteome
QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University.
Phylogenetic Reconstruction based on RNA Secondary Structural Alignment Benny Chor, Tel-Aviv Univ. Joint work with Moran Cabili, Assaf Meirovich, and Metsada.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
7 -1 Chapter 7 Dynamic Programming Fibonacci sequence Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, … F i = i if i  1 F i = F i-1 + F i-2 if.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
Input Sensitive Algorithms for Multiple Sequence Alignment Pankaj Yonatan University Rachel
Using BLAST for Genomic Sequence Annotation Jeremy Buhler For HHMI / BIO4342 Tutorial Workshop.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Structural Alignment of Pseudo-knotted RNA
Dynamic Programming: Edit Distance
Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.
DNA, RNA and protein are an alien language
Motif Search and RNA Structure Prediction Lesson 9.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Local Exact Pattern Matching for Non-fixed RNA Structures Mika Amit, Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Mohl, Christina Schmiedl, Sebastian.
Poster Design & Printing by Genigraphics ® Esposito, D., Heitsch, C. E., Poznanovik, S. and Swenson, M. S. Georgia Institute of Technology.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
Bioinformatics Overview
CSCI2950-C Lecture 12 Networks
Stochastic Context-Free Grammars for Modeling RNA
RNA Secondary Structure Prediction
Stochastic Context-Free Grammars for Modeling RNA
SPIRE Normalized Similarity of RNA Sequences
Intro to Alignment Algorithms: Global and Local
Comparative RNA Structural Analysis
Protein Structures.
RNA Secondary Structure Prediction
SPIRE Normalized Similarity of RNA Sequences
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
Sequence Analysis Alan Christoffels
Multiple Sequence Alignment
Presentation transcript:

Alain Denise Bioinformatique LRI Orsay UMR CNRS 8623 Université Paris-Sud 11 Algorithmes pour la comparaison des structures secondaires dARN Algorithmes pour la comparaison des structures secondaires dARN

© Ebbe Sloth Andersen Les multiples rôles de lARN

© Ebbe Sloth Andersen Les multiples rôles de lARN

Why RNA ? Present in all cellular processes The only molecule which can be genome as well as catalyser Origin of life (?): RNA world Frequent target for antibiotics © E.Westhof 2005

RNA structure: tRNA Primary structure Tertiary structure Secondary structure GCGGAUUUAGCUCAGUUGGGAGAGCGCCAGACUGAAUAUCUGGAGGUCCUGUGUUCGAUCCCACAGAAUUCGCACCA

RNA structure levels RNA structure ~ Graph of bounded degree, containing a (known) hamiltonian path. Arc-annotated sequences General (Tertiary structure) Crossing (Secondary structure with pseudoknots) Nested (Secondary structure without pseudoknots) Plain (Primary structure)

RNA « Bio-Algorithmics » Structure prediction (given sequence) Design: sequence prediction (given structure) Structural pattern-matching Comparison of two or several structures

Why to compare RNA structures ? How much are they similar (or different?) classification phylogeny Which parts are the more similar between the two structures? Is the small one similar to a part of the large one? Comparison score + correspondence between the structures

Edition and alignment We are given a set of basic operations and a score function associated to each of them. Data : two structures S 1 and S 2. Edit(S 1,S 2 ) : find a best-scoring sequence of operations which changes S 1 into S 2. Align(S 1,S 2 ) : find a structure S which contains S 1 and S 2 as substructures, in such a way to maximize Score(Edit(S 1,S)+Edit(S 2,S)).

Example: sequence comparison Deux séquences v = v 1 v 2 …v n et w = w 1 w 2 …w m Opérations dédition : ins(x,i) suppr(x,i) subs(x,y,i) CHAT - suppr(C,1) HAT - subs(H,R,1) RAT (Pour les séquences : édition ~ alignement : CHAT - RAT)

Example: tree comparison

Edition vs Alignment Alignment Edition Ins( )Del( ) Subs(, ) Ancestor relations are conserved

The nested case Secondary structures (without pseudokots) Tree comparison

Tree edition algorithm Zhang, Shasha 1989

Tree edition algorithm Score( (f), (f) ) = Max Subs(, ) + Score(f,f) Ins( ) + Score( (f), f ) Del( ) + Score( f, (f) ) Score( [ (f) o t 1 o … o t p ], [ (f), t 1 o … o t q ] ) = Max Score( (f), (f)) + Score([t 1 o … o t p ], [t 1 o … o t q ]) Ins( ) + Score( [ (f) o t 1 o … o t p ], [ f, t 1 o … o t q ]) Del( ) + Score([ f o t 1 o … o t p ], [ (f) o t 1,… o t q ]) f t 1 t 2 … t p Zhang, Shasha 1989 O(n 3 logn) [Klein 1998]

Score( (f), (f) ) = Max Subs(, ) + Score(f,f) Ins( ) + Score( (f), f ) Del( ) + Score( f, (f) ) Tree alignment algorithm Score( (f) o t 1 o … o t p ; (f) o t 1 o … o t q ) = Max Score( (f); (f)) + Score(t 1 o … o t p ; t 1 o … o t q ) Ins( ) + Max i { Score( (f) o … o t i ; f ) + Score(t i+1 o … o t p ; t 1 o … o t q ) } Del( ) + Max j { Score( f ; (f) o t 1 o … o t j ) + Score(t 1 o … o t p ; t j+1 o … o t q ) } f t 1 t 2 … t p Jiang, Wang, Zhang 1995 O(n 4 )

Edition vs Alignment Score( [ (f), t 1,…,t p ], [ (f), t 1,…,t q ] ) = Max … Ins( ) + Score( [ (f), t 1,…,t p ], [ f, t 1,…,t q ]) … Score( [ (f), t 1,…,t p ], [ (f), t 1,…,t q ] ) = Max … Ins( ) + Max i { Score( [ (f), …t i ], f ) + Score([t i+1,…, t p ], [t 1,…,t q ]) } …

Edition vs Alignment Score(, ) = Max … Ins( ) + Score(, ) … Score(, ) = Max … Ins( ) + Max i { Score(, ) + Score(, ) } … i+1i

Edition vs Alignment Score(, ) = Max … Ins( ) + Score(, ) … Score(, ) = Max … Ins( ) + Max i { Score(, ) + Score(, ) } … i+1i Can be inserted anywhere

Complexity Edition [Zhang, Shasha 1989, Klein 1998] Worst-case : O(n 4 ) [Zhang-Shasha 1989] O(n 3 logn) [Klein 1998, Dulucq-Touzet 2003] In average : O(n 3 ) [Dulucq-Tichit 2003] Alignment [Jiang, Wang, Zhang 1995] Worst-case : O(n 4 )

3 operations! AU GC GU UA UU Delete( ) Insert( ) Edition operations: problem A-U U-A G-C C-U A-U U G-C C-U AUGG…….UCAUAUGG…….UCUU

Opérations on bases: Substitution: Deletion / Insertion: Operations on arcs: Arc-substitution: Arc-deletion / Arc-insertion: Arc-breaking / : Arc-altering / : A C A C G U A C G C G C - Edition operations on RNA New

A first solution A-U U-A G-C C-U A-U U A G-C C-U AUGG…….UCAU A U G C U A C U A U G C U A C U But this implies some constraints on the scores. For example: Arc-deletion = Arc-Breaking + 2 Base-Deletion Höchsmann, Töller, Gierich, Kurtz 2003 (RNAforester)

Edition operations on RNA Opérations on bases: Substitution: Deletion / Insertion: Operations on arcs: Arc-substitution: Arc-deletion / Arc-insertion: Arc-breaking / : Arc-altering / : A C A C G U A C G C G C -

General Crossing Nested Plain Complexity of the edition problem

GeneralCrossingNestedPlain General NP-complete Crossing NP-complete Nested NP-completeO(nm 3 ) Plain O(nm / logn) Jiang, Lin, Ma, Zhang 2002 Blin, Fertin, Rusu, Sinoquet 2003 Crochemore, Landau, Ziv-Ukelson 2002 If 2 Score(Arc-altering) = Score(Arc-breaking) + Score (Arc-removing), then algorithm in O(n 3 m) or Edit(crossing,nested) et Edit(nested,nested) Complexity of the edition problem

Complexity of 2 ary struct. comparison Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] ?

Secondary structure alignment A-BCD-EFG ABB-DF-FG AB---CDEFG ABBDF---FG ABCDEFGABBDFFG EditionAlignment

New edition operations on trees Arc-breaking / : Arc-altering / : C G C G C -

Alignment algorithm (1/5) f

Alignment algorithm (2/5) f t

Alignment algorithm (2/5) f t

Alignment algorithm (2/5) f t

Alignment algorithm (2/5) f t

Alignment algorithm (2/5) f t

Alignment algorithm (3/5) f t

Alignment algorithm (3/5) f t

Alignment algorithm (3/5) f t

Alignment algorithm (3/5) f t

Alignment algorithm (3/5) f t

Alignment algorithm (4/5) f t

Alignment algorithm (5/5) f t

Alignment algorithm (5/5) f t

Alignment algorithm (5/5) f t

Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] O(n 4 ) [Herrbach, AD, Dulucq, Touzet 2005] Complexity of 2 ary struct. comparison

Tree operationsRNA operations EditionO(n 3 logn) [Zhang-Shasha 1989, Klein 1998] NP-complete [Blin, Fertin, Sinoquet, Rusu 2003] AlignmentO(n 4 ) [Jiang, Wang, Zhang 1995] O(n 4 ) [Herrbach, AD, Dulucq, Touzet 2005] Complexity of 2 ary struct. comparison Complexity of the alignment problem for the other structure levels: [Blin, Touzet 2006]

Example: two tRNAs Homo sapiensBacillus subtilis Drawing: Tulip (David Auber et al., LaBRI) Base-subs / Arc-subs Deletions / Insertions Arc-breaking Arc-altering

Et dans la vraie vie ?

Alignement de RNAses P

To do… Biological validation : Test on real data Comparison with other softwares ( RNAForester, MiGal [J.Allali, M.F.Sagot] ) Combined approaches ( [J.Allalli, A.Ouangraoua-P.Ferraro] ) Parameters : substitution matrices etc. Statistical evaluation of results Relevant algorithms and parameters Useful and user-friendly programs Sequence/Structure alignment Multiple alignment …

Crédits Julien Allali David Auber Serge Dulucq Claire Herrbach Rym Kachouri Yann Ponty Michel Termier Laurent Tichit Hélène Touzet Eric Westhof