Beyond ab initio modelling… Comparative and Boltzmann equilibrium Yann Ponty, CNRS/Ecole Polytechnique with invaluable help from Alain Denise, LRI/IGM,

Slides:



Advertisements
Similar presentations
RNA Secondary Structure Prediction
Advertisements

RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Gene Prediction Preliminary Results Computational Genomics February 20, 2012.
1/2/3 dimensional visualization of RNA Yann Ponty (VARNA), CNRS/Ecole Polytechnique, France Jim Procter (JalView), University of Dundee, UK.
Lecture 8 Alignment of pairs of sequence Local and global alignment
RNA Structure Prediction
Structural bioinformatics
An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure Jérôme Waldispühl, PhD School of Computer.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export –Spliceosome –Regulatory molecules (RNAi) –Enzymes –Virus –Retrotransposons.
Introduction to Bioinformatics - Tutorial no. 9 RNA Secondary Structure Prediction.
RNA Secondary Structure aagacuucggaucuggcgacaccc uacacuucggaugacaccaaagug aggucuucggcacgggcaccauuc ccaacuucggauuuugcuaccaua aagccuucggagcgggcguaacuc.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Zhi John Lu, Jason Gloor, and David H. Mathews University of Rochester Medical Center, Rochester, New York Improved RNA Secondary Structure Prediction.
RNA Structure Prediction Rfam – RNA structures database RNAfold – RNA secondary structure prediction tRNAscan – tRNA prediction.
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
Heuristic Approaches for Sequence Alignments
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
Predicting RNA Structure and Function
An Investigation into Selection Constraints in RNA Genes Naila Mimouni, Rune Lyngsoe and Jotun Hein Department of Statistics, Oxford University Aim A robust.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
March 2006Vineet Bafna ncRNA detection w/ multiple alignments.
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
RNA Structure Prediction Rfam – RNA structures database RNAfold – RNA secondary structure prediction tRNAscan – tRNA prediction.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
RNA informatics Unit 12 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
BLAST What it does and what it means Steven Slater Adapted from pt.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Multiple Sequence Alignment. Definition Given N sequences x 1, x 2,…, x N :  Insert gaps (-) in each sequence x i, such that All sequences have the.
ZORRO : A masking program for incorporating Alignment Accuracy in Phylogenetic Inference Sourav Chatterji Martin Wu.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
RNA folding & ncRNA discovery I519 Introduction to Bioinformatics, Fall, 2012.
Using the T-Coffee Multiple Sequence Alignment Package I - Overview Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
RNA Structure Prediction Chapter 16. Primary, Secondary and Tertiary Structures.
Construction of Substitution Matrices
Improving the prediction of RNA secondary structure by detecting and assessing conserved stems Xiaoyong Fang, et al.
RNA Structure Prediction
Prediction of Secondary Structure of RNA
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Construction of Substitution matrices
Motif Search and RNA Structure Prediction Lesson 9.
Tracking down ncRNAs in the genomes. How to find ncRNA gene The stability of ncRNA secondary structure is not sufficiently different from the predicted.
Finding, Aligning and Analyzing Non Coding RNAs Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Step 3: Tools Database Searching
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
RNA Structure Prediction
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
An unbiased adaptive sampling algorithm for the exploration of RNA mutational landscapes under evolutionary pressure Jérôme Waldispühl, PhD School of Computer.
ncRNA Multiple Alignments with R-Coffee
Multiple sequence alignment (msa)
Vienna RNA web servers
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
Structure Prediction dmitra 11/18/2018.
Multiply Aligning RNA Sequences
Comparative RNA Structural Analysis
Protein structure prediction.
Presentation transcript:

Beyond ab initio modelling… Comparative and Boltzmann equilibrium Yann Ponty, CNRS/Ecole Polytechnique with invaluable help from Alain Denise, LRI/IGM, Université Paris-Sud M2 Bioinfo Paris-Saclay

Prediction by homology M2 Bioinfo Paris-Saclay  Data : several homologous RNA sequences.  Output : a consensus structure for this set of sequences.

Prediction by Homology From sequence alignment 3M2 Bioinfo Paris-Saclay

Detecting covariations M2 Bioinfo Paris-Saclay  We start from a sequence alignment: GAGGACTGAGCTCAGTTAAAGTGCCTG AAGGGCCCCGCTGGGCAAAG--GCTG- AAGGGGTCGGCTGACCTAAAGTAGTTG GAGGGGTGAG-GCAUCTAAAGTGTTTG GAGGACTGTGCTCAGTTAAAGTGTTTG  Look for sequence covariations

Detecting covariations M2 Bioinfo Paris-Saclay  We start from a sequence alignment: GAGGACTGAGCTCAGTTAAAGTGCCTG AAGGGCCCCGCTGGGCAAAG--GCTG AAGGGGTCGGCTGACCTAAAGTAGTTG GAGGGGTGAG-GCAUCTAAAGTGTTTG GAGGACTGTGCTCAGTTAAAGTGTTTG ( )  We search for sequence covariations,  They come from compensatory mutations during the evolution

Detecting covariations M2 Bioinfo Paris-Saclay  We start from a sequence alignment: GAGGACTGAGCTCAGTTAAAGTGCCTG AAGGGCCCCGCTGGGCAAAG--GCTG AAGGGGTCGGCTGACCTAAAGTAGTTG GAGGGGTGAG-GCAUCTAAAGTGTTTG GAGGACTGTGCTCAGTTAAAGTGTTTG....((((....))))  We search for sequence covariations  They come from compensatory mutations during the evolution

Detecting covariations M2 Bioinfo Paris-Saclay  We start from a sequence alignment: GAGGACTGAGCTCAGTTAAAGTGCCTG AAGGGCCCCGCTGGGCAAAG--GCTG AAGGGGTCGGCTGACCTAAAGTAGTTG GAGGGGTGAG-GCAUCTAAAGTGTTTG GAGGACTGTGCTCAGTTAAAGTGTTTG....((((....))))  Measure : mutual information between positions i and j : - ∑ Pr(i=a) Pr(j=b) log(Pr(i=a|j=b)) a,b where a and b are the different nucleotides.

Two softwares based on this approach M2 Bioinfo Paris-Saclay  RNA-alifold (Hofacker et al. 2000)  RNAz (Washietl et al. 2005)

RNAalifold 9M2 Bioinfo Paris-Saclay

Application : tRNA Alanine >Artibeus_jamaicensis AAGGGCTTAGCTTAATTAAAGTAGTTGATTTGCATTCAGCAGCTGTAGGATAAAGTCTTGCAGTCCTTA >Balaenoptera_musculus GAGGATTTAGCTTAATTAAAGTGTTTGATTTGCATTCAATTGATGTAAGATATAGTCTTGCAGTCCTTA >Bos_taurus GAGGATTTAGCTTAATTAAAGTGGTTGATTTGCATTCAATTGATGTAAGGTGTAGTCTTGCAATCCTTA >Canis_familiaris GAGGGCTTAGCTTAATTAAAGTGTTTGATTTGCATTCAATTGATGTAAGATAGATTCTTGCAGCCCTTA >Ceratotherium_simum GAGGGTTTAGCTTAATTAAAGTGTTTGATTTGCATTCAGTTGATGTAAGATAGAGTCTTGCAGCCCTTA >Dasypus_novemcinctus GAGGACTTAGCTTAATTAAAGTGCCTGATTTGCGTTCAGGAGATGTGGGGCTAAATCTTGCAGTCCTTA >Equus_asinus AAGGGCTTAGCTTAATGAAAGTGTTTGATTTGCGTTCAATTGATGTGAGATAGAGTCTTGCAGTCCTTA >Erinaceus_europeus GAGGATTTAGCTTAAAAAAAGTGGTTGATTTGCATTCAATTGATATAGGAAATATAATCTTGTAATCCTTA >Felis_catus GAGGACTTAGCTTAATTAAAGTGTTTGATTTGCAATCAATTGATGTAAGATAGATTCTTGCAGTCCTTA >Hippopotamus_amphibius AGGGACTTAGCTTAATAAAAGCAGTTGAGTTGCATTCAATTGATGTGAGGTGCGGTCTTGCAGTCTCTA >Homo_sapiens AAGGGCTTAGCTTAATTAAAGTGGCTGATTTGCGTTCAGTTGATGCAGAGTGGGGTTTTGCAGTCCTTA 10M2 Bioinfo Paris-Saclay

Exercise M2 Bioinfo Paris-Saclay Compute an alignment of the previous sequences, by using MAFFT: (do not forget to set the Nucleic Acid option) 2. Copy/paste the result in RNAalifold : 3. Look at the result.

MAFFT alignment >Artibeus_jamaicensis AAGGGCTTAGCTTAATTAAAGTAGTTGATTTGCATTCAGCAGCTGTAGG--ATAAAGTCTTGCAGTCCTTA >Balaenoptera_musculus GAGGATTTAGCTTAATTAAAGTGTTTGATTTGCATTCAATTGATGTAAG--ATATAGTCTTGCAGTCCTTA >Bos_taurus GAGGATTTAGCTTAATTAAAGTGGTTGATTTGCATTCAATTGATGTAAG--GTGTAGTCTTGCAATCCTTA >Canis_familiaris GAGGGCTTAGCTTAATTAAAGTGTTTGATTTGCATTCAATTGATGTAAG--ATAGATTCTTGCAGCCCTTA >Ceratotherium_simum GAGGGTTTAGCTTAATTAAAGTGTTTGATTTGCATTCAGTTGATGTAAG--ATAGAGTCTTGCAGCCCTTA >Felis_catus GAGGACTTAGCTTAATTAAAGTGTTTGATTTGCAATCAATTGATGTAAG--ATAGATTCTTGCAGTCCTTA >Equus_asinus AAGGGCTTAGCTTAATGAAAGTGTTTGATTTGCGTTCAATTGATGTGAG--ATAGAGTCTTGCAGTCCTTA >Homo_sapiens AAGGGCTTAGCTTAATTAAAGTGGCTGATTTGCGTTCAGTTGATGCAGA--GTGGGGTTTTGCAGTCCTTA >Hippopotamus_amphibius AGGGACTTAGCTTAATAAAAGCAGTTGAGTTGCATTCAATTGATGTGAG--GTGCGGTCTTGCAGTCTCTA >Dasypus_novemcinctus GAGGACTTAGCTTAATTAAAGTGCCTGATTTGCGTTCAGGAGATGTGGG--GCTAAATCTTGCAGTCCTTA >Erinaceus_europeus GAGGATTTAGCTTAAAAAAAGTGGTTGATTTGCATTCAATTGATATAGGAAATATAATCTTGTAATCCTTA 12M2 Bioinfo Paris-Saclay

RNAalifold 13M2 Bioinfo Paris-Saclay

Application : tRNA H.sapiens >Homo_sapiensArg TGGTATATAGTTTAAACAAAACGAATGATTTCGACTCATTAAATTATGATAATCATATTTACCAA >Homo_sapiensAsn TAGATTGAAGCCAGTTGATTAGGGTGCTTAGCTGTTAACTAAGTGTTTGTGGGTTTAAGTCCCATTGGTCTAG >Homo_sapiensAsp AAGGTATTAGAAAAACCATTTCATAACTTTGTCAAAGTTAAATTATAGGCTAAATCCTATATATCTTA >Homo_sapiensCys AGCTCCGAGGTGATTTTCATATTGAATTGCAAATTCGAAGAAGCAGCTTCAAACCTGCCGGGGCTT >Homo_sapiensGln TAGGATGGGGTGTGATAGGTGGCACGGAGAATTTTGGATTCTCAGGGATGGGTTCGATTCTCATAGTCCTAG >Homo_sapiensGlu GTTCTTGTAGTTGAAATACAACGATGGTTTTTCATATCATTGGTCGTGGTTGTAGTCCGTGCGAGAATA >Homo_sapiensGly ACTCTTTTAGTATAAATAGTACCGTTAACTTCCAATTAACTAGTTTTGACAACATTCAAAAAAGAGTA >Homo_sapiensHis GTAAATATAGTTTAACCAAAACATCAGATTGTGAATCTGACAACAGAGGCTTACGACCCCTTATTTACC >Homo_sapiensIso AGAAATATGTCTGATAAAAGAGTTACTTTGATAGAGTAAATAATAGGAGCTTAAACCCCCTTATTTCTA >Homo_sapiensLeuCun ACTTTTAAAGGATAACAGCTATCCATTGGTCTTAGGCCCCAAAAATTTTGGTGCAACTCCAAATAAAAGTA 14M2 Bioinfo Paris-Saclay

Exercise M2 Bioinfo Paris-Saclay The same as previously, but with these new sequences. 1. Compute an alignment of the previous sequences, by using ClustalW or ClustalO: (do not forget to put the « DNA » option) 2. Copy/paste the result in RNAalifold : 3. Look at the result. What happened ? Why ?

MAFFT alignment >Homo_sapiensArg TGGTATATAGT---TTAAACAAAACGAATGATTTCGACTCATTAAAT---TATGATAA---TCATATTTACCAA >Homo_sapiensGly ACTCTTTTAGT---ATAAATAGTACCGTTAACTTCCAATTAACTAGT---TTTGACAACATTCAAAAAAGAGTA >Homo_sapiensHis GTAAATATAGT---TTAACCAAAACATCAGATTGTGAATCTGACAAC--AGAGGCTTACGACCCCTTATTTACC >Homo_sapiensIso AGAAATATGTC---TGATAAAAGAGTTACTTTGATAGAGTAAATAAT--AGGAGCTTAAACCCCCTTATTTCTA >Homo_sapiensGlu GTTCTTGTAGT---TGAAATACAACGATGGTTTTTCATATCATTGGT--CGTGGTTGTAGTCCGTGCGAGAATA >Homo_sapiensLeuCun ACTTTTAAAGG---ATAACAGCTATCCATTGGTCTTAGGCCCCAAAAATTTTGGTGCAACTCCAAATAAAAGTA >Homo_sapiensAsn TAGATTGAAGCCAGTTGATTAGGGTGCTTAGCTGTTAACTAAGTGTT-TGTGGGTTTAAGTCCCATTGGTCTAG >Homo_sapiensGln TAGGATGGGGTGTGATAGGTGGCACGGAGAATTTTGGATTCTCAGGG--ATGGGTTCGATTCTCATAGTCCTAG >Homo_sapiensCys AGCTCCGAGGT-----GATTTTCATATTGAATTGCAAATTCGAAGAA---GCAGCTTCAAACCTGCCGGGGCTT >Homo_sapiensAsp AAGGTATTAGA---AAAACCATTTCATAACTTTGTCAAAGTTAAATT---ATAGGCTAAATCCTATATATCTTA 16M2 Bioinfo Paris-Saclay

RNAalifold 17M2 Bioinfo Paris-Saclay RNAalifold finds a common but much less conserved structure.

Prediction by Homology Simultaneous folding and alignment 18M2 Bioinfo Paris-Saclay

Problem specification  Data : a set of sequences  Output : a sequence alignment, and a common secondary structure. 19M2 Bioinfo Paris-Saclay

Approaches  The reference approach: Sankoff’s algorithm (1985)  Algorithmic approach: dynamic programming  Complexity : n 3k for k sequences of length n  There are several implementatons, herer are two of them (with constraints):  Foldalign (Gorodkin, Heyer, Stormo 1997, Havgaard, Lyngso, Stormo, Gorodkin 2005).  Dynalign (Mathews, Turner 2002)  Heuristics based on this algorithm :  LocaRNA ( freiburg.de:8080/LocARNA.jsp). freiburg.de:8080/LocARNA.jsp 20M2 Bioinfo Paris-Saclay

Exercise M2 Bioinfo Paris-Saclay Take the two previous sets of sequences (one after the other) and run LocARNA. Look at the results Consider the first set only. Run LocARNA with the first two sequences, then the first three, and so on. How many sequences do you need to get the right tRNA structure?

Sankoff’s algorithm in a few words :  Data : a set of sequences  Parameters : a score matrix, giving a score S ij,kl for each alignment of pairs of nucleotides.  Output : a sequence alignment, and a common secondary structure.  Method : dynamic programming.  It is a bit complicated, so we will study a simplified version of the algorithm : Foldalign.  Two sequences only  No multiloop allowed in the secondary structure  Simplified score matrix 22M2 Bioinfo Paris-Saclay

23M2 Bioinfo Paris-Saclay

Recurrence relation for Foldalign 24M2 Bioinfo Paris-Saclay

25M2 Bioinfo Paris-Saclay

26M2 Bioinfo Paris-Saclay

27M2 Bioinfo Paris-Saclay

28M2 Bioinfo Paris-Saclay

29M2 Bioinfo Paris-Saclay

30M2 Bioinfo Paris-Saclay

From energy minimization to Boltzmann equilibrium? M2 Bioinfo Paris-Saclay

Optimization methods can be overly sensitive to fluctuations of the energy model Example:  Get RFAM seed alignment for D1-D4 domain of the Group II intron  Extract A. capsulatum ( Acidobacterium_capsu.1 ) sequence  Run RNAFold on sequence using default parameters  Rerun RNAFold using latest energy parameters Denise Ponty - Tuto ARN - Stability (Turner 2004) RNA ACGAUCGCGA CUACGUGCAU CGCGGCACGA CUGCGAUCUG CAUCGGA... Stability (Turner 1999) <ε<ε

Probabilistic approaches in RNA folding  RNA in silico paradigm shift:  From single structure, minimal free-energy folding…  … to ensemble approaches. …CAGUAGCCGAUCGCAGCUAGCGUA… Ensemble diversity? Structure likelihood? Evolutionary robustness? UnaFold, RNAFold, Sfold… M2 Bioinfo Paris-Saclay

Probabilistic approaches indicate uncertainty and suggest alternative conformations Example: >ENA|M10740|M Saccharomyces cerevisiae Phe-tRNA. : Location:1..76 GCGGATTTAGCTCAGTTGGGAGAGCGCCAGACTGAAGATTTGGAGGTCCTGTGTTCGATCCACAGAATTCGCACCA M2 Bioinfo Paris-Saclay Native structure RNAFold -p « dot-plot »

ij i+1j-1 i i+1 j j i j-1 i kk+1 j Nussinov’s algorithm (1978) Partition function algorithms can be adapted from non-ambiguous* DP scheme Is this decomposition ambiguous? * Ambiguous = Multiple ways to generate a structure 35M2 Bioinfo Paris-Saclay