Motif Search and RNA Structure Prediction Lesson 9.

Slides:



Advertisements
Similar presentations
MicroARNs, comparaison, prediction
Advertisements

Gene expression From Gene to Protein
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Transcription Translation
Improving miRNA Target Genes Prediction Rikky Wenang Purbojati.
MiRNA in computational biology 1 The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig C. Mello for their discovery of "RNA interference.
RNA Structure Prediction
A turbo intro to (the bioinformatics of) microRNAs 11/ Peter Hagedorn.
Predicting RNA Structure and Function. Non coding DNA (98.5% human genome) Intergenic Repetitive elements Promoters Introns mRNA untranslated region (UTR)
Predicting RNA Structure and Function
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Chapter 4 Transcription and Translation. The Central Dogma.
Computational biology seminar
Predicting RNA Structure and Function. Nobel prize 1989Nobel prize 2009 Ribozyme Ribosome RNA has many biological functions The function of the RNA molecule.
Presenting: Asher Malka Supervisor: Prof. Hermona Soreq.
LECTURE 5: DNA, RNA & PROTEINS
MicroRNA genes Ka-Lok Ng Department of Bioinformatics Asia University.
Predicting RNA Structure and Function. Following the human genome sequencing there is a high interest in RNA “Just when scientists thought they had deciphered.
[Bejerano Fall10/11] 1.
. Class 5: RNA Structure Prediction. RNA types u Messenger RNA (mRNA) l Encodes protein sequences u Transfer RNA (tRNA) l Adaptor between mRNA molecules.
Predicting RNA Structure and Function
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
microRNA computational prediction and analysis
Introduction to RNA Bioinformatics Craig L. Zirbel October 5, 2010 Based on a talk originally given by Anton Petrov.
RNA informatics Unit 12 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.
Non-coding RNA gene finding problems. Outline Introduction RNA secondary structure prediction RNA sequence-structure alignment.
Chapter 13.2 (Pgs ): Ribosomes and Protein Synthesis
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
From motif search to gene expression analysis
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Intelligent Systems for Bioinformatics Michael J. Watts
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
MicroRNA identification based on sequence and structure alignment Presented by - Neeta Jain Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong.
RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.
1 Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine Chenghai Xue, Fei Li, Tao He,
RNA Secondary Structure Prediction. 16s rRNA RNA Secondary Structure Hairpin loop Junction (Multiloop)Bulge Single- Stranded Interior Loop Stem Image–
CFE Higher Biology DNA and the Genome Translation.
© Wiley Publishing All Rights Reserved. RNA Analysis.
Lecture 9 CS5661 RNA – The “REAL nucleic acid” Motivation Concepts Structural prediction –Dot-matrix –Dynamic programming Simple cost model Energy cost.
RNA secondary structure RNA is (usually) single-stranded The nucleotides ‘want’ to pair with their Watson-Crick complements (AU, GC) They may ‘settle’
RNA Structure Prediction
CSLS Retreat 2007 Matan Hofree & Assaf Weiner 1. Outline  A brief introduction to microRNA  Project motivation and goal  Selecting the data sets 
Spliceosome attachs to hnRNA and begins to snip out non-coding introns mRNA strand composed of exons is free to leave the nucleus.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
MicroRNAs and Other Tiny Endogenous RNAs in C. elegans Annie Chiang JClub Ambros et al. Curr Biol 13:
Questions?. Novel ncRNAs are abundant: Ex: miRNAs miRNAs were the second major story in 2001 (after the genome). Subsequently, many other non-coding genes.
Doug Raiford Lesson 7.  RNA World Hypothesis  RNA world evolved into the DNA and protein world  DNA advantage: greater chemical stability  Protein.
Improving Intergenic miRNA Target Genes Prediction Rikky Wenang Purbojati.
RNA Structure Prediction RNA Structure Basics The RNA ‘Rules’ Programs and Predictions BIO520 BioinformaticsJim Lund Assigned reading: Ch. 6 from Bioinformatics:
Introduction to Bioinformatics Algorithms Algorithms for Molecular Biology CSCI Elizabeth White
This seems highly unlikely.
Construction of Substitution matrices
Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine 朱林娇 14S
MicroRNA Prediction with SCFG and MFE Structure Annotation Tim Shaw, Ying Zheng, and Bram Sebastian.
RNA Structure Prediction
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
Rapid ab initio RNA Folding Including Pseudoknots via Graph Tree Decomposition Jizhen Zhao, Liming Cai Russell Malmberg Computer Science Plant Biology.
Higher Human Biology Unit 1 Human Cells KEY AREA 3: Gene Expression.
RNAs. RNA Basics transfer RNA (tRNA) transfer RNA (tRNA) messenger RNA (mRNA) messenger RNA (mRNA) ribosomal RNA (rRNA) ribosomal RNA (rRNA) small interfering.
DNA, RNA, & Protein Synthesis (12.3) State Standards 2A. Distinguish between DNA and RNA. 2B. Explain the role of DNA in storing and transmitting cellular.
Projects
bacteria and eukaryotes
Predicting RNA Structure and Function
RNA Secondary Structure Prediction
Introduction to Bioinformatics II
RNA 2D and 3D Structure Craig L. Zirbel October 7, 2010.
credit: modification of work by NIH
Presentation transcript:

Motif Search and RNA Structure Prediction Lesson 9

Finding short motifs in biological data (DNA, RNA and Protein sequences ) Scenario 1 : Binding motif is known (easier case) Scenario 2 : Binding motif is unknown (hard case)

Scenario 2 : Binding motif is unknown “Ab initio motif finding” Why is it hard???

Are common motifs the right thing to search for ?

?

Solutions: -Searching for motifs which are enriched in one set but not in a random set - Use experimental information to rank the sequences according to their binding affinity and search for enriched motifs at the top of the list

Sequencing the regions in the genome to which a protein (e.g. transcription factor) binds to. ChIP-Seq

ChIP –SEQ Best Binders Weak Binders Finding the p53 binding motif in a set of p53 target sequences which are ranked according to binding affinity

Ranked sequences list Candidate k-mers CTACGCCTACGC ACTTGA ACGTGAACGTGA ACGTGCACGTGC CTGTGCCTGTGC CTGTGACTGTGA CTGTACCTGTAC ATGTGCATGTGC ATGTGAATGTGA CTATGCCTATGC CTGTGCCTGTGC CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA -a word search approach to search for enriched motif in a ranked list CTGTGACTGTGA CTGTGACTGTGA

The total number of input sequences The number of sequences containing the motif The number of sequences at the top of the list The number of sequences containing the motif among the top sequences Ranked sequences list CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA uses the minimal hyper geometric statistics (mHG) to find enriched motifs

The enriched motifs are combined to get a PSSM which represents the binding motif Detected Enriched motifs

P[ED]XK[RW][RK]X[ED] Protein Motifs Protein motifs are usually 6-20 amino acids long and can be represented as a consensus/profile: or as PWM

From Sequence to Structure Predicting RNA structure

15 protein RNA DNA According to the central dogma of molecular biology the main role of RNA is to transfer genetic information from DNA to protein

RNA has many other biological functions Protein synthesis (ribosome) Control of mRNA stability (UTR) Control of splicing (snRNP) Control of translation (microRNA) Control of transcription (long non-coding RNA) The function of the RNA molecule depends on its folded structure

Nobel prize 2009 Ribosome

RNA Structural levels tRNA Secondary Structure Tertiary Structure

RNA Secondary Structure U U C G U A A U G C 5’ 3’ 5’ G A U C U U G A U C 3’ RNA bases are G, C, A, U The RNA molecule folds on itself. The base pairing is as follows: G C A U G U hydrogen bond. Stem Loop

Predicting RNA secondary Structure Most common approach: Search for a RNA structure with a Minimal Free Energy (MFE) G A U C U U G A U C U U C G U A A U G U G C U A G U Low energy High energy U

Free energy model Free energy of a structure is the sum of all interactions energies Free Energy(E) = E(CG)+E(CG)+….. The aim: to find the structure with the minimal free energy (MFE)

Why is MFE secondary structure prediction hard? MFE structure can be found by calculating free energy of all possible structures BUT the number of potential structures grows exponentially with the number of bases Solution :Dynamic programming (Zucker and Steigler)

Simplifying assumptions for RNA Structure Prediction RNA folds into one minimum free-energy structure. The energy of a particular base can be calculated independently –Neighbors do not influence the energy.

Sequence dependent free-energy Nearest Neighbor Model U U C G G C A U G C A UCGAC 3’ 5’ U U C G U A A U G C A UCGAC 3’ 5’ Free Energy of a base pair is influenced by the previous base pair (not by the base pairs further down).

Sequence dependent free-energy values of the base pairs (nearest neighbor model) U U C G G C A U G C A UCGAC 3’ 5’ U U C G U A A U G C A UCGAC 3’ 5’ Example values: GC GC AU GC CG UA These energies are estimated experimentally from small synthetic RNAs.

Improvements to the MFE approach Positive energy - added for destabilizing regions such as bulges, loops, etc. More than one structure can be predicted

Free energy computation U U A G C A G C U A A U C G A U A 3’ A 5’ mismatch of hairpin -2.9 stacking nt bulge -2.9 stacking -1.8 stacking 5’ dangling -0.9 stacking -1.8 stacking -2.1 stacking G= -4.6 KCAL/MOL nt loop

Improvements to the MFE approach Positive energy - added for destabilizing regions such as bulges, loops, etc. Looking for an ensemble of structures with low energy and generating a consensus structure WHY? RNA is dynamic and doesn’t always fold to the lowest energy structure

RNA fold prediction based on Multiple Alignment Information from multiple sequence alignment (MSA) can help to predict the probability of positions i,j to be base-paired. G C C U U C G G G C G A C U U C G G U C G G C U U C G G C C

Compensatory Substitutions U U C G U A A U G C A UCGAC 3’ G C 5’ Mutations that maintain the secondary structure can help predict the fold

RNA secondary structure can be revealed by identification of compensatory mutations G C C U U C G G G C G A C U U C G G U C G G C U U C G G C C U C U G C G N N’ G C

Insight from Multiple Alignment Information from multiple sequence alignment (MSA) can help to predict the probability of positions i,j to be base-paired. Conservation – no additional information Consistent mutations (GC  GU) – support stem Inconsistent mutations – does not support stem. Compensatory mutations – support stem.

From RNA structure to Function Many families of non coding RNAs which have unique functions are characterized by the combination of a conserved sequence and structure

MicroRNAs miRNA gene Target gene mature miRNA

MicroRNA in Cancer Sun et al, 2012

The challenge for Bioinformatics: - Identifying new microRNA genes - Identifying the targets of specific microRNA

How to find microRNA genes? Searching for sequences that fold to a hairpin ~70 nt -RNAfold -other efficient algorithms for identifying stem loops Concentrating on intragenic regions and introns - Filtering coding regions Filtering out non conserved candidates -Mature and pre-miRNA is usually evolutionary conserved

How to find microRNA genes? A. Structure prediction B. Evolutionary Conservation

Predicting microRNA targets MicroRNA targets are located in 3’ UTRs, and complementing mature microRNAs Why is it hard to find them ?? –Base pairing is required only in the seed sequence (7-8 nt) –Lots of known miRNAs have similar seed sequences Very high probability to find by chance 3’ UTR of Target gene mature miRNA

Predicting microRNA target genes General methods - Find motifs which complements the seed sequence (allow mismatches) –Look for conserved target sites –Consider the MFE of the RNA-RNA pairing ∆G (miRNA+target) –Consider the delta MFE for RNA-RNA pairing versus the folding of the target ∆G (miRNA+target )- ∆G (target)