Presentation is loading. Please wait.

Presentation is loading. Please wait.

Motif Search and RNA Structure Prediction Lesson 9.

Similar presentations


Presentation on theme: "Motif Search and RNA Structure Prediction Lesson 9."— Presentation transcript:

1 Motif Search and RNA Structure Prediction Lesson 9

2 Finding short motifs in biological data (DNA, RNA and Protein sequences ) Scenario 1 : Binding motif is known (easier case) Scenario 2 : Binding motif is unknown (hard case)

3 Scenario 2 : Binding motif is unknown “Ab initio motif finding” Why is it hard???

4 Are common motifs the right thing to search for ?

5 ?

6 Solutions: -Searching for motifs which are enriched in one set but not in a random set - Use experimental information to rank the sequences according to their binding affinity and search for enriched motifs at the top of the list

7 Sequencing the regions in the genome to which a protein (e.g. transcription factor) binds to. ChIP-Seq

8 ChIP –SEQ Best Binders Weak Binders Finding the p53 binding motif in a set of p53 target sequences which are ranked according to binding affinity

9 Ranked sequences list Candidate k-mers CTACGCCTACGC ACTTGA ACGTGAACGTGA ACGTGCACGTGC CTGTGCCTGTGC CTGTGACTGTGA CTGTACCTGTAC ATGTGCATGTGC ATGTGAATGTGA CTATGCCTATGC CTGTGCCTGTGC CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA -a word search approach to search for enriched motif in a ranked list CTGTGACTGTGA CTGTGACTGTGA http://drimust.technion.ac.il/

10 The total number of input sequences The number of sequences containing the motif The number of sequences at the top of the list The number of sequences containing the motif among the top sequences Ranked sequences list CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA CTGTGACTGTGA uses the minimal hyper geometric statistics (mHG) to find enriched motifs

11 The enriched motifs are combined to get a PSSM which represents the binding motif Detected Enriched motifs

12

13 P[ED]XK[RW][RK]X[ED] Protein Motifs Protein motifs are usually 6-20 amino acids long and can be represented as a consensus/profile: or as PWM

14 From Sequence to Structure Predicting RNA structure

15 15 protein RNA DNA According to the central dogma of molecular biology the main role of RNA is to transfer genetic information from DNA to protein

16 RNA has many other biological functions Protein synthesis (ribosome) Control of mRNA stability (UTR) Control of splicing (snRNP) Control of translation (microRNA) Control of transcription (long non-coding RNA) The function of the RNA molecule depends on its folded structure

17 Nobel prize 2009 Ribosome

18 RNA Structural levels tRNA Secondary Structure Tertiary Structure

19 RNA Secondary Structure U U C G U A A U G C 5’ 3’ 5’ G A U C U U G A U C 3’ RNA bases are G, C, A, U The RNA molecule folds on itself. The base pairing is as follows: G C A U G U hydrogen bond. Stem Loop

20 Predicting RNA secondary Structure Most common approach: Search for a RNA structure with a Minimal Free Energy (MFE) G A U C U U G A U C U U C G U A A U G U G C U A G U Low energy High energy U

21 Free energy model Free energy of a structure is the sum of all interactions energies Free Energy(E) = E(CG)+E(CG)+….. The aim: to find the structure with the minimal free energy (MFE)

22 Why is MFE secondary structure prediction hard? MFE structure can be found by calculating free energy of all possible structures BUT the number of potential structures grows exponentially with the number of bases Solution :Dynamic programming (Zucker and Steigler)

23 Simplifying assumptions for RNA Structure Prediction RNA folds into one minimum free-energy structure. The energy of a particular base can be calculated independently –Neighbors do not influence the energy.

24 Sequence dependent free-energy Nearest Neighbor Model U U C G G C A U G C A UCGAC 3’ 5’ U U C G U A A U G C A UCGAC 3’ 5’ Free Energy of a base pair is influenced by the previous base pair (not by the base pairs further down).

25 Sequence dependent free-energy values of the base pairs (nearest neighbor model) U U C G G C A U G C A UCGAC 3’ 5’ U U C G U A A U G C A UCGAC 3’ 5’ Example values: GC GC AU GC CG UA -2.3 -2.9 -3.4 -2.1 These energies are estimated experimentally from small synthetic RNAs.

26 Improvements to the MFE approach Positive energy - added for destabilizing regions such as bulges, loops, etc. More than one structure can be predicted

27 Free energy computation U U A G C A G C U A A U C G A U A 3’ A 5’ -0.3 -1.1 mismatch of hairpin -2.9 stacking +3.3 1nt bulge -2.9 stacking -1.8 stacking 5’ dangling -0.9 stacking -1.8 stacking -2.1 stacking G= -4.6 KCAL/MOL +5.9 4 nt loop

28 Improvements to the MFE approach Positive energy - added for destabilizing regions such as bulges, loops, etc. Looking for an ensemble of structures with low energy and generating a consensus structure WHY? RNA is dynamic and doesn’t always fold to the lowest energy structure

29 RNA fold prediction based on Multiple Alignment Information from multiple sequence alignment (MSA) can help to predict the probability of positions i,j to be base-paired. G C C U U C G G G C G A C U U C G G U C G G C U U C G G C C

30 Compensatory Substitutions U U C G U A A U G C A UCGAC 3’ G C 5’ Mutations that maintain the secondary structure can help predict the fold

31 RNA secondary structure can be revealed by identification of compensatory mutations G C C U U C G G G C G A C U U C G G U C G G C U U C G G C C U C U G C G N N’ G C

32 Insight from Multiple Alignment Information from multiple sequence alignment (MSA) can help to predict the probability of positions i,j to be base-paired. Conservation – no additional information Consistent mutations (GC  GU) – support stem Inconsistent mutations – does not support stem. Compensatory mutations – support stem.

33 From RNA structure to Function Many families of non coding RNAs which have unique functions are characterized by the combination of a conserved sequence and structure

34 MicroRNAs miRNA gene Target gene mature miRNA

35 MicroRNA in Cancer Sun et al, 2012

36 The challenge for Bioinformatics: - Identifying new microRNA genes - Identifying the targets of specific microRNA

37 How to find microRNA genes? Searching for sequences that fold to a hairpin ~70 nt -RNAfold -other efficient algorithms for identifying stem loops Concentrating on intragenic regions and introns - Filtering coding regions Filtering out non conserved candidates -Mature and pre-miRNA is usually evolutionary conserved

38 How to find microRNA genes? A. Structure prediction B. Evolutionary Conservation

39 Predicting microRNA targets MicroRNA targets are located in 3’ UTRs, and complementing mature microRNAs Why is it hard to find them ?? –Base pairing is required only in the seed sequence (7-8 nt) –Lots of known miRNAs have similar seed sequences Very high probability to find by chance 3’ UTR of Target gene mature miRNA

40 Predicting microRNA target genes General methods - Find motifs which complements the seed sequence (allow mismatches) –Look for conserved target sites –Consider the MFE of the RNA-RNA pairing ∆G (miRNA+target) –Consider the delta MFE for RNA-RNA pairing versus the folding of the target ∆G (miRNA+target )- ∆G (target)


Download ppt "Motif Search and RNA Structure Prediction Lesson 9."

Similar presentations


Ads by Google