Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 8. Molecular structures The Chinese University of Hong Kong BMEG3102 Bioinformatics.

Similar presentations


Presentation on theme: "Lecture 8. Molecular structures The Chinese University of Hong Kong BMEG3102 Bioinformatics."— Presentation transcript:

1 Lecture 8. Molecular structures The Chinese University of Hong Kong BMEG3102 Bioinformatics

2 Lecture outline 1.From sequences to functions 2.RNA secondary structures Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 20162

3 FROM SEQUENCES TO FUNCTIONS Part 1

4 From sequences to functions One of the biggest questions in molecular biology: Can one tell the function of a molecule (DNA/RNA/protein) from its sequence alone? – Sometimes, but usually not (yet) – Easier if we also know the structure – Common believe: sequence  structure  function – Of course, also depends on the environment Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 20164

5 Molecular structures Four levels: – Primary structures The sequence – Secondary structures First formed Local – Tertiary structures Global Sometimes called “folds” and “domains” – Quaternary structures Multiple molecules Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 20165 Image credit: http://www.personal.psu.edu/jms5704/blogs/simmons/levels_of_protein_s_c_la_784.jpg

6 Primary structures Connections (strong covalent bonds vs. weak hydrogen bonds) – Which molecules are connected – Which atoms are connected – First-level constraints of the possible structures Example: Molecules close in primary structure must also be close in secondary, tertiary and quaternary structures Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 20166 Image credit: Wikibooks

7 Primary structures Orientation: – DNA, RNA: 5’-3’ – Amino acids: Amino (N) terminus to carboxyl (C) terminus “Residue”: what remains after a water molecule is expelled Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 20167 Image credit: http://bealbio.wikispaces.com/file/view/dsDNA.jpg, http://attentionmanagement.ca/userfiles/image/DNA- RNA%20directions.gif, http://www.phschool.com/science/biology_place/biocoach/images/translation/peptbond.gif, http://www.cystinuria.org/resources/education/aminoacids/peptide.gif

8 DNA secondary structures Double helix A-DNA (dehydrated samples) – Right-handed – 11bp per turn Most common: B-DNA – Right-handed – 10.5bp per turn Z-DNA (some methylated DNA) – Left-handed – 12bp per turn Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 20168 Image credit: Wikipedia

9 DNA secondary structures Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 20169 A-DNAB-DNAZ-DNA Image credit: Wikipedia

10 RNA secondary structures Largely possible to be projected onto a 2D plane (much more on RNA secondary structures later in this lecture) Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201610 Image source: http://rna.urmc.rochester.edu/NNDB/RS1141-edited.gif

11 RNA secondary structures Pseudoknots: complex structures Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201611 Image credit: Wikipedia, Sperschneider and Datta, RNA 14(4):630-640, (2008)

12 Protein secondary structures Three main types of sub-structures: –  -helixes ( A ) –  -sheets ( B ) – Coils/connectors ( C ) Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201612 Image credit: http://calcium.uhnres.utoronto.ca/cadherin/images/pub_pages/general/ribbon.jpg, http://www.mun.ca/biology/scarr/MGA2-03-25.jpg

13 DNA tertiary structures Wrapped around nucleosomes formed by histone proteins Condensed form at beginning of mitosis and meiosis Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201613 Image credit: http://micro.magnet.fsu.edu/cells/nucleus/images/chromatinstructurefigure1.jpg, Wikipedia

14 RNA tertiary structures Overall structure of an RNA – More studied for RNAs that do not translate into proteins -- “non-coding” RNAs – Example: tRNA Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201614 Image credit: Wikipedia

15 Protein tertiary structures Complex structures – Mainly caused by weak forces (hydrogen bonds and hydrophobic interactions) – Occasionally stronger forces (disulfide bonds between cysteines) The CATH hierarchy – Class: composition of secondary structures – Architecture: overall shape – Topology: connection of secondary structures – Homologous: with common ancestor Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201615 Image credit: CATH

16 Quaternary structures Types: – Protein subunit-protein sub-unit – Protein-protein – Protein-DNA – Protein-RNA – (Protein-small molecules) – RNA-RNA –... Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201616 Image credit: Wikipedia, http://serrano.crg.es/images/protein_dna1.jpg Protein-DNA interaction Protein-subunit interaction (Hemoglobin)

17 Structure and function Why function depends on structure? 1.Structure itself is the function (e.g., tubulins) 2.Binding Complementarity of interacting structures Formation of special bonds Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201617 Image credit: http://www.nigms.nih.gov/NR/rdonlyres/54BEAC37-47A9-454A-BC4F-B94EA127FA1E/0/fig1a_large.jpg, http://upload.wikimedia.org/wikimedia/en-labs/7/7f/Protein_Protein_Docking.JPG

18 Structure and function Why function depends on structure? (cont’d) 3.Functional group (e.g., catalytic site) 4.Determining localization (e.g., transporter membrane proteins) Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201618 Image credit: http://www.catalysis-ed.org.uk/principles/images/enzyme_substrate.gif, Spudich, Science 288(5470):1358-1359, 2000

19 RNA SECONDARY STRUCTURES Part 2

20 Important RNA classes Coding: – Messenger RNAs (mRNAs) For translating into proteins Non-coding: – Ribosomal RNAs (rRNAs) Parts of the ribosome complex – Transfer RNAs (tRNAs) Delivering free amino acids during translation – Micro RNAs (miRNAs) Binding mRNA targets to promote RNA degradation or repress translation – Small nucleolar RNAs (snoRNAs) Guiding chemical modifications of other RNAs – Small nuclear RNAs (snRNAs) Involved in mRNA splicing – Long non-coding RNAs (lncRNAs) Some involved in gene regulation –... Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201620 Image source: http://legacy.hopkinsville.kctcs.edu/sitecore/instructors/Jason- Arnold/VLI/Module%201/m1DNAfunction/m1DNAfunction3.html

21 Importance of RNA structures Structure is important to many classes of RNA Examples: Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201621 Image sources: http://www.bio.miami.edu/dana/pix/tRNA.jpg, http://lowelab.ucsc.edu/images/CDBox.jpg tRNAsnoRNA

22 Representing RNA secondary structures Formats: (see http://projects.binf.ku.dk/pgardner/bralibase/RNAformats.html ): – Dot-bracket format – Stockholm format –... Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201622

23 Dot-bracket format Sequence (nucleotides 10, 20, 30, etc. marked in red): GUGAAUGAUGAAUUUAAUUCUUUGGUCCGUGUUUAUGAUGGGAAGUAAGAC CCCCGAUAUGAGUGACAAAAGAGAUGUGGUUGACUAUCACAGUAUCUGACG Structure:......((((.......((((((.(((....((((((.((((..........)))).)))))).))).)))))).((((((.....)))))).))))..... Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201623 Image credit: Xihao Hu

24 Predicting RNA secondary structures A basic assumption in structure predictions: – Real structure has the lowest free energy In a simplified view, more stable bonds  lower free energy In the case of RNA secondary structures: – Good to form more pairs Canonical pairs: A - U, C - G Sometimes G - U (a “wobble base pair”) – Good to form more stable pairs. Stability: C - G > A - U > G - U – Good to have stable sub-structures E.g., stacking pairs Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201624

25 Predicting RNA secondary structures We will assume there are no pseudoknots – With pseudoknots, currently there is no known algorithm that can find the optimal solution efficiently We need two things: 1.A thermodynamic model for computing the free energy of a structure 2.A method for finding the structure with the minimum free energy – This setting sounds familiar? Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201625 Image credit: Wikipedia A pseudoknot

26 Further assumptions 1.The free energy of a secondary structure is the sum of the free energy of the sub- structures. – Not the sum of individual bases/base pairs, as one base pair can participate in multiple sub- structures. – We will count each sub-structure exactly once. For example, to count a hairpin loop, we consider the base pair that closes the loop. 2.The free energy of the sub-structures is independent. Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201626

27 Problem definition Given an RNA sequence, find a set of base pairs so that each base is paired at most once Example: – Input sequence: GUGAAUGAUGAAUUU...ACG – Output set of base pairs: (7, 97) (8, 96)... (18, 74)... (81, 87) Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201627 Image credit: Xihao Hu

28 Linear view Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201628 178910111213141516171819202122232425...6768697071727374757677787980818283848586878889909192939495... 9697.((((.......((((((.(...).)))))).((((((.....)))))).)) ))

29 Thermodynamics model We will consider four types of sub-structures here: – Stacking pairs: both (i, j) and (i+1, j-1) are in the set – Hairpin loop: there is a pair (i, j), where all bases from i+1 to j-1 are not paired – Bulge/Internal loop: there are two pairs (i, j) and (i 1, j 1 ), where i<i 1 <j 1 <j, and all bases from i+1 to i 1 -1 and from j 1 +1 to j-1 are not paired – Multi-loop: there are pairs (i, j), (i 1, j 1 ),..., (i k, j k ), where i<i 1 <j 1 <...<i k <j k <j, and all bases from i+1 to i 1 -1, from j 1 +1 to i 2 -1,..., j k-1 +1 to i k -1 and from j k +1 to j-1 are unpaired Note: One base pair can participate in multiple structures Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201629

30 Stacking pairs Both (i, j) and (i+1, j-1) are in the set E.g., i:20, j:72 Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201630 178910111213141516171819202122232425...6768697071727374757677787980818283848586878889909192939495... 9697 ii+1j-1j

31 Hairpin loop There is a pair (i, j), where all bases from i+1 to j-1 are not paired E.g., i: 81, j: 87 Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201631 178910111213141516171819202122232425...6768697071727374757677787980818283848586878889909192939495... 9697 ij Image source: http://img.ehowcdn.com/article-new/ds-photo/getty/article/151/226/87820768_XS.jpg

32 Bulge/Internal loop Internal loop: There are two pairs (i, j) and (i 1, j 1 ), where i<i 1 <j 1 <j, and all bases from i+1 to i 1 -1 and from j 1 +1 to j-1 are not paired – Called a bulge if only one side has unpaired bases E.g., i:23, j:69, i 1 :25, j 1 :67 Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201632 178910111213141516171819202122232425...6768697071727374757677787980818283848586878889909192939495... 9697 ii1i1 jj1j1

33 Multi-loop Multi-loop: There are pairs (i, j), (i 1, j 1 ),..., (i k, j k ), where i<i 1 <j 1 <...<i k <j k <j, and all bases from i+1 to i 1 - 1, from j 1 +1 to i 2 -1,..., j k-1 +1 to i k -1 and from j k +1 to j-1 are unpaired E.g., k=2, i:10, j:94, i 1 :18, j 1 :74, i 2 :76, j 2 :92 Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201633 178910111213141516171819202122232425...6768697071727374757677787980818283848586878889909192939495... 9697 ii1i1 j1j1 i2i2 j2j2 j

34 One possible thermodynamic model Unpaired bases have 0 free energy and all the terms below have negative free energy eS(i, j): for the stacking pairs (i, j) and (i+1, j-1) eH(i, j): for the hairpin loop closed at (i, j) eBI(i, j, i 1, j 1 ): for a bulge or internal loop enclosed by the pairs (i, j) and (i 1, j 1 ) eM(i, j, i 1, j 1,..., i k, j k ): for a multi-loop that consists of the pairs (i, j), (i 1, j 1 ),..., (i k, j k ) and satisfying i<i 1 <j 1 <...<i k <j k <j Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201634

35 Finding the optimal structure Dynamic programming Let s be the RNA sequence with n nucleotides Tables: – V(j): free energy of the optimal structure for s[1..j] Final answer is based on V(n) – V P (i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair – V BI (i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop – V M (i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201635

36 Update formulas V(j): free energy of the optimal structure for s[1..j] V(1) = 0 For j > 1, Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201636 j...1 i j i-11 j-1 j...1 j is unpaired j pairs with i...

37 Update formulas V P (i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair We require that i < j Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201637 i...j i j j-1i+1 Stacking pairs i...j Hairpin loop All unpaired...

38 Update formulas V BI (i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop (i.e., i and j take the roles of i 1 and j 1 ) Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201638 i...j i j i1i1 j1j1 Budge or internal loop All unpaired

39 Update formulas V M (i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201639 i...j

40 Time and space requirements V: n entries, each takes O(n) time V P (i, j): O(n 2 ) entries, each takes constant time Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201640

41 Time and space requirements V BI : O(n 2 ) entries, each takes O(n 2 ) time V M : O(n 2 ) entries, each takes O(n 2k ) time Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201641

42 Time and space requirements Summary: – V: n entries, each takes O(n) time – V P : O(n 2 ) entries, each takes constant time – V BI : O(n 2 ) entries, each takes O(n 2 ) time – V M : O(n 2 ) entries, each takes O(n 2k ) time Total: O(n 2 ) space, O(n 2k+2 ) time – Exponential if k is unbounded – Some approximations could bring the time down to O(n 4 ) – still huge for large n, but feasible for small or median n Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201642

43 Some remarks If we allow general pseudoknots, there is currently no efficient way to find the optimal RNA secondary structure with the minimum free energy Other methods to predict RNA secondary structures: – Conservation and covariation High conservation: 2 and 4 Strong covariation: 1 and 5 – Experimental methods (e.g., RNA footprinting) Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201643 12345 ACGGU ACUGU CCAGG UCCGA

44 Representing pseudoknots Without pseudoknots, RNA secondary structures can be unambiguously represented by dots (single bases) and brackets (base pairs) – What if there are pseudoknots? – Need more types of brackets Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201644 178910111213141516171819202122232425...6768697071727374757677787980818283848586878889909192939495... 9697.((((.......((((((.(...).)))))).((((((.....)))))).)) )) Image source: http://ultrastudio.org/upload/RNAPseudoKnot-25005810.jpg.{.((((.....))})).. 12345678910111213141516171819 GAAGUACAAUAUGUAACCG

45 CASE STUDY, SUMMARY AND FURTHER READINGS Epilogue

46 Case study: Drug finding/design Drugs are mostly chemicals with a specific structure that interacts with some biological objects Examples: – Inhibiting the activities of an important protein of bacteria – Blocking the interaction between virus and receptors of host cell – Simulating the production of a hormone Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201646

47 Case study: Drug finding/design Suppose we want to identify/design a chemical to target a particular object (e.g., a protein), we need to make sure that they have tight bindings through a process called docking Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201647 Image source: http://vds.cm.utexas.edu/

48 Case study: Drug finding/design Computational problem: – Input: a target protein and a list of chemicals – Goal: find a chemical that binds the target well Try different locations and orientations Binding depends on structure and chemistry – Output: One or more chemicals that bind the target well Difficulties: – Computational complexity Large search space for each protein-chemical combination Need to try many chemicals – Need to ensure specificity (not to target other proteins and cause side-effects) Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201648

49 Case study: Drug finding/design There is a game for players to try folding proteins called FoldIt (http://fold.it/)http://fold.it/ – Score based on free energy – Real time update of scores and ranks – Players can discuss and share solutions – Resulted in some amazingly good folds as compared to automatic predictions by computer programs Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201649 Image source: http://fold.it/portal/site_files/theme/science/competition.png

50 Summary Functions depend on structures Different levels of structures: – Primary (sequence) – Secondary (local) – Tertiary (global) – Quaternary (interactions) RNA secondary structures can be predicted by dynamic programming based on a thermodynamic model Important sub-structures – Stacking pairs – Hairpin loops – Internal loops/bulges – Multi-loops – Pseoduknots Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201650

51 Further readings Chapter 11 of Algorithms in Bioinformatics: A Practical Introduction – Speed up of algorithm – Algorithm for RNA structure perdition with pseudoknots – Free slides available Free slides Parts VII and VIII of Fundamental Concepts of Bioinformatics – Protein folding and protein structure prediction – Docking Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201651

52 Further readings Dekker et al., Exploring the three-dimensional organization of genomes: interpreting chromatin interaction data. Nature Review Genetics 14(6):390-403, (2013) – Describing how information about the 3D organization of genomes can be obtained Last update: 22-Mar-2016BMEG3102 Bioinformatics | Kevin Yip-cse-cuhk | Spring 201652


Download ppt "Lecture 8. Molecular structures The Chinese University of Hong Kong BMEG3102 Bioinformatics."

Similar presentations


Ads by Google