Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS5263 Bioinformatics Lecture 2: Introduction to molecular biology.

Similar presentations

Presentation on theme: "CS5263 Bioinformatics Lecture 2: Introduction to molecular biology."— Presentation transcript:


2 CS5263 Bioinformatics Lecture 2: Introduction to molecular biology

3 PolymerMonomer DNADeoxyribonucleotides RNARibonucleotides ProteinAmino Acid

4 DNA DNA: forms the genetic material of all living organisms A string made from alphabet {A, C, G, T} –e.g. ACAGAACGTAGTGCCGTGAGCG Each letter is called a base –A deoxyribonucleotides

5 G A G T C A G C 5’-AGCGACTG-3’ AGCGACTG Phosphate Sugar Base Many biological processes go from 5’ to 3’ e.g. DNA replication, transcription, etc. 5’ 3’ DNA

6 T C A C T G G C G A G T C A G C Base-pair: A = T G = C 5’ 3’ 5’-AGCGACTG-3’ 3’-TCGCTGAC-5’ AGCGACTG TCGCTGAC AGCGACTG Forward (+) strand Backward (-) strand One strand is said to be reverse- complementary to the other

7 DNA double helix

8 RNA Carry information from DNA to protein –Other functions have been found a string made from alphabet {A, C, G, U} –e.g. ACAGAACGUAGUGCCGUGAGCG Each letter is called a base –A ribonucleotides

9 G A G U C A G U 5’-AGUGACUG-3’ AGUGACUG Phosphate Sugar Base Many biological processes go from 5’ to 3’ e.g. transcription. 5’ 3’ RNA

10 RNA Secondary structures RNAs are normally single-stranded Can form complex structure by self-base- pairing A=U, C=G

11 Protein The actual “worker” for almost all processes in the cell A string built from 20 letters –E.g. MGDVEKGKKIFIMKCSQCHTVEKGGKH Each letter is called an amino acid

12 Carboxyl group Amino group Composed of a chain of amino acids. R | H 2 N--C--COOH | H Protein zoom-in Side chain

13 Amino acid 20 amino acids, only differ at side chains –Each can be expressed by three letters –Or a single letter: A-Y, except B, J, O, U, X –Alanine = Ala = A –Arginine = Arg = R –Asparagine = Asn = N –Lysine = Lys = K

14 R R | | H 2 N--C--CO--NH--C--COOH | | H H R R | | H 2 N--C--COOH H 2 N--C--COOH | | H H Amino acids => peptide Peptide bond

15 Protein Has orientations Usually recorded from N-terminal to C-terminal Peptide vs protein: basically the same thing Conventions –Peptide is shorter (< 50aa), while protein is longer –Peptide refers to the sequence, while protein has 2D/3D structure R H2N RRRRR COOH N-terminal C-terminal …

16 Protein structure Linear sequence of amino acids folds to form a complex 3-D structure. The structure of a protein is intimately connected to its function.

17 Genome and chromosome Genome: the complete DNA sequences of an organism –May contain one (in prokaryotes) or more (in eukaryotes) chromosomes Chromosome: a single large DNA molecule in an organism –May be circular or linear –Contain genes as well as “junk DNAs” –Highly packed!

18 Formation of chromosome

19 50,000 times shorter than extended DNA

20 Gene Gene: unit of heredity in living organisms –A segment of DNA with information to make a protein

21 Some statistics ChromosomesBasesGenes Human463 billion20k-25k Dog782.4 billion~20k Corn202.5 billion50-60k Yeast1620 million~7k E. coli14 million~4k Marbled lungfish ?130 billion?

22 Human genome 46 chromosomes: 22 pairs + X + Y 1 from mother, 1 from father Female: X + X Male: X + Y

23 Human genome Every cell contains the same genomic information –Except sperms and eggs –They only contain half of the genome Otherwise your children would have chromosomes How does biology achieve that?

24 Cell division: meiosis A reproductive cell divides into four cells, each containing only half of the genomes –Diploid => haploid Two haploid cells (sperm + egg) forms a zygote –Which will then develop into a multi-cellular organism by mitosis

25 Cell division: mitosis A cell duplicates its genome and divides into two identical cells These cells build up different parts of your body

26 Central dogma of molecular biology DNA replication is critical in both mitosis and meiosis

27 DNA Replication The process of copying a double-stranded DNA molecule –Semi-conservative 5’-ACATGATAA-3’ 3’-TGTACTAT-5’  5’-ACATGATAA-3’ 3’-TGTACTATT-5’

28 Mutation: changes in DNA base-pairs Proofreading and error-correcting mechanisms exist to ensure extremely high fidelity

29 DNA synthesis Creating DNA synthetically in a laboratory Chemical synthesis –Chemical reactions –Arbitrary sequences –Maximum length Cloning: make copies based on a DNA template –Biological reactions –Requires template –Many copies of a long DNA in a short time

30 in vivo Cloning Connect a piece of DNA to bacterial DNA, which can then be replicated together with the host DNA

31 in vitro Cloning Polymerase chain reaction (PCR) denature 5’ Primer (< 30 bases) 5’ dNTP 5’ DNA Polymerase

32 Chemical synthesis In vivo cloning In vitro cloning ReactionChemicalBiological TemplateNoYes SpeedFastVary (rely on host cell) Fast LengthVery shortLongMedium

33 Some terms Denaturation: a DNA double-strand is separated into two strands –By raising temperature Renaturation: the process that two denatured DNA strands re-forms a double-strand –By cooling down slowly Hybridization: two heterogeneous DNAs form a double-strand –may have mismatches –The rationale behind many molecular biological techniques including DNA microarray

34 Central dogma of molecular biology

35 Transcription The process that a DNA sequence is copied to produce a complementary RNA –Called message RNA (mRNA) if the RNA carries instruction on how to make a protein –Called non-coding RNA if the RNA does not carry instruction on how to make a protein –Only consider mRNA for now Similar to replication, but –Only one strand is copied

36 Transcription (where genetic information is stored) (for making mRNA) Coding strand: 5’-ACGTAGACGTATAGAGCCTAG-3’ Template strand: 3’-TGCATCTGCATATCTCGGATC-5’ mRNA: 5’-ACGUAGACGUAUAGAGCCUAG-3’ Coding strand and mRNA have the same sequence, except that T’s in DNA are replaced by U’s in mRNA. DNA-RNA pair: A=U, C=G T=A, G=C

37 The genetic code There are four bases in DNA (A, C, G, T), and four in RNA (A, C, G, U), but 20 amino acids in protein How are amino acids encoded in mRNA? –4^1 = 4 –4^2 = 16 –4^3 = 64 The actual genetic code used by the cell is a triplet. –Each triplet is called a codon –Redundancy –Universal

38 The Genetic Code Third letter

39 Translation The sequence of codons is translated to a sequence of amino acids Gene: -GCT TGT TTA CGA ATT- mRNA: -GCU UGU UUA CGA AUU - Peptide: - Ala - Cys - Leu - Arg - Ile – Start codon: AUG –Also code Met –Stop codon: UGA, UAA, UAA

40 Translation Transfer RNA (tRNA) – a different type of RNA. –Freely float in the cytoplasm. –Every amino acid has its own type of tRNA that binds to it alone. Anti-codon – codon binding crucial.

41 tRNA


43 More complexity gene promoter Transcription starting site RNA Polymerase Transcription factor RNA polymerase binds to certain location on promoter to initiate transcription Transcription factor binds to specific sequences on the promoter to regulate the transcription –Recruit RNA polymerase: induce –Block RNA polymerase: repress –Multiple transcription factors may coordinate

44 More complexity gene promoter Transcription starting site Pre-mRNA transcription Pre-mRNA needs to be “edited” to form mature mRNA 5’ UTR 3’ UTRexon intron Start codonStop codon Open reading frame (ORF) Pre-mRNA Mature mRNA (mRNA) Splice

45 DNA sequencing: Basic idea PCR primer extension 5’-TTACAGGTCCATACTA  3’-AATGTCCAGGTATGATACATAGG-5’ We need to supply A, C, G, T for the synthesis to continue Besides A, C, G, T, we add some A*, C*, G*, and T* –Very similar to ACGT in all aspects, except that –The extension will stop if used

46 DNA sequencing, cont



Download ppt "CS5263 Bioinformatics Lecture 2: Introduction to molecular biology."

Similar presentations

Ads by Google