Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Molecular Biology for Computer Scientists

Similar presentations


Presentation on theme: "Introduction to Molecular Biology for Computer Scientists"— Presentation transcript:

1 Introduction to Molecular Biology for Computer Scientists
Dr. Suzanne Gollery – Sierra Nevada College Martin Gollery – Active Motif

2 Who are we? Suzanne – Assistant Professor, Sierra Nevada College
Formerly at Baylor College of Medicine, UC Berkeley Marty- Senior Scientist, ActiveMotif, Inc. Formerly at University of Nevada, Reno TimeLogic etc

3 Why Do Bioinformaticists need it?
Avoiding mistakes Understanding the purpose Appreciating the Difficulties We will look at some of the applicable programs to these concepts

4 Introduction to Molecular Biology for Computer Scientists
Protein structure and function Protein structure Protein function Nucleic acid structure and function The central dogma Nucleic acid structure Genetic code Control of gene expression Mutation

5 Introduction to Molecular Biology for Computer Scientists
The centrality of evolution by natural selection in biology Universal genetic code Types of mutations and their effects Genomes Natural selection acts on random variation to produce evolution

6 Introduction to Molecular Biology for Computer Scientists
Protein structure and function Protein structure Protein function

7 Protein structure – Amino acids
An amino acid has four functional groups attached to a central carbon atom An amino group A carboxyl group A hydrogen atom A variable side chain (R group) Proteins use L isomers of amino acids

8 Protein structure – 20 amino acids are used in proteins
Polar amino acid side chains have partial negative or positive electrostatic charges O and N atoms “hog” electrons and have partial negative charges Atoms attached to N and O have partial positive charges

9 Protein structure – 20 amino acids are used in proteins
Non-polar amino acid side chains have no electrostatic charge

10 Protein structure – 20 amino acids are used in proteins
Charged amino acids are acidic or basic, and donate or accept a H+ in cells Some charged amino acids have a positive electrostatic charge Other charged amino acids have a negative electrostatic charge

11 Protein structure – 20 amino acids are used in proteins
Aromatic amino acids have a bulky carbon ring structure in their side chains

12 Protein structure – Analysis
X-ray Crystallography- interference patterns Nuclear Magnetic Resonance (NMR) ~31,000 structures in Protein Databank (PDB) PDB is organized by other databases Tends to emphasize certain types of proteins

13 Protein structure - Polypeptides
Amino acids are joined to produce polypeptides A water molecule is removed as the amino group of an amino acid reacts with the carboxyl group at the end of a polypeptide The peptide bond (yellow) is a planar structure

14 Protein structure – Protein folding
Chemical interactions among amino acids determine the final 3D shape (conformation) of a protein

15 Protein structure – Protein folding

16 Protein structure Primary structure: the linear order of amino acids in a polypeptide Secondary structure: regions of a-helix or b-sheet Tertiary structure: globular folded polypeptide Quaternary structure: multiple folded polypeptides form a complete protein – hemoglobin in this figure

17 Protein Structure- Primary
Sequence yields structure/function Homology searching paradigm- similar primary structures yield similar functions Needleman-Wunsch, Smith-Waterman, FASTA, BLAST Similarity scoring- amino acids with similar properties can replace each other without breaking structure

18 Secondary structure Prediction of Secondary structure from primary sequence is tractable Programs- Coils, PHD, predator, JPRED Secondary Structure may be used to improve alignments

19 Protein Folding- Prediction
Very Computationally Intensive A Short Protein (100 bases) would take ~20 days straight on a Petaflop computer Force Field approximation programs CHARMM, AMBER Accelerated versions based on Field Programmable Gate Array (FPGA) technology

20 Protein structural Motifs

21 Structural Motifs Repeated or combined motifs form functional domains
Domains predict protein function NAD(P)-binding domain of proteins that bind to NAD, an electron carrier b-sandwich domain of cell surface recognition proteins (Ig, MHC, CD4)

22 Functional motifs eMATRIX/eMOTIF MEME/MetaMEME FingerPrintScan
PHI-BLAST

23 Hidden Markov Models Represent Domains, Motifs or proteins
Major programs include HMMer (hmmer.wustl.edu) SAM ( Wise tools ( Meta-MEME (metameme.sdsc.edu/) PSI-BLAST ( DeCypherHMM (

24 What are HMMs, anyway? Statistical description of a protein family's consensus sequence Conserved regions receive highest scores Can be seen as a Finite State Machine

25 Hidden Markov Models yciH KDGII ZyciH KDGVI VCA0570 KDGDI HI1225 KNGII
sll KEDCV C D E G I K N V 1 1.0 2 0.6 0.2 3 0.8 4 0.4 5 Contrast with RE type motif, K[DEN][DG][CDIV][IV]

26 HMM databases Pfam TIGRfam Superfamily SMART COG KinFam PirSF Panther
KOG …etc

27 Protein structure – Modifications after protein synthesis
Prosthetic groups associate (Heme of cytochrome c) Polypeptides are trimmed or cut Sugars are attached Other chemical groups are attached Chaperone proteins assist in polypeptide folding

28 Protein function – Binding to ligand
Ligand (antigen peptide) fits into a cleft on a protein (MHC) like two puzzle pieces fit together

29 Protein function – Binding to ligand
SitesBase –information on known ligand binding sites from the PDB LigBase adds related sequences and structures

30 Protein function – Chymotrypsin binding to ligand: Complementarity of electrostatic charge and 3D shape

31 Protein function – proteins bind to other proteins
Myosin head binds to actin during muscle contraction Protein shapes are complementary like puzzle pieces Binding is reversible Goodness of fit (shape, charge, hydrophobicity) determines affinity of protein/ligand binding

32 Protein function – changes in shape are essential to protein function
Proteins are dynamic machines Two or a few protein conformations may be of similar stability Ligand binding can act as a switch to change protein conformation Induced fit: binding of glucose (red) to hexokinase changes enzyme conformation

33 Protein function – changes in shape are essential to protein function
Hemoglobin switches between a T (taut) deoxygenated and an R (relaxed) oxygenated state

34 Protein function – changes in shape are essential to protein function
Binding of lactose to the lactose transport protein changes the shape of the protein. Lactose (red) binds to the protein on one side of a cell membrane and is released on the other side Movement across the membrane is reversible

35 Protein function – changes in shape are essential to protein function
The Na+ -K+ pump switches between two conformations when a phosphate groups is added or removed Attaching phosphate to the Na+ -K+ pump switches the protein’s shape, moving Na+ outside the cell Removing phosphate switches the protein’s shape again, moving K+ into the cell

36 Protein function – changes in shape are essential to protein function
Myosin heads cycles between multiple conformations during muscle contraction Binding and release of ATP, ADP, and phosphate (Pi) trigger changes in myosin head conformation Myosin heads pull on actin to make muscle fibers shorter

37 Protein function – changes in shape are essential to protein function
Intrinsically Disordered Proteins have roles in signalling, etc. Some take shape only when interacting Tend to form hubs in interaction networks Predict with PONDR, Spritz, Wiggle, FoldUnfold Disprot database of ID proteins

38 Protein function - Phosphorylation
Phosphorylation changes a protein’s shape Phosphorylation may turn a protein on or off Kinases: enzymes that attach phosphates to proteins Phosphatases: enzymes that remove phosphates Other charged chemical groups (cAMP, Ca++…) are also attached to proteins to switch them on or off

39 Protein structure – Modifications after protein synthesis
Post-Translational Modifications (PTM) Phosphorylation takes place on S, T or Y, but only in certain situations NetPhosK uses Artificial Neural Networks KinasePhos uses HMMs to predict Phosphorylation sites

40 Protein Interaction Networks

41 Introduction to Molecular Biology for Computer Scientists
Nucleic acid structure and function The central dogma Nucleic acid structure Genetic code Control of gene expression Mutation

42 The central dogma of molecular biology
DNA contains instructions for making RNA and protein DNA is transcribed (copied) to make messenger RNA (mRNA) mRNA is translated (instructions read) to make proteins

43 Central dogma - Transcription
Transcription occurs in the nucleus mRNAs are modified before transport to the cytoplasm Other RNAs are also transcribed: tRNAs: “read” genetic code in mRNA rRNAs: backbone of ribosomes small RNAs: enzymes and regulators of gene function

44 Central dogma - Translation
Translation occurs in the cytoplasm on ribosomes tRNAs match the correct amino acids to codons on the mRNA Ribosomal enzymes join amino acids to the growing polypeptide The emerging polypeptide folds

45 Nucleic acid structure - Nucleotides

46 Nucleic acid structure - Polynucleotides
The 5’ phosphate of one nucleotide is joined to the 3’ hydroxyl group of the preceding nucleotide The beginning of a nucleic acid has a free 5’ phosphate group The end of a nucleic acid has a free ’ hydroxyl group

47 Nucleic acid structure - DNA
Double stranded DNA forms a helix DNA strands are joined by hydrogen bonds between complementary bases A always pairs with T; two hydrogen bonds G always pairs with C; three hydrogen bonds

48 Nucleic acid structure - DNA
Each DNA strand serves as a template for replication or repair of the other strand, and for RNA synthesis

49 Nucleic acid structure - RNA
RNA is single stranded RNAs fold to form regions of internal double helix with complementary base pairs Many functional RNAs have globular shapes (green: sugar-phosphate backbone; gray: paired bases)

50 Genetic code – Linear code
One DNA strand serves as a template for mRNA synthesis The linear order of nucleotides in the mRNA corresponds to the amino acid sequence of the polypeptide Triplet codons specify insertion of amino acids The DNA coding strand is complementary to the template strand, so its sequence is comparable to the mRNA (with T instead of U)

51 Genetic code – triplet codons
tRNA “reads” genetic code: anticodon is complementary to the mRNA codon Redundant genetic code: some amino acids are specified by multiple codons First codon is AUG  Met Three stop codons specify termination of polypeptide synthesis

52 Control of Gene Expression – Gene Structure
RNA polymerase binds to DNA at the promoter to initiate transcription Other proteins bind to sequences near the exons to regulate transcription The information in genes (exons) is interrupted by non-coding sequence (introns) that are removed from RNA by splicing 5’ cap and poly (A) tail are added for mRNA stability

53 Control of gene expression – DNA binding proteins can activate or inhibit transcription
Multiple proteins must bind to DNA to initiate transcription Some proteins bind near the promoter, while others bind farther away at enhancers DNA bends so that enhancer-binding proteins help RNA polymerase assemble on the promoter Some proteins bind to DNA or to activator proteins to block transcription initiation

54 Control of gene expression – how proteins bind to specific DNA sequences
DNA binding proteins insert an a-helix into the major groove of DNA Helix-turn-helix domain

55 Control of gene expression – how proteins bind to specific DNA sequences
The protein stalls on the DNA where amino acids form the maximum number of hydrogen bonds with nucleotide bases in the major groove

56 Control of gene expression – DNA binding protein domains
Homeodomain Zinc finger Leucine zipper

57 Control of gene expression – other points of control
Whether or not a gene is “expressed” can be controlled at any step that affects proteins concentration and function Alternative splicing (post-transcriptional processing) produces multiple proteins from one gene Control of translation (siRNAs block translation) Control of protein activity (phosphorylation switches proteins on or off) mRNA or protein longevity (how quickly it is degraded)

58 Control of gene expression – Alternative splicing

59 Analysis of gene expression
Microarrays, GeneChips Three classes of software- Reading the images Clustering the data, building associations GeneSpring, GeneSifter, Bioconductor Warehousing the data GEO, SMD, YMD, others Meta-analysis is difficult due to variability

60 Mutations Although DNA is replicated and repaired accurately, rare mistakes are made, which alters nucleotide sequence Exposure to some chemicals and radiation damages DNA, increasing the likelihood of mutation Although rare, a few nucleotide sequence changes occur with each generation Mutations introduce genetic variability in a population of individuals

61 Introduction to Molecular Biology for Computer Scientists
The centrality of evolution by natural selection in biology Universal genetic code Types of mutations and their effects Genomes Natural selection acts on random variation to produce evolution

62 Universal genetic code
All living organisms use the same genetic code: CCC encodes proline in all cells All organisms are descended from a common ancestor – all life on earth evolved from a common point of origin The impressive variation in living organisms arose through random changes in nucleotide sequence (mutation) acted upon by natural selection over billions of years

63 Mutation – nucleotide substitutions
Mutations are random changes in nucleotide sequence Mutations in non-coding sequences and codon third position are often silent Some nucleotide substitutions change the amino acid Nonsense mutations introduce stop codons and truncate a polypeptide

64 Mutations - Frameshifts
Insertion or deletion of a nucleotide shifts the reading frame and drastically alters amino acid sequence

65 Mutations - Frameshifts
Frameshift tolerant matching programs add additional states corresponding to transitions to other reading frames FrameSearch, Wise2, BLAST with OOF option

66 Mutations – Chromosomal mutations
Large duplications generate multiple copies of genes Large deletions remove genes from the genome Duplications, deletions, inversions, and translocations are preserved in future generations, so can be used to trace evolutionary history

67 Mutations – effects on organisms
Random mutation produces genetic variability that is acted upon by natural selection Most mutations are deleterious: mutations occur randomly, and are more likely to disrupt protein function than alter it in a positive way Deleterious mutations are eliminated by natural selection Rare mutations that introduce altered, even beneficial functions are positively selected

68 Genomes – Eukaryotic cells
Animals, Plants, Fungi, and some other organisms (Protists) have eukaryotic cells Eukaryotic organisms have existed on earth for millions of years Eukaryotic genomes are in linear pieces of DNA called chromosomes Eukaryotic genes usually have introns Eukaryotic genes are separated by lots of spacer DNA that does not encode proteins Many repetitive sequences are present: Tandem repeats of short sequences, like GAC Transposons (for example, LINES and SINES)

69 Genomes – Prokaryotic cells
Bacteria have prokaryotic cells Prokaryotic organisms have existed on earth for billions of years Most prokaryotic genomes consist of one circular DNA molecule Most prokaryotic genes lack introns Prokaryotic genomes have little spacer DNA; most DNA encodes known RNAs or proteins There is much less repetitive DNA than for eukaryotic genomes Plasmids, tiny circular DNA molecules separate from the main genome, are used in recombinant DNA technology to introduce genes into bacteria

70 Genomes – evolution through chromosomal mutation
Gene duplications result in multigene families After duplication, one gene can provide the original function while the other may evolve (through mutation and selection) a different function The globin gene family evolved through duplication, mutation, and selection

71 Genomes – evolution through chromosomal mutation
a- and b-globin genes vary in amino acid sequence, yet share the same conformation Sequence differences are fairly conservative

72 Multiple Sequence Alignment

73 Multiple Sequence Alignment

74 Genomes – evolution through chromosomal mutation
As species diverge, chromosomal mutations shuffle genome content The more recently two species diverged, the more similar their genome organization Both chromosomal and single nucleotide mutations can be used to trace the evolutionary history of species Genetic information in human chromosomes (blue) compared to dog chromosomes

75 Credits Many figures were borrowed from three sources:
Nelson and Cox, Lehninger Principles of Biochemistry, 4e, WH Freeman and Co., ISBN: Raven, Johnson, Losos, Mason, and Singer, Biology, 8e, McGraw-Hill, ISBN: Klug, Cummings, and Spencer, Essentials of Genetics, 6e, Pearson/Prentice Hall, ISBN: These texts and the accompanying on-line materials are excellent resources for learning more molecular biology

76 Thank You! Marty is at gollery@activemotif.com
Suzanne is at


Download ppt "Introduction to Molecular Biology for Computer Scientists"

Similar presentations


Ads by Google