Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Molecular Biology for Computer Scientists Dr. Suzanne Gollery – Sierra Nevada College Martin Gollery – Active Motif.

Similar presentations


Presentation on theme: "Introduction to Molecular Biology for Computer Scientists Dr. Suzanne Gollery – Sierra Nevada College Martin Gollery – Active Motif."— Presentation transcript:

1 Introduction to Molecular Biology for Computer Scientists Dr. Suzanne Gollery – Sierra Nevada College Martin Gollery – Active Motif

2 Who are we? Suzanne – Assistant Professor, Sierra Nevada College Formerly at Baylor College of Medicine, UC Berkeley Marty- Senior Scientist, ActiveMotif, Inc. Formerly at University of Nevada, Reno TimeLogic etc

3 Why Do Bioinformaticists need it? Avoiding mistakes Understanding the purpose Appreciating the Difficulties We will look at some of the applicable programs to these concepts

4 Introduction to Molecular Biology for Computer Scientists I. Protein structure and function A. Protein structure B. Protein function II. Nucleic acid structure and function A. The central dogma B. Nucleic acid structure C. Genetic code D. Control of gene expression E. Mutation

5 Introduction to Molecular Biology for Computer Scientists III. The centrality of evolution by natural selection in biology A.Universal genetic code B.Types of mutations and their effects C.Genomes D.Natural selection acts on random variation to produce evolution

6 Introduction to Molecular Biology for Computer Scientists I. Protein structure and function A. Protein structure B. Protein function

7 Protein structure – Amino acids An amino acid has four functional groups attached to a central carbon atom An amino group A carboxyl group A hydrogen atom A variable side chain (R group) Proteins use L isomers of amino acids

8 Protein structure – 20 amino acids are used in proteins Polar amino acid side chains have partial negative or positive electrostatic charges O and N atoms hog electrons and have partial negative charges Atoms attached to N and O have partial positive charges

9 Protein structure – 20 amino acids are used in proteins Non-polar amino acid side chains have no electrostatic charge

10 Protein structure – 20 amino acids are used in proteins Charged amino acids are acidic or basic, and donate or accept a H + in cells Some charged amino acids have a positive electrostatic charge Other charged amino acids have a negative electrostatic charge

11 Protein structure – 20 amino acids are used in proteins Aromatic amino acids have a bulky carbon ring structure in their side chains

12 Protein structure – Analysis X-ray Crystallography- interference patterns Nuclear Magnetic Resonance (NMR) ~31,000 structures in Protein Databank (PDB) PDB is organized by other databases Tends to emphasize certain types of proteins

13 Protein structure - Polypeptides Amino acids are joined to produce polypeptides A water molecule is removed as the amino group of an amino acid reacts with the carboxyl group at the end of a polypeptide The peptide bond (yellow) is a planar structure

14 Protein structure – Protein folding Chemical interactions among amino acids determine the final 3D shape (conformation) of a protein

15 Protein structure – Protein folding

16 Protein structure Primary structure: the linear order of amino acids in a polypeptide Secondary structure: regions of -helix or -sheet Tertiary structure: globular folded polypeptide Quaternary structure: multiple folded polypeptides form a complete protein – hemoglobin in this figure

17 Protein Structure- Primary Sequence yields structure/function Homology searching paradigm- similar primary structures yield similar functions Needleman-Wunsch, Smith-Waterman, FASTA, BLAST Similarity scoring- amino acids with similar properties can replace each other without breaking structure

18 Secondary structure Prediction of Secondary structure from primary sequence is tractable Programs- Coils, PHD, predator, JPRED Secondary Structure may be used to improve alignments

19 Protein Folding- Prediction Very Computationally Intensive A Short Protein (100 bases) would take ~20 days straight on a Petaflop computer Force Field approximation programs CHARMM, AMBER Accelerated versions based on Field Programmable Gate Array (FPGA) technology

20 Protein structural Motifs

21 Structural Motifs Repeated or combined motifs form functional domains Domains predict protein function NAD(P)-binding domain of proteins that bind to NAD, an electron carrier -sandwich domain of cell surface recognition proteins (Ig, MHC, CD4)

22 Functional motifs eMATRIX/eMOTIF MEME/MetaMEME FingerPrintScan PHI-BLAST

23 Hidden Markov Models Represent Domains, Motifs or proteins Major programs include HMMer (hmmer.wustl.edu) SAM (www.cse.ucsc.edu/research/compbio/sam) Wise tools (www.ebi.ac.uk/Wise2/) Meta-MEME (metameme.sdsc.edu/) PSI-BLAST (www.ncbi.nlm.nih.gov/blast) DeCypherHMM (www.timelogic.com)

24 What are HMMs, anyway? Statistical description of a protein family's consensus sequence Conserved regions receive highest scores Can be seen as a Finite State Machine

25 Hidden Markov Models yciH KDGII ZyciH KDGVI VCA0570 KDGDI HI1225 KNGII sll0546 KEDCV CDEGIKNV Contrast with RE type motif, K[DEN][DG][CDIV][IV]

26 HMM databases Pfam TIGRfam Superfamily SMART COG KinFam PirSF Panther KOG …etc

27 Protein structure – Modifications after protein synthesis Prosthetic groups associate (Heme of cytochrome c) Polypeptides are trimmed or cut Sugars are attached Other chemical groups are attached Chaperone proteins assist in polypeptide folding

28 Protein function – Binding to ligand Ligand (antigen peptide) fits into a cleft on a protein (MHC) like two puzzle pieces fit together

29 Protein function – Binding to ligand SitesBase –information on known ligand binding sites from the PDB LigBase adds related sequences and structures

30 Protein function – Chymotrypsin binding to ligand: Complementarity of electrostatic charge and 3D shape

31 Protein function – proteins bind to other proteins Myosin head binds to actin during muscle contraction Protein shapes are complementary like puzzle pieces Binding is reversible Goodness of fit (shape, charge, hydrophobicity) determines affinity of protein/ligand binding

32 Protein function – changes in shape are essential to protein function Proteins are dynamic machines Two or a few protein conformations may be of similar stability Ligand binding can act as a switch to change protein conformation Induced fit: binding of glucose (red) to hexokinase changes enzyme conformation

33 Protein function – changes in shape are essential to protein function Hemoglobin switches between a T (taut) deoxygenated and an R (relaxed) oxygenated state

34 Protein function – changes in shape are essential to protein function Binding of lactose to the lactose transport protein changes the shape of the protein. Lactose (red) binds to the protein on one side of a cell membrane and is released on the other side Movement across the membrane is reversible

35 Protein function – changes in shape are essential to protein function The Na + -K + pump switches between two conformations when a phosphate groups is added or removed Attaching phosphate to the Na + -K + pump switches the proteins shape, moving Na + outside the cell Removing phosphate switches the proteins shape again, moving K + into the cell

36 Protein function – changes in shape are essential to protein function Myosin heads cycles between multiple conformations during muscle contraction Binding and release of ATP, ADP, and phosphate (Pi) trigger changes in myosin head conformation Myosin heads pull on actin to make muscle fibers shorter

37 Protein function – changes in shape are essential to protein function Intrinsically Disordered Proteins have roles in signalling, etc. Some take shape only when interacting Tend to form hubs in interaction networks Predict with PONDR, Spritz, Wiggle, FoldUnfold Disprot database of ID proteins

38 Protein function - Phosphorylation Phosphorylation changes a proteins shape Phosphorylation may turn a protein on or off Kinases: enzymes that attach phosphates to proteins Phosphatases: enzymes that remove phosphates Other charged chemical groups (cAMP, Ca ++ …) are also attached to proteins to switch them on or off

39 Protein structure – Modifications after protein synthesis Post-Translational Modifications (PTM) Phosphorylation takes place on S, T or Y, but only in certain situations NetPhosK uses Artificial Neural Networks KinasePhos uses HMMs to predict Phosphorylation sites

40 Protein Interaction Networks

41 Introduction to Molecular Biology for Computer Scientists II. Nucleic acid structure and function A. The central dogma B. Nucleic acid structure C. Genetic code D. Control of gene expression E. Mutation

42 The central dogma of molecular biology DNA contains instructions for making RNA and protein DNA is transcribed (copied) to make messenger RNA (mRNA) mRNA is translated (instructions read) to make proteins

43 Central dogma - Transcription Transcription occurs in the nucleus mRNAs are modified before transport to the cytoplasm Other RNAs are also transcribed: o tRNAs: read genetic code in mRNA o rRNAs: backbone of ribosomes o small RNAs: enzymes and regulators of gene function

44 Central dogma - Translation Translation occurs in the cytoplasm on ribosomes tRNAs match the correct amino acids to codons on the mRNA Ribosomal enzymes join amino acids to the growing polypeptide The emerging polypeptide folds

45 Nucleic acid structure - Nucleotides

46 Nucleic acid structure - Polynucleotides The 5 phosphate of one nucleotide is joined to the 3 hydroxyl group of the preceding nucleotide The beginning of a nucleic acid has a free 5 phosphate group The end of a nucleic acid has a free 3 hydroxyl group

47 Nucleic acid structure - DNA Double stranded DNA forms a helix DNA strands are joined by hydrogen bonds between complementary bases A always pairs with T; two hydrogen bonds G always pairs with C; three hydrogen bonds

48 Nucleic acid structure - DNA Each DNA strand serves as a template for replication or repair of the other strand, and for RNA synthesis

49 Nucleic acid structure - RNA RNA is single stranded RNAs fold to form regions of internal double helix with complementary base pairs Many functional RNAs have globular shapes (green: sugar-phosphate backbone; gray: paired bases)

50 Genetic code – Linear code One DNA strand serves as a template for mRNA synthesis The linear order of nucleotides in the mRNA corresponds to the amino acid sequence of the polypeptide Triplet codons specify insertion of amino acids The DNA coding strand is complementary to the template strand, so its sequence is comparable to the mRNA (with T instead of U)

51 Genetic code – triplet codons tRNA reads genetic code: anticodon is complementary to the mRNA codon Redundant genetic code: some amino acids are specified by multiple codons First codon is AUG Met Three stop codons specify termination of polypeptide synthesis

52 Control of Gene Expression – Gene Structure RNA polymerase binds to DNA at the promoter to initiate transcription Other proteins bind to sequences near the exons to regulate transcription The information in genes (exons) is interrupted by non-coding sequence (introns) that are removed from RNA by splicing 5 cap and poly (A) tail are added for mRNA stability

53 Control of gene expression – DNA binding proteins can activate or inhibit transcription Multiple proteins must bind to DNA to initiate transcription Some proteins bind near the promoter, while others bind farther away at enhancers DNA bends so that enhancer- binding proteins help RNA polymerase assemble on the promoter Some proteins bind to DNA or to activator proteins to block transcription initiation

54 Control of gene expression – how proteins bind to specific DNA sequences DNA binding proteins insert an -helix into the major groove of DNA Helix-turn-helix domain

55 Control of gene expression – how proteins bind to specific DNA sequences The protein stalls on the DNA where amino acids form the maximum number of hydrogen bonds with nucleotide bases in the major groove

56 Control of gene expression – DNA binding protein domains Homeodomain Zinc finger Leucine zipper

57 Control of gene expression – other points of control Whether or not a gene is expressed can be controlled at any step that affects proteins concentration and function Alternative splicing (post- transcriptional processing) produces multiple proteins from one gene Control of translation (siRNAs block translation) Control of protein activity (phosphorylation switches proteins on or off) mRNA or protein longevity (how quickly it is degraded)

58 Control of gene expression – Alternative splicing

59 Analysis of gene expression Microarrays, GeneChips Three classes of software- Reading the images Clustering the data, building associations GeneSpring, GeneSifter, Bioconductor Warehousing the data GEO, SMD, YMD, others Meta-analysis is difficult due to variability

60 Mutations Although DNA is replicated and repaired accurately, rare mistakes are made, which alters nucleotide sequence Exposure to some chemicals and radiation damages DNA, increasing the likelihood of mutation Although rare, a few nucleotide sequence changes occur with each generation Mutations introduce genetic variability in a population of individuals

61 Introduction to Molecular Biology for Computer Scientists III. The centrality of evolution by natural selection in biology A.Universal genetic code B.Types of mutations and their effects C.Genomes D.Natural selection acts on random variation to produce evolution

62 Universal genetic code All living organisms use the same genetic code: CCC encodes proline in all cells All organisms are descended from a common ancestor – all life on earth evolved from a common point of origin The impressive variation in living organisms arose through random changes in nucleotide sequence (mutation) acted upon by natural selection over billions of years

63 Mutation – nucleotide substitutions Mutations are random changes in nucleotide sequence Mutations in non-coding sequences and codon third position are often silent Some nucleotide substitutions change the amino acid Nonsense mutations introduce stop codons and truncate a polypeptide

64 Mutations - Frameshifts Insertion or deletion of a nucleotide shifts the reading frame and drastically alters amino acid sequence

65 Mutations - Frameshifts Frameshift tolerant matching programs add additional states corresponding to transitions to other reading frames FrameSearch, Wise2, BLAST with OOF option

66 Mutations – Chromosomal mutations Large duplications generate multiple copies of genes Large deletions remove genes from the genome Duplications, deletions, inversions, and translocations are preserved in future generations, so can be used to trace evolutionary history

67 Mutations – effects on organisms Random mutation produces genetic variability that is acted upon by natural selection Most mutations are deleterious: mutations occur randomly, and are more likely to disrupt protein function than alter it in a positive way Deleterious mutations are eliminated by natural selection Rare mutations that introduce altered, even beneficial functions are positively selected

68 Genomes – Eukaryotic cells Animals, Plants, Fungi, and some other organisms (Protists) have eukaryotic cells Eukaryotic organisms have existed on earth for millions of years Eukaryotic genomes are in linear pieces of DNA called chromosomes Eukaryotic genes usually have introns Eukaryotic genes are separated by lots of spacer DNA that does not encode proteins Many repetitive sequences are present: o Tandem repeats of short sequences, like GAC o Transposons (for example, LINES and SINES)

69 Genomes – Prokaryotic cells Bacteria have prokaryotic cells Prokaryotic organisms have existed on earth for billions of years Most prokaryotic genomes consist of one circular DNA molecule Most prokaryotic genes lack introns Prokaryotic genomes have little spacer DNA; most DNA encodes known RNAs or proteins There is much less repetitive DNA than for eukaryotic genomes Plasmids, tiny circular DNA molecules separate from the main genome, are used in recombinant DNA technology to introduce genes into bacteria

70 Genomes – evolution through chromosomal mutation Gene duplications result in multigene families After duplication, one gene can provide the original function while the other may evolve (through mutation and selection) a different function The globin gene family evolved through duplication, mutation, and selection

71 Genomes – evolution through chromosomal mutation - and -globin genes vary in amino acid sequence, yet share the same conformation Sequence differences are fairly conservative

72 Multiple Sequence Alignment

73

74 Genomes – evolution through chromosomal mutation As species diverge, chromosomal mutations shuffle genome content The more recently two species diverged, the more similar their genome organization Both chromosomal and single nucleotide mutations can be used to trace the evolutionary history of species Genetic information in human chromosomes (blue) compared to dog chromosomes

75 Credits Many figures were borrowed from three sources: o Nelson and Cox, Lehninger Principles of Biochemistry, 4e, WH Freeman and Co., 2005 ISBN: o Raven, Johnson, Losos, Mason, and Singer, Biology, 8e, McGraw-Hill, 2008 ISBN: o Klug, Cummings, and Spencer, Essentials of Genetics, 6e, Pearson/Prentice Hall, 2007 ISBN: These texts and the accompanying on-line materials are excellent resources for learning more molecular biology

76 Thank You! Marty is at Suzanne is at


Download ppt "Introduction to Molecular Biology for Computer Scientists Dr. Suzanne Gollery – Sierra Nevada College Martin Gollery – Active Motif."

Similar presentations


Ads by Google