Molecular Biology and Recombinant DNA Technology Montarop Yamabhai
Schedule: Tue,Th 10 am- 1 pm July Tue 24 Thr 26 August Tue 7 Thr 9
Topic Recombinant DNA Technology (July 24,26) Brief Review of Molecular Biology (movie) History DNA and Protein Electrophoresis DNA sequencing PCR Basic Bioinformatics Genetic Engineering Plasmids (vector) DNA Library Southern Blot Analysis PCR cloning Mutagenesis The study of Gene Expressions (August 7) Northern / Western Blot Analysis Micro array Reporter-fusion Proteins / Immunolocalization Transgenic Animal Modern Methods in Molecular Biology (August 9) Production of Recombinant Proteins Molecular Evolution RNAi Technology Real time PCR
Reading Molecular Cloning A laboratory manual Sambrook & Russell Cold Spring Harbor Laboratory Press c2001 Modern Genetic Analysis Griffiths, Anthony J.F.; Gelbart, William M.; Miller, Jeffrey H.; Lewontin, Richard C. New York: W. H. Freeman & Co. ; c1999W. H. Freeman & Co. Molecular Biology of the Cell Alberts, Bruce; Johnson, Alexander; Lewis, Julian; Raff, Martin; Roberts, Keith; Walter, Peter New York and London: Garland Science ; c2002Garland Science Molecular Cell Biology Lodish, Harvey; Berk, Arnold; Zipursky, S. Lawrence; Matsudaira, Paul; Baltimore, David; Darnell, James E. New York: W. H. Freeman & Co. ; c1999W. H. Freeman & Co. Cell and Molecular Biology: Concepts and Experiments, 5th Edition Gerald Karp, Formerly of the Univ. of Florida, Gainesville ISBN , ©2008, 864 pagesCell and Molecular Biology: Concepts and Experiments, 5th Edition
Evaluation Assignments 50% Select a company that sale any biotech products that you like. Then, write a “product review” describing this product in as much detail as possible but no more than one A4 page Exam 50% Close book, 3 hrs.
Recombinant DNA Technology »Brief Review of Molecular Biology (movie) »DNA and Protein Electrophoresis »DNA sequencing »PCR »Basic Bioinformatics »Genetic Engineering »Plasmids (vector) »PCR cloning »DNA Library »Southern Blot Analysis »Site-directed mutagenesis
Chapter 18: Techniques in Cell and Molecular Biology Mechanism of Recombination Video (size: 360 x 240 or 588 x 392)360 x x 392 Polymerase Chain Reaction Video (size: 360 x 240 or 588 x 392)360 x x 392 Chapter 10: The Nature of the Gene and the Genome Chargaff's Ratios Video (size: 360 x 240 or 588 x 392)360 x x 392 Base Pairing Video (size: 360 x 240 or 588 x 392)360 x x 392 Public Project Sequencing Video (size: 360 x 240 or 588 x 392)360 x x 392 Chapter 11: Expression of Genetic Material: From Transcription to Translation Transcription Video (size: 360 x 240 or 588 x 392)360 x x 392 Triplet Code Video (size: 360 x 240 or 588 x 392)360 x x 392 Translation Video (size: 360 x 240 or 588 x 392)360 x x 392 Chapter 12: The Cell Nucleus and the Control of Gene Expression How Much DNA Codes for Protein Video (size: 360 x 240 or 588 x 392)360 x x 392 How DNA is Packaged Video (size: 360 x 240 or 588 x 392)360 x x 392 Microarray Video (size: 360 x 240 or 588 x 392)360 x x 392 Chapter 13: DNA Replication and Repair Replicating the Helix Video (size: 360 x 240 or 588 x 392)360 x x 392 Mechanism of Replication Video (size: 360 x 240 or 588 x 392)360 x x 392 Chapter 16: Cancer Microarray Video (size: 360 x 240 or 588 x 392)360 x x 392 Tumor Growth Video (size: 360 x 240 or 588 x 392)360 x x 392 Review
2006 Craig C. MelloCraig C. Mello and Andrew Fire's received a noble prize for RNAi Andrew Fire
DNA and Protein Electrophoresis Agarose gel electrophoresis Polyacrylamide gel electrophoresis (PAGE)
Gel Electrophoresis
DNA Sequencing Dideoxy Method Automated DNA Sequencing
PCR
Bioinformatics What is Bioinformatics Useful Websites Tools Biological Databases Sequence Alignment Structural Bioinformatics Molecular Phylogenetics Genomics/Proteomics
What is Bioinformatics Interdisciplinary subject involving computer and biological sciences
the science of informatics as applied to biological research. Informatics is the management and analysis of data using advanced computing techniques. Bioinformatics is particularly important as an adjunct to genomics research, because of the large amount of complex data this research generates. The collection, organization and analysis of large amounts of biological data, using networks of computers and databases. The process of developing tools and processes to quantify and collect data to study biological systems logically. the assembly of data from genomic analysis into accessible forms. It involves the application of information technology to analyze and manage large data sets resulting from gene sequencing or related techniques. the use of computers in solving information problems in the life sciences. It mainly involves the creation of extensive electronic databases on genomes, protein sequences etc. Also involves techniques such as three-dimensional modelling of biomolecules and biological systems. The use of computers to handle biological information. The term is often used to describe computational molecular biology – the use of computers to store, search and characterize the genetic code of genes, the proteins linked to each gene and their associated functions. A broad term to describe applications of computer technology and information science to organize, interpret, and predict biological structure and function. Bioinformatics is ususally applied in the context of analyzing DNA sequence data. Biomagnification: a problem associated with the introduction of xenobiotic compounds into the biosphere in which the concentration of the compound increases as it passes up the food chain. The field of science in which biology, computer science, and information technology merge into a single discipline The field of biology specializing in developing hardware and software to store and analyze the huge amounts of data being generated by life scientists. information about human and other animal genes and related biological structures and processes pharmacy.ucsf.edu/glossary/i/ pharmacy.ucsf.edu/glossary/i/ is research, development or application of mathematical tools and approaches for expanding the use of biological, medical, behavioral or health data. This includes methods to acquire, store, organize, archive, analyze or visualize data. The use of computers to collect, analyze and store genomics information. pbi-ibp.nrc-cnrc.gc.ca/en/media/glossary.htm pbi-ibp.nrc-cnrc.gc.ca/en/media/glossary.htm The use of computers, laboratory robots and software to create, manage and interpret massive sets of complex biological data. The collection and storage of information about genomics in databases. The management and analysis of data from biological research. Description: A scientific discipline that comprises all aspects of the gathering, storing, handling, analysing, interpreting and spreading of biological information. Involves powerful computers and innovative programmes which handle vast amounts of coding information on genes and proteins from genomics programmes.... europa.eu.int/comm/research/biosociety/library/glossarylist_en.cfm europa.eu.int/comm/research/biosociety/library/glossarylist_en.cfm The discipline of obtaining information about genomic or protein sequence data. This may involve similarity searches of databases, comparing your unidentified sequence to the sequences in a database, or making predictions about the sequence based on current knowledge of similar sequences. Databases are frequently made publically available through the Internet, or locally at your institution. bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/Glossary.html bioinfo.cnio.es/docus/courses/SEK2003Filogenias/seq_analysis/Glossary.html An interdisciplinary area at the intersection of biological, computer, and information sciences necessary to manage, process, and understand large amounts of data, for instance from the sequencing of the human genome, or from large databases containing information about plants and animals for use in discovering and developing new drugs. the use of computers in biological research use of computers in the acquisition and analysis of information relating to genes, proteins (and their structures), biological pathways and drugs the organisation and use of information on biological and molecular subjects. This includes organising biomolecular databases, managing the quality of data input, getting useful information out of such databases, and integrating information from disparate sources. One application of bioinformatics is to bring together gene- sequence dated with that about the physiological functions of the proteins whose production they simulate the use of computers and information technology to store and analyze nucleotide and amino acid sequences and related information. A collective term that designates the use of computers and specialized software to analyze and retrieve data from genomic and scientific databases. The study of collecting, sorting, and analyzing DNA and protein sequence information using computers and statistical techniques. The science of managing and analyzing biological data using advanced computing techniques. Bioinformatics or computational biology is the use of techniques from applied mathematics, informatics, statistics, and computer science to solve biological problems. Research in computational biology often overlaps with systems biology. Major research efforts in the field include sequence alignment, gene finding, genome assembly, protein structure alignment, protein structure prediction, prediction of gene expression and protein-protein interactions, and the modeling of evolution.... en.wikipedia.org/wiki/Bioinformatics en.wikipedia.org/wiki/Bioinformatics
Useful Server Websites USA NCBI Europe EBIhttp:// Google search
Tools Bioinformatic Software – $$$ buy from company – Free web-based software DNA sequence analysis Primer design DNA translation tools Structure prediction Restriction site analysis Sequence alignments etc
Biological Databases Primary Database Secondary Database Specialized Database
Primary Database Raw nucleic acid sequence –Genbank –EMBL –DDBJ Use different format to present data These databases are closely connected and exchanged data daily 3D structure : PDB Protein and nucleic acid structure Atomic coordinates from X-ray and NMR
Secondary Database Computational processed information Provide sequence annotation SWISS-PROT TrEMBL (translated nucleic acid sequence from EMBL) Other : UniPort, Pfam, Blocks, DALI
Specialized Database Focus on particular organisms –Flybase –Wormbase –AceDB, TAIR Focus on functional analysis –Genbank EST –Microarray gene expression database
Important Many database are connected –NCBI are most integrated Reliability ! There are many errors in the database –Sequencing error (especially before 1990s) –Redundancy Non-redundant database UniGene (coalesce EST)
Information Retrieval Use Boolean operation = join a series of keywords Text-based search Provide access to multiple database for retrieval of integrated search result Entrez (NCBI) SRS (Seq retrival system from EBI)
Sequence Alignment Heart of bioinformatic analysis A consequene of evolution Help to identify evolution relationship
Sequence homology ≠ Sequence similarity Sequence homology is a “quantitative term showing common evolution origin Sequence similarity is a “quantitative term” calculating from sequence alignment (% similarity) From % similarity one can conclude that the sequence is homolog or non-homolog
Sequence similarity & Sequence identity Same for DNA Different for protein –Protein similarity means % of similar physicochemical characteristic –Protein identity means % of match of the same amino acid sequence Formula (%) –Ls(i) x 2 / La+Lb x 100 –Ls(i) / La x100 La is the length of shorter sequence
Similarity Searching Submit “query sequence” to perform pairwise comparison using computational process Use heuristic method BLAST –Developed in 1990s –Variations: BLASTN, BLASTP, BLASTX, TBLASTN, TBLASTX, NBLAST FASTA Significant determined from E-value
Important Protein sequence is better (more sensitive) Not guarantee to find all homolog Must be followed by independent alignment programs Must filter out LCRs (low complexity regions eg. Repetitive sequence)
Multiple Sequence Alignment Reveal more information than pair wise alignment Identify conserved or critical amino acid Required for phylogenic analysis and prediction of protein structure For designing degenerated primers for PCR-cloning
Methods / Program Dynamic program Heuristic approach –Clustal; ClustalW, ClustalX –T-Coffee –Poa –PRALINE –PRRN, etc Editing –BioEdit –Rascal
Important No alignment program is perfect Combine results from multiple program The alignment should be refine manually Protein sequence alignment is more accurate and should be aligned first
Prediction of Gene and Promoter Very difficult Prokarotic genomes are much easier to predict The good program is being developed –GeneMark –Glimmer
Molecular Phylogenetic
Evolution Development of biological form from preexisting form through natural selection and mutation Protein or DNA sequence are molecular fossils
Nature 392: , 1998
Major Assumptions Molecular sequences used in phylogenetic construction are homologous Evolutionary tree is always binary Each position in a sequence evolve independently
Types of Phylogenetic trees
Steps Choosing molecular markers Performing multiple sequence alignment Choosing a model of evolution Determining a tree building method Assessing tree reliability
Selection of molecular markers For closely related organisms (individual within populations), DNA sequence is used, as it evolves fast For more widely divergent group (different species of bacteria, fungi) use slowly evolving sequence such as ribosomal RNA or protein For greatly different organisms (bacteria and eukaryote) use conserve protein sequence –DNA sequence can be biased –Protein sequence allow more sensitive alignment
Choosing a model of evolution Select a proper substitution model that provide estimate of true evolutionary events For DNA Jukes-Cantor and Kimura For protein PAM and JJT
Phylogenetic method Distance – based –UPGMA, NJ : fast but not accurate –Fitch – Margoliash, minimum evolution : accurate but not fast Character – based –MP maximum parsimony : popular –ML maximum like hood : slowest but based on a solid statistic foundation
Assessing tree reliability Statistically evaluate the reliability of the tree after constructed Bootstraping jackknifing
Phylogenetic Programs Knowing background, capability, and limitation is important Felsenstein’s collection lists hundreds of freely available programs For example : PAUP, Phylip, PHYML
Important Phylogenetic tree construction is a complicated process None of the methods are guarantee to find a correct tree At least 2 methods should be used for any phylogenetic analysis
Structural Bioinformatics Protein functions are determined by their structures Essential elements in bioinformatics 20 amino acids are building blocks of protein Amino acids are linked by peptide bond Conformation (folding) of protein is determined by dihedral angle (phi and psi)
Ramachandran Plot
3D structure of protein Can be determined by X-ray crystallography –Protein need to be grown into large crystal; bottle neck –The x-ray are relected by electron cloud surrounding the atoms, diffraction patterns are converted into electron density map –2 methods are used to resolved the structures Molecular replacement Multiple isomorphous replacement –R factor is used to determined the quality of the model, ranging from 0.0 – 0.59 NMR (nuclear magnetic resonance) –Detect spinning pattern of atomic nuclei in magnetic field –Protein are in solution, so it is mobile and vibrating, thus a number of different models will be constructed –Limit to <200 amino acid residues, use radioisotope
Protein visualization PDB only contains x, y, z coordinate of atoms Widely used and freely available software –RasMol, RasTop –Swiss-PDBViewer –WebMol, Chime, Cn3D Unix software –Molscript –Ribbons –Grasp
Other software Software for structure comparison –DALI Software for protein classification –SCOP –CATH
3D Structure Prediction Theoretical alternative to experimental approaches There are 3 computational approaches –Homology modeling : most accurate Divided into 6 steps : 1) template recognition, 2) sequence recognition, 3) backbone generation, 4) loop building, 5) side chain building, 6) model refinement and evaluation –Threading or fold recognition –Ab initio recognition Comprehensive modeling program –Modeller, Swiss-Model, 3D-JIGSAW
Genomics and Proteomics Genomics is the study of genome involving simultaneous analysis of a large number of genes, using automated high-throughput machine Genomic study can be divided into 2 parts –Structural genomics –Functional genomics
Structural Genomics Genome mapping –Low resolution : using genetic markers –Highest resolution : complete sequence of the whole genome Genome sequencing; assembly –Full shotgun –Hierarchical approach Genome annotation –Gene finding, naming –Assigning function to the gene –The exact number of genes in human genomes is not known Comparative genomics –Help to discover potential operon and assign putative funciton –Conserved gene order among prokaryotes often indicate protein physical interaction (e.g. protein in the same metabolic pathway) –BLASTZ or LAGAN are the two best programs for genome comparison
Level of Analysis Cytological map –Pattern on the chromosome Genetic map –Genetic marker Physical map (Restriction Map) DNA sequence
Sequencing Approaches Shot gun sequencing approach –Genomic DNA many short fragments cloned sequencing each clone assemble sequence by aligning and removing overlaps (need high capacity software) Hierarchical sequencing approach –Genomic DNA long fragments cloned into BAC library with completed physical map subclone library from each BAC clone sequencing each subclone assemble sequence by aligning and removing overlaps
Functional Genomics Study gene function at the whole genome level using high throughput approach Simultaneous analysis of all genes in a genome Transcriptome : all expressed genes 2 approaches for analysis Sequence-based (ESTs) Miroarray-based : most popular method to study gene expression
Proteomics An entire set of expressed proteins in a cell Simultaneous study of all the translated proteins in a cell High-throughput analysis of the protiens –Protein expression –Posttranslational modification –Protein sorting –Protein-protein interaction
Traditional proteomic analysis
Genetic Engineering Restriction Enzyme Ligation
Restriction Map
Joining or Ligation
Chemical synthesis of oligonucletides
Plasmids (vector) Origin of Replication (Ori) Marker Multiple Cloning Site (MCS)
Origin of Replication
Different Types of Replicons PlasmidRepliconCopy number pBR322pMB pUCmodified from pMB pMOB45PKN pACYCp15A18-22 pSC101 pSC101~5 colEI colEI15-20
Types of Cloning Vector Typessize of cloned DNA (kb) Plasmid20 Lambda Phage25 Cosmid45 P1 phage100 BAC300 YAC1000
Multiple Cloning Sites (MCS) Poly Linker
DNA Library Genomic DNA Library cDNA Library Phage Displayed Library
Select for ampicilin resistance colonies
Phage Display Library
DNA Hybridization
Membrane hybridization assay
Southern blot
In situ hybridization
Radiolabeling of DNA
PCR Cloning Primer Design –Specific primers –Degenerated primers –Nested primers Amplification –High-fidelity DNA polymerase –Hot start –Touch down PCR Clone into appropriate vector –Compatible restriction sites –Poly T (pGEM T easy) –No ligation (Topo cloning)
NESTED PCR A powerful method to amplify specific sequences of DNA from a large COMPLEX mixture of DNA.
Touch Down / Step down one-step procedure for optimizing PCRs
Degenerated Primers
Example a protein motif: W D T A G Q E Trp Asp Thr Ala Gly Gln Glu 5' TGG GAY ACN GCN GGN CAR GAR 3' where the Y = C or T, R = G or A, N = G, A, T or C. (This gives a mix of 256 different oligonucleotides.)
Align Design
pGEM-T (Promega) TaqTflTthTli (Vent ® ) Deep Vent ® PfuPwo"Long PCR" enzyme mixes † Resulting DNA ends 3′-A >95% Blunt BluntN.A.Varies 5′->3′ exonucle ase activity Yes No Yes 3′->5′ exonucle ase activity No Yes template-independent addition of a single A at the 3 ′ -end of PCR products by some thermostable DNA polymerases.
No Ligation / TOPO Cloning
Mutagenesis Purpose –Study regulatory regions of the genes –Study structure-function relationsip of protein –Alter activity of enzymes or proteins Types –Random Mutagenesis –Site-directed Mutagenesis
Random Mutagenesis Chemical treatment Error-prone PCR DNA shuffling
Error Prone PCR Non-proofreading polymerase, i.e. Taq Low annealing temperature Low/unequal dNTP concentration High Mg 2+ High cycle number Incorporation of Mn 2+ ion ( mM)
DNA Shuffling
Site-Directed Mutagenesis Kits from various company PCR-based mutagenesis
Site-directed Mutagenesis by PCR
The Study of Gene Expression cDNA library Northern / Western Blot Analysis Micro array Reporter-fusion Proteins / Immunolocalization Transgenic Organisms
DNA Microarray
Gene Expression Proteins can be made in large amount in various organisms through the use of expression vector Expression of proteins is used to –Production of large amount of proteins –Study the biological function of different proteins Various organisms (systems) can be used to expressed foreign proteins
inducer
Gene Expression Systems Bacteria expression systems –E.coli –Bacillus subtilis Yeast expression system –S.cerevisiae –Pichia pastoris Mammalian expression system –Primary cells or cell lines (human, mouse) –Transgenic animals Insect expression system Plant expression system
Bacterial Gene Expression
Lac Operon
Eukaryotic Gene Expression
Putting genes into cells Tranformation - Bacteria, Yeast Transfection - metazoa –Chemical –Electrical –Viral infection
Expression Vector promoter RBS….ATGMCSTagstop promoter RBS….ATG MCS Tagstop
Expression of Fusion Proteins Tagged-protein for Purification –GST –6xHis, myc epitope, Flag Reporter Proteins –Green Fluorescence proteins (GFP) –LacZ
Animal Cloning NT Nuclear Transfer
Modern Methods Recombinant Proteins Molecular Evolution RNAi Technology RT-PCR
Production of Recombinant Proteins E. coli expression system Other bacterial expression systems Eukaryotic expression systems
Molecular Evolution Phage Display Technology Directed Evolution
Phage Display Technology Principle of phage display technology Applications of phage display technology
1 Pan library with immobilized targets 2 Wash off unbound phage 3 Elute bound phage 4 Amplify overnight 5 Pan eluted phage 6 Isolate individual colony 3-4 © 2004 Montarop Yamabhai
Phage ELISA HRP
Directed Evolution Principle of directed evolution techniques Applications of directed evolution techniques
RNAi RNA interference A mechanism for RNA-guided regulation of gene expression in which double stranded RNA inhibits the expression of genes with complementary nucleotide sequences Powerful tool in molecular biology and have many potentials application in medicine and biotechnology
History First observed in plants but don’t know why? Craig C. Mello and Andrew Fire's 1998 Nature paper reported a potent gene silencing effect after injecting double stranded RNA into C. elegans.[5] They observed that neither mRNA nor antisense RNA injections had an effect on protein production, but double- stranded RNA successfully silenced the targeted gene. This work represented the first identification of the causative agent of a previously inexplicable phenomenon. They were awarded the Nobel Prize in Physiology or Medicine in 2006 for their work.Craig C. MelloAndrew FireC. elegans[5]antisense RNANobel Prize in Physiology or Medicine
The most interesting aspects of RNAi are the following dsRNA, rather than single-stranded antisense RNA, is the interfering agent it is highly specific it is remarkably potent (only a few dsRNA molecules per cell are required for effective interference) the interfering activity (and presumably the dsRNA) can cause interference in cells and tissues far removed from the site of introduction
RNAi focus from Nature Reviews: RNA interference – Animations PDF
Real Time - PCR RT- PCR not Reverse Transcription PCR Both quantitative and qualitative Two types of detection –Double stranded DNA dyes such as SYBR Green –Fluorescent reporter probe More specific More expensive
Ct Cycle threshold
Real-time PCR Animation - PCR and Real- time PCR principles and comparisonReal-time PCR Animation Real Time PCR Tutorial by Dr Margaret Hunt, University of South Carolina, September 5, 2006Real Time PCR Tutorial Real-Time PCR Vs. Traditional PCR tutorial from Applied Biosystems (link opens a PDF document)Real-Time PCR Vs. Traditional PCRPDF
The study of protein-protein interactions Yeast two-hybrid system Phage display technology Pull-down experiments