Presentation on theme: "Thanks to: Broad Inst., DARPA-BioComp, DOE-GTL, EU-MolTools, NGHRI-CEGS, NHLBI-PGA, NIGMS-CECBSR, PhRMA, Lipper Foundation Agencourt, Ambergen, Atactic,"— Presentation transcript:
Thanks to: Broad Inst., DARPA-BioComp, DOE-GTL, EU-MolTools, NGHRI-CEGS, NHLBI-PGA, NIGMS-CECBSR, PhRMA, Lipper Foundation Agencourt, Ambergen, Atactic, BeyondGenomics, Caliper, Genomatica, Genovoxx, Helicos, MJR, NEN, Nimblegen, ThermoFinnigan, Xeotron/Invitrogen For more info see: arep.med.harvard.edu BU BME retreat 23-Jun-2004 9:45-10:30 Seacrest, N. Falmouth, MA Optimal Combinatorial Biology & Genome Engineering
Exponential technologies Shendure J, Mitra R, Varma C, Church GM (May 2004) Advanced Sequencing Technologies: Methods & Goals. Nature Reviews of Genetics 5, 335 -344. ABI
010101 01010 01010001101010 1010010110010110 01010001101010 010010 111010 01010101010 01010001101010 1010010110010110 01010001101010 010010111010 010101010 010101101010 10100100010110 010001101010 0100111010 0101010 0101101010 101000010110 0100001010 01001010 Programming cells with DNA vs. Digital computers simulating cells Cells simulating digital computers Drugs & devices simulating human systems 0101010 0101101010 101000010110 0100001010 01001010 0101010 0101101010 101000010110 0100001010 01001010 0101010 0101101010 101000010110 0100001010 01001010 0101010 0101101010 101000010110 0100001010 01001010
Engineering complex systems (comparative genomics) Stedman et al. (2004) [Masticatory] Myosin gene mutation correlates with anatomical changes in the human lineage Nature 428, 415 - 418
DNA RNA Proteins Metabolites Replication rate Environment Biosystems Engineering Integrating Measures & Models Microbes Cancer & stem cells Darwinian optima In vitro replication Small multicellular organisms RNAi Insertions SNPs interactions
Now that we have 200 genomes, why sequence? Once per organism Phylogenetic footprinting, biodiversity RNA splicing & chromatin modification patterns. Cell-lineage during development NA "aptamers" & Ab for any protein Once per person Preventative medicine & genotype–phenotype associations Frequently Cancer: mutation sets for individual clones, loss-of-heterozygosity B & T-cell receptor diversity: Temporal profiling, clinical New & old pathogen "weather map", biowarfare sensors DNA computing & lab selections Shendure et al. 2004 Nature Rev Gen 5, 335.
Why 'single molecule' sequencing? (1) Single-cell analyses, e.g. Preimplantation (PGD) (2) Co-occurrence on a molecule, complex, cell e.g. RNA splice-forms (3) Cost: $1K-100K "personal genomes" http://grants.nih.gov/grants/guide/rfa-files/RFA-HG-04-003.html (4) Precision: Counting 10 9 RNA tags (to reduce variance) (~5e5 RNAs per human cell) Fixed 5e3 5e4 5e6 5e9 (goal) Costs EST SAGE MPSS Polony-FISSeq (polymerase colony)
Polony Fluorescent In Situ Sequencing Libraries Greg Porreca Abraham Rosenbaum 1 to 100kb Genomic L R M L R PCR bead Sequencing primers Selector bead 2x20bp after MmeI Dressman et al PNAS 2003 emulsion
Cleavable dNTP-Fluorophore (& terminators) Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65 Reduce or photo- cleave
Polony- FISSeq : up to 2 billion beads/slide White= Fe-core pixels, Cy5 primer (570nm) ; Cy3 dNTP (666nm) Jay Shendure
# of bases sequenced (total)23,703,953 # bases sequenced (unique)73 Avg fold coverage324,711 X Pixels used per bead (analysis)~3.6 Read Length per primer14-15 bp Insertions 0.5% Deletions 0.7% Substitutions (raw) 4e-5 Throughput:360,000 bp/min Polony FISSeq Stats Current capillary sequencing 1400 bp/min (600X speed/cost ratio, ~$5K/1X) (This may omit: PCR, homopolymer, context errors) Shendure
CD44 Exon Combinatorics (Zhu & Shendure) Alternatively Spliced Cell Adhesion Molecule Specific variable exons are up-or-down-regulated in various cancers (>2000 papers) v6 & v7 enable direct binding to chondroitin sulfate, heparin… Zhu,J, et al. Science. 301:836-8.
XiXi Membrane V transport V syn V deg V growth Growth: c 1 X i + c 2 X 2 +... +c m X m Biomass Flux ratios at each branch point yields optimal polymer composition for replication X i =const. v j =0
AcCoA CoA ATP FAD NADH Xi = metabolites Ci = coeff. in growth reaction Biomass composition Edwards & Palsson, PNAS 2000, BMC Bioinf. 2000 Optimize flow from input C,N,P to Biomass GTP Trp Leu Ala Arg Gly Cys Ser Asn Asp His CTP UTP SucCoA Val Glu Gln Phe Pro Ile Lys Met Tyr Thr dACGT
Minimization of Metabolic Adjustment (MoMA) Linear Programming (LP) to find optima, Quadratic (QP) to find closest points x,y are two of the 100s of flux dimensions Wild-type optimum Mutant optimum Mutant initially (closest point) Mutant Wild type (feasible flux polyhedra) Objective function = growth flux hyperplanes Segre, Vitkup, & Church PNAS 99: 15112-7
Reproducibility of mass competition Correlation between two selection experiments Badarinarayana, et al. Nature Biotech.19: 1060
Competitive growth data 2 p-values 4x10 -3 1x10 -5 Position effects Novel redundancies On minimal media negative small selection effect Hypothesis: next optima are achieved by regulation of activities. LP QP
Synthetic testing of DNA motif combinations 1.3 2.4 (1.3 in argR) 1.1 1.3 0.7 2.5 0.2 1.4 1.4 3.5 RNA Ratio (motif- to wild type) for each flanking gene Bulyk, McGuire,Masuda,Church Genome Res. 14:201–208
Systems Biology Loop Synthesis / Perturbation Model Experimental design (Systematic) Data Proteasome targeting Genome Engineering
Engineering BioSystems Perturbations Action Specificity %KO "Design" Small molecules (drugs) Fast Varies Varies Hard Antibodies Fast Varies Varies Hard RNAi Slow Varies Medium OK Insertion "traps" Slow Yes Varies Random Proteasome targeting Fast Excellent Medium Easy Homologous recombination Slow Perfect Complete Easy
Programming proteasome targeting Janse, DM, Crosas,B Finley,D & Church, GM (2004) Localization to the Proteasome is Sufficient for Degradation.
Synthetic Genomes & Proteomes. Why? Test or engineer cis-DNA/RNA-elements Access to any protein (complex) including post-transcriptional modifications Affinity agents for the above. Mass spectrometry standards, protein design Utility of molecular biology DNA-RNA-Protein in vitro "kits" (e.g. PCR, SP6, Roche) Toward these goals design a chassis: 115 kbp genome. 150 genes. Nearly all 3D structures known. Comprehensive functional data.
PURE translation utility (yet room for improvement) Removing tRNA-synthetases, RNases & proteases makes feasible: Optimal mRNA structure & codon usage Lee et al. 2004 J Immunol Methods. 284:147-57. Selection of scFvs specific for HBV DNA polymerase using ribosome display. Forster et al. 2003Programming peptidomimetic syntheses by translating genetic codes designed de novo. PNAS 100:6353-7. Klammt et al. 2004 Eur J Biochem. 271:568-80. High level cell-free expression & specific labeling of integral membrane proteins. Shimizu et al. 2001 Nat Biotechnol. 19:751-5. Cell-free translation reconstituted with purified components.
in vitro genetic codes 5' mS yU eU UGG UUG CAG AAC... GUU A 3' GAAACCAUG fMTNVE | | | 5' Second base 3' U A C C U mS yU eU A C U G A Forster, et al. (2003) PNAS 100:6353-7 80% average yield per unnatural coupling. bK = biotinyllysine, mS = Omethylserine eU=2-amino-4-pentenoic acid yU = 2-amino-4-pentynoic acid
Mirror world : resistant to enzymes, parasites, predators L-amino acids & D-ribose (rNTPs, dNTPs) Transition: EF-Tu, peptidyl transferase, DNA-ligase D-amino acids & L-ribose (rNTPs, dNTPs) Dedkova, et al. (2003) Enhanced D-amino acid incorporation into protein by modified ribosomes. J Am Chem Soc 125, 6616-7
Forster & Church Oligos for 150 & 776 synthetic genes (for E.coli minigenome & M.mobile whole genome respectively)
Up to 760K Oligos/Chip 18 Mbp for $700 raw (6-18K genes) <1K Oxamer Electrolytic acid/base 8K Atactic/Xeotron/Invitrogen Photo-Generated Acid Sheng, Zhou, Gulari, Gao (U.Houston) 24K Agilent Ink-jet standard reagents 48K Febit 100K Metrigen 380K Nimblegen Photolabile 5'protection Nuwaysir, Smith, Albert Tian, Gong, Church
Improve DNA Synthesis Cost Synthesis on chips in pools is 5000X less expensive per oligonucleotide, but amounts are low (1e6 molecules rather than usual 1e12) & bimolecular kinetics slow with square of concentration decrease!) Solution: Amplify the oligos then release them. 10 50 10 => ss-70-mer (chip) 20-mer PCR primers with restriction sites at the 50mer junctions Tian, Gong, Sheng, Zhou, Gulari, Gao, Church => ds-90-mer => ds-50-mer
Improve DNA Synthesis Accuracy via mismatch selection Tian & Church
Genome assembly Challenges: 1. Tandem, inverted and dispersed repeats (hierarchical assembly, size-selection and/or scaffolding) 2. Reduce mutations (goal <1e-6 errors) to reduce # of intermediates 3. >30 kbp homologous recombination (Nick Reppas) Stemmer et al. 1995. Gene 164:49-53. Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. 50 75 125 225 425 825 … 100*2^(n-1)
M 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 DNA Templates RNA Transcripts All 30S-Ribosomal-protein DNAs & mRNAs synthesized in vitro s19 0.5kb 0.3kb Nimblegen Xeotron/Atactic Wild-type DNA Templates Tian, Gong, Sheng, Zhou, Gulari, Gao, Church
Improving synthesis accuracy 9-fold Method Total bp # Clones Trans- ition Trans- versionDeletionAddition Bp/error Hyb selection, PCR2364197352 1391 Gel selection, PCR24546352812113 455 No selection, ligation +PCR60932566224 160 No selection, PCR9243212513191 159 Tian & Church
Extreme mRNA makeover for protein expression in vitro RS-2,4,5,6,9,10,12,13,15,16,17,and 21 detectable initially. RS-1, 3, 7, 8, 11, 14, 18, 19, 20 initially weak or undetectable. Solution: Iteratively resynthesize all mRNAs with less mRNA structure. Tian & Church Western blot based on His-tags
Improve DNA Synthesis accuracy Synthesis on a chip pools of "construction" ~50-mers and two complementary "selection" ~26-mers (Left & Right) 10 50 10 => ss-70-mer (chip) Tian, Gong, Sheng, Zhou, Gulari, Gao, Church => ds/ss-50-mer (amplif/restrict) 10 26 10 => ss-56-mer (chip) 20-mer PCR primers (one biotinylated) Biotin => ss-76-mer (amplif/avidin)
Improve DNA Synthesis Accuracy via D-HPLC or MutS Smith & Modrich (1997) PNAS 94: 6847–50. Removal of polymerase-produced mutant sequences from PCR products. MutHLS Cleaves at GATC near mismatches. Lowers error rate from 6e-6 to 6e-7. Bellanne-Chantelot et al. (1997) Mutat Res. 382:35-43. Search for DNA sequence variations using a MutS-based technology. Mulligan & Tabone (2002) US Patent 6,664,112. Methods for improving the sequence fidelity of synthetic doublestranded- oligonucleotides.