Presentation is loading. Please wait.

Presentation is loading. Please wait.

Life with four billion atoms Tom Knight Ginkgo Bioworks.

Similar presentations


Presentation on theme: "Life with four billion atoms Tom Knight Ginkgo Bioworks."— Presentation transcript:

1 Life with four billion atoms Tom Knight Ginkgo Bioworks

2 Energy in the 1800s The steam engine has given more to science than
science has given to the steam engine Lord Kelvin

3 Information in the 1900s

4

5 Core of simple universal parts (central metabolites) (Energy carriers)
Catabolism Anabolism Natural Complexity (Food) Specified Complexity (Organisms) Architecture – raw materials to building materials to new construction Design Information (Genome) Core of simple universal parts (central metabolites) (Energy carriers)

6 Harold Morowitz 1962 & 1984

7 Some history… Confusion over what PPLO/Mycoplasma were
“The Microbe of pleuorpneumonia” Nocard 1896 1932 isolation of “PPLO” Koch postulates. 1958 Klieneberger-Nobel: free living bacterial species Morowitz 1962 SciAm: “the smallest living cell” 1980 Gilbert effort to sequence M. capricolum 1982 Morowitz “complete understanding of life” 1996 Fraser et al. M. genitalium sequence 1999 Hutchison et al. Minimal genome set for M. genitalium 2009 Gibson & Lartigue: Genome transplantation 2012 Serrano et al. Comprehensive model

8 Complexity, minimality & simplicity
Complex systems have many parts Reducing the part count leads generally to simpler system (minimal part count) Extreme part count reduction leads to shared parts These systems are ironically less simple There is an optimal part count for modular design Conservation of complexity Stratified design Structure function

9 Complexity Reduction 100’s of OS calls 100 statements
User Application software Operating system, user interface Programming language Instruction set architecture Virtual machine Computer hardware design Functional computing units Logic synthesis Logic gates Circuit design Transistors Mask geometry Fabrication technologies Semiconductor physics Quantum physics 100’s of OS calls 100 statements 100’s of instructions 10’s of units 10’s of gate types 4 types of transistors 15 mask layers 6 materials

10 Biology is modular and abstract
Complexity Reduction Good News: Biology is modular and abstract Evolution needs modular design as much as we do We can discover the modular designs, modify them, and use them

11 Engineered Simple Organisms
modular understood malleable low complexity Start with a simple existing organism Remove structure until failure Rationalize the infrastructure Learn new biology along the way The chassis and power supply for our computing

12 Relative Complexity Alive Autotroph Log Genome Size, base pairs
Mycoplasma genitalium (580 kB) Mesoplasma florum (793 kB) S. cerevisiae (12 MB) T7 Phage (36 kB) Lilly (12 GB) E. coli (4.6 MB) Human (3.3 GB) Plasmid Gene 100 1K 10K 100K 1M 10M 100M 1G 10G 100G Alive Autotroph Log Genome Size, base pairs

13 Choosing an organism: Mesoplasma florum
Isolation from the flower of a lemon tree, Florida (McCoy84) Safe BSL-1 organism -- an insect commensal Not a human or plant or animal pathogen No growth at 37C Fast growing 40 minute vs. six hours for doubling in M. genitalium Convenient to work with Facultative anaerobe Small genome: 793,244 bp 682 coding regions

14 Tomographic EM of Me. florum
Grant Jensen – Caltech 3-D TEM image of Mesoplasma florum Reconstructed from angled TEM images 300 x 400 nm 6 nm membranes 5 nm ribosomes False colored DNA

15 How many atoms? Cell diameter is about 400 nm
Approximately 2000 atoms in diameter About four billion atoms 70% of these are in water molecules 1.2 billion atoms in biomolecules DNA is about 40 million atoms, 3%

16

17 Genome characteristics
base pairs 26.52% G + C 682 protein coding regions UGA for tryptophan No CGG codon or corresponding tRNA Classic circular genome oriC, terminator region, gene orientation 39 stable RNAs 29 tRNAs 2x 16S, 23S, 5S RNAse-P, tmRNA, SRP One inactive insertion sequence Gene direction largely oriented with replication fork

18 Understand the metabolism
Identify major metabolic pathways by finding critical genes coding for known enzymes Predict necessary enzymes which may not have been found Evaluate the list of unknown function genes for candidates Build the major metabolic pathway map of the organism Consider elimination of entire pathways

19 Identified Metabolic Pathways in Mesoplasma florum
G. Fournier 02/23/04 PTS II System Mfl519, Mfl565 sucrose trehalose xylose beta-glucoside glucose unknown ribose ABC transporter Mfl516, Mfl527, Mfl187 Mfl500 Mfl669 Mfl009, Mfl033, Mfl318, Mfl312 fructose Mfl214, Mfl187 Mfl619, Mfl431, Mfl426 Mfl181 ATP Synthase Complex Mfl497 Mfl515, Mfl526 Mfl499 Mfl317?, Mfl313? Mfl617, Mfl430, Mfl313? Mfl425, Mfl615, Mfl034, Mfl009, Mfl011, Mfl012, ? Mfl112, Mfl113, Mfl114, Mfl109, Mfl110, Mfl111, Mfl115, Mfl116 Mfl666, Mfl667, Mfl668 glucose-6-phosphate chitin degradation Mfl558 Mfl347, ATP ADP sn-glycerol-3-phosphate ABC transporter Pentose-Phosphate Pathway Glycolysis Mfl023, Mfl024, Mfl025, Mfl026 L-lactate, acetate Mfl223, Mfl640, Mfl642, Mfl105, Mfl349 glyceraldehyde-3-phosphate Mfl254, Mfl180, Mfl514, Mfl174, Mfl644, Mfl200, Mfl504, Mfl578, Mfl577, Mfl502, Mfl120, Mfl468, Mfl175, Mfl259 Mfl039, Mfl040, Mfl041, Mfl042, Mfl043, Mfl044, Mfl596, Mfl281 Lipid Synthesis unknown substrate transporters Mfl046, Mfl052 Mfl384, Mfl593, fatty acid/lipid transporter ribose-5-phosphate acetyl-CoA Mfl230, Mfl382, Mfl286, Mfl663, Mfl465, Mfl626 Mfl590, Mfl591 Mfl325,Mfl482 Mfl099, Mfl474,Mfl315, x13+ PRPP cardiolipin/ phospholipids membrane synthesis Purine/Pyrimidine Salvage phospholipid membrane Identified Metabolic Pathways in Mesoplasma florum Mfl074, Mfl075, Mfl276, Mfl665, Mfl463, Mfl144, Mfl342, Mfl343, Mfl419, Mfl676, Mfl635, Mfl119, Mfl107, Mfl679, Mfl306, Mfl648, Mfl170, Mfl195, Mfl372 Mfl076, Mfl121, Mfl639, Mfl528, Mfl530, Mfl529, Mfl547, Mfl375 Mfl143, Mfl466, Mfl198, Mfl556, Mfl385 niacin? Mfl038, Mfl065, Mfl063, Mfl388 xanthine/uracil permease Pyridine Nucleotide Cycling variable surface lipoproteins Mfl413, Mfl658 Mfl444, Mfl446, Mfl451 Mfl340, Mfl373, Mfl521, Mfl588 Mfl583, Mfl288, Mfl002, Mfl678, Mfl675, Mfl582, Mfl055, Mfl328 Mfl150, Mfl598, Mfl597, Mfl270, Mfl649 hypothetical lipoproteins DNA Polymerase RNA Polymerase x22 competence/ DNA transport Mfl047, Mfl048, Mfl475 Electron Carrier Pathways DNA RNA K+, Na+ transporter Mfl027, Mfl369 Flavin Synthesis Nfl289, Mfl037, Mfl653, Mfl193 Mfl064, Mfl178 NAD+ Mfl165, Mfl166 ribosomal RNA transfer RNA degradation FMN, FAD Mfl193 Mfl541, Mfl005, Mfl647, Mfl258, Mfl329, Mfl374, Mfl563, Mfl548, Mfl088, Mfl231, Mfl209 Mfl282, Mfl387, Mfl682, Mfl014, Mfl196,Mfl156, Mfl029, Mfl412, Mfl540, Mfl673, Mfl077, rnpRNA Mfl283, Mfl334 malate transporter? hypothetical transmembrane proteins NADP Mfl378 x57 Ribosome metal ion transporter Signal Recognition Particle (SRP) riboflavin? Mfl356, Mfl496, Mfl217 messenger RNA tRNA aminoacylation NADPH NADH protein secretion (ftsY) srpRNA, Mfl479 23sRNA, 16sRNA, 5sRNA, Mfl122, Mfl149, Mfl624, Mfl148, Mfl136, Mfl284, Mfl542, Mfl132, Mfl082, Mfl127, Mfl561, Mfl368.1, Mfl362.1, Mfl129, Mfl586, Mfl140, Mfl080, Mfl623, Mfl137, Mfl492, Mfl406 Mfl608, Mfl602, Mfl609, Mfl493, Mfl133, Mfl141, Mfl130, Mfl151, Mfl139, Mfl539, Mfl126, Mfl190, Mfl441, Mfl128, Mfl125, Mfl134, Mfl439, Mfl227, Mfl131, Mfl123, Mfl638, Mfl396, Mfl089, Mfl380, Mfl682.1, Mfl189, Mfl147, Mfl124, Mfl135, Mfl138, Mfl601, Mfl083, Mfl294, Mfl440? cobalt ABC transporter Mfl237 Mfl152, Mfl153, Mfl154 proteins Formyl-THF Synthesis Export Mfl613, Mfl554, Mfl480, Mfl087, Mfl651, Mfl268, Mfl366, Mfl389, Mfl490, Mfl030, Mfl036, Mfl399, Mfl398, Mfl589, Mfl017, Mfl476, Mfl177, Mfl192, Mfl587, Mfl355 Mfl086, Mfl162, Mfl163, Mfl161 phosphonate ABC transporter met-tRNA formylation Mfl571, Mfl572 Mfl060, Mfl167, Mfl383, Mfl250 protein translocation complex (Sec) Mfl057, Mfl068, Mfl142,Mfl090, Mfl275 Mfl409, Mfl569 phosphate ABC transporter Mfl233, Mfl234, Mfl235 degradation THF? Mfl186 formate/nitrate transporter amino acids intraconversion? Mfl094, Mfl095, Mfl096, Mfl097, Mfl098 Mfl418, Mfl404, Mfl241, Mfl287, Mfl659, Mfl263, Mfl402, Mfl484, Mfl494, Mfl210, tmRNA Mfl509, Mfl510, Mfl511 spermidine/putrescine ABC transporter oligopeptide ABC transporter Mfl016, Mfl664 putrescine/ornithine APC transporter Mfl015 Mfl182, Mfl183, Mfl184 Mfl019 Mfl605 arginine/ornithine antiporter Mfl557 Mfl652 unknown amino acid ABC transporter glutamine ABC transporter lysine APC transporter alanine/Na+ symporter glutamate/Na+ symporter Amino Acid Transport

20 How Simple is this? Missing cell wall, outer membrane
Missing TCA cycle Missing amino acid synthesis Missing fatty acid synthesis One sigma factor Small number of dna binding proteins One insertion sequence, probably not active One restriction system (Sau3AI-like) CTG/CAG methylation (function?) Evidence for shared protein function MDH/LDH (Pollack 97 Crit rev microbiol 23:269)

21 Proteome Collaboration with Steve Tannenbaum / Yingwu Wang
2-D gels + MS spot ID LC/LC/MS/MS ID of trypsin digests

22 Proteome Results 180 spots picked and analyzed
Mudpit LC/LC/MS/MS also carried out 369 proteins identified by trypsin digestion and mass spec out of 682 annotated coding regions Transcription of 16S ribosomal RNA Stops don’t always stop Instead they cause frame shifts into other frames

23 Transposome insertions
Engineered tetM tetracycline resistance gene Promoter from Tn4001 tetM gene Outward directed primers for insertion site verification Unique I-SceI cut site In vitro binding of Tn5 transposase Electroporation of Tn5 transposome Selection with tetracycline Genomic DNA prep Cut with MboI frequent cutter & religate PCR with outward directed primers Sequence to identify insertion site Locate disrupted genes Alternatively: directly sequence from genomic DNA

24 Custom Transposon

25 Tn5 Transposomes Transposon design issues Codon usage Promoter design
Restriction site avoidance Cell transformation Electroporation voltage Selection medium

26 Transposome insertion events
2700 currently picked, saved, and sequenced 337 Essential Genes + 29 tRNA + 7 essential RNA genes Most are unsurprising surface lipoproteins and “unknown function” Some surprises: inessential ftsZ, mreB, many ribosomal & tRNA modification proteins, But the Sau3AI homologous restriction system appears essential About 80 unknown function genes (many GTPases) are essential Compare with Dybvig08 results on M. arthritidis French08 results on M. pulmonis Glass06 results on M. genitalium Ordered library of cells with 330 inactivated genes

27 Functional Categories of Essential Genes
Category Number DNA Replication 22 Cell Division 5 Transcription 12 Nucleotide Transport & metabolism 20 Protein translation 112 Post translational modification 6 Protein secretion 7 Lipid metabolism Coenzyme metabolism Energy production 11 Transporters 35

28 Transposome insertion events

29 Essential is not absolute
Multi-copy genes are not identified as essential NADH oxidase Acyl carrier protein Essentiality is defined by the culture conditions Genes with stability and reliability function are marked as dispensable DNA repair Chaparones Some RNA modification enzymes This is a much more important effect in larger genomes

30 Next in Analysis and Tools
Genome re-engineering with knock-in/knock-out Resequencing Whole cell metabolic models Plug and play modules for additional function Biosafety issues

31 Genome re-engineering tools
Plasmid: S. citri pSci2 PE protein based (Breton08) J70302 registry part, under test now recET recombination system (S. citri recT gene) J70007 recT part, DNA available, being mutated Chloramphenicol resistance gene cassette PheS mutant gene cassette Phase 1: Turn on recombination Insert PheS/cat cassette in the target location Select with Chloramphenicol Phase 2: Insert final modification Select with p-chlorophenylalanine Result: seamless editing of the chromosome

32 Resequencing Illumina sequencing is cheap and very high throughput
Relatively straightforward with a pre-existing scaffold sequence We get millions of reads of limited length (35-70 bp) Paired ends, bp fragments Bar coded samples can multiplex the sequencing effort Allows many samples to be sequenced in a single run Resequence the Mesoplasma florum genome De novo sequence for sixteen additional strains Collection of Robert Whitcomb De novo sequencing for several closely related species Mesoplasma entomophilum Mesoplasma lactucae

33 Whole cell modeling Approximately 2000 chemical reactions
About 300 small molecule species Faster implementations of stochastic models Faster computers Comparison against reality Mass spec quantitation of metabolites

34 Open Cell Modules Energy sources Arginine vs. glucose
Photosynthesis pathway Citric acid cycle (reverse?) Amino acid synthesis Add unnatural AAs Nucleotide synthesis Lipid synthesis Cofactor synthesis Measurement structures Environmental niche Halobacterium Sulfur reducer Temperature optimum Membrane export / import Membrane structure Sensing of chemical environment Flagellar motion Light sensing Light production Cell cycle control (sporulation) Biosafety modules

35 Biosafety Barriers Codon isolation
CGG containing genes are unusable inside TGA containing genes are unusable outside Extend this idea with more codons Pairs of required essential nutrients Reduces likelihood of gradual evolution of workarounds Explicit “kill” switches Otherwise benign chemicals lethal to this organism Shared function with critical metabolism reduces drift

36 Recoding the genome of entire organisms
Engineered phage Natural Phage X X X Engineered organism No transfer: UGA not translated No transfer: CGG not translated X Natural organism

37 Kit Part the genome Make Biobrick parts from each gene, tRNA, promoter, other part-like genome element Develop techniques for recombining parts into coherent modules YAC editing and assembly, e.g. Lambda RED or RecET recombination Enable the bootstrapping of cells based on the redesigned genome Liposome fusion, e.g. Learn the design rules for chromosomes

38 Thanks to… Harold Morowitz Greg Fournier Gail Gasparich Bob Whitcomb
Eric Lander Bruce Birren Nicole Stange-Thomann George Church Roger Brent Grant Jensen Yingwu Wang Samantha Burke PJ Steiner Nick Papadakis Ron Weiss Drew Endy Randy Rettberg Austin Che Reshma Shetty MIT Synthetic biology working group DARPA, NTT, NSF, Microsoft Colleagues at Ginkgo Bioworks

39 Thank you for your attention

40 Our Plan Completely understand a simple organism
Build excellent models and predictive tools Simplify the organism further Remove inessential genes Replace dual function genes with single function equivalents Abstract useful modules from other living systems Understand and create good models for these modules Selectively add these modules to the existing simple cell The code’s 4 billion years old; it’s time for a rewrite

41 The Mollicute Bibliome
Complete collection of mycoplasma related papers: 6,411 and counting Books and book chapters also Endnote file: mycoplasmas.enl Downloaded .pdfs for articles > 1995 Scanned articles and books, OCR Collaboration for “shallow semantic” understanding people.csail.mit.edu/tk/mfpapers/ user=meso, pass=meso


Download ppt "Life with four billion atoms Tom Knight Ginkgo Bioworks."

Similar presentations


Ads by Google