Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biosimplicity: Engineering Simple Life Tom Knight MIT Computer Science and Artificial Intelligence Laboratory Simplicity is the ultimate sophistication.

Similar presentations


Presentation on theme: "Biosimplicity: Engineering Simple Life Tom Knight MIT Computer Science and Artificial Intelligence Laboratory Simplicity is the ultimate sophistication."— Presentation transcript:

1 Biosimplicity: Engineering Simple Life Tom Knight MIT Computer Science and Artificial Intelligence Laboratory Simplicity is the ultimate sophistication -- Leonardo da Vinci

2 Measuring Complexity Number of components Size of description Size of functional model Have you ever thought... about whatever man builds, that all of man's industrial efforts, all his calculations and computations, all the nights spent over working draughts and blueprints, invariably culminate in the production of a thing whose sole and guiding principle is the ultimate principle of simplicity? In any thing at all, perfection is finally attained, not when there is no longer anything to add, but when there is no longer anything to take away. -- Antoine de Sainte Exupery, in "Wind, Sand, and Stars"

3 Engineered Simple Organisms modular understood malleable low complexity Start with a simple existing organism Remove structure until failure Rationalize the infrastructure Learn new biology along the way The chassis and power supply for our computing

4 Some history… Confusion over what PPLO/Mycoplasma were  “The Microbe of pleuorpneumonia” Nocard 1896 1932 isolation of “PPLO.” Koch postulates. 1958 Klieneberger-Nobel identifies them as free living bacterial species Morowitz 1962 SciAm: “the smallest living cell” 1980 Gilbert effort to sequence M. capricolum 1982 Morowitz “complete understanding of life” 1996 Fraser et al. M. genitalium sequence 1999 Hutchison et al. Minimal genome set for M. genitalium

5 Relative Complexity 100 1K10K100K 1M 10M100M1G10G100G Gene Plasmid Mycoplasma genitalium (580 kB) Mesoplasma florum (793 kB) E. Coli (4.6 MB) S. Cerevisiae (12 MB) Human (3.3 GB) Lilly (12 GB) T7 Phage (36 kB) Log Genome Size, base pairs Alive Autotroph

6 From Giovannoni et al. 2005

7 Choosing an organism Safe  BSL-1 organism -- insect commensal Un-regulated  Not a crop plant or domesticated animal pathogen Fast growing  40 minute doubling time  vs. six hours for M. genitalium Convenient to work with  Facultative anaerobe Small genome Known sequence Complete annotation

8 The Mollicute Bibliome Complete collection of mycoplasma related papers: 6,418 and counting All books and book chapters also Endnote & Refworks Downloaded.pdfs for articles > 1995 Scanned articles and books, OCR with Abbyy Finereader Plans for a Google appliance search engine Collaboration for “shallow semantic” understanding people.csail.mit.edu/tk/mfpapers/ user=meso, pass=meso

9

10 Mesoplasma florum 1 u

11 Tomographic TEM Courtesy Jensen Lab, Caltech

12 Culture Medium 1161 Beef Heart infusion 4% Sucrose Fresh yeast culture broth 20% horse serum Penicillin Phenol red

13 Defined medium Sodium phosphate KCl Magnesium sulfate Glycerol Spermine Nicotinic acid Thiamine Riboflavin Pyridoxamine Thioctic acid Coenzyme A Amino acids (minus asp, glu) Guanine Uracil Thymine Adenine Glucose BSA Palmitic acid Oleic acid

14 Synthesis vs. Import Mycoplasma import virtually all small biochemical molecules Each import is done with a specific membrane protein – some are capable of importing a class Complexity is reduced if the import is simpler than the synthesis Example of the opposite:  Glutamine  Glutamic acid  Asparagine  Aspartic acid

15 Sequencing the genome First sequenced small portions of the genome to test that we had the correct species  Compared the results to Genbank entries  Sequenced PTS system gene, identical to reported sequence  Sequenced 16S rRNA (unreported)  Discovered identical to Mesoplasma entomophilum 16S rRNA sequence – probably the same species  Genbank entry Measured genome size with pulse field gels Sequenced 12% of genome to see what we were up against

16 PFGE of Mesoplasma genomic DNA 1.2% agarose 9C 6V/cm Ramped 90 - 120 sec 48 hours y yeast marker lambda marker me Mesoplasma entomophilum mf Mesoplasma florum ml Mesoplasma lactucae

17 I-CeuI digestion Special restriction enzymes cut only at 23S rRNA sites 5´..TAACTATAACGGTC CTAA^GGTAGCGA..3´ 3´..ATTGATATTGCCAG^GATT CCATCGCT..5´ Calculate the number of rRNA sites in the genome from the number of cut fragments

18 I-CeuI digests

19 rRNA sequences Degenerate primers The two rRNA sequences are different:

20

21 Library creation Randomly cut genomic DNA with EcoRI Shotgun cloned into pUC18 vector Sequenced the inserts (0.1 – 8 Kb) Sheared the genomic DNA with a needle End repaired Cloned into defective lambda phage vector Packaged vector into phage heads Infected E. coli cells with phage  ~ 40 Kb inserts

22 What we learned from partial sequence Almost all “old friends” Little or no extra junk Inter-gene sequences small (-4 to 30 bp) Little transcriptional control High AT vs. GC content – 27% average GC  20% in typical genes, even less in control regions  40% in rRNA regions

23 Origin of Replication

24 Whitehead Agreement Whitehead agreed in January 2002 to sequence the organism Estimated to take about two hours of time on their sequencers  “Sure, we can do it Tom, but what do we do with the rest of the day after the coffee break?” “How many other organisms like this are there?”  300 “Why don’t we sequence them all?” Good draft available November 2002 Gaps closed July 2003 -- final November 2003

25 Gap closure 9 gaps remained Long range PCR Primer walking One difficult sequence  Poly A region 16-17 bp long  Sequencing stuttered  Reprime with aaaaaaaaaaaaaaaag Repeat region 186 bp in surface lipoprotein  Give up on accurate sequence, PCR for length Final assembly verification

26 Genome characteristics 793281 base pairs 26.52% G + C 682 protein coding regions  UGA for tryptophan  No CGG codon or corresponding tRNA  Classic circular genome  oriC, terminator region, gene orientation 39 stable RNAs  29 tRNAs  2x 16S, 23S, 5S  RNAse-P, tmRNA, SRP

27 Standard Motifs -10 present usually very highly conserved  Often preceded by a “TG” 1-2 bp upstream Seldom a conserved -35 region RBS is standard Shine-Dalgarno Alternate RBS matches complementary region of 16S rRNA UAACAACAU (Loechel 91) Standard stem-loop terminators with loop TTAA  6-8 poly T tail in forward direction dnaA box TTATCCACA Four ribo-box sequences (Thi, Ile, Val, Guanine)

28

29 Understand the metabolism Identify major metabolic pathways by finding critical genes coding for known enzymes Predict necessary enzymes which may not have been found Evaluate the list of unknown function genes for candidates Build the major metabolic pathway map of the organism Consider elimination of entire pathways

30 Mfl214, Mfl187 Mfl516, Mfl527, Mfl187 Mfl500Mfl669 Mfl009, Mfl033, Mfl318, Mfl312 Mfl666, Mfl667, Mfl668 Mfl023, Mfl024, Mfl025, Mfl026 ribose ABC transporter glucose sucrosetrehalosexylose unknown fructose sn-glycerol-3-phosphate ABC transporter Mfl254, Mfl180, Mfl514, Mfl174, Mfl644, Mfl200, Mfl504, Mfl578, Mfl577, Mfl502, Mfl120, Mfl468, Mfl175, Mfl259 Mfl039, Mfl040, Mfl041, Mfl042, Mfl043, Mfl044, Mfl596, Mfl281 Glycolysis Mfl497Mfl515, Mfl526Mfl499Mfl317?, Mfl313?? Mfl181 beta-glucoside Mfl009, Mfl011, Mfl012, Mfl425, Mfl615, Mfl034, Mfl617, Mfl430, Mfl313? PTS II System Mfl519, Mfl565 chitin degradation Mfl223, Mfl640, Mfl642, Mfl105, Mfl349 Pentose-Phosphate Pathway glyceraldehyde-3-phosphate Mfl619, Mfl431, Mfl426 Mfl074, Mfl075, Mfl276, Mfl665, Mfl463, Mfl144, Mfl342, Mfl343, Mfl170, Mfl195, Mfl372 Mfl419, Mfl676, Mfl635, Mfl119, Mfl107, Mfl679, Mfl306, Mfl648, Mfl143, Mfl466, Mfl198, Mfl556, Mfl385 Mfl076, Mfl121, Mfl639, Mfl528, Mfl530, Mfl529, Mfl547, Mfl375 Purine/Pyrimidine Salvage glucose-6-phosphate ribose-5-phosphate Mfl413, Mfl658 xanthine/uracil permease DNARNA Mfl027, Mfl369 competence/ DNA transport DNA Polymerase degradation RNA Polymerase Mfl047, Mfl048, Mfl475 Mfl237 protein translocation complex (Sec) protein secretion (ftsY) srpRNA, Mfl479 Signal Recognition Particle (SRP) Ribosome Export Mfl182, Mfl183, Mfl184 Mfl509, Mfl510, Mfl511 Mfl652Mfl557 Mfl605Mfl019 Mfl094, Mfl095, Mfl096, Mfl097, Mfl098 Mfl015 spermidine/putrescine ABC transporter unknown amino acid ABC transporter glutamine ABC transporter oligopeptide ABC transporter arginine/ornithine antiporter lysine APC transporter alanine/Na+ symporter glutamate/Na+ symporter Mfl016, Mfl664 putrescine/ornithine APC transporter 23sRNA, 16sRNA, 5sRNA, Mfl122, Mfl149, Mfl624, Mfl148, Mfl136, Mfl284, Mfl542, Mfl132, Mfl082, Mfl127, Mfl561, Mfl368.1, Mfl362.1, Mfl129, Mfl586, Mfl140, Mfl080, Mfl623, Mfl137, Mfl492, Mfl406 Mfl608, Mfl602, Mfl609, Mfl493, Mfl133, Mfl141, Mfl130, Mfl151, Mfl139, Mfl539, Mfl126, Mfl190, Mfl441, Mfl128, Mfl125, Mfl134, Mfl439, Mfl227, Mfl131, Mfl123, Mfl638, Mfl396, Mfl089, Mfl380, Mfl682.1, Mfl189, Mfl147, Mfl124, Mfl135, Mfl138, Mfl601, Mfl083, Mfl294, Mfl440? proteins degradation Mfl418, Mfl404, Mfl241, Mfl287, Mfl659, Mfl263, Mfl402, Mfl484, Mfl494, Mfl210, tmRNA tRNA aminoacylation ribosomal RNAtransfer RNA messenger RNA Mfl029, Mfl412, Mfl540, Mfl014, Mfl196,Mfl156, Mfl282, Mfl387, Mfl682, Mfl673, Mfl077, rnpRNA Mfl563, Mfl548, Mfl088, Mfl258, Mfl329, Mfl374, Mfl541, Mfl005, Mfl647, Mfl231, Mfl209 Mfl613, Mfl554, Mfl480, Mfl087, Mfl651, Mfl268, Mfl366, Mfl389, Mfl490, Mfl030, Mfl036, Mfl399, Mfl398, Mfl589, Mfl017, Mfl476, Mfl177, Mfl192, Mfl587, Mfl355 Mfl086, Mfl162, Mfl163, Mfl161 amino acids Amino Acid Transport intraconversion? Mfl590, Mfl591 Lipid Synthesis Mfl230, Mfl382, Mfl286, Mfl663, Mfl465, Mfl626 fatty acid/lipid transporter Identified Metabolic Pathways in Mesoplasma florum Mfl384, Mfl593, Mfl046, Mfl052 L-lactate, acetate Mfl099, Mfl474,Mfl315, Mfl325,Mfl482 cardiolipin/ phospholipids membrane synthesis x22 Mfl444, Mfl446, Mfl451 variable surface lipoproteins hypothetical lipoproteins phospholipid membrane Mfl063, Mfl065, Mfl038, Mfl388 Mfl186 formate/nitrate transporter Mfl060, Mfl167, Mfl383, Mfl250 Formyl-THF Synthesis THF? x57 hypothetical transmembrane proteins met-tRNA formylation Mfl409, Mfl569 Mfl152, Mfl153, Mfl154 Mfl233, Mfl234, Mfl235 Mfl571, Mfl572 Mfl356, Mfl496, Mfl217 Mfl064, Mfl178 Nfl289, Mfl037, Mfl653, Mfl193 Mfl109, Mfl110, Mfl111, Mfl112, Mfl113, Mfl114, Mfl115, Mfl116 ATP Synthase Complex ATPADP phosphate ABC transporter phosphonate ABC transporter metal ion transporter Mfl583, Mfl288, Mfl002, Mfl678, Mfl675, Mfl582, Mfl055, Mfl328 Mfl150, Mfl598, Mfl597, Mfl270, Mfl649 acetyl-CoA cobalt ABC transporter Mfl165, Mfl166 K+, Na+ transporter Mfl378 malate transporter? Mfl340, Mfl373, Mfl521, Mfl588 Pyridine Nucleotide Cycling NAD+ Electron Carrier Pathways NADHNADPH NADP Flavin Synthesis riboflavin? FMN, FAD Mfl283, Mfl334 Mfl193 Mfl057, Mfl068, Mfl142,Mfl090, Mfl275 Mfl347, Mfl558 G. Fournier 02/23/04 x13+ unknown substrate transporters PRPP niacin?

31 How Simple is this? Missing cell wall, outer membrane Missing TCA cycle Missing amino acid synthesis Missing fatty acid synthesis One sigma factor Small number of dna binding proteins One insertion sequence, probably not active One restriction system (Sau3AI-like) CTG/CAG methylation (function?) Evidence for shared protein function  MDH/LDH (Pollack 97 Crit rev microbiol 23:269)

32 Minimal is not always simple Shared function of parts Overlapped genes Tradeoff of import vs. synthesis Example:  Television set design  Shared deflection coil, high voltage power supply, isolated filament supply  Three functions, a single circuit, a difficult engineering, modeling, debugging, and repair task how many genes have multiple functions

33 DNA Methylation Bisulfite conversion of genomic DNA Sequencing of converted DNA to identify methylated C positions Results: GATC sequences, as expected Unexpected: CAG and CTG sequences

34 Current work Array experiments  Close species  Transcriptional units, pseudo-genes  TRASH Protein species by LC/LC/MS/MS Elimination of the restriction system Plasmid system  pBG7AU based Recombination system  Positive/negative selection Yeast chromosome transfer Genome edits to reduce size Genome edits to modularize Genome edits to eliminate complexity Use as a construction chassis

35 Reconstruct the Genome Use recombination techniques to edit the genome Eliminate unnecessary genes Remove overlaps Standardize promoters, ribosomal binding sites Identify transcriptional and translational regulators Recode proteins to use a reduced portion of the coding space The code is 4 billion years old, it’s time for a rewrite

36 YAC mutagenesis Bring up YAC technology  Spheroplasts, old YAC plasmid sequencing Triple transform with MF chromosome and  pRML1 (Spencer 92)  pRML2  Genome inactive except for ARS, telomeres, selection markers Use yeast recombination systems for genome editing Isolate and recircularize YAC to form a new genome Lipid encapsulate genome into vesicles Fuse vesicles with genome-killed wild type cells

37 Proteome Collaboration with Steve Tannenbaum / Yingwu Wang 2-D gels + MS spot ID LC/LC/MS/MS ID of trypsin digests

38 Riboswitch Analysis Collaboration with Ron Breaker / Adam Roth Discovery of unique riboswitches specific for GTP rather than dGTP Found in no other sequenced genomes Analysis of close relatives under way

39 Engineer plasmids No known plasmids for this class of organism Renaudin has made plasmids using the chromosomal OriC as the replicative element  Lartigue 03 M. mycoides has pADB201 and pBG7AU rolling circle plasmids similar to pE194 (1082 bp) We know the antibiotic sensitivities and have working resistance genes

40 Kit Part the genome Make Biobrick parts from each gene, tRNA, promoter, other part-like genome element Attempt to develop techniques for recombining parts into coherent modules Develop techniques for assembling and modeling the resulting structures

41 Thanks to… Harold Morowitz Greg Fournier Gail Gasparich Bob Whitcomb Eric Lander Bruce Birren Nicole Stange-Thomann George Church Roger Brent Grant Jensen Yingwu Wang Nick Papadakis Ron Weiss Drew Endy Randy Rettberg Austin Che Reshma Shetty MIT Synthetic biology working group DARPA, NTT, NSF, Microsoft

42 Synthetic Biology An alternative to understanding complexity is to remove it This complements rather than replaces standard approaches Engineering synthetic constructs will be easier  Enabling quicker more facile experiments  Enabling deeper understanding of the basic mechanisms  Enabling applications in nanotechnology, medicine and agriculture Simplicity is the ultimate sophistication -- Leonardo da Vinci


Download ppt "Biosimplicity: Engineering Simple Life Tom Knight MIT Computer Science and Artificial Intelligence Laboratory Simplicity is the ultimate sophistication."

Similar presentations


Ads by Google