Presentation on theme: "What Have We Learned From Unicellular Genomes?"— Presentation transcript:
1 What Have We Learned From Unicellular Genomes? Propionibacterium acnesBacteroides thetaiotaomicronMycoplasma genitaliumMimivirusCyanobacteriaPlasmodiumYeast
2 Why do I get so many pimples? The genome of Propionibacterium acnes was sequenced in July of 2004.P. acnes lives in sebaceous cysts and sometimes stimulates and immune response.A group in Paris, along with two groups in Germany sequenced P. acnes.They found 2,333 genes in its 2.6 Mb genome.68% of these had orthologs in other species.20% had none, and 12% encoded only RNA.
3 Anatomy of a pimple Figure 2.9 Anatomy of a pimple. a) A pimple is caused by a blocked skin pore through which a hair should emerge. b) Bacteria in the pore begin to grow and obstruct the secretion of sebum. c) When blocked, the associated sebaceous glands continue to secrete sebum and the follicle pore enlarges and becomes inflamed.
4 Figure 2.10 Map of P. acnes genome. The length of the chromosome is shown on the outside circle. Numbers with the format X e Y = X·10y. Genes are illustrated in yellow or green to indicate the two strands and orientations. “Alien” gene clusters (red) indicate atypical use of codons. GC content (blue) increased (pointing outward) or decreased (inward) relative to the whole-genome average value of 60%. The origin of replication is predicted to occur at 0 where the GC skew is greatest and the gene DnaA is located (not shown).
5 Genome-wide evaluations A first step following bacterial genome sequencing is finding the ori and terminus for replication.GC skewing (non-uniform distribution of G’s & C’sOris tend to have the lowest skew, while termini have the highest.Genes that have originated by horizontal transfer are identified using a sliding window to find segments with abnormal GC content.Codon bias is also used to detect HT. Immunogenic and metabolic genes were detected.
6 Transcriptional Phase Variation During finishing, it was found that P. acnes had a variable # of G’s associated with some genes.It is hypothesized that the initiation of transcription depends on the # of consecutive G’s.As rows of G’s are replicated, the # will change.This leads to a mixed population of bacteria with varying degrees of protein production.This diverse population is optimized to respond differentially to various skin treatments.
7 Figure 2.11 Polycytosine chromats reveal transcription regulatory mechanism. Four sequencing runs of the noncoding strand for gene PPA1880. Each chromat represents a different clone in the genome shotgun plasmid library made from a population of cells. The number of cytosines (determined by the number of guanines on the complementary strand) increases from top to bottom: a = 12, b = 13, c = 14, and d = 16.
8 Figure 2.11 Polycytosine chromats reveal transcription regulatory mechanism. Four sequencing runs of the noncoding strand for gene PPA1880. Each chromat represents a different clone in the genome shotgun plasmid library made from a population of cells. The number of cytosines (determined by the number of guanines on the complementary strand) increases from top to bottom: a = 12, b = 13, c = 14, and d = 16.
9 Digesting Our Cells For Food P. acnes was found to be able to grow anaerobically as well as aerobically.Cells produce many enzymes that are able to degrade lipids, ester, and amino acids.Some of these degradation products increase adhesion to our cells.Many of the digestive enzymes contain a motif (LPXTG) that targets them to the cell wall.Hyaluronate lyase is also found on the surface of the bacteria, this destroys the extracellular matrix that binds our cells together.
10 Figure 2.12 Activity of P. acnes leading to pimple formation. Skin keratinocytes lining the pore are digested by enzymes secreted from the centrally located bacterium. Additional proteins secreted by the bacterium induce an immune response that contributes to the redness, swelling, and soreness. HSP= heat-shock protein; SP= surface-associated protein; PTRP= proline-threonine repetitive protein; Ag = antigen.
11 Stimulating the Immune Response P. acnes produces 5 CAMP factors (secreted proteins that bind antibodies) that can form pores in the cell membrane.A dipeptide motif (PT) is present in certain proteins, this motif is also found in M. tuberculosis.The bacteria also has at least 7 heat shock protein genes.Porphyrin is also secreted, which produces toxic forms of oxygen, further stimulating the immune response.
12 Withstanding the Environment P. acnes can signal nearby cells that something has changed in the environment.Sensors called two-component systems (1 to sense & 1 to signal) exist in some bacteria, P. acnes has 10 pairs.Quorum sensing is the ability to detect conditions of overcrowding. The LuxS gene is expressed in these instances, which produces a universal signal for interspecies communication among bacteria.Biofilms of meshed-together cells protect themselves.
13 Are all bacteria living in us bad for us? An average human body is composed of about 1013 cells.Our intestines have about 1010 microbes/ml and contain at least 1,000 ml.A majority of the cells in our bodies may be bacteria! ( ,000 different species)This accounts for 2-4 million non-human genesBacteroides thetaiotaomicron constitutes a substantial portion of our intestinal flora.A group from Wash. U. in St. Louis sequenced it’s genome.
14 Figure 2.13 B. thetaiotaomicron genome map. The lengths of the circles are indicated on the outside with genes on the leading (outer) and lagging (inner) strands color-coded by COG functional category. GC content (black) increased (pointing outward) or decreased (inward) relative to the whole-genome average. GC skew (two-toned inner graph) is distinctive in this species; the origin of replication is indicated by the arrow near base 1 million.
15 Overview of the GenomeB. thetaiotaomicron’s genome contains 6.3 Mb, as well as 4,779 genes (and a 33 kb plasmid).58% of ORFs have known function, 18% have orthologs of no known function, and 24% have no homology with known proteins.COGs (functional categories of genes) are determined following sequencing to create an overview of a given genome.Many of the genes specialize in sugar uptake, cell wall synthesis, environmental sensing and signaling, as well as transposition.
16 Major COGsSugar metabolism- 170 genes fit into this category, most bacteria have a set of 23.61% of these appear to be secreted, this not only benefits other bacteria but us as well.163 paralogs of 2 genes (SusC & SusD) import sugars into the cytoplasm of the microbe.Many two-component genes are present for signaling, some of these interact with s factors.63 tranposons are present, which may help spread antibiotic resistance.
17 Figure 2.14 B. thetaiotaomicron has unusually large ORFs. Average coding sequences (CDS) are displayed as bar graphs for prokaryotes with widely distributed genome sizes (purple diamonds). CDS size does not correlate with genome size; B. thetaiotaomicron has the largest average CDS size.
18 Does Size Matter? Summary The coding capacity for this genome is very high (89% coding DNA) but it has a lower ratio of gene # to genome size than expected.This was a paradox until it was determined that the ORFs of this microbe are unusually large. It is unclear why this is the case.SummaryGut symbionts provide us with predigested sugars, stimulated blood vessel formation, crowd out pathogens, sequester limited resources, and stimulate our mucosal layer.
19 Can Microbial Genomes Become Dependent Upon Us? In the microbial world, if you don’t use it- you lose it.Mycoplasma genitalium has one of the most reduced microbial genomes and the 2nd smallest bacterial genome with 580 kb (the smallest is N. equitans with 490 kb).TIGR sequenced its genome in 1995.470 ORFs were found, 96 of which have no known orthologs.M. genitalium has an 88% coding capacity.
20 Figure 2.15 M. genitalium genome map. The circular chromosome is depicted linearly for clarity of genes, which are color-coded by functional category; the direction of transcription is indicated by the arrow shape of genes. Note that genes on two ends of the genome tend to transcribe toward the center. Each line of the genome is about 24 kb and predicted genes are numbered from 1 to 470. MgPa is an especially abundant adhesion protein gene.
21 Genes that have been lost: M. genitalium has presumably lost many genes involved in the synthesis of amino acids, cofactors, cell envelope, and regulatory factors. It has only 1 s factor.The microbe has retained genes for energy metabolism, fatty acid and phospholipid metabolism, nucleotide production, replication, transcription, and protein transport.The only category overrepresented is translation, namely rRNA and tRNA genes.
23 What is the Minimum # of Genes? Craig Venter, along with Hamilton O. Smith, is trying to construct an organism with the fewest possible genes.A new field called synthetic biology seeks to synthesize a functioning genome de novo.A better understanding of evolutionary principles and genome circuitry is sought.Japanese & European scientists have tried to identify the essential genes of B. subtilis.They have found that only 192 genes are indispensable to life.
24 Do all Viruses have Small Genomes? Most viral genomes are much smaller than bacterial ones:HIV- 9,200 ntWNV- 10,962 ntSARs- 29,727 ntT7- 39,900 ntl- 48,502 ntIn 2003, a new virus that infects amoeba was isolated that has 1.2 Mb! A group in Marseille, France sequenced Mimivirus, as it is called.
25 Figure 2.16 Mimivirus genome map. The length of chromosome is indicated on the outside with genes on the leading (outer) and lagging (inner) strands color-coded by COG functional category. Red arrows indicate location and direction of tRNA genes. AT content (blue) increased relative to the whole-genome average of 72%, with a peak near 380 kb. Although sequenced as a linear chromosome, it may form a circle in vivo for at least part of its life cycle.
26 Mimivirus Genome1,262 ORFs were identified, the coding capacity is 90.5%.Like most viruses, the genome is linear, but it has inverted repeats at both ends by which it may circularize, perhaps during replication.Isoleucine is used twice as often as usual, and there is a strong codon bias for codons lacking G or C. The genome is 28% GC.Mimivirus is overrepresented in genes for translation, posttranslational modification, and amino acid transport and metabolism.
27 Is Mimivirus Alive?The genome of Mimivirus resembles bacterial, Mimivirus even stains Gram +, is it a virus?In 1957, the definition of a virus was proposed:1) smaller than .2 microns2) possesses DNA or RNA, not both3) not able to synthesize its own proteins4) cannot generate energy from substrates5) cannot grow by binary fissionMimivirus only satisfies the 4th category, we are not sure about the 5th.
28 Figure 2.17 Phylogenetic tree of Mimivirus with three domains of life. Seven universally conserved proteins totaling 3,164 amino acids were used to produce this unrooted tree. Bootstrap values are placed at the branch points (see Math Minute 3.3).
29 What is it then?Mimivirus has blurred the distinction between prokaryotes and viruses.It is hypothesized that, like M. genitalium, Mimivirus has lost genes over time.We will learn of more obligate intracellular parasites later in class.Mimivirus may resemble some of the earliest forms of life that was able to replicate independently until it became a parasite.
30 Genomes Reflect an Organism’s Ecological Niche Cyanobacteria are the most productive phytoplankton in the world.The two most abundant genera of cyano-bacteria are Prochlorococcus and Synecho-coccus. 3 genomes in the former group and 1 in the latter were sequenced in 2003.Individual cells from both genera are referred to using a numbering system to indicate different ecotypes. Species designations are difficult to assign still, Prochlorococcus was discovered in the 1990s.
31 Prochlorococcus Figure 2.18 Anatomy of Prochlorococcus. Transmission electron micrograph of unicellular prokaryotes with one cell caught in the act of cell division. Note the internal thylakoid membranes that are required for photosynthesis; some prokaryotes do have internal, membrane-bound structures.
32 Figure 2.19 Ecology of four distinctive cyanobacteria. Open ocean ecosystem with vertical gradients of light, nutrients, and temperature, as well as zones of mixing (top) and periodic upwelling of lower water layers (from bottom up). Nutrient labels represent the location of highest concentrations, but the nutrients are not restricted to indicated layers. Location of cyanobacteria names indicates relative depths where each is maximally concentrated.
34 Dot Plot Align-mentFigure 2.20 Global genome dot plot alignment of MED4 and MIT9313.Genes present in one genome but not the other are positioned on the appropriate axis. The broken-X pattern indicates numerous inversions, with the intersection located near the origins of replication.
35 Prochlorococcus MED4 vs. MIT9313 These ecotypes share 1,352 orthologs.Short diagonal segments indicate synteny.A negative slope indicates that the segment was inverted in one type relative to the other.Segments with positive slope but located off the diagonal indicate chromosome recombinations.Genes along the axis means they are missing from the other ecotype, MED4 has 364 genes not found in MIT9313, which has 923 genes not found in the other.
36 pcb gene familyA major difference between the ecotypes is in the pcb gene family, which encode chlorophyll-binding, light-harvesting antenna complex proteins that help capture a wider spectrum of light.MED4 (high light) has only 1 pcb geneMIT9313 (medium light) has 2 (A & B)SS120 (low light) has 8 (A-H)MED4’s gene does not respond to changes in Fe+3 but MIT9313’s is induced 7-fold and SS120’s is induced 23-fold.
37 MED4’s Small GenomeMED4’s genome is the smallest known for a photoautotroph and may represent the minimum for a photosynthetic organism.MED4 appears to have lost genes over time.A more stream-lined genome means a narrower ecological range that an organism is adapted for. Synechococcus has the largest genome of this group and the largest ecological range as well.People have proposed seeding the ocean with Fe+3 to help stimulate CO2 consumption.
38 Gene deletions in Cyanobacteria Figure 2.21 Deletion, acquisition, and rearrangement of nitrogen usage genes.MIT9313 lost 25 genes, including the nitrate/nitrite transporter (nrtP/napA), nitrate reductase (narB), and carbonic anhydrase. MED4 retained the cyanate transporter and cyanate lyase (cynS) at a different locus; these were lost from MIT9313, presumably after the two ecotypes diverged. MIT9313 retained nitrite reductase (nirA) and acquired a nitrite transporter. MED4 lost nirA but the urea transporter (urt cluster) and urease (ure cluster) genes relocated elsewhere in the MED4 genome (dotted line). Gene colors identify genes but do not indicate functional symbolism.
39 MalariaMalaria, although it rarely makes news headlines, is a daily threat to the 3 billion people who live in tropical climates.In 2002, about 500 million people were infected. About 2.7 million people die each year (about 90% of these are < 5 years old).The cause of malaria has been known for 100 years but we still can’t stop its spread.The most lethal form of malaria is caused by Plasmodium falciparum.
40 Lifecycle of Plasmodium Figure 2.22 Life cycle of Plasmodium.a) When an infected Anopheles mosquito bites a human,the sporozoite form of Plasmodium enters its new host.Sporozoites travel to the liver and mature to merozoites,which infect RBCs; pass through the trophozoite stage; and return to the merozoite stage to begin the cycle again. Some infected RBCs produce gametocytes that enter mosquitoes that ingest new blood; there the gametes fuse to form diploid zygotes that eventually emerge from the insect gut to produce sporozoites and begin the entire life cycle again. b) Many human RBCs are visible, some of which have been infected by Plasmodium. The stages are: 1= trophozoite; 2 = schizont; 3 = merozoite; 4 = gametocyte.
41 RBC InfectionThe most vulnerable time for Plasmodium is during the RBC infection stage.The parasite must force its way into a RBC without rupturing any plasma membranes.Three structures are important during infection:1) extracellular coating to make cells sticky2) apical end of cell must be oriented downward3) apicoplast is an internalized algal symbiont
42 Figure 2.23 Invasion of RBC by Plasmodium merozoite. The eukarotic Plasmodium has a nucleus, mitochondria, and a unique but vital organelle called an apicoplast. Infection takes place in three stages. a) The parasite binds loosely to the surface of the RBC. b) The merozoite orients itself to place the apical surface with specialized binding proteins directly adjacent to the RBC surface. c) Plasmodium pulls itself inside the RBC by myosin motors pulling on the merozoite proteins attached to the RBC surface proteins. Note that the internalized parasite does not cross the RBC plasma membrane; it surrounds itself with the host cell membrane.
43 Plasmodium GenomesPlasmodium actually has three genomes: nuclear, mitochondrial, and apicoplastic.Pulse-field gel electrophoresis to separate chromosomes, followed by shotgun genome sequencing was used on Plasmodium.This proved to be the most AT-rich genome sequenced so far (19.4% GC).The 22.9 Mb genome has 52.6% coding capacity and 5,268 ORFs (60% of which have no known function, the largest of any genome).
44 Tricking the Immune System The genes of Plasmodium that are responsible for binding to RBC’s and for avoiding the immune system are located near the telomeres of this eukaryote.Genes located near Plasmodium telomeres are replicated many times, all three gene families in these categories (var, rif, & stevor) are polymorphic.There are 59 var paralogs, 149 rif, and 28 stevor. This may account for our immune system’s lack of ability to deal with this parasite
45 The Plasmodium Proteome 1% of proteins are used for host cell invasion4% help evade the immune response31% are integral to the membrane14% are enzymes (about 4x < most proteomes)10% are transported to the apicoplast60% have unknown functionThe Krebs cycle is present, but the organismgrows anaerobically and only uses this cycle forheme biosynthesis (which it could get from us)
46 Apicoplast ProteomeSimilar to a chloroplast in origin but used for a different purpose now.Only two photosynthetic orthologs remain.This organelle synthesizes fatty acids, isoprenoids, and heme groups.Nuclear proteins sent here assist in DNA replication & repair, transcription, translation, posttranslational glycosylation, protein import, and protein degradation.
47 Comparing PlasmodiaThe Plasmodium sequencing project took 45 people 6 years to complete.At the same time, other groups were working on P. yoelii, which infects rats and is used as a model organism for malaria research.Unfortunately, this latter genome was never finished, making comparisons difficult.P. yoelii has 600 additional ORFs, and the two have 3,310 genes in common (56%).Is this similar enough to make a good model organism?
48 Malaria Treatment Options? Recently, a German & American team used reverse genetics (starting with a gene sequence and deducing its function) to target a gene in the production of a knock-out strain. This strain is expected to be less pathogenic than wild type. Mice injected with this strain were protected for 30 days.Even if a better drug were produced, funding and health care infrastructure are lacking in many problem areas. Very little $ is spent on malaria research.
49 Yeast Figure 2.25 Baker’s yeast Saccharomyces cerevisiae. a) Yeast is easily grown in the lab by streaking it onto nutrient-containing agar plates; single colonies are easily isolated. The strain in this photo is S288C, which is the same strain used in the genome sequencing project. b) Yeast cells grow rapidly through mitosis, as shown in this transmission electron micrograph.
50 Yeast Genome The S. cerevisiae genome was sequenced in 1996. It took over 600 scientists in Europe, North America, and Japan working together to seqeunce the 12 Mb genome.Yeast has a 70.3% coding capacity, higher than Plasmodium but lower than all bacteria.There is a gene every 2 kb in yeast, one every 6 kb in C. elegans, and one every 30 kb in humans. Eukaryotes have more junk DNA than prokaryotes and enhancers, promoters, and introns add substantially to the size of eukaryotic genes.
52 Chromosome Structure in Yeast The 4 smallest chromosomes in yeast have a unique structure. It was known from using YACs that chromosomes smaller that 150 kb were not stable in yeast. These chromosomes are relatively gene-poor and undergo recombination at high frequencies, perhaps to protect the larger ones from the same fate.Transcriptionally silent genes are found in the sub-telomeric regions of many chromosomes, this may help identify the right and left sides of a chromosome.
53 Yeast Chromosomes Figure 2.26 S. cerevisiae ideogram. Drawings of the 16 yeast chromosomes numbered with Roman numerals (I—XVI) plus the mitochondrial chromosome (MT). Centromeres are represented by small white gaps in the black chromosomes.
54 Evolutionary History of Yeast There were a substantial number of genes found in duplicate copies in yeast.It was proposed that yeast had undergone “duplication events” at some point in time.Many regions of chromosomes are syntenic with regions on other chromosomes. Such paralogs are seen as evolutionary experiments where one gene can drift to provide new specialized functions.Some genes were initially thought to be extra copies but experiments proved their difference
55 Predictions for the Future The authors of the landmark 1996 yeast sequencing publication made the following predictions:1) they described plans to produce a collection of single, double, and even triple KO mutations2) they addressed the value of making all genome sequences publicly available.3) They felt WGS sequencing of large genomes was not feasible.4) They looked forward to comparing yeast with the S. pombe as well as the human genome.
57 Better AnnotationA number of yeast genomes have been sequenced since With these, the need to annotate genes based on GO, Gene Ontology, became clear.Improvements in computers, search algorithms, and the increased volume of genes in the databases lead to better annotation.The original 5,885 ORFs annotated has been increased to 6,672, many below the original cutoff of 100 codons