Presentation on theme: "PacBio Meets the Microbiome George Weinstock PacBio Users Group Meeting September 18, 2013."— Presentation transcript:
PacBio Meets the Microbiome George Weinstock PacBio Users Group Meeting September 18, 2013
Diverse interest in medical metagenomics Acne Antibiotics, gut microbiome, and obesity Antibiotic resistance Asthma, allergies Acute RSV infection Vitamin D Bacterial vaginosis Cancer microbiomes Conjunctiva – trachoma microbiome Crohn's disease Cystic fibrosis Diabetes Oral microbiome Skin microbiome Dietary effects on gut microbiome Fecal transplant HIV and lung microbiome Infection control C. difficile VRE MRSA E. coli O157 H7 NICU bacteremia Intestinal fat uptake Necrotizing enterocolitis Non-Alcoholic Fatty Liver Disease Oral microbiome Periodontitis Caries Parasitic infection and the microbiome Post-transplant Lymphoproliferative Disorders Pre-term birth Maternal microbiome Vitamin D Respiratory microbiome Influenza infection Pre-term babies Childhood vaccination Sepsis ICU NICU Short-bowel syndromes Urethritis Virus discovery Kawasaki Disease Fever of unknown origin in children Transplantation: CMV, BK Immuno-suppression/-compromised
Approaches to study the microbiome Bacteria Viruses Fungi Yeasts Protists Enzymes
Major “enterotypes” of the stool biomes women men St. Louis Houston BMI>=30 BMI<25 NA not hispanic/ latino/spanish Hispanic /latino/spanish 25 <= BMI <30 Studying communities - 16S rRNA genes Each row a different sample Histograms of genera in each sample Bacteroides Prevotella Ruminococcus
Some Metagenomic Effects Community Structure e.g. content; ecological parameters (biodiversity) Specific Organism e.g. C. difficile Multiple Specific Organisms beneficial ↓ detrimental ↑ Genes or Pathways e.g. lactic acid
SubjectClinical findingsC. difficileCampylobacterSalmonella A C dif + high TcdB + Noro II CT 24 7.150.640.00 B NC (SE meds?) 0.020.00 C C dif + high TcdBc+ Sapo CT 35 45.400.00 D NC IBD 0.00 0.01 E C dif +low TcdB 0.900.00 F NC Campy + Sapo CT 34 0.006.240.00 G C dif +low TcdB 0.100.000.01 H NC ? 0.000.050.00 I NC Salmonella 0.050.005.02 J C dif + average TcdB 2.120.022.58 Pathogen relative abundance in clinical samples 16S read abundance
The bacterial 16S rRNA gene (ssu) Evaluation of 16S rDNA-based community profiling for human microbiome research. Jumpstart Consortium Human Microbiome Project Data Generation Working Group. PLoS One. 2012;7(6):e39315.
Trends in 16S rRNA gene sequencing Full-length Sanger sequencing PCR => clone => sequence All 9 hypervariable regions 1/3-length 454 sequencing PCR 500bp regions => sequence 2-4 hypervariable regions 1/10-length Illumina sequencing PCR 500bp regions => sequence 1 hypervariable region Expensive Time-consuming Accurate taxa ID Inexpensive High-throughput Less accurate taxa ID Very cheap Very high-throughput Less accurate taxa ID Full-length PB PCR => sequence 9 hypervariable regions
Full-length 16S CCS sequencing on single organisms OrganismLength (reference or cluster) % identity Enterococcus faecalis 154399.6 Staphylococcus aureus 153799.9 Escherichia coli152899.7 Rhodobacter sphaeroides 145699.9
Large-scale single isolate typing Have ~8000 isolates (microtiter plates) from hospital Looking for unsequenced species from humans Need FL 16S in order to make a species call for typing 400 base reads from 454 do not give enough specificity Each well has one strain Sanger seq’ing of FL PCR products => single sequence w/o cloning Can PacBio compete: cheaper, higher throughput? Goal: Find what species these isolates are Choose novel isolates Perform WG sequencing
Large-scale single isolate typing with PacBio Sanger: do not see alleles of multiple 16S genes/strain PacBio: can see different alleles since single molecule Hospital isolates (82): 70 samples agree between Sanger and PacBio 4 samples have minor species seen with both platforms 5 samples have strain differences seen with PacBio, not w Sanger 2 samples failed with Sanger, not w PacBio 1 sample disagreement 7 DNA sample controls agree between platforms 4 known culture sample controls agree between platforms PacBio: can see low level contaminants 99% agreement between Sanger and PacBio (90/91) Only 1 disagreement between the platforms More information from PacBio
Cost is an issue With 96 samples/1 SMRT cell, the fully loaded cost of PacBio is about 2x Sanger. SMRT cell Sequencing reagents Library kit and labor Instrument Computation (storage, labor, cpu) Would need to pool more samples/SMRT cell Need more bar codes
Sequencing communities of microbes en masse 16S rRNA gene sequencing for community profiling Full-length gives species-level definition 454 500bp reads give genus-level definition Shotgun sequencing Longer reads give better assembly (of unknown uncultured) Bacteria, viruses, fungi and other eukaryotes described
Simulated community 16S sequencing A mock community of 24 species Only 22 amplified with the primers used Organisms range over 300-fold in abundance Make 4 different batches Aim for 5000 sequences/sample (454 protocol) Pool 1Pool 2Pool 3Pool 4 Reads after filtering 35575055103319798 Species found202122 % reads hitting species 99.9 92.090.8
Mock community analysis with Sanger, 454 Evaluation of 16S rDNA-based community profiling for human microbiome research. Jumpstart Consortium Human Microbiome Project Data Generation Working Group. PLoS One. 2012;7(6):e39315.
Consistent recognition of an organism in the pool for 4 replicate 16S amplifications: 300-fold difference in prevalence of 16S genes for separate organisms in the pool Methanobrevibacter (an archaea) and Collinsella do not amplify with 16S primers utilized.
INFECTION Sample Culture single species Metagenomic analysis (culture-independent) Strains/Subspecies based on SNP/indel content Strains/Subspecies based on gene content Species present Variants of a species WGS Assembly Annotation 16S Assembly, Annotation WGS Alignment Strains/Subs pecies Replace culture-based analysis with metagenomic analysis Traditional culture-based analysis Genes of interest
Clinical Greg Storch, WU Susan Haake, UCLA Phil Tarr, WU Martin Blaser, NYU Barb Warner, WU Richard Hotchkiss, WU J. Dennis Fortenberry, Indiana U Scott Weiss, Harvard Ellen Li, SUNY-Stony Brook Katherine Gregory, Harvard Huiying Li, UCLA Catherine O’Brien, Toronto Brad Warner, WU Homer Twigg, Indiana U Many others Acknowledgments Washington University Genome Institute: Makedonka Mitreva Erica Sodergren Sahar Abubucker Karthik Kota John Martin Bruce Rosa Yanjiao Zhou Kristine Wylie Kathie Mihindukulasuriya Hongyu Gao Bill Shannon Patricio La Rosa Great Production & Informatics Teams Funding: NIH Gates Foundation Peer Bork Group Siegfried Schloissnig Manimozhiyan Arumugam Shinichi Sunagawa Julien Tap Ana Zhu Alison S. Waller Daniel R. Mende Shamil R. Sunyaev Thank you to the subjects and their families