Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metagenomics: From Bench to Data Analysis 19-23rd September 2016 16S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr.

Similar presentations


Presentation on theme: "Metagenomics: From Bench to Data Analysis 19-23rd September 2016 16S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr."— Presentation transcript:

1 Metagenomics: From Bench to Data Analysis 19-23rd September S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr Mark Alston Computational Biologist Organisms and Ecosystems Group

2 Outline Compare sequencing platforms and 16S rRNA regions
Amplicon choice amplicons vs. full-length rRNA sequencing Bias and quantification Comparison to WGS approaches

3 16S Microbial Community Profiling
16S rRNA gene sequence conserved (green) and hypervariable (blue) regions Most common phylogenetic marker ‘gold standard’ in molecular surveys of bacterial and archaeal diversity Pros ubiquitous, highly conserved, evolutionarily stable Cons often multiple copy, little resolution at/below species level

4 Comparing Different Platforms and Target Regions
‘A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling’ DOI: /s Compare sequencing platforms MiSeq (Illumina), Pacific Biosciences RSII 454 GS-FLX/+ (Roche) IonTorrent (Life Technologies) Compare target regions Assess performance via synthetic microbial communities mix gDNA from 49 bacterial and 10 archaeal species even / uneven distribution Summary of primers and platforms used

5 Ability of Different Platforms and Regions to Reconstruct the Synthetic Community
Even synthetic community Platform had a significant effect Species’ frequencies highly unbalanced Possible causes primer mismatches rRNA copy number amplification bias (associated with target length) Bacterial Species Target Region

6 How do Different rRNA Regions reflect Composition?
‘Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities’ DOI: / Synthetic Bacteria community Heat map represents accuracy ratio Perfect agreement has value of 1 underestimated abundance overestimated abundance

7 How do Different rRNA Regions reflect Composition?
‘Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities’ DOI: / Synthetic Bacteria community Heat map represents accuracy ratio Perfect agreement has value of 1 underestimated abundance overestimated abundance Regions suffer from substantial bias

8 Which Region Should I Choose?
16S rRNA gene sequence conserved (green) and hypervariable (blue) regions Most common approach V4, V3–V4 or V4–V5 primers on Illumina platforms ~ 250–430 bp read length e.g. 16S for V4 on MiSeq

9 Full-length vs. Amplicon 16S Sequencing
Factors affecting taxon abundance estimates and tree-placement Sequencing platform, primer choice, read length, environmental source, reference database, assignment method [or a combination] New technologies short reads sequence ~15-30 % of the full 16S rRNA gene more quantitative information reduced taxonomic resolution species level assignment can be elusive implications for inferring metabolic traits in various ecosystems

10 Full-length vs. Amplicon 16S Sequencing
Factors affecting taxon abundance estimates and tree-placement Sequencing platform, primer choice, read length, environmental source, reference database, assignment method [or a combination] New technologies short reads sequence ~15-30 % of the full 16S rRNA gene more quantitative information reduced taxonomic resolution species level assignment can be elusive implications for inferring metabolic traits in various ecosystems Use full-length 16S rRNA sequencing?

11 Full-length 16S rRNA Sequencing
PacBio long-read, single-molecule real-time (SMRT) technology average read lengths > 8 kb at ~ 87% read accuracy only been used for a few environmental surveys ‘High-resolution phylogenetic microbial community profiling’ DOI: /ismej MinION™ USB stick-sized device per-base sequencing accuracy ~85% for 2D reads additional read length helps resolve 16S rRNA to species level ‘Species level resolution of 16S rRNA gene amplicons sequenced through MinIONTM portable nanopore sequencer’ DOI: /s z

12 Full-length 16S rRNA Sequencing
PacBio long-read, single-molecule real-time (SMRT) technology average read lengths > 8 kb at ~ 87% read accuracy only been used for a few environmental surveys ‘High-resolution phylogenetic microbial community profiling’ DOI: /ismej MinION™ USB stick-sized device per-base sequencing accuracy ~85% for 2D reads additional read length helps resolve 16S rRNA to species level ‘Species level resolution of 16S rRNA gene amplicons sequenced through MinIONTM portable nanopore sequencer’ DOI: /s z

13 Full-length 16S rRNA Sequencing and Gene Variability
non-homogeneous distribution of mutations varies across different phylogenetic groups leads to both over- and underestimation of community diversity

14 Full-length 16S rRNA Sequencing and Gene Variability
non-homogeneous distribution of mutations varies across different phylogenetic groups leads to both over- and underestimation of community diversity

15 Full-length 16S rRNA Sequencing and Gene Variability
non-homogeneous distribution of mutations varies across different phylogenetic groups leads to both over- and underestimation of community diversity 2 Salmonella spp. 97.4% identical across gene 100% identical across V4 region Underestimate community diversity

16 Full-length 16S rRNA Sequencing and Gene Variability
non-homogeneous distribution of mutations varies across different phylogenetic groups leads to both over- and underestimation of community diversity Mutations accumulated in V4 region Overestimate community diversity

17 Compare FL vs. V4 [Sakinaw lake samples]
Community composition profile at genus level Colour pairs denote samples of the same depth Bubble sizes indicate read abundance

18 Compare FL vs. V4 [Sakinaw lake samples]
BUT it looks possible to make the same conclusions because there’s a lot of stuff in common! FL vs. V4 discrepancies highlighted by boxes e.g. Bacillus greatly underrepresented by V4 c.f. PB [50m samples] ‘High-resolution phylogenetic microbial community profiling’ DOI: /ismej

19 Platforms and Regions Suffer from Substantial Bias
The observed relative frequencies do not reflect the true species frequencies in the community

20 Platforms and Regions Suffer from Substantial Bias
The observed relative frequencies do not reflect the true species frequencies in the community

21 Platforms and Regions Suffer from Substantial Bias
The observed relative frequencies do not reflect the true species frequencies in the community But, the observed differences between samples could still reflect true differences Can we have a quantitative method despite the bias?

22 Can 16S rRNA Sequencing be Quantitative?
‘A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling’ DOI: /s Assembled 2 synthetic communities one with even distribution, one uneven Take pairs of samples Sequence on MiSeq and PacBio platforms

23 Can 16S rRNA Sequencing be Quantitative?
Compare for each species true ratio of frequencies [known mixtures] and observed ratio of frequencies Highly significant correlation between the two ratios [blue line] and a slope of 1 [red line] MiSeq Ratio of Observed Freq. PacBio Ratio of True Freq.

24 Can 16S rRNA Sequencing be Quantitative?
Compare for each species true ratio of frequencies [known mixtures] and observed ratio of frequencies Highly significant correlation between the two ratios [blue line] and a slope of 1 [red line] Implies 16S rRNA sequencing is strongly quantitative despite being biased MiSeq more quantitative than PacBio MiSeq Ratio of Observed Freq. PacBio Ratio of True Freq.

25 MiSeq more quantitative than PacBio
Species responsible for this difference? Which are more accurately quantified on one platform relative to the other? MiSeq Ratio of Observed Freq. PacBio Ratio of True Freq.

26 MiSeq vs. PacBio Species with significantly different quantification accuracies:

27 MiSeq vs. PacBio Species with significantly different quantification accuracies: MiSeq the better platform

28 MiSeq vs. PacBio Species with significantly different quantification accuracies: MiSeq the better platform Except for strain resolution Full-length 16S rRNA sequencing of benefit Shewanella baltica OS223 Shewanella baltica OS185

29 16S Microbial Community Profiling
16S rRNA gene sequence conserved (green) and hypervariable (blue) regions Most common approach V4, V3–V4 or V4–V5 primers on Illumina platforms ~ 250–430 bp read length Economy of scale single MiSeq run > 10 million reads High base-calling accuracy e.g. 16S for V4 on MiSeq

30 Compare Error Rates Across Platforms
Even synthetic community Platform had a significant effect MiSeq has the most accurate sequence reads

31 Impact of Overlapping Reads on MiSeq V4 Error Rates
Even synthetic community Overlapping forward and reverse reads greatly reduces errors MiSeq Dual Index barcode Illumina barcodes on both reads ‘stitched’ reads

32 Shotgun Metagenomics vs. Amplicon Sequencing
‘Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities’ DOI: / Compare amplicon sequencing to Illumina [HiSeq] and 454 metagenomics sequencing

33 Shotgun Metagenomics vs. Amplicon Sequencing
‘Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities’ DOI: / Compare amplicon sequencing to Illumina [HiSeq] and 454 metagenomics sequencing Metagenomic data tends to outperform amplicon sequencing

34 Shotgun Metagenomics vs. Amplicon Sequencing
‘A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling’ DOI: /s MiSeq MG sample expected Metagenome sample benchmark should be relatively unbiased as fewer PCR amplification steps in library construction WGS gives the most accurate species estimations

35 Is 16S “Metagenomics” ? Many papers talk about
“metagenomics analysis based on microbial 16S rRNA gene sequencing” “16S metagenomic studies” etc. But rRNA surveys focus on a single gene, not genomes Is this due to a fear of not getting funded if you don’t include a word containing ‘Meta*omics’? “Referring to 16S surveys as metagenomics is misleading and annoying #badomics #OmicMimicry”

36 In Summary Many sources of bias when we sequence 16S rRNA
e.g. platform, region etc. Can still be a quantitative MiSeq V4 a good ‘all round bet’ prior knowledge of taxa may suggest otherwise combinations of primers? full-length for strain resolution Whole genome shotgun better estimations of species abundances

37 Metagenomics: From Bench to Data Analysis 19-23rd September 2016 Thank You for Listening
Dr Mark Alston Computational Biologist Organisms and Ecosystems Group


Download ppt "Metagenomics: From Bench to Data Analysis 19-23rd September 2016 16S rRNA-based surveys for Community Analysis: How Quantitative are they? Dr."

Similar presentations


Ads by Google