Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 12, 2012 Metagenome analysis: use case.

Similar presentations


Presentation on theme: "Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 12, 2012 Metagenome analysis: use case."— Presentation transcript:

1 Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 12, 2012 Metagenome analysis: use case

2 Advancing Science with DNA Sequence Minoan eruption and metagenomics …it seemed as though the sea was being sucked backwards, as if it were being pushed back by the shaking of the land…Behind us were frightening dark clouds, rent by lightning twisted and hurled, opening to reveal huge figures of flame. These were like lightning, but bigger. From Pliny the Younger’s Letter

3 Advancing Science with DNA Sequence Apart from Minoan eruption… from Chernicoff & Stanley, Geology, 2007 Diagram by Gary Massoth/PMEL

4 Advancing Science with DNA Sequence Sampling sites white mat red mat Key gradients white vs red: Temperature60 vs 18 o C CO2 tension>99% vs <1%

5 Advancing Science with DNA Sequence This is what it looks like

6 Advancing Science with DNA Sequence Chimney material may be of biological origin

7 Advancing Science with DNA Sequence Standard JGI metagenome pipeline DNA sample DNA QC SSU pyrotags shotgun libraries http://pyrotagger.jgi-psf.org  Community composition  Semi-quantitative – OTU abundance Illumina long mate pair Illumina standard 454 standard 454 long mate pair Metagenome IMG/M-ER contigs + unassembled reads  Community composition  Functional analysis Assembly Analysis

8 Advancing Science with DNA Sequence Pyrotag results – BLASTn against Greengenes database

9 Advancing Science with DNA Sequence PhyloDistribution results – BLASTp of metagenome CDSs against isolates in IMG

10 Advancing Science with DNA Sequence Pyrotags vs PhyloDistribution – white mat Big differences in abundance (an order of magnitude or more) of Bacteroidetes and Thermotogae

11 Advancing Science with DNA Sequence Possible explanations Amplification artifacts in pyrotags – well known for metagenome data Sequencing GC bias in the metagenome – low and high ( 65%) are underrepresented in Illumina data K-mer assembler problems: abundant populations may be undrrepresented in assembly if incorrect k-mer/coverage parameters selected Primer bias in pyrotags (against Proteobacteria)?

12 Advancing Science with DNA Sequence PCR artifacts in metagenome data 12 Reason: presence of free beads during the library prep step; escaped emPCR products bind to free beads and are disproportionately amplified 454 technology includes an emulsion PCR step, which may lead to artificial overrepresentation of certain sequences

13 Advancing Science with DNA Sequence Low GC (Brachyspira) What about GC bias? Medium GC (Arcanobacterium) High GC (Cellulomonas) Question: how do you find average/max/min GC content for a clade? Answer: IMG=>Genome Browser=>View Phylogenetically=>click on green + to select the clade, then “Add selected to Genome Cart”=>Compare Genomes=>Genome Statistics Result: Thermotogae GC percent 41 average/47 max/31 min Bacteroidetes GC percent 42.5 average/66 max/31 min

14 Advancing Science with DNA Sequence Let’s take a closer look at the unassembled reads White matRed mat 454 reads total299,9751,429,091 Illumina reads total49,227,14645,337,178 Assembled contigs195,59088,776 N50, bp659869 Longest contig, bp28,14575,483 Illumina reads mapped to assembly, % total 42.312.5 454 reads mapped to assembly, % total 62.115.3

15 Advancing Science with DNA Sequence It’s pyrotag bias after all! JGI uses primer pair 946F-1492R 1492R primer TACGCYTACCTTGTTACGACTT TACGGTTACCTTGTTACGACTT Sequence in the metagenome CG mismatch JGI did extensive testing on artificial communities – this problem not detected

16 Advancing Science with DNA Sequence Functional analysis: metagenome as a bag of functions Red mat is taxonomically more diverse Is it more diverse functionally? White matRed mat COG clusters36313402 Pfam clusters38473505 Question: where do you find this information? Answer: IMG=>Taxon Details=>Metagenome Statistics; Genes with Pfam=>Display as a list =>Export Rarefaction curves: white mat is expected to have ~4000 different Pfams; red mat ~3600

17 Advancing Science with DNA Sequence Abundance Comparisons Motility and chemotaxis genes are overrepresented in white mat (detected by both Pfams and COG Categories) white matred mat

18 Advancing Science with DNA Sequence Is motility/chemotaxis common to all organisms in white mat? Scenario 1: the function/pathway is overrepresented because it is present in all members of the community, possibly at higher copy number Scenario 2: the function/pathway is overrepresented because it is present in one clade, which is absent from the second sample Question: can we distinguish between the two scenarios? Answer: click on the gene count for protein family/functional category, add all genes to Gene Cart=>add scaffolds to Scaffold Cart=>PhyloDistribution of all scaffolds in the Scaffold Cart

19 Advancing Science with DNA Sequence Carbon fixation pathways

20 Advancing Science with DNA Sequence Conclusions Two communities have different composition; white mat sampled next to the hydrothermal vent has lower complexity Community composition as sampled by pyrotags and the metagenome may be quite different due to a number of biases Some protein families/functional categories are more abundant because of different community composition, and not because they are more important


Download ppt "Advancing Science with DNA Sequence Natalia Ivanova MGM Workshop September 12, 2012 Metagenome analysis: use case."

Similar presentations


Ads by Google