Presentation is loading. Please wait.

Presentation is loading. Please wait.

Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee Taylor Shawn Houston Institute.

Similar presentations


Presentation on theme: "Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee Taylor Shawn Houston Institute."— Presentation transcript:

1 Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee Taylor ltaylor@iab.alaska.edu Shawn Houston houston@alaska.edu Institute of Arctic Biology University of Alaska Fairbanks Photograph: Roger Ruess

2 Coupling Diversity with Function: Metagenomics of Boreal Forest Fungi USDA-NSF Microbial Genome Sequencing Program, 2003-2007 IPY: A Community Genomics Investigation of Fungal Adaptation to Cold NSF OPP International Polar Year, 2007-2011 Major Clone Datasets To Date: Upland successional stages, Bonanza Creek LTER site30,000 Various black spruce community types, Interior Alaska40,000 Two individual floodplain black spruce soil cores20,000 Seasonal study in single white spruce site, Interior Alaska9,200 Moist sites along North American Arctic Transect, bioclimatic subzones A-E9,200 Moist sites at Svalbard, subzones B, C3,000 Snow addition experiment at Toolik Lake LTER tundra site3,800

3 Bioinformatic Processing of Fungal ITS Sequences from the Environment I. Initial sequence cleanup -> Quality Scores -> Masking if Sanger Sequences II. Bar-coding/tagging -> Long, bias tested, edit distance -> Tag-finder script III. Chimeras -> Uclust IV. Defining OTUs I.Introns -> TGICL/Cap3 Genome Assemblers II.Percent Identity Thresholds -> Phylobinning III.Pseudogenes -> ??? V. Identifying OTUs I.Curated databases with masking of conserved seqs VI. SIMPLE DEMO/TUTORIAL

4 ftp://folders.inbre.alaska.edu/FMP/ http://www.borealfungi.uaf.edu/pipeline/

5

6 V Kunin, A Engelbrektson, H Ochman and P Hugenholtz. 2010. Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental microbiology 12:118–123.

7 Bioinformatic Processing of Fungal ITS Sequences from the Environment I. Initial sequence cleanup -> Quality Scores -> Masking if Sanger Sequences II. Bar-coding/tagging -> Long, bias tested, edit distance -> Tag-finder script III. Chimeras -> Uclust IV. Defining OTUs I.Introns -> TGICL/Cap3 Genome Assemblers II.Percent Identity Thresholds -> Phylobinning III.Pseudogenes -> ??? V. Identifying OTUs -> Curated databases with masking of conserved seqs

8

9 Design of Pig-Tagged Primers Taylor DL, Booth MG, McFarland JW, Herriott IC, Lennon NJ, Nusbaum C & Marr TG. 2008. Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach. Molecular Ecology Resources 8(4): 742 - 752.

10 4 Taxon Test for Biases Taylor DL, Booth MG, McFarland JW, Herriott IC, Lennon NJ, Nusbaum C & Marr TG. 2008. Increasing ecological inference from high throughput sequencing of fungi in the environment through a tagging approach. Molecular Ecology Resources 8(4): 742 - 752.

11 Soil 1 Soil 2. OTUTag064Tag102Tag067Tag126 Grand Total 11314373599 349810330 32775423 2912100022 25990018 14207514 26640313 4225312 30008412 17332210 202158 520237 1952007 2300347 4020237 300336 3141106 712025 1800145 1111114 1220114 2113004 4113004 Soil Sample Tests for Biases

12

13 Bioinformatic Processing of Fungal ITS Sequences from the Environment I. Initial sequence cleanup -> Quality Scores -> Masking if Sanger Sequences II. Bar-coding/tagging -> Long, bias tested, edit distance -> Tag-finder script III. Chimeras -> Uclust IV. Defining OTUs I.Introns -> TGICL/Cap3 Genome Assemblers II.Percent Identity Thresholds -> Phylobinning III.Pseudogenes -> ??? V. Identifying OTUs I.Curated databases with masking of conserved seqs VI. SIMPLE DEMO/TUTORIAL

14 Challenges: Chimeras Reports in literature up to 30% of clone datasets 3% in our earliest clone libraries <1% in a 30,000 clone black spruce dataset* Currently used detection methods depend upon global MSA and/or library of clean reference sequences

15 STEP 1: Identify 97% contigs that are represented in multiple libraries. Sequences belonging to these contigs are deemed to be real and non-chimeric. STEP 2: BLAST sequences against all known databases of fungi (including GenBank and lab databases) and identify passing matches (queries) STEP 3: BLAST ITS1 and ITS2 of remaining sequences against curated database hunting for 97+% matches of both sides to same species. Sequences for which both the ITS1 and ITS2 regions match the same species at 97+% over 200+ bp are considered real and non-chimeric. STEP 4: BLAST ITS1 and ITS2 sequences against database from which they came (including all libraries), hunting for matches to possible chimera parents STEP 5: Align full length queries against best ITS1 and ITS2 matches, examine by eye

16

17 First Uclust test runs on fungal ITS sequences: 1)dataset of 45 OTUs with ITS plus 600bp LSU 10 out of 10 synthetic chimeras detected, including intrageneric only 2 real sequences suggested as possible chimeras, with low probability 2)examined another dataset of 547 real, relatively reliable sequences spits out 3 way alignments that can be examined Bellerophon suggested 53% chimeric, Uclust found ZERO

18 Bioinformatic Processing of Fungal ITS Sequences from the Environment I. Initial sequence cleanup -> Quality Scores -> Masking if Sanger Sequences -> Orienting to fix direction II. Bar-coding/tagging -> Long, bias tested, edit distance -> Tag-finder script III. Chimeras -> Uclust IV. Defining OTUs I.Introns -> TGICL/Cap3 Genome Assemblers II.Percent Identity Thresholds -> Phylobinning III.Pseudogenes -> ??? V. Identifying OTUs -> Curated databases with masking of conserved seqs

19 Challenges: Arbitrary % Identity Thresholds -Multicopy -Intra-individual variation (including pseudogenes) -Intra-specific variation -Different rates of evolution in different lineages -How does 97% identity threshold perform? SSUITS-15.8SITS-2LSU

20 Groupings Differ Depending on Alignment Program and Parameter Settings

21 X. Huang & A. Madan. 1999. Genome Research 9: 868-877

22 ITS phylogram of Lactarius ML tree, thick branches have >0.95 Bayesian Posterior Probability OTU 13 (97% ITS sim.) Geml J, Laursen GA, Timling I, McFarland J, Booth MG, Lennon N, Nusbaum HC, Taylor DL. 2009. Molecular Ecology 18: 2213–2227.

23 OTU 19 (97% ITS sim.) OTU 12 (97% ITS sim.) Geml J, Laursen GA, Timling I, McFarland J, Booth MG, Lennon N, Nusbaum HC, Taylor DL. 2009. Molecular Ecology 18: 2213–2227.

24 Our Phylobinning Approach: - cluster with Cap3 at low % identity (90%) - extract sequences from clusters - find related sequences in GenBank (everything & uncultured excluded) - generate alignments for each cluster using Muscle - feed alignments to RAxML - use fast-bootstraping method and find best tree using maximum likelihood - parse tree to determine phylobins If branch length > 0.001 AND bootstrap >= 98, then name new phylobin If branch length < 0.01 AND bootstrap < 98, move to next cluster If branch length >= 0.01 AND bootstrap < 70, then move to next cluster If branch length >= 0.01 AND bootstrap >= 70, then name new phyobin If branch length >= 0.03, then name new phylobin (even if individual sequence) All sequences from a contig that are not assigned to a phylobin at this point go into a last, default phylobin

25 Systematic Biology 57(5): 758–771, 2008 {RAxML version 7.0.4 released by Alexandros Stamatakis in April 2008}

26

27

28

29 Meliniomyces bicolor Cistella acuum Uncultured fungus clone TD9_OTU5 Uncultured fungus clone G20_OTU5 Uncultured fungus clone IH_Tag102_3331 Mycorrhizal fungal sp. pkc09 Mycorrhizal fungal sp. pkc12 Mycorrhizal fungal sp. pkc22 Mycorrhizal fungal sp. pkc18 Mycorrhizal fungal sp. pkc33 Mycorrhizal fungal sp. pkc38 TKN7_3179P22phylobin18 *gi|133753088| Uncultured fungus clone G20_OTphylobin18 *gi|133753170| Uncultured fungus clone TD9_OTphylobin18 TKN12_3255J11phylobin19 TKN12_3258A12phylobin20 TKN9_3238J10phylobin21 *gi|37624773| Mycorrhizal fungal sp. pkc18 1phylobin21 *gi|37624772| Mycorrhizal fungal sp. pkc33 1phylobin21 *gi|37624762| Mycorrhizal fungal sp. pkc38 1phylobin21 TKN10_3235I22phylobin21 TKN11_3260O3phylobin21 *gi|37624759| Mycorrhizal fungal sp. pkc12 1phylobin21 *gi|37624763| Mycorrhizal fungal sp. pkc22 1phylobin21 *gi|162311725| Uncultured fungus clone IH_Tagphylobin22 TKN12_3249H16phylobin22 18 21 22 TKN12_3255J11 TKN9_3238J10 TKN12_3258A12 TKN10_3235I22 TKN11_3260O3 TKN12_3249H16 TKN7_3179P2 Hyalodendriella betulae 19 20

30 Bioinformatic Processing of Fungal ITS Sequences from the Environment I. Initial sequence cleanup -> Quality Scores -> Masking if Sanger Sequences -> Orienting to fix direction II. Bar-coding/tagging -> Long, bias tested, edit distance -> Tag-finder script III. Chimeras -> Uclust IV. Defining OTUs I.Introns -> TGICL/Cap3 Genome Assemblers II.Percent Identity Thresholds -> Phylobinning III.Pseudogenes -> ??? V. Identifying OTUs -> Curated databases with masking of conserved seqs

31 Challenges: Pseudogenes -Sequences from cultures and fruitbodies for phylogenetics are rarely cloned - usually averages of variants that equate with the dominant sequence type -Pseudogenes found in ITS clone libraries of Zooxanthellae

32 Thornhill, Lajeunesse & Santos. 2007. Molecular Ecology 16: 5326-5340.

33 “Based on these results, we conclude that artefacts due to Taq polymerase and cloning error only account for a small percentage of our clones while the remaining sequence diversity and divergence originates from ribosomal operon variation within the Symbiodinium Genome.”

34 Challenges: Non-fungal Sequences 30,000 black spruce clones Primers ITS1-F and TW13 5.8S LSU

35 Bioinformatic Processing of Fungal ITS Sequences from the Environment I. Initial sequence cleanup -> Quality Scores -> Masking if Sanger Sequences -> Orienting to fix direction II. Bar-coding/tagging -> Long, bias tested, edit distance -> Tag-finder script III. Chimeras -> Uclust IV. Defining OTUs I.Introns -> TGICL/Cap3 Genome Assemblers II.Percent Identity Thresholds -> Phylobinning III.Pseudogenes -> ??? V. Identifying OTUs -> Curated databases with masking of conserved seqs

36

37 >TKN14_3314_P9 CTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTATTGAAATTATAG GTGAGGGTTGTAGCTGGCCTCTCGGGGCATGTGCACGCCCGAGCCCTTAATCCACACACACCTGTGAACCTATTGTAAGG GCCCTTAAAAAAGGCCTTTACGTCTTATCATCAACCCATCGTATGTCTCATAGAATGTAAATATATGTCCTCGCCTTAAA AAGCGTTGATAAACTTATACAACTTTCAACAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATA AGTAATGTGAATTGCAGATTTTCAGTGAATCATCGAATCTTTGAACGCACCTTGCGCTCCTTGGTATTCCGAGGAGCATG CCTGTTTGAGTGTCATTAAATTCTCAACTCTGATCGATTTGTTTCGACTTCGGAGCTTGGATTTGGAGCGTGCTGGCGTC GGTCGGCTCCTCTTAAATGCATCAGCGGAATCTAACGTTTCGGACGTCAGTGTGATAATCATGTTGCGCTGTCTGCCTGA TCTGAAAGCCCGCTCACAATGGTCTTCGGACAACTTCATATCAAATTTGACCTCAAATCAGGTAGGACTACCCGCTGAAC TTAAGCATATCAATAAGCGGAGGAAAAGAAACTAACAAGGATTCCCCTAGTAACTGCGAGTGAAGCGGGAAAAGCTCAAA TTTAAAATCTGGCGGTCTTGCGGCCGTCCGAGTTGTAATCTGGAGAAGCGTTTATCCGCGTCGGACCGTGTACAAGTCTT CTGGAAGGGAGCGTCGTAGAGGGTGAGAATCCCGTCTTTGACACGGACAACCGGTGCTTTTGTGATGCGCTCTCGAAGAG TCGAGTTGTTTGGGAATGCAGCTCAAAATGGGTGGTAAATTCCATCTAAAGCTAAATATTGGCGAGAGACCGATAGCGAA CAAGTACCGTGAGGGAAAGATGAAAAGCACTTTGGAAAGAGAGTTAAACAGTACGTGAAATTGTTGAAAGGGAAACGTTT GAAGTCAGTCGCGTCGGCCGAGACTCAACCTTGCTTCTGCTCGGTGCACTTCTCGGTTGACGGGTCAGCATCAATTTTGA CCGCCGGATAAAGGTCGGGGGAATGTGGCATCCTTCGGGATGTGTTATAGACCTCGATTCGGATACGGCGATTGGGATTG AGGAACTCGGCGCTTTGCGTCCAGGATGCTGGCATAATGGCTTTAAGCGACCCGTCTTGAAACACGGANC >TKN14_3314_P9 CTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTTTCCGTAGGTGAA CCTGCGGAAGGATCATTATTGAAATTATAGGTGAGGGTTGTAGCTGGCCT CTCGGGGCATGTGCACGCCCGAGCCCTTAATCCACACACACCTGTGAACC TATTGTAAGGGCCCTTAAAAAAGGCCTTTACGTCTTATCATCAACCCATC GTATGTCTCATAGAATGTAAATATATGTCCTCGCCTTAAAAAGCGTTGAT AAACTTATACAACTTTCAACAACGGATCTCTTGGCTCTCGCATCGATGAA GAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGATTTTCAGTGAAT CATCGAATCTTTGAACGCACCTTGCGCTCCTTGGTATTCCGAGGAGCATG CCTGTTTGAGTGTCATTAAATTCTCAACTCTGATCGATTTGTTTCGACTT CGGAGCTTGGATTTGGAGCGTGCTGGCGTCGGTCGGCTCCTCTTAAATGC ATCAGCGGAATCTAACGTTTCGGACGTCAGTGTGATAATCATGTTGCGCT GTCTGCCTGATCTGAAAGCCCGCTCACAATGGTCTTCGGACAACTTCATA TCAAATTTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNCGTCTTGAAACACGGANC

38

39 query match bit score E value -------- -------- ---------- ---------- TKN14_3314_P9gi|56126498|gb|AY822743.1| Uncultured ectomycor 1168 0.0 TKN14_3314_P9gi|296184581|gb|AY884238.2| Ectomycorrhizal fun 1168 0.0 TKN14_3314_P9gi|299778250|gb|HM069482.1| Uncultured fungus c 1128 0.0 TKN14_3314_P9gi|13470319|gb|AY010281.1| Piloderma fallax iso 1100 0.0 TKN14_3314_P9gi|104295534|gb|DQ474631.1| Uncultured ectomyco 1100 0.0 query match bit score E value -------- -------- ---------- ---------- TKN14_3314_P9gi|296184581|gb|AY884238.2| Ectomycorrhizal fun 1168 0.0 TKN14_3314_P9gi|13470319|gb|AY010281.1| Piloderma fallax iso 1100 0.0 TKN14_3314_P9 gi|13470320|gb|AY010282.1| Piloderma fallax iso 1066 0.0 TKN14_3314_P9 gi|86610857|gb|DQ365660.1| Piloderma fallax iso 1025 0.0 TKN14_3314_P9 gi|86610864|gb|DQ365667.1| Piloderma fallax iso 1025 0.0 query: TKN14_3314_P9 Click here to see the sequences of the best scores The best scores are: gi|296184581|gb|AY884238.2| Ectomycorrhizal fun cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; unclassified Fungi; ectomycorrhizal fungal sp. AR-Ny2 gi|13470319|gb|AY010281.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallax gi|13470320|gb|AY010282.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallax gi|86610857|gb|DQ365660.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallax gi|86610864|gb|DQ365667.1| Piloderma fallax iso cellular organisms; Eukaryota; Fungi/Metazoa group; Fungi; Dikarya; Basidiomycota; Agaricomycotina; Agaricomycetes; Agaricomycetidae; Atheliales; Atheliaceae; Piloderma; Piloderma fallax

40 Funding Sources and Supporting Agencies

41 Thanks! Michelle Augustyn Michael Booth Dan Cardin József Geml Hope Gray Ian Herriott Scott Hillard Teresa Hollingsworth Sarah Hopkins Jason Hunt Tom Marr Jack McFarland Chad Nusbaum Gary Laursen Niall Lennon Jim Long Mitali Patil Ina Timling

42

43 Mask (marking low quality base calls) Tag-Finder (identifying primer bar-codes) Orient (fixing sequence directions) Trim-Seq (removing low quality bases at ends) Purge (removing low quality sequences) Flag Non-FungalsPrepare_contigs (BLAST + Organism Lookup)(TGICL/Cap3 broad clusters) (BLAST to add close relatives) (Muscle cluster alignments) Flag ChimerasPhylo_table (Uclust)(RAxML bootstrap trees) (Tree parsing) Final Phylobin Table (Closest BLAST Relatives) (Abundances of Phylobins across Samples) (Any Flags)

44

45 ftp://folders.inbre.alaska.edu/FMP/ http://www.borealfungi.uaf.edu/pipeline/

46 ftp://folders.inbre.alaska.edu/FMP/ http://www.borealfungi.uaf.edu/pipeline/

47 ftp://folders.inbre.alaska.edu/FMP/ http://www.borealfungi.uaf.edu/pipeline/ Upload sequences in.fasta format here Upload quality file here (.qual file) Place phred threshold here (phred = 20 is conservative)

48 >R_UP1_3168_P8 CAAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTCTCCGTTGG TGAACCAGCGGAGGGATCATTACCGAGTTTACAAACTCCCAAACCCTTTG TGAACCTTACCTATCGTTGCTTCGGCGGGACCGCCCCGACGGCCACCTCG GTGGTCCCGGAACCAGGCGCCCGCCGAAGGCCCCAAACTCTTTGTTTCCT ATGGTTTTCTCCTCTGAGTGGAAAATAAACAAATAAATAAAAACTTTCAA CAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGAT AAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCAC ATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCA ACCCTCAGGCCCCCAGTGCCTGGCGTTGGGGATCGGCCGCTGGCGTCCTT CGGGGGCGCCTGGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGTAGTCC TCCTCTGCGTAGTAGCACAACCTCGCAGTTGGAACGCGGCGGTGGGCCAT GCCGTTAAACACCCCACTTCTGAAAGTTGACCTCGGATCAGG >R_UP1_3168_P8 36 22 41 51 51 51 51 45 20 26 22 21 27 33 43 41 61 61 61 45 61 61 61 51 57 61 61 61 61 57 45 61 57 61 61 57 49 61 61 61 57 61 61 61 61 61 57 52 61 27 51 57 61 61 55 52 45 61 61 61 61 39 25 61 61 43 33 61 61 61 61 61 61 61 61 61 61 61 61 55 57 61 51 61 61 61 61 61 61 43 41 61 61 61 61 61 61 61 61 61 57 61 61 57 51 61 61 61 61 61 61 55 61 61 61 61 51 61 61 61 61 61 61 61 43 42 61 61 61 45 39 61 61 61 61 61 61 61 47 47 61 52 61 61 51 61 61 61 61 61 61 32 55 61 51 61 61 61 61 61 61 51 42 61 52 61 61 52 61 61 61 61 52 61 61 61 61 52 42 51 51 52 51 47 49 61 61 51 61 61 52 61 61 61 42 61 47 47 51 47 61 55 61 52 51 61 61 61 61 51 61 61 51 52 52 61 52 47 49 61 51 51 61 61 47 31 55 61 51 49 40 61 55 47 61 61 52 41 61 61 52 61 55 52 61 61 61 52 41 61 44 47 52 47 61 49 51 40 51 55 51 61 43 61 40 32 55 51 49 61 52 34 49 51 61 61 47 51 61 47 52 47 61 61 40 51 49 49 49 51 51 52 61 52 38 55 39 61 55 31 61 51 51 46 61 61 45 47 25 52 43 24 25 55 43 27 47 40 55 46 39 51 49 29 49 47 55 51 51 37 51 34 49 55 49 49 52 39 51 46 55 47 40 44 55 47 51 46 51 49 41 51 55 52 47 51 49 43 41 37 30 40 39 52 37 49 39 55 43 51 55 51 32 55 51 49 39 51 44 49 38 44 27 52 30 32 40 44 51 41 43 51 23 39 31 55 49 49 32 23 37 46 41 35 47 40 47 31 32 33 52 41 41 44 35 45 27 40 35 47 34 47 47 31 45 15 24 39 37 36 38 39 20 28 44 21 26 39 40 29 28 24 20 29 47 26 25 27 40 39 26 29 34 35 36 8 24 31 32 47 30 21 10 8 36 36 35 26 47 41 29 34 47 29 35 30 45 29 46 27 19 44 11 16 39 31 27 45 36 40 30 20 31 31 30 45 31 21 32 22 22 22 27 33 24 23 28 27 17 24 29 37 28 29 5 6 20 7 7 14 24 30 29 29 28 28 30 25 7 10 21 29 35 32 14 24 22 22 27 23 29 21 16 27 26 8 22 30 23 29 29 10 16 28 26 29 29 29 23 22 26 26 30 24 17 15 18 18 25 21 20 25 23 40 28 17 18 19 23 28 18 28 17 24 28 22 29 31 30 8 11 26

49 >R_UP1_3168_P8_Original CAAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTCTCCGTTGG TGAACCAGCGGAGGGATCATTACCGAGTTTACAAACTCCCAAACCCTTTG TGAACCTTACCTATCGTTGCTTCGGCGGGACCGCCCCGACGGCCACCTCG GTGGTCCCGGAACCAGGCGCCCGCCGAAGGCCCCAAACTCTTTGTTTCCT ATGGTTTTCTCCTCTGAGTGGAAAATAAACAAATAAATAAAAACTTTCAA CAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGAT AAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCAC ATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCA ACCCTCAGGCCCCCAGTGCCTGGCGTTGGGGATCGGCCGCTGGCGTCCTT CGGGGGCGCCTGGCCGGCCCCGAAATCTAGTGGCGGTCTCGCTGTAGTCC TCCTCTGCGTAGTAGCACAACCTCGCAGTTGGAACGCGGCGGTGGGCCAT GCCGTTAAACACCCCACTTCTGAAAGTTGACCTCGGATCAGG >R_UP1_3168_P8_Masked CAAACTTGGTCATTTAGAGGAAGTAAAAGTCGTAACAAGGTCTCCGTTGG TGAACCAGCGGAGGGATCATTACCGAGTTTACAAACTCCCAAACCCTTTG TGAACCTTACCTATCGTTGCTTCGGCGGGACCGCCCCGACGGCCACCTCG GTGGTCCCGGAACCAGGCGCCCGCCGAAGGCCCCAAACTCTTTGTTTCCT ATGGTTTTCTCCTCTGAGTGGAAAATAAACAAATAAATAAAAACTTTCAA CAACGGATCTCTTGGTTCTGGCATCGATGAAGAACGCAGCGAAATGCGAT AAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCAC ATTGCGCCCGCCAGTATTCTGGCGGGCATGCCTGTTCGAGCGTCATTTCA ACCCTCAGGCCCCCAGTGCCTGGCGNTGGGGATCGGCCGCTGGCGTCCTT CGGGGNCGCCTGNNCGGCCCCGAAATCTAGNGNNGGTCTCGCTGTAGTCC TCCTCTGCNTAGTANNANNNCCTCGCAGNNGGAANGCGGCGGNGGNCCAT GNNGTTAAACACCCNNNNTCTGAAANNNGANCNCGGATCNNG

50 primer = TTTCTT pigtail = TTGGTC Upload your sequences here Upload list of tags as text file here

51 Upload sequences here Upload text file “Orient_Motifs” here

52

53

54

55

56

57

58

59 Challenges: Introns -best fungal-selective primer is ITS1F, but it is 5’ of intron insertion site in 3’ end of SSU for many Ascomycetes

60 Challenges: Introns

61 Mask (marking low quality base calls) Tag-Finder (identifying primer bar-codes) Orient (fixing sequence directions) Trim-Seq (removing low quality bases at ends) Purge (removing low quality sequences) Flag Non-FungalsPrepare_contigs (BLAST + Organism Lookup)(TGICL/Cap3 broad clusters) (BLAST to add close relatives) (Muscle cluster alignments) Flag ChimerasPhylo_table (Uclust)(RAxML bootstrap trees) (Tree parsing) Final Phylobin Table (Closest BLAST Relatives) (Abundances of Phylobins across Samples) (Any Flags) Only for Sanger, 454 & Illumina software do these steps


Download ppt "Issues, Concepts and Tools for Analysis of Fungal Environmental ITS Sequences D. Lee Taylor Shawn Houston Institute."

Similar presentations


Ads by Google