Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESPRIT. Taxonomy ● Works very well and gives accurate results ● Requires a previous blast search that may take long to complete ● When in doubt goes one.

Similar presentations


Presentation on theme: "ESPRIT. Taxonomy ● Works very well and gives accurate results ● Requires a previous blast search that may take long to complete ● When in doubt goes one."— Presentation transcript:

1 ESPRIT

2 Taxonomy ● Works very well and gives accurate results ● Requires a previous blast search that may take long to complete ● When in doubt goes one level up in the hierarchy ● Assignment is as accurate as possible ● Species detail is lost ● Not good enough to measure genetic diversity and species richness

3 MOTU / OTU ● Molecular operational taxonomic unit ● Operational Taxonomic Unit ● 3%species ● 5%genus ● 10%phylum ● Controversial ● Practical ● As long as you remember there is no real association

4 Computing OTUs ● Measure distance among two sequences ● If < cutoff ● They belong to the same group (out) ● If > cutoff ● They belong in different groups ● Each sequence must be checked against all others ● Requires a distance matrix ● Distances are calculated by sequence comparison

5 Multiple Sequence Alignment ● Slow ● New developments: MAFFT, MUSCLE, CLUSTAL-OMEGA ● Slow for hundreds of thousands of sequences ● MSA leads to inflated estimates ● Arguable results for 16S hypervariable regions ● Some regions may not have enough conservation (e, g, V6, V3) ● Distance tables can become huge

6 Better than MSA: NW ● Needleman-Wunsch aligns two sequences globally ● Pairwise distances can be computed simultaneously ● Does not require reading a huge distance matrix ● Gives more accurate results

7 Pairwise alignments ● Are a combinatorial problem: ● (N · (n – 1) ) / 2 ● Needleman-Wunsch is expensive on sequence size ● Can take forever is not reduced to minimum needed ● Combined with a suitable clustering method can avoid computing distance matrix.

8 Reducing problem size ● Remove low-quality and low-information reads ● Remove reads containing ambiguous nucleotides (N) ● Eliminate reads with atypical sequence lengths ● If two sequences are identical or one is a subset of the other, they are combined and the frequency count is incremented ● Estimate distances among pairs with <0.10 distance ● Use k-mer distance of 0.5 for initial filtering.

9 Hierarchical clustering ● First sort pairwise distances in ascending order ● Process distances on the fly ● Classify clusters into active or inactive ● Active: not enough information to merge with other cluster ● Inactive: cluster with no information or already merged ● Gives same results as mothur clustering method

10 Calculations ● Observed species ● Rarefaction analysis ● CHAO1 ● ACE

11 OTUPIPE

12 About Otupipe ● Bash script ● Requires USEARCH and UCHIME ● Calculate OTUs from single-region experiments ● Designed for 454 sequencing ● Can be adapted for Illumina reads ● Appears to show higher error rates for 16S gene ● No effective denoising/error-correction solution has been published ● Increase MINSIZE

13 Basic usage ● Otupipe.bash input.file.fas outdir ● Creates outdir ● Writes chimeras.fa, otus.fa and readmap.uc ● readmap.uc – One line per read – Hit (chimera or out) – No match (new species or more likely an error) ● User settable parameters as environment variables – MINSIZE, PCTID_ERR, PCTID_OTU, PCTID_BIN

14 Practical usage ● Windows: use Cygwin ● Embed in shell scripts ● Process results programatically

15 What it does ● Remove duplicates ● Sort sequences by decreasing length ● Detect chimeras (UCHIME) ● Abundance ● Gold database ● Set chimeras aside ● Cluster chimeras ● Cluster remaining reads ● Generate readmap.uc

16 MOTHUR

17 A general tool ● Can do most common tasks ● In several ways ● Evolves rapidly ● Join the forum ● Trace changes ● Well documented ● function(help) ● Good tutorials

18 Denoising ● Sffinfo (get information on sff file) ● shhh.flows (PyroNoise) ● trim.seqs (select by properties as size, ambiguity, remove barcodes, primers...) ● unique.seqs (select unique sequences) ● screen.seqs (remove sequences aligning outside a desired range) ● filter.seqs (remove common gaps, trump, etc...) ● pre.cluster (merge sequences below threshold) ● chimera.uchime (remove chimeras using uchime) ● classify.seqs/remove.lineage (remove contaminants)

19 Multiple sequence alignments ● Use an external alignment in fasta format ● Use a reference guided alignment ● Kmer, blastn, suffix tree ● Pairwise alignment between candidate and de- gapped sequences (Needleman-Wunsch, Gotoh, blastn) ● Reinsert gaps (NAST) ● References: Greengenes, SILVA, user-provided

20 Cluster ● pre.cluster (collate reads with less than X changes) ● cluster.seqs (cluster reads by furthest, average or nearest neighbor) ● Hcluster (hierarchical clustering, very slow for average neighbor, good for furthest and nearest) ● Cluster.split (fastest, new, works by taxon level and should give same output as cluster.seqs)

21 Measures ● Large array of options ● OTUs and rarefactions ● Estimators (ACE, CHAO1, Shannon) ● Phylogeny ● Alpha and beta diversity (one or many groups) ● Venn diagrams ● Unifrac ● PCoA (Principal Component Analysis) ● NMDS (non-metric multidimensional scaling), etc...

22 Usage ● Command line ● Batch (mothur file) ● Parallel (processors=x) ● Distributed (MPI) ● See SOP in Mothur web site ● Monitor the web site ● Most versatile


Download ppt "ESPRIT. Taxonomy ● Works very well and gives accurate results ● Requires a previous blast search that may take long to complete ● When in doubt goes one."

Similar presentations


Ads by Google