Presentation is loading. Please wait.

Presentation is loading. Please wait.

AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction  Load Assembly.

Similar presentations


Presentation on theme: "AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction  Load Assembly."— Presentation transcript:

1 AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction  Load Assembly Data into Bank  Evaluate Mate Pairs & Libraries  Evaluate Read Alignments  Evaluate Read Breakpoints  Analyze Depth of Coverage  Identify “Surrogates”  Load Misassembly Signatures into Bank AMOS Bank http://amos.sourceforge.net

2 Assembly QC: mate happiness Evaluate mate “happiness” across assembly Happy = Correct orientation and distance Finds regions with multiple: Compressed Mates (too close together) Expanded Mates (too far apart) Invalid same orientation (   ) Invalid “outie” orientation (   ) Missing Mates Linking mates (mate in a different scaffold) Singleton mates (mate is not in any contig) Regions with high C/E statistic

3 Mate happiness Excision: Skip reads between flanking repeats Truth Misassembly: Compressed Mates, Missing Mates

4 Mate happiness Insertion: Additional reads between flanking repeats Truth Misassembly: Expanded Mates, Missing Mates

5 Mate happiness Rearrangement: Reordering of reads Truth Misassembly: Misoriented Mates AB Note: if A,B too far apart, mates may all be “happy” BA

6 Compression/Expansion (C/E) Statistic The presence of individual compressed or expanded mates is rare but expected Do the inserts spanning a given position differ from the rest of the library? Flag large differences as potential misassemblies Even if each individual mate is “happy” Compute the statistic at all positions (Local Mean – Global Mean) / Scaling Factor Introduced by Jim Yorke’s group at UMD

7 Library size variation 2kb4kb6kb 8 inserts: 3kb-6kb Local Mean: 4048 C/E Stat: (4048-4000) = +0.33 (400 / √8) Near 0 indicates overall happiness 0kb

8 C/E statistic: Compression 8 inserts: 3.2 kb-4.8kb Local Mean: 3488 C/E Stat: (3488-4000) = -3.62 (400 / √8) C/E Stat ≤ -3.0 indicates Compression 2kb4kb6kb0kb

9 Read Alignment Multiple reads with same conflicting base are unlikely 1x QV 30: 1/1000 base calling error 2x QV 30: 1/1,000,000 base calling error 3x QV 30: 1/1,000,000,000 base calling error Correlated SNPs are likely to be assembly errors, usually collapsed repeats AMOS Tools: analyzeSNPs & clusterSNPs Locate regions with high rate of correlated SNPs Parameterized thresholds: Multiple positions within 100bp sliding window 2+ conflicting reads Cumulative QV >= 40 (1/10000 base calling error) A G C A G C A G C A G C A G C A G C C T A C T A C T A C T A C T A

10 “chimeric” reads mates ribosomal RNA repeats, B. anthracis Read breakpoints: compression error QC METHOD:  Align singleton reads to consensus assembly  Find any breakpoints shared by multiple reads

11 “ Uncompress ” by creating new repeat copy Tandem duplication Reference: B. anthracis Ames ‘ancestor’ strain B. anthracis Ames Porton Down strain

12 Read Coverage Find regions of contigs where the depth of coverage is unusually high AMOS Tool: analyzeReadDepth 2.5x mean coverage AR 1 + R 2 B AR1R1 BR2R2

13 Hawkeye: assembly viewer and debugger

14 Launch Pad

15 Histograms & Statistics Insert Size GC Content Read Length Overall Statistics Bird’s eye view of data and assembly quality

16 Scaffold View a.Statistical Plots b.Scaffold c.Features d.Clone inserts e.Overview f.Control Panel g.Details

17 Standard Feature Types [B] Breakpoint Alignment ends at this position [C] Coverage Location of unusual mate coverage (asmQC) [S] SNPs Location of Correlated SNPs [U] Unitig Used to report location of surrogate unitigs in CA assemblies [X] Other All other Features

18 Insert (mate) Happiness Happy Oriented Correctly && |Insert Size – Library.mean| <= Happy-Distance * Library.sd Stretched Oriented Correctly && Insert Size > Library.mean + Happy-Distance * Library.sd Compressed Oriented Correctly && Insert Size < Library.mean - Happy-Distance * Library.sd Misoriented Same or Outies Linking Read’s mate is in some other scaffold Singleton Read’s mate is a singleton Unmated No mate was provided for read Both mates present Only 1 read present

19 Contig View: detailed alignment of reads to contigs Consensus & Position Scrollable Read Tiling Read OrientationDiscrepancy Highlight Discrepancy Summary Discrepancy Navigation Contig Quick Select Regular Expression Consensus Search

20 SNP View SNP Sorted Reads Polymorphism View Zoom Out

21 SNP Barcode SNP Sorted Reads Colored Rectangle indicate the positions and composition of the SNPs

22 Scaffold View CE Statistic Coverage SNP Feature Happy Stretched Compressed MisorientedLinking

23 Collapsed Repeat 68 Correlated SNPs -5.5 CE Dip Compressed Mates Cluster Read Coverage Spike

24 Example 1: Compression in Prevotella intermedia 17 assembly, found by the CE statistic  Green inserts are 2 standard deviations.  Vertical yellow line shows the most likely place of a compression misassembly.  Only one insert in this case is compressed by > 3 standard deviations

25 Example 2: Compression in Prevotella intermedia 17 assembly, found by the CE statistic

26 Fixing collapsed repeats with AMOS Before After Resolved “Stitched” Contig Original Contig Compression Point Patch Contig

27 Assemblies can be preserved at NCBI’s Assembly Archive http://www.ncbi.nlm.nih.gov/Traces/assembly/assmbrowser.cg i


Download ppt "AMOS tools for assembly validation Automatically scan an assembly to locate misassembly signatures for further analysis and correction  Load Assembly."

Similar presentations


Ads by Google