Presentation is loading. Please wait.

Presentation is loading. Please wait.

PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne.

Similar presentations


Presentation on theme: "PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne."— Presentation transcript:

1 PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne

2 Outline Method – Read screening – Seed building – Contig extension – Scaffolding – Gap filling Result

3 Data-sets Used Single end reads Paired end reads – ReadLength (from 25bp to 100bp) – Insert size vary from MinSpan to MaxSpan – The information are mainly from this data-sets.

4 Overview Read screening step select a set of reads as starting point. Seed building step extend these reads using Single End Reads to make them longer than MaxSpan. Successfully extended regions are called seeds. Contig extension try to extend all seeds using paired-end reads, result sequences called contigs.

5 Read screening Get all k-mers from all the reads. – A k-mer that is expected to occur in the actual genome is called a ‘solid’ k-mer. – A k-mer that is expected to occur within a repeat region is called a ‘repeat’ k-mer. Repeat Region: – ACTTTGACACACACACAC……ACACACACGTTGAG

6 Read screening

7 A read is solid read if: – All it’s k-mers are within the two threshold cut-off. Example: – Two cut-off [42, 120] from previous graph. – K=5 – Read: ACCGTATA – ACCGT, CCGTA, CGTAT, GTATA – 100, 70, 90, 140 – Not a solid read.

8 Read screening Example: – Two cut-off [42, 120] from previous graph. – K=5 – Read: ACCGTATG – ACCGT, CCGTA, CGTAT, GTATG – 100, 70, 90, 70 – A solid read.

9 Seed Building Try to extend the solid read using all overlapping reads.

10 Seed Building Because of sequencing errors or small repeats, there maybe multiple feasible candidates.

11 Seed Building Ambiguities due to sequencing errors, we extend every candidate base up to ReadLength. – If only one candidate path reach the full distance ReadLength, then that path is assumed to be correct extension. If no path or more than one path found. Try other side.

12 Seed Building Finally, when the sequence reach MaxSpan, (called seed) do a verification. At least one paired-end reads overlaps with this seed within expected length [MinSpan, MaxSpan]

13 Contig Extension This step aims to extend each verified seed to form a longer contig using Paired-End reads. For multiple feasible candidates, may due to 3 reasons. – First, sequencing errors. – Second, short tandem repeat. Handling in Gap Filing step. – Third, long repeat. Which longer than MaxSpan.

14 Scaffolding Find the correct ordering of the resulting set of contigs. Gao Song currently working on it.

15 Gap filling Gap filling step is to assemble the gap region between two adjacent contigs to form a longer contig.

16 Gap filling

17

18

19 Simulated data results. Result compare using: – Average Length of all contigs. – N50, N90 of contigs. Bigger better. – Coverage. – Large Misassembly: accuracy is much more important than others.

20 Simulated data results. E. ColiS. PombeHG18 chr10 200bp + 10kbp200bp + 1kbp + 10kbp200bp + 10kbp200bp + 1kbp + 10kbp PAAllpaths2VelvetPAAllpaths2PAAllpaths2VelvetPAAllpaths2PA Contig statistics Contigs (>200bp)23313764453190193311643158 Average length (kb)202.7155.4125.6777.4107.6231.865.263.7394.775.339.1 Maximum length (kb)2109.4732.71506.82492.6593.73500.9868.41062.83519.6851.0514.5 Contig N50 size (kb)883.9357.11413.72492.6362.71499.4236.6602.21487.7226.889.0 Contig N90 size (kb)355.792.4597.42146.083.2210.065.7148.6507.676.424.3 Coverage99.89%99.83%99.60%100.00%99.85%97.56%98.62%98.95%97.78%98.60%94.20% Evaluation Large misassemblies00200011001 Segment maps99.20%99.27%95.00%99.68%99.18%96.02%96.78%93.38%96.42%96.83%90.48% Performance 1 Execution time (min)1221582122776853261017341682 Memory usage (gb)1.315.422.329.73.8455.34.56616

21 Thank you for attention.


Download ppt "PE-Assembler: De novo assembler using short paired-end reads Pramila Nuwantha Ariyaratne."

Similar presentations


Ads by Google