Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.

Similar presentations


Presentation on theme: "Aligning Reads Ramesh Hariharan Strand Life Sciences IISc."— Presentation transcript:

1 Aligning Reads Ramesh Hariharan Strand Life Sciences IISc

2 What is Read Alignment?

3 AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC Subject’s Genome AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC Reference Genome Where do these match in the Reference? Close but not quite the same as the Subject’s Genome

4 What does “Match” mean?

5 AGGCTACGCATGTCCCATAATGACCCACACTTAAGTTC Reference Genome GCTACGCA Exact Match CATAAAGAC With Mismatches CACTT_AGT With Gaps

6 Why mismatches and gaps?

7 The subject genome could be different from the reference

8 Reads Reference Genome SNP Deletion Mismatches and Gaps

9 The reading process could be erroneous

10 How many mismatches and gaps?

11 Short reads ~50, few mismatches and gaps Long reads, ~1000, many more mismatches and gaps

12 How do aligners fare?

13 BWA: Very few mismatches and gaps CoBWeb BWA-SW: Many mismatches and gaps BowTie: only mismatches, no gaps No paired read handling No handling of adaptor trimming for small RNA Separate handling for RNASeq BowTie2

14 How does an Aligner work?

15 For simplicity, assume Exact Match

16 For each read, scan the entire reference genome sequence SLOW!!!!

17 CGACG The Reference C C C G G T T T A A C C A A G G A A C C T T Index the Reference

18 How can we find Exact Matches of a read quickly with this index?

19 CGACG The Reference C C C G G T T T A A C C A A G G A A C C T T CG C

20 The problem: 24GB

21 Can this structure be compressed?

22 C G AC$ A C $CG C G AC$ C $ CGA G A C$C $ C GAC The Reference This column is the BWT All its circular shifts, sorted lexicographically The Index: now an array instead of a tree The Burrows- Wheeler based Index Sampled to reduce memory at the expense of speed (Ferragina and Manzini) Sampled to reduce memory at the expense of speed (Ferragina and Manzini)

23 How about Mismatches and Gaps?

24 BWA, BWA-SW and BowTie force mismatches and gaps into the BW Index searching procedure

25 CoBWeb uses the BW Index to find a ‘seed’ exact match and does Smith- Waterman around this seed This 15-mer occurs at locations x1, x2… This 15-mer occurs at locations x3, x4… This whole 30-mer occurs at location x5

26 Dynamic Programming Given a location in the reference with an read anchor, how well does the read match here? Reference Read Anchor 14 mer Smith-Waterman (optimized for large gaps)

27 Comparison with BWA Read Length 50 Read Length 150 20% faster than BWA with comparable results CoBWeb: 3 mismatches and 2 gaps BWA: 2 mismatches + 1 gap of possibly multiple length

28 Comparison with BWA-SW Read Length 400 8 mismatches plus 10 gaps CoBWebBWA-SW Reads1m Time taken1130s2242s Incorrectly Mapped125989819 5650 mapped incorrecty by BWA-SW The remainder has poor BWA mapping quality

29 Avadis NGS

30 Alignment, DNA Var Detection, RNASeq, ChIPSeq, Small RNASeq

31 Thank You


Download ppt "Aligning Reads Ramesh Hariharan Strand Life Sciences IISc."

Similar presentations


Ads by Google