Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC2431 February 3rd 2010 Alecia Fowler

Similar presentations


Presentation on theme: "CSC2431 February 3rd 2010 Alecia Fowler"— Presentation transcript:

1 CSC2431 February 3rd 2010 Alecia Fowler
Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform Heng Li and Richard Durban CSC2431 February 3rd 2010 Alecia Fowler

2 Short Read Alignment SPEED AND ACCURACY

3 Burrows Wheeler Aligner
OVERVIEW: based on backward search and Burrows-Wheeler Transform (BWT) FEATURES: performs gapped alignment for single-end reads, supports paired-end mapping, generates mapping quality PLATFORM: Illumina; SOLiD; 454; Sanger PROS: fast CONS: short read algorithm is slow for long reads and reads with high error rate

4 Prefix trie X = GOOGOL$ “G” “GO” “GOO” “GOOG” “GOOGO” “GOOGOL”

5 Burrows-Wheeler Transform (BWT)
Algorithm used for data compression Output is easier to compress as it groups similar symbols together Text compression method Takes a block of data and rearranges it using a sorting algorithm String is built by sorting all of the circular shifts of a string and concatenating the last characters of each circular shift Key feature is the first-last property, in that the k-th occurrence of a character in the BWT string corresponds to its kth occurrence in the list of sorted circularshifts

6 Suffix array interval and sequence alignment

7 Exact and Inexact Matching
W = LOL X = GOOGOL$ Has to account for mismatches or gaps in the reads the BWT index of the reverse reference sequence narrows the search space

8 Evaluation: Simulated Data
Simulated reads from human genome One million pairs of different lengths Mapped to the human genome BWA was found to be more accurate than Bowtie and SOAPv2 Would need to sacrifice mapping quality in order to increase speed

9 Evaluation: Real Data 12.2 million pairs of 51bp reads from a male genome Mapped to human genome and a human-chicken hybrid reference Had high speed and accuracy for both


Download ppt "CSC2431 February 3rd 2010 Alecia Fowler"

Similar presentations


Ads by Google