Presentation on theme: "John Dorband, Yaacov Yesha, and Ashwin Ganesan Analysis of DNA Sequence Alignment Tools."— Presentation transcript:
John Dorband, Yaacov Yesha, and Ashwin Ganesan Analysis of DNA Sequence Alignment Tools
Project Goal The goal of our project is analyzing DNA sequence alignment tools, such as SHRiMP , Bowtie , BWA , and BFAST , explaining why different tools produce different results, and finding ways of improving the tools.
Alignment of Short Reads A common task is aligning short reads of DNA to a reference genome (database). A common technique used by DNA alignment tools is creating a searchable index.
Transitions Vs. Transversions As mentioned in , transition mutations (A G and C T) have higher probability than transversion mutations (other subsitutions).  utilized this facts for improving DNA alignment. We introduced the following technique: In situations where mutation rate is suffiently high compared with sequencing error rate, use different penalties for transition mismatches and tranversion mismathces, in algorithms, such as those used in Bowtie  and BWA , that are related to the Burrows Wheeler transform . We plan to test our technique.
Comparing DNA Alignment Tools Our work also includes comparing several DNA alignment tools. We compared Bowtie and SHRiMP, and found out that SHRiMP mapped 74.18%, while Bowtie mapped 35.79%. We plan to use simulated data, as was used in , in order to compare sensitivity and specificity of different DNA alignment tools.
A Performance Issue At IGS it was found that BWA was performing an enormous number of opens and closes of files, which resulted in extremely poor performance We analysed the problem and concluded that this is likely caused by file locks by the system We recommend that the BWA code be checked and likely modified in order to eliminate this problem
Polymorphism One claimed strength of SHRiMP  is handling substantial polymorphism. We plan on using simulated test data that will include substantial polymorphism in addition to sequencing errors. We plan to run SHRiMP and also other mapping tools on that data and compare sensitivity and specificity.
References  Stephen M. Rumble, Phil Lacroute, Adrian V. Dalca1, Marc Fiume, Arend Sidow, Michael Brudno, SHRiMP: Accurate Mapping of Short Color-space Reads, PLoS Computational Biology, May 2009.  Ben Langmead, Cole Trapnell, Mihai Pop and Steven L Salzberg, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biology 2009.  Heng Li and Richard Durbin, Fast and accurate short read alignment with Burrows–Wheeler transform, Bioinformatics (2009).
References (continued) [ 4] Nils Homer, Barry Merriman, Stanley F. Nelson, BFAST: An Alignment Tool for Large Scale Genome Resequencing, PLoS ONE, 2009.  Laurent Noé* and Gregory Kucherov, Improved hit criteria for DNA local alignment, BMC Bioinformatics 2004, 5:149.  M. Burrows and D.J. Wheeler, A Block-sorting Lossless Data Compression Algorithm, SRC Research Report 124, May 10, 1994, digital, Systems Research Center, 130 Lytton Avenue, Palo Alto, California 94301,