Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg 林恩羽 宋曉亞 陳翰平.

Slides:



Advertisements
Similar presentations
John Dorband, Yaacov Yesha, and Ashwin Ganesan Analysis of DNA Sequence Alignment Tools.
Advertisements

SeqMapReduce: software and web service for accelerating sequence mapping Yanen Li Department of Computer Science, University of Illinois at Urbana-Champaign.
SCHOOL OF COMPUTING ANDREW MAXWELL 9/11/2013 SEQUENCE ALIGNMENT AND COMPARISON BETWEEN BLAST AND BWA-MEM.
Wilma Hodges  Began faculty training and moving content in Nov  Original plan was to be fully migrated to Sakai by.
Group 1 (1)陳伊瑋 (2)沈國曄 (3)唐婉馨 (4)吳彥緯 (5)魏銘良
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
1 ALAE: Accelerating Local Alignment with Affine Gap Exactly in Biosequence Databases Xiaochun Yang, Honglei Liu, Bin Wang Northeastern University, China.
Introduction to Short Read Sequencing Analysis
Novell Server Linux vs. windows server 2008 By: Gabe Miller.
World’s Leading Provider of Turn-key Compute Solutions for NGS / Bioinformatics.
Design Goals Crash Course: Reference-guided Assembly.
Bowtie2: Extending Burrows-Wheeler-based read alignment to longer reads and gapped alignments Ben Langmead 1, 2, Mihai Pop 1, Rafael A. Irizarry 2 and.
Ultrafast and memory-efficient alignment of short reads to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg Center for Bioinformatics.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, and Steven L. Salzberg Center.
Capacity Planning in SharePoint Capacity Planning Process of evaluating a technology … Deciding … Hardware … Variety of Ways Different Services.
SOAP3-dp Workflow.
F1031 COMPUTER HARDWARE CLASSES OF COMPUTER. Classes of computer Mainframe Minicomputer Microcomputer Portable is a high-performance computer used for.
Next generation sequencing Xusheng Wang 4/29/2010.
Presented by Mario Flores, Xuepo Ma, and Nguyen Nguyen.
Li and Dewey BMC Bioinformatics 2011, 12:323
Comp-TIA Standards.  AMD- (Advanced Micro Devices) An American multinational semiconductor company that develops computer processors and related technologies.
Human SNPs from short reads in hours using cloud computing Ben Langmead 1, 2, Michael C. Schatz 2, Jimmy Lin 3, Mihai Pop 2, Steven L. Salzberg 2 1 Department.
Introduction to Short Read Sequencing Analysis
Massively Parallel Mapping of Next Generation Sequence Reads Using GPUs Azita Nouri, Reha Oğuz Selvitopi, Özcan Öztürk, Onur Mutlu, Can Alkan Bilkent University,
Massive Parallel Sequencing
InCoB August 30, HKUST “Speedup Bioinformatics Applications on Multicore- based Processor using Vectorizing & Multithreading Strategies” King.
Aligning Reads Ramesh Hariharan Strand Life Sciences IISc.
Hugh E. Williams and Justin Zobel IEEE Transactions on knowledge and data engineering Vol. 14, No. 1, January/February 2002 Presented by Jitimon Keinduangjun.
TopHat Mi-kyoung Seo. Today’s paper..TopHat Cole Trapnell at the University of Washington's Department of Genome Sciences Steven Salzberg Center.
SIGNAL PROCESSING FOR NEXT-GEN SEQUENCING DATA RNA-seq CHIP-seq DNAse I-seq FAIRE-seq Peaks Transcripts Gene models Binding sites RIP/CLIP-seq.
OCR GCSE Computing © Hodder Education 2013 Slide 1 OCR GCSE Computing Chapter 2: CPU.
1 NETTAB 2012 FILTERING WITH ALIGNMENT FREE DISTANCES FOR HIGH THROUGHPUT DNA READS ASSEMBLY Maria de Cola, Giovanni Felici, Daniele Santoni, Emanuel Weitschek.
Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.
Error model for massively parallel (454) DNA sequencing Sriram Raghuraman (working with Haixu Tang and Justin Choi)
BNFO 615 Usman Roshan. Short read alignment Input: – Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Next Generation Sequencing
Achim Tresch Computational Biology ‘Omics’ - Analysis of high dimensional Data.
billion-piece genome puzzle
The Million Point PI System – PI Server 3.4 The Million Point PI System PI Server 3.4 Jon Peterson Rulik Perla Denis Vacher.
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.
Short Read Mapping On Post Genomics Datasets
P.M. VanRaden and D.M. Bickhart Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD, USA
Big Data Bioinformatics By: Khalifeh Al-Jadda. Is there any thing useful?!
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences (upto a few hundred base pairs (bp)) produced by a sequencing machine.
Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.
Qq q q q q q q q q q q q q q q q q q q Background: DNA Sequencing Goal: Acquire individual’s entire DNA sequence Mechanism: Read DNA fragments and reconstruct.
Indexing genomic sequences 逢甲大學 資訊工程系 許芳榮. Outline Introduction Unique markers Multi-layer unique markers Locating SNP on genome Aligning EST to genome.
Short Read Workshop Day 5: Mapping and Visualization
From Reads to Results Exome-seq analysis at CCBR
Short Read Workshop Day 5: Mapping and Visualization Video 3 Introduction to BWA.
RNAseq: a Closer Look at Read Mapping and Quantitation
1 BWT Arrays and Mismatching Trees: A New Way for String Matching with k Mismatches 1Yangjun Chen, 2Yujia.
Computing challenges in working with genomics-scale data
FastHASH: A New Algorithm for Fast and Comprehensive Next-generation Sequence Mapping Hongyi Xin1, Donghyuk Lee1, Farhad Hormozdiari2, Can Alkan3, Onur.
Short Read Sequencing Analysis Workshop
SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data - Aditi Thuse.
The short-read alignment in distributed memory environment
Pairwise and NGS read alignment
Hadoop Clusters Tess Fulkerson.
GateKeeper: A New Hardware Architecture
MapView: visualization of short reads alignment on a desktop computer
CSC2431 February 3rd 2010 Alecia Fowler
Next-generation sequencing - Mapping short reads
Maximize read usage through mapping strategies
Next-generation sequencing - Mapping short reads
CS 6293 Advanced Topics: Translational Bioinformatics
Run time performance for all benchmarked software.
Toward Accurate and Quantitative Comparative Metagenomics
Accelerating Regular Path Queries using FPGA
Presentation transcript:

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg 林恩羽 宋曉亞 陳翰平

Rationale  Existing method(Maq, SOAP, …)  computational cost of many short reads is large  Align the 140 billion bases  Maq: 5 cpu-months  SOAP: 3 cpu-years  Clear need for new tools that consume less time and computational resources

Bowtie  Ultrafast, memory-efficient  Burrows-Wheeler indexing  25 million reads per CPU hour with a memory footprint of approximately 1.3 GB

Bowtie  Aligns 35-base pair reads  25 million / hour (35 times faster than Maq / 300 times faster than SOAP)  Small footprint allows Bowtie to run on PC with 2 GB of RAM  Multiple processor cores  to achieve greater alignment speed

Compromise  Fail to align a small number of reads with valid alignment, if those reads have multiple mismatches  Options that increase accuracy at cost some performance

Environment PCServer Intel Core 2 2.4GHzAMD Opteron (4-core) 2GB RAM32GB RAM Red Hat Enterprise Linux AS Release 4

Evaluation  Wall-clock time, CPU time  Time to build index is excluded  Reuse pre-computed index  Human  Chimp  Mouse  Dog  Rat  Arabidopsis thaliana

Comparison to SOAP and Maq  1,000 Genomes project (NCBI Short Read Archive: SRR001115)  8.84 million reads  Trimmed to 35bp  Aligned to Human Reference Genome (build 36.3)

Bowtie V.S. SOAP 99.7% were aligned by both 0.2% were aligned by Bowtie 0.1% were aligned by SOAP Bowtie V.S. Maq 96.0% were aligned by both 0.1% were aligned by Bowtie 3.9% were aligned by Maq Maq is more flexible  allows 3 mismatches

Maq with filtered read set  ‘poly-A’ artifacts  ‘catfilter’  438,145 reads

Read length and performance  Lengths of reads supported:  Bowtie – 1024bp  Maq – 127bp (v0.7.0)  SOAP – 60bp  Align 36-bp set, 50-bp set, 76-bp set to human genome  36-bp set (SRR003084)  50-bp set (SRR003092)  76-bp set (SRR003196)

Parameters  Bowtie  ‘-v 2’  allows 2 mismatches [SOAP]  ‘--maxns 5’  filter out reads with 5 or more no- confidence bases [SOAP]  ‘-z’  only forward index is resident in memory at one time [Maq]

Parallel performance  Bowtie allows the user to specify a desired number of threads  The memory image of the index is shared by all threads

Parallel performance

Index building  Bowtie uses a flexible indexing algorithm that can be configured to trade off between memory usage and running time

Discussion  Unlike SOAP, Bowtie‘s 1.3 GB memory footprint allows it to run on a typical PC with 2 GB of RAM  Unlike many other short-read aligners, Bowtie creates a permanent index of the reference that may be re-used across alignment runs

Discussion  Bowtie‘s speed and small memory footprint are due chiefly to its use of the Burrows- Wheeler index  Does not yet support paired-end alignment or alignments with insertions or deletions