SOAP3-dp Workflow.

Slides:



Advertisements
Similar presentations
CS179: GPU Programming Lecture 5: Memory. Today GPU Memory Overview CUDA Memory Syntax Tips and tricks for memory handling.
Advertisements

Speed, Accurate and Efficient way to identify the DNA.
Programming with Alice Computing Institute for K-12 Teachers Summer 2011 Workshop.
Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Bowen Yu Programming Practice Midterm, 7/30/2013.
May 7, A Real Problem  What if you wanted to run a program that needs more memory than you have?
1 A Real Problem  What if you wanted to run a program that needs more memory than you have?
C++ Sets and Multisets Set containers automatically sort their elements automatically. Multisets allow duplication of elements whereas sets do not. Usually,
High Throughput Sequencing Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Sequence Alignment in DNA Under the Guidance of : Prof. Kolin Paul Presented By: Lalchand Gaurav Jain.
Ultrafast and memory-efficient alignment of short DNA sequences to the human genome Ben Langmead, Cole Trapnell, Mihai Pop, Steven L Salzberg 林恩羽 宋曉亞 陳翰平.
Wednesday, 12/4/02, Slide #1 CS 106 Intro to Comp. Sci. 1 Wednesday, 12/4/02  Questions?  Return Test #2  General discussion of HW #05  Introduction.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Insertion sort, Merge sort COMP171 Fall Sorting I / Slide 2 Insertion sort 1) Initially p = 1 2) Let the first p elements be sorted. 3) Insert the.
Memory Management Design & Implementation Segmentation Chapter 4.
Aki Hecht Seminar in Databases (236826) January 2009
Chapter 8.3: Memory Management
CS 333 Introduction to Operating Systems Class 11 – Virtual Memory (1)
Sorting and Searching Timothy J. PurcellStanford / NVIDIA Updated Gary J. Katz based on GPUTeraSort (MSR TR )U. of Pennsylvania.
1 Lecture Today’s topic Arrays Reading for this Lecture: –Chaper 11.
Relations and Functions
Gregex: GPU based High Speed Regular Expression Matching Engine Date:101/1/11 Publisher:2011 Fifth International Conference on Innovative Mobile and Internet.
Chap 8 Memory Management. Background Program must be brought into memory and placed within a process for it to be run Input queue – collection of processes.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
NGS Analysis Using Galaxy
Detecting copy number variations using paired-end sequence data Nick Furlotte CS224 May 29, 2009.
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
A Simple Two-Pass Assembler
MES Genome Informatics I - Lecture V. Short Read Alignment
Computer Architecture and the Fetch-Execute Cycle
Computer Architecture and the Fetch-Execute Cycle
CIS 565 Fall 2011 Qing Sun
Background Program must be brought into memory and placed within a process for it to be run. Input queue – collection of processes on the disk that are.
A sample data structure for N-level page tables. Sample Data Structure PageTable – Contains information about the tree Level – A structure describing.
STL multimap Container. STL multimaps multimaps are associative containers –Link a key to a value –AKA: Hashtables, Associative Arrays –A multimap allows.
1 Virtual Memory and Address Translation. 2 Review Program addresses are virtual addresses.  Relative offset of program regions can not change during.
8.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles Implementation of Page Table Page table is kept in main memory Page-table base.
CE Operating Systems Lecture 14 Memory management.
Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.
1 Memory Management (b). 2 Paging  Logical address space of a process can be noncontiguous; process is allocated physical memory whenever the latter.
P p Chapter 11 discusses several ways of storing information in an array, and later searching for the information. p p Hash tables are a common approach.
ESRI User Conference 2004 ArcSDE. Some Nuggets Setup Performance Distribution Geodatabase History.
CS201: Data Structures and Discrete Mathematics I Hash Table.
QCAdesigner – CUDA HPPS project
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Lecture 26 Virtual Machine Monitors. Virtual Machines Goal: run an guest OS over an host OS Who has done this? Why might it be useful? Examples: Vmware,
Memory Management. Background Memory consists of a large array of words or bytes, each with its own address. The CPU fetches instructions from memory.
8.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Fragmentation External Fragmentation – total memory space exists to satisfy.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Background Program must be brought into memory and placed within a process for it to be run. Input queue – collection of processes on the disk that are.
Domain: a set of first elements in a relation (all of the x values). These are also called the independent variable. Range: The second elements in a relation.
Short Read Workshop Day 5: Mapping and Visualization
IP Logical Networks COMP 3270 Computer Networks Computing Science Thompson Rivers University.
Chapter 8: Memory Management. 8.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 8: Memory Management Background Swapping Contiguous.
Chapter 7: Main Memory CS 170, Fall Program Execution & Memory Management Program execution Swapping Contiguous Memory Allocation Paging Structure.
Computational Challenges in BIG DATA 28/Apr/2012 China-Korea-Japan Workshop Takeaki Uno National Institute of Informatics & Graduated School for Advanced.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition, Chapter 8: Memory- Management Strategies.
Short Read Workshop Day 5: Mapping and Visualization Video 3 Introduction to BWA.
Basic Paging (1) logical address space of a process can be made noncontiguous; process is allocated physical memory whenever the latter is available. Divide.
Next Generation Sequencing Analysis
Chapter 8: Main Memory.
VCF format: variants c.f. S. Brown NYU
Architecture Background
CS510 Operating System Foundations
Canadian Bioinformatics Workshops
Relation (a set of ordered pairs)
Cache writes and examples
Dirty COW Race Condition Attack
Presentation transcript:

SOAP3-dp Workflow

Pair up the seed alignments SOAP3-dp workflow for paired-end alignment Step 1: Use SOAP3 to align pair-ended reads paired alignments Paired-end reads chr 6, +4,059, -4,369; ............ …………. SOAP3 (2-mismatch) ………………….. Step 2: For reads with one end mapped but another not, use Default-DP to align the unmapped ends One ends’ alignments Default-DP chr 9, +49,538; ……….. …….…. paired alignments mapped region of one end candidate region for the unmapped end chr 9, +49,538, -49,829; ............ …………. + The unmapped ends chr 9 + 49,538 ………. use DP to align Step 3: For reads with both ends unaligned, use SOAP3 to align the seeds and then use Deep-DP to align both ends seed alignments of first end seed alignments of second end SOAP3 (1-mismatch) chr 18, +349,683; ............ …………. + chr 18, -349,998; ............ …………. seeds Pair up the seed alignments Deep-DP paired alignments candidate region chr 18 paired seed alignments chr 18, +349,664, -349,923; ............ …………. 349,683 + 349,998 - chr 18, +349,683, -349,998; ............ …………. use DP to align

Step 1: SOAP3 Both ends can be mapped and paired properly Report the alignments Only one end can be mapped with not too many hits (i.e. <= 30) Store the readID (of aligned end) and hits to ARRAY A SOAP3 (2-mismatch) Only one end can be mapped with too many hits (i.e. > 30) A read pair is paired properly if: Both ends are mapped within the insert size (i.e. a range of distance between two ends inputted by the user). In proper orientation (for illumina reads, the end aligned to left side is in forward strand, while another aligned to right in reverse strand.) Store the readID ( of aligned end) and hits to ARRAY B both ends cannot be mapped Store the readID (of the first read of the pairs) and hits to ARRAY C Both ends can be mapped but not paired properly Store the readID and hits to ARRAY A or B (describe more in next slides)

Step 1: SOAP3 -- Both ends can be mapped but not paired properly read 1 read 2 Not paired properly YES YES Let x = # of all valid hits of read 1 Let y = # of all valid hits of read 2 If x > 30, only retains the best hits of read 1 and reset x = # of best hits of read 1. If y > 30, only retains the best hits of read 2 and reset y = # of best hits of read 2. a) x,y <= 30 YES NO NO YES ARRAY A b) x <= 30 < y YES NO ARRAY A c) y <= 30 < x NO YES ARRAY A d) 30 < x < y YES NO ARRAY B e) 30 < y <= x NO YES ARRAY B Store the read ID and hits of YES to ARRAY A or B

default DP and new default DP Step 2 and step 3: default DP and new default DP Both ends can be mapped and paired properly Report the alignments Otherwise Default DP Array A Store the readID of the first read of the pairs to ARRAY C Both ends can be mapped and paired properly Report the alignments Otherwise New default DP Array B Store the readID of the first read of the pairs to ARRAY C

Detailed picture of Default DP and New Default DP For reads with one end mapped but another not, AND the number of hits is not too many, use Default-DP to align the unmapped ends One ends’ alignments Default-DP chr 9, +49538; ……….. …….…. paired alignments mapped region of one end candidate region for the unmapped end chr 9, +49538, -49829; ............ …………. + The unmapped ends chr 9 + 49538 ………. use DP to align For reads with one end mapped but another not, AND the number of hits is too many, use New-Default-DP to align the unmapped ends seed alignments of unmapped end One ends’ alignments The unmapped ends SOAP3 (1-mismatch) chr 18, -349998; ............ …………. chr 18, +349683; ……….. …….…. seeds + seeds Pair up the seed alignments with the alignments of another end New-Default-DP paired alignments candidate region chr 18 chr 18, +349683, -349923; ............ …………. 349683 + 349998 - chr 18, +349683, -349998; ............ …………. mapped region of one end use DP to align

Step 4: 2-level Deep DP ARRAY C ROUND 1 SEEDING for both ends Seed length: 26 Sample rate: 1/13 Max # of hits allowed: 100 If (1) there exists a seed with too many hits; AND (2) no pairs of hits within insert size. If there exists pairs of hits within insert size. If there exists pairs of hits within insert size. Perform DP for those pairs of hits within insert size. ROUND 2 SEEDING for both ends Seed length: 30 Sample rate: 1/15 Max # of hits allowed: 1000 Case 1: Valid paired alignments found Report the alignments Case 2: No valid paired alignment found Store the readID of both ends to ARRAY D

Report the ends cannot be aligned Step 5: Single DP The end can be mapped Report the alignments Otherwise Single DP Array D Report the ends cannot be aligned

Detailed picture of Single DP seed alignments SOAP3 (1-mismatch) chr 18, +349,683; ............ …………. seeds Single-DP Report the alignments Candidate region Chr18 chr 18, +349,664; ............ …………. 349,683 + use DP to align

Paired-end alignment (overall workflow) Load 6M reads (3M pairs) SOAP3 (2-mismatch) Create a new CPU thread to load next 6M reads New default DP Note: New-default DP needs 2BWT in GPU, while default DP does not. Thus we run new-default DP before default DP, because after SOAP3, 2BWT index is already inside GPU. Default DP 2-level deep DP single DP Yes More reads to process? No END

SOAP3 Architecture …….. …….. Host (CPU) Device (GPU) 2BWT + SA 2BWT Execution Host (CPU) Execution Device (GPU) Memory-resident data structures Memory-resident data structures 2BWT + SA 2BWT Process 1M reads for round 1 and round 2 alignments Process 1M reads for round 1 and round 2 alignments Process round 3 alignment & Report results Process round 3 alignment & report results Process 1M reads for round 1 and round 2 alignments Mention hard reads may be processed in multiple GPU rounds Mention there are many cases Process round 3 alignment & report results Process 1M reads for round 1 and round 2 alignments Process round 3 alignment & report results Process 1M reads for round 1 and round 2 alignments …….. ……..

DP with seeding …….. …….. Host (CPU) Device (GPU) 2BWT + SA Execution Host (CPU) Execution Device (GPU) Memory-resident data structures Memory-resident data structures 2BWT + SA 2BWT / DP tables Copy 2BWT index to GPU & Extract seeds of reads in Array C SOAP3 (1-mismatch) Process 1M seeds for round 1 and round 2 alignments Process 1M seeds for round 1 and round 2 alignments Process round 3 alignment Mention hard reads may be processed in multiple GPU rounds Mention there are many cases …….. …….. Pair-up the seed alignments, Clear 2BWT index in GPU & Create DP tables in GPU Perform DP between the reads and the candidate regions

Perform DP between the reads and the candidate regions Default DP Execution Host (CPU) Execution Device (GPU) Memory-resident data structures Memory-resident data structures 2BWT + SA DP tables Create DP tables in GPU Perform DP between the reads and the candidate regions Mention hard reads may be processed in multiple GPU rounds Mention there are many cases

Load 6M single-end reads Single-end alignment (overall workflow) Load 6M single-end reads SOAP3 (2-mismatch) Create a new CPU thread to load next 6M reads single DP Yes More reads to process? No END

Paired-end alignment (For read length > 150) Load 6M reads (3M pairs) 2-level deep DP Create a new CPU thread to load next 6M reads single DP Yes More reads to process? No END