Speed, Accurate and Efficient way to identify the DNA.

Slides:

Advertisements

Similar presentations

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE408 / CS483 Applied Parallel Programming.

Advertisements

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 25, 2011 DeviceRoutines.pptx Device Routines and device variables These notes will introduce:

GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.

Intermediate GPGPU Programming in CUDA

Complete Unified Device Architecture A Highly Scalable Parallel Programming Framework Submitted in partial fulfillment of the requirements for the Maryland.

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 28, 2011 GPUMemories.ppt GPU Memories These notes will introduce: The basic memory hierarchy.

1 ITCS 5/4145 Parallel computing, B. Wilkinson, April 11, CUDAMultiDimBlocks.ppt CUDA Grids, Blocks, and Threads These notes will introduce: One.

GPU programming: CUDA Acknowledgement: the lecture materials are based on the materials in NVIDIA teaching center CUDA course materials, including materials.

GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.

Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:

2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.

CUDA Programming Model Xing Zeng, Dongyue Mou. Introduction Motivation Programming Model Memory Model CUDA API Example Pro & Contra Trend Outline.

Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.

Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 18: Application-Driven Hardware Acceleration (4/4)

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

Programming with CUDA WS 08/09 Lecture 5 Thu, 6 Nov, 2008.

CUDA Grids, Blocks, and Threads

Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.

Programming with CUDA WS 08/09 Lecture 3 Thu, 30 Oct, 2008.

To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,

Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.

An Introduction to Programming with CUDA Paul Richmond

Nvidia CUDA Programming Basics Xiaoming Li Department of Electrical and Computer Engineering University of Delaware.

GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.

David Luebke NVIDIA Research GPU Computing: The Democratization of Parallel Computing.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.

Massively Parallel Mapping of Next Generation Sequence Reads Using GPUs Azita Nouri, Reha Oğuz Selvitopi, Özcan Öztürk, Onur Mutlu, Can Alkan Bilkent University,

© David Kirk/NVIDIA and Wen-mei W. Hwu Taiwan, June 30-July 2, Taiwan 2008 CUDA Course Programming Massively Parallel Processors: the CUDA experience.

Algorithm Engineering „GPGPU“ Stefan Edelkamp. Graphics Processing Units  GPGPU = (GP)²U General Purpose Programming on the GPU  „Parallelism for the.

CUDA All material not from online sources/textbook copyright © Travis Desell, 2012.

+ CUDA Antonyus Pyetro do Amaral Ferreira. + The problem The advent of multicore CPUs and manycore GPUs means that mainstream processor chips are now.

CIS 565 Fall 2011 Qing Sun

Genetic Programming on General Purpose Graphics Processing Units (GPGPGPU) Muhammad Iqbal Evolutionary Computation Research Group School of Engineering.

CUDA Optimizations Sathish Vadhiyar Parallel Programming.

Automatic translation from CUDA to C++ Luca Atzori, Vincenzo Innocente, Felice Pantaleo, Danilo Piparo 31 August, 2015.

GPU Architecture and Programming

Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.

GPU Programming and CUDA Sathish Vadhiyar Parallel Programming.

Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.

Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE498AL, University of Illinois, Urbana-Champaign 1 ECE498AL Lecture 4: CUDA Threads – Part 2.

CUDA. Assignment  Subject: DES using CUDA  Deliverables: des.c, des.cu, report  Due: 12/14,

 Genetic Algorithms  A class of evolutionary algorithms  Efficiently solves optimization tasks  Potential Applications in many fields  Challenges.

Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.

CUDA Basics. Overview What is CUDA? Data Parallelism Host-Device model Thread execution Matrix-multiplication.

Killdevil Running CUDA programs on cluster. Requesting permission bin/unc_id/services bin/unc_id/services.

Parallel Programming Basics  Things we need to consider:  Control  Synchronization  Communication  Parallel programming languages offer different.

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.

Martin Kruliš by Martin Kruliš (v1.0)1.

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

Lecture 8 : Manycore GPU Programming with CUDA Courtesy : SUNY-Stony Brook Prof. Chowdhury’s course note slides are used in this lecture note.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408/CS483, University of Illinois, Urbana-Champaign 1 ECE 8823A GPU Architectures Module 2: Introduction.

1 ITCS 5/4010 Parallel computing, B. Wilkinson, Jan 14, CUDAMultiDimBlocks.ppt CUDA Grids, Blocks, and Threads These notes will introduce: One dimensional.

Programming with CUDA WS 08/09 Lecture 2 Tue, 28 Oct, 2008.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

GPGPU Programming with CUDA Leandro Avila - University of Northern Iowa Mentor: Dr. Paul Gray Computer Science Department University of Northern Iowa.

Computer Engg, IIT(BHU)

CS427 Multicore Architecture and Parallel Computing

Lecture 2: Intro to the simd lifestyle and GPU internals

CS/EE 217 – GPU Architecture and Parallel Programming

ECE498AL Spring 2010 Lecture 4: CUDA Threads – Part 2

ECE 8823A GPU Architectures Module 3: CUDA Execution Model -I

Chapter 4:Parallel Programming in CUDA C

6- General Purpose GPU Programming

Reconfigurable Computing (EN2911X, Fall07)

Presentation transcript:

Speed, Accurate and Efficient way to identify the DNA

DNA Overview. Sequence Alignment. Problem & Previous Solutions. GPU & CUDA. Implemented Solution. GUI (Ribbon). Results.

DNA Overview. Sequence Alignment. Problem & Previous Solutions. GPU & CUDA. Implemented Solution. GUI (Ribbon). Results.

Describing the genetic information for cell growth, division and functions. Diagnoses the case of an organism or a human, for example: - check if he has certain disease such as cancer or not. feature of the human body. -Such as ( height, eye color, the shape of the nose, hair, skin color, gender,……. ).

Chromosomes Genes Nucleotide bases Adenine (A) Guanine (G) Cytosine (C) Thymine (T).

Genes structure

FASTA format is a text-based format used to represent any type of sequences as DNA.

DNA Overview. Sequence Alignment. Problem & Previous Solutions. GPU & CUDA. Implemented Solution. GUI (Ribbon). Results.

biological sequences develop from preexisting sequences instead of being invented by nature from the beginning. Three types of changes can occur at any given position within a sequence: –Point mutations. –Insertion. –Deletions. Two identical characters produces a match, Two different nonblank characters produces a mismatch, and a blank is called an indel (insertion/deletion) or gap.

Global Sequence Alignment –Needleman-Wunsch Algorithm Local Sequence Alignment –Smith-Waterman Algorithm

DNA Overview. Sequence Alignment. Problem & Previous Solutions. GPU & CUDA. Implemented Solution. GUI (Ribbon). Results.

The computational cost is very high, requiring a number of operations proportional to the product of the length of two sequences. The algorithm has a complexity of O(NxM) Previous solutions: –FPGA: High cost. Not suitable for all users –Approximated algorithms: Less accurate Current Solution: Parallelization on Graphics Cards.

DNA Overview. Sequence Alignment. Problem & Previous Solutions. GPU & CUDA. Implemented Solution. GUI (Ribbon). Results.

GPU ( Graphics Processing Unit) GPU is viewed as a compute device operating as a coprocessor to the main CPU (host). CPU and GPU are separate devices with separate memory.

CUDA Compute Unified Device Architecture CUDA is NVidia's scalable parallel programming model and a software environment for parallel computing. Language: CUDA C, minor extension to C/C++. A heterogeneous serial-parallel programming model.

CUDA CUDA program = serial code + parallel kernels (all in CUDA C). -Serial C code executes in a host thread (CPU thread). - Parallel kernel code executes in many device threads (GPU threads).

CUDA ARCHITECTURE Blocks and grids may be 1d, 2d, or 3d. Blocks and grids may be 1d, 2d, or 3d. gridDim, blockIdx, blockDim, threadIdx. gridDim, blockIdx, blockDim, threadIdx. Threads/blocks have unique IDs. Threads/blocks have unique IDs.

CUDA Kernels A kernel is a function executed on the CUDA device. Threads are grouped into warps of 32 threads. -Warps are grouped into thread blocks. -Thread blocks are grouped into grids. Each kernel has access to certain variables that define its position. - threadIdx.x. - blockIdx.x. -gridDim.x, blockDim.x.

Kernel Call Syntax Kernels are called with the >> syntax. Function name >>(arg[1],arg[2],…). Where: Dg = dimensions of the grid (type dim3). Db = dimensions of the block (type dim3).

Function Type Qualifiers The kernel was defined as __global__. This specifies that the function runs on the device and is callable from the host only. __device__ and __host__ are other available qualifiers. __device__ - executed on device, callable only from device. __host__ - default if not specified. Executed on host, callable from host only.

CUDA PROGARM ING Basic steps Transfer data from CPU to GPU. Explicitly call the GPU kernel designed - CUDA will implicitly assign threads to each multiprocessor and assign resources for computations. Transfer results back from GPU to CPU.

GPU ( Graphics Processing Unit) GPU is viewed as a compute device operating as a coprocessor to the main CPU (host). CPU and GPU are separate devices with separate memory.

CUDA Compute Unified Device Architecture CUDA is NVidia's scalable parallel programming model and a software environment for parallel computing. Language: CUDA C, minor extension to C/C++. A heterogeneous serial-parallel programming model.

CUDA CUDA program = serial code + parallel kernels (all in CUDA C). -Serial C code executes in a host thread (CPU thread). - Parallel kernel code executes in many device threads (GPU threads).

CUDA ARCHITECTURE Blocks and grids may be 1d, 2d, or 3d.Blocks and grids may be 1d, 2d, or 3d. gridDim, blockIdx, blockDim, threadIdx.gridDim, blockIdx, blockDim, threadIdx. Each kernel has access to certain variables that define its position.Each kernel has access to certain variables that define its position. - threadIdx.x. - threadIdx.x. - blockIdx.x. - blockIdx.x. -gridDim.x, blockDim.x. -gridDim.x, blockDim.x.

CUDA Kernels A kernel is a function executed on the CUDA device. Threads are grouped into warps of 32 threads. -Warps are grouped into thread blocks. -Thread blocks are grouped into grids.

Kernel Call Syntax Kernels are called with the >> syntax. >>. Where: Dg = dimensions of the grid (type dim3). Db = dimensions of the block (type dim3).

Function Type Qualifiers The kernel was defined as __global__. This specifies that the function runs on the device and is callable from the host only. __device__ and __host__ are other available qualifiers. __device__ - executed on device, callable only from device. __host__ - default if not specified. Executed on host, callable from host only.

DNA Overview. Sequence Alignment. Problem & Previous Solutions. GPU & CUDA. Implemented Solution. GUI (Ribbon). Results.

PARALLELIZATION The sequence alignment algorithm consumes large amount of time For processing. parallelization capabilities found in the GPUs. Parallelization=Performance Two levels of polarization level 1: Paralleling the Database comparison -- Assume 14 sequences in the database

PARALLELIZATION Parallelization inside single sequence comparing. 1. Initializing the data matrix and pointers

PARALLELIZATION

data dependency in the calculation steps d PARALLELIZATION

Implementation of this paralleling part

DNA Overview. Sequence Alignment. Problem & Previous Solutions. GPU & CUDA. Implemented Solution. GUI (Ribbon). Results.

DNA Overview. Sequence Alignment. Problem & Previous Solutions. GPU & CUDA. Implemented Solution. GUI (Ribbon). Results.

Performance

Speed Up

Any Questions ??