Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba.

Similar presentations


Presentation on theme: "A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba."— Presentation transcript:

1 A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba C. M. A. Melo, and Ricardo P. Jacobi Publisher: IEEE TRANSACTIONS ON COMPUTERS 2010 Presenter: Chin-Chung Pan Date: 2011/04/20

2 Outline 2  Introduction  The DIALIGN Algorithm  Related Work  Design of the FPGA-Based Architectures  The DIALIGN-Score Architecture  Executing DIALIGN in Linear Space  The DIALIGN-Alignment Architecture  Experimental Results

3 Introduction  SW is the most widely used exact method to locally align two sequences, and it is very accurate if the sequences have a single common region of high similarity. However, if the sequences share more than one region of high similarity, SW is not very effective.  DIALIGN can be used for either local or global alignment as well as pairwise or multiple sequence alignment. One drawback of DIALIGN is that it is slower than SW. To overcome this, alternatives have been proposed to run DIALIGN in parallel and to combine it with a fast local search similarity tool.  We propose and evaluate two FPGA-based accelerators executing DIALIGN in linear space: one to obtain the optimal DIALIGN score and one to retrieve the DIALIGN alignment. 3

4 The DIALIGN Algorithm 4

5  DIALIGN (DIAgonal ALIGNment) is a method for sequence alignment that searches for fragments (or diagonals) that have no gaps and aligns them.  For each DIALIGN pairwise alignment, it is necessary to calculate the relevance of each diagonal found before attempting to align it. This is done through the equation E(l, m) = -ln(P(l, m)), where P(l, m) is the probability of a diagonal D of size l have at least m matches.  Weighting the Significance of Diagonals. 5 One may assume p = 0.25 for nucleic acid sequences and p = 0.05 for proteins.

6 The DIALIGN Algorithm  For every pair of positions (i,j) with 1 ≦ i ≦ L 1 and 1 ≦ j ≦ L 2, all integers k ≧ 0 with k ≦ min(i-1, j-1) for which the diagonal (X i-k, Y j-k ;... ; X i,Y j ) from (i - k, j - k) to (i,j) has a positive weight.  Next, for every pair (i,j) as above, one defines a value ‘‘score(i,j),’’ which is the score of a maximum alignment of the prefixes (X 1,..., X i ) and (Y 1,..., Y j ). 6

7 The DIALIGN Algorithm  The last fragment D k which is aligned in position (i, j) is recovered by the function prec(i, j) = D k. For each fragment D k aligned in position (i, j), prec(i, j) chooses the chain of fragments with the greatest score to date. 7

8 The DIALIGN Algorithm  X = C T G, Y = C G. 8 CG C{C, C}{C, G} T{T, C}{T, G} {CT, CG} G{G, C}{G, G} {TG, CG} CG C1.3860.288 T 0.288 0.827 G0.2881.386 0.827 Possible diagonals for every position Diagonal weights of each position CG C1.386 T 1.674 0.827 G1.3862.772 0.827 Scores at each position Result : CTG C ─ G

9 Related Work 9

10 Design of the FPGA-Based Architectures  In the case of DIALIGN, the recurrence relations are more complex and involve a set of conditional statements. For this reason, the time needed for each PE to complete its operations can greatly vary.  We propose the use of wavefront array processors instead of systolic arrays, for our FPGA-based architectures that execute DIALIGN.  We claim that wavefront array processors are better suited to deal with our problem since communication between processing elements is asynchronous, occurring exactly when output data are available. 10

11 Design of the FPGA-Based Architectures  Wavefront array processors 11

12 The DIALIGN-Score Architecture 12

13 Executing DIALIGN in Linear Space  It’s only stores the maximum score, the row where it occurs and its, and the ending position of the preceding fragment of the fragment that has the highest score in column j is stored.  The area that comprises rows 1 to 15 and columns 1 to 17 needs to be reprocessed. 13

14 The DIALIGN-Alignment Architecture 14

15 Experimental Results - Dataset 15

16 Experimental Results - Results for DIALIGN-Score 16

17 Experimental Results - Results for DIALIGN-Alignment 17


Download ppt "A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space Author: Azzedine Boukerche, Jan M. Correa, Alba."

Similar presentations


Ads by Google