Presentation is loading. Please wait.

Presentation is loading. Please wait.

Challenging Cloning Related Problems with GPU-Based Algorithms

Similar presentations


Presentation on theme: "Challenging Cloning Related Problems with GPU-Based Algorithms"— Presentation transcript:

1 Challenging Cloning Related Problems with GPU-Based Algorithms
Authors : Thierry Lavoie、Michael Eilers-Smith、Ettore Merlo Publisher: ACM IWSC’10 Presenter: Ye-Zhi Chen Date: 2011/12/21

2 Introduction This paper describes an implementation of the Smith-Watterman algorithm for proper clone filtering

3 Algorithm To address the clone detection false positives problem by an appropriate filtering technique ; the DP-matching seemed to be an interesting choice - A B C X 1 2 3

4 Algorithm

5 Algorithm GPU DP-matching :
Find what cells of the matrix are free of computational dependencies in order to compute their values on separate cores simultaneously It is simple to check that every cells on the anti-diagonals become free of any computational dependencies at the same moment because their value is solely dependent on the cells of the previous anti-diagonals.

6 Algorithm Let Vk represents the linear buffer computed at step k. Let fk be the following map between the Indexes of V and those of the matrix D : u can be seen as the index of threads , s1 and s2 ‘s first character are gaps

7 Algorithm - A B C X 1 2 3

8 The characters which are compared
top left Upper left

9 Algorithm Worst case problem:
The worst case of the classical DP-matching algorithm has a quadratic running time. In the general worst case, the GPU-based implementation also has a running quadratic worst time. However, since a large number of cores perform the computation at the same time, the hidden quadratic constant can be divided by a large factor

10 Algorithm On very small instances of DP-matching problems, the CPU might outrun the GPU, mostly because of memory bandwidth limitations If computation on such very small instances is to be performed on a basis of one string matched against a set of strings, there’s a way of packing the data on the GPU to make the total computation more efficient.

11 p = len(ci) − max(len(cj)|cj ∈ C)
Algorithm Let C be a set of strings and let c0 be an element of C. Lets define C’ as: C ’= C − {c0} The problem is then defined as matching c0 against all ci in C’. Practical implementations need to pad the strings to be matched.This will enforce the number of computational steps k to be the same in each sub matrix. The length of the padding p of a ci is defined as follow: p = len(ci) − max(len(cj)|cj ∈ C) Each padded ci of C’ is then concatenated to each other separated by a special blank character

12 k’s initial value is not 0,the initial value is
|C’-1|*(max(len(ci)|ci∈C)+1) the number of computational steps k is reduced to 2*(max(len(ci)|ci ∈ C))-1

13

14 the indexes γ corresponding
to these cells can be evaluated with this equation: γ = x ∗ (max(len(ci)|ci ∈ C) + 1) ∀ x ∈ {0..|C| − 1}

15 EXPERIMENTAL Equipment:
Intel Core 2 Duo computer 3.00 GHz with 6MB of cache, 3GB of RAM and a GeForce 8800GT


Download ppt "Challenging Cloning Related Problems with GPU-Based Algorithms"

Similar presentations


Ads by Google