Presentation is loading. Please wait.

Presentation is loading. Please wait.

One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Similar presentations


Presentation on theme: "One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap."— Presentation transcript:

1 One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap

2 SAT Problems Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables) Notation: V: array of boolean values; V[3] is the value of the third variable in assignment V EVAL i (V): evaluation function of clause i, returns boolean value resulting from evaluating clause i under assignment V

3 GenSAT procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flipped end end

4 Instances of GenSAT GSAT: CHOOSEFLIP randomly chooses a flip that produces maximal score WSAT: CHOOSEFLIP randomly chooses a violated clause, and randomly chooses among the variables of that clause a flip that produces maximal score GWSAT: choose randomly whether to do GSAT flip or WSAT flip GSAT/Tabu: prevent quick flipping back HSAT: use history for tie breaking: choose least recently flipped variable

5 FPGAs ASICs: application-specific integrated circuits –customer describes logic behavior in a hardware description language such as VHDL –vendor designs and produces integrated circuit with this behavior Masked gate arrays –ASIC with transistors arranged in a grid-like manner –initially unconnected; mass produced –add final conductor layers for connecting components FPGAs: field programmable gate arrays

6 Current Line of FPGAs: Example Xilinx XCV1000 4MBytes on-board RAM max clock rate 300 MHz max clock rate using on-board RAM 33MHz 6144 CLBs (configurable logic blocks) roughly 1M system gates 1 Mbit of distributed RAM each CLB is divided into 2 slices thus 12,288 slices available

7 Programming FPGAs Massively parallel computer with random access memory Instructions are compiled into hardware; no runtime stacks; no functions; no recursion… In practice, hardware description languages like VHDL are used to program FPGAs Newer development: Handel C

8 NESL-like Syntax for Parallelism Pgates for Pdepth of P x:=y+z g(P) = O(1)d(P) = O(1) Q ; Rg(P) = g(Q)+g(P) d(P) = g(Q)+g(R) { e( i ) : i  S} g(P) =  i (g(e(i)))d(P) = max i (d(e(i)))

9 Example Let S be an array of statically known size n, where n is a power of 2. macro SUM(S,n): if n = 1 then S[0] else SUM({ S[2i] + S[2i + 1] : i  [0..n/2-1]}, n/2) g( SUM(S,n ) = O( n ) d( SUM(S,n ) = O(log n )

10 Previous GSAT/FPGA Work Hamadi/Merceron: first non-software design of a local search algorithm; CP 97 Yung/Seung/Lee/Leong: runtime reconfigurable version of Hamadi/Merceron work; first implementation; Conference on Field- programmable Logic and Applications, 1999

11 Naïve Parallel GSAT (Ham/Merc) macro CHOOSEFLIP(f): max := -1; f := -1; for i = 1 to n do score := SUM({EVAL j (V[  V[i]/i] : j  [1…m]}); if score > max  (score = max  RANDOMBIT()) then max := score; f := i end g(CHOOSEFLIP(f)) = O(n m) d(CHOOSEFLIP(f)) = n * (O(log m) + O(log n)) = O(n log m)

12 Step 1: Naïve Random GSAT macro CHOOSEFLIP(f): max := -1; f := -1; MaxV := {0 : k  [1…n]}; for i = 1 to n do score := SUM({EVAL j (V[  V[i]/i] : j  [1…m]}); if score > max then max := score; MaxV := { 0 : k  [1…n]}[1/i] else if score = max then MaxV := MaxV[1/i]end f := CHOOSE_ONE(MaxV) g and d is unchanged; d(CHOOSE_ONE) = O(log n), g = O(n)

13 Step 2: Parallel Variable Scoring macro CHOOSEFLIP(f): Scores := { SUM( {EVAL j (V[  V[i]/i]) : j  [1…m]}) : i  [1…n]}; f := CHOOSE_MAX(Scores); d(CHOOSEFLIP(f)) = O(log m + log n) = O(log m) g(CHOOSEFLIP(f)) = O(m n 2 )

14 Step 3: Relative Scoring Selman/Levesque/Mitchell use a technique of relative scoring in their implementation. First thorough analysis of relative scoring in Hoos’ Diplomarbeit Idea: After every flip, update the score of those variables that are affected by the flip. Since clauses are small, the number of affected variables is much smaller than the overall number of variables

15 Some Notation NCl[i] is the number of clauses that contain the variable i MaxClauses = max i NCl[i]; usually MaxClauses << m MaxVariables = max j (number of vars in clause j) EVAL j C(i) evaluates the j-th clause from the set of clauses that contain the variable i

16 Relative Scoring macro CHOOSE_FLIP(f): NewS := { SUM({EVAL j C(i) (V[  V[i]/i]) : j  [1…NCl[i]}) : i  [1…n] }; OldS := { SUM({EVAL j C(i) (V) : j  [1…NCl[i]}) : i  [1…n] }; Diff := { NewS[i] – OldS[i] : i  [1…n]}; f := CHOOSE_MAX(Diff) g(CHOOSE_FLIP(f)) = O(MaxVars MaxClauses n) d(CHOOSE_FLIP(f)) = O(log MaxClauses + log MaxVars)

17 Step 4: Pipelining procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flipped end end

18 S I Pipelining Outer Loop macro CHOOSE_FLIP(f): NewS := { SUM({EVAL j C(i) (V[  V[i]/i]) : j  [1…NCl[i]}) : i  [1…n] }; OldS := { SUM({EVAL j C(i) (V) : j  [1…NCl[i]}) : i  [1…n] }; Diff := { NewS[i] – OldS[i] : i  [1…n]}; f := CHOOSE_MAX(Diff) STAGE I STAGE II STAGE III STAGE IV S IS IIS IIIS IVS IS IIS IIIS IVS I S IIS IIIS IVS IS IIS IIIS IVS I S IIS IIIS IVS IS IIS IIIS IV S IS IIS IIIS IVS IS IIS III S II … … … … Try 1 Try 2 Try 3 Try 4

19 Preliminary Experiments Conducted on hill-climbing variant of GSAT; Comparing software implementation by Selman/Kautz with Hamadi/Merceron and Step 4 Software: running on Pentium II at 400MHz FPGA: running on Xilinx XCV 1000 at 20MHz; programmed using Handel C by Celoxica

20 Flips per Second DIMACS Problems Software Sel/Kau FPGA Ham/Mer FPGA Step 4 Speedup vs H/M 50-80- 1.6 128.5 K520 K25 M48 50-100- 2.0 107.4 K520 K25 M48 100-160- 1.6 139.6 K284 K22 M77.5 100-200- 2.0 110.9 K284 K22 M77.5

21 Flips per Slice Second DIMACS Problems Slices Ham/Mer f / sl sec Ham/Mer Slices Step 4 f / sl sec Step 4 Impro vement 50-80- 1.6 65180016711495018.7 50-100- 2.0 70474016971470019.9 100-160- 1.6 11362503154697527.9 100-200- 2.0 12402303186690030

22 Conclusions Fastest known one-chip implementation of GSAT using parallel relative scoring plus pipelining current size and speed makes it feasible to use FPGAs as platforms for parallel algorithms FPGA are one-chip parallel machines with serious limitations of programmability higher-level languages needed stack support needed: towards compiling parallel languages to hardware


Download ppt "One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap."

Similar presentations


Ads by Google