One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap.

Slides:



Advertisements
Similar presentations
Field Programmable Gate Array
Advertisements

Day - 3 EL-313: Samar Ansari. INTEGRATED CIRCUITS Integrated Circuit Design Methodology EL-313: Samar Ansari Programmable Logic Programmable Array Logic.
FPGA (Field Programmable Gate Array)
Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.
Introduction to Programmable Logic John Coughlan RAL Technology Department Electronics Division.
MATH 224 – Discrete Mathematics
VHDL - I 1 Digital Systems. 2 «The designer’s guide to VHDL» Peter J. Andersen Morgan Kaufman Publisher Bring laptop with installed Xilinx.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Methods for SAT- a Survey Robert Glaubius CSCE 976 May 6, 2002.
1 Boolean Satisfiability in Electronic Design Automation (EDA ) By Kunal P. Ganeshpure.
Heuristics for Efficient SAT Solving As implemented in GRASP, Chaff and GSAT.
02/02/20091 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
1/31/20081 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.
Stochastic greedy local search Chapter 7 ICS-275 Spring 2007.
Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.
Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
April 15, Synthesis of Signal Processing on FPGA Hongtao
Performance and Overhead in a Hybrid Reconfigurable Computer O. D. Fidanci 1, D. Poznanovic 2, K. Gaj 3, T. El-Ghazawi 1, N. Alexandridis 1 1 George Washington.
Lecture #3 Page 1 ECE 4110– Sequential Logic Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.No Class Monday, Labor Day Holiday 2.HW#2 assigned.
FPGA IRRADIATION and TESTING PLANS (Update) Ray Mountain, Marina Artuso, Bin Gui Syracuse University OUTLINE: 1.Core 2.Peripheral 3.Testing Procedures.
EGRE 427 Advanced Digital Design Figures from Application-Specific Integrated Circuits, Michael John Sebastian Smith, Addison Wesley, 1997 Chapter 4 Programmable.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Introduction to Programmable Logic Devices John Coughlan RAL Technology Department Electronics Division.
System Arch 2008 (Fire Tom Wada) /10/9 Field Programmable Gate Array.
Designing the WRAMP Dean Armstrong The University of Waikato.
(TPDS) A Scalable and Modular Architecture for High-Performance Packet Classification Authors: Thilan Ganegedara, Weirong Jiang, and Viktor K. Prasanna.
Analysis of Algorithms
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
1 Agenda Modeling problems in Propositional Logic SAT basics Decision heuristics Non-chronological Backtracking Learning with Conflict Clauses SAT and.
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Galen SasakiEE 260 University of Hawaii1 Electronic Design Automation (EDA) EE 260 University of Hawaii.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
VLSI Algorithmic Design Automation Lab. 1 Integration of High-Performance ASICs into Reconfigurable Systems Providing Additional Multimedia Functionality.
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
Lecture #3 Page 1 ECE 4110–5110 Digital System Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.HW#2 assigned Due.
EE3A1 Computer Hardware and Digital Design
Anurag Dwivedi. Basic Block - Gates Gates -> Flip Flops.
Algorithm and Programming Considerations for Embedded Reconfigurable Computers Russell Duren, Associate Professor Engineering And Computer Science Baylor.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Stochastic greedy local search Chapter 7 ICS-275 Spring 2009.
Review of Propositional Logic Syntax
Saleem Sabbagh & Najeeb Darawshy Supervisors: Mony Orbach, Technion & Ilia Averbouch, IBM Started at: Spring 2012 Duration: Semester.
Saleem Sabbagh & Najeeb Darawshy Supervisors: Mony Orbach, Technion & Ilia Averbouch, IBM Started at: Spring 2012 Duration: Semester.
Heuristics for Efficient SAT Solving As implemented in GRASP, Chaff and GSAT.
1 Multiplexers (Data Selectors) A multiplexer (MUX) is a device that allows several low-speed signals to be sent over one high-speed output line. “Select.
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
FPGA Field Programmable Gate Arrays Shiraz University of shiraz spring 2012.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Introduction to Field Programmable Gate Arrays (FPGAs) EDL Spring 2016 Johns Hopkins University Electrical and Computer Engineering March 2, 2016.
Lecture 14 State Machines II Topics State Machine Design Resolution with Text Design with D flip-flops Design with JK Readings: Chapter 7 November 11,
A Brief Introduction to FPGAs
Reconfigurable Computing1 Reconfigurable Computing Part II.
Heuristics for Efficient SAT Solving As implemented in GRASP, Chaff and GSAT.
Programmable Logic Devices
Introduction to Programmable Logic
ELEN 468 Advanced Logic Design
Programmable Logic Devices: CPLDs and FPGAs with VHDL Design
Field Programmable Gate Array
Field Programmable Gate Array
Field Programmable Gate Array
Digital Designs – What does it take
Presentation transcript:

One Flip per Clock Cycle Martin Henz, Edgar Tan, Roland Yap

SAT Problems Find an assignment of n variables that satisfies all m clauses (disjunctions of literals of variables) Notation: V: array of boolean values; V[3] is the value of the third variable in assignment V EVAL i (V): evaluation function of clause i, returns boolean value resulting from evaluating clause i under assignment V

GenSAT procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flipped end end

Instances of GenSAT GSAT: CHOOSEFLIP randomly chooses a flip that produces maximal score WSAT: CHOOSEFLIP randomly chooses a violated clause, and randomly chooses among the variables of that clause a flip that produces maximal score GWSAT: choose randomly whether to do GSAT flip or WSAT flip GSAT/Tabu: prevent quick flipping back HSAT: use history for tie breaking: choose least recently flipped variable

FPGAs ASICs: application-specific integrated circuits –customer describes logic behavior in a hardware description language such as VHDL –vendor designs and produces integrated circuit with this behavior Masked gate arrays –ASIC with transistors arranged in a grid-like manner –initially unconnected; mass produced –add final conductor layers for connecting components FPGAs: field programmable gate arrays

Current Line of FPGAs: Example Xilinx XCV1000 4MBytes on-board RAM max clock rate 300 MHz max clock rate using on-board RAM 33MHz 6144 CLBs (configurable logic blocks) roughly 1M system gates 1 Mbit of distributed RAM each CLB is divided into 2 slices thus 12,288 slices available

Programming FPGAs Massively parallel computer with random access memory Instructions are compiled into hardware; no runtime stacks; no functions; no recursion… In practice, hardware description languages like VHDL are used to program FPGAs Newer development: Handel C

NESL-like Syntax for Parallelism Pgates for Pdepth of P x:=y+z g(P) = O(1)d(P) = O(1) Q ; Rg(P) = g(Q)+g(P) d(P) = g(Q)+g(R) { e( i ) : i  S} g(P) =  i (g(e(i)))d(P) = max i (d(e(i)))

Example Let S be an array of statically known size n, where n is a power of 2. macro SUM(S,n): if n = 1 then S[0] else SUM({ S[2i] + S[2i + 1] : i  [0..n/2-1]}, n/2) g( SUM(S,n ) = O( n ) d( SUM(S,n ) = O(log n )

Previous GSAT/FPGA Work Hamadi/Merceron: first non-software design of a local search algorithm; CP 97 Yung/Seung/Lee/Leong: runtime reconfigurable version of Hamadi/Merceron work; first implementation; Conference on Field- programmable Logic and Applications, 1999

Naïve Parallel GSAT (Ham/Merc) macro CHOOSEFLIP(f): max := -1; f := -1; for i = 1 to n do score := SUM({EVAL j (V[  V[i]/i] : j  [1…m]}); if score > max  (score = max  RANDOMBIT()) then max := score; f := i end g(CHOOSEFLIP(f)) = O(n m) d(CHOOSEFLIP(f)) = n * (O(log m) + O(log n)) = O(n log m)

Step 1: Naïve Random GSAT macro CHOOSEFLIP(f): max := -1; f := -1; MaxV := {0 : k  [1…n]}; for i = 1 to n do score := SUM({EVAL j (V[  V[i]/i] : j  [1…m]}); if score > max then max := score; MaxV := { 0 : k  [1…n]}[1/i] else if score = max then MaxV := MaxV[1/i]end f := CHOOSE_ONE(MaxV) g and d is unchanged; d(CHOOSE_ONE) = O(log n), g = O(n)

Step 2: Parallel Variable Scoring macro CHOOSEFLIP(f): Scores := { SUM( {EVAL j (V[  V[i]/i]) : j  [1…m]}) : i  [1…n]}; f := CHOOSE_MAX(Scores); d(CHOOSEFLIP(f)) = O(log m + log n) = O(log m) g(CHOOSEFLIP(f)) = O(m n 2 )

Step 3: Relative Scoring Selman/Levesque/Mitchell use a technique of relative scoring in their implementation. First thorough analysis of relative scoring in Hoos’ Diplomarbeit Idea: After every flip, update the score of those variables that are affected by the flip. Since clauses are small, the number of affected variables is much smaller than the overall number of variables

Some Notation NCl[i] is the number of clauses that contain the variable i MaxClauses = max i NCl[i]; usually MaxClauses << m MaxVariables = max j (number of vars in clause j) EVAL j C(i) evaluates the j-th clause from the set of clauses that contain the variable i

Relative Scoring macro CHOOSE_FLIP(f): NewS := { SUM({EVAL j C(i) (V[  V[i]/i]) : j  [1…NCl[i]}) : i  [1…n] }; OldS := { SUM({EVAL j C(i) (V) : j  [1…NCl[i]}) : i  [1…n] }; Diff := { NewS[i] – OldS[i] : i  [1…n]}; f := CHOOSE_MAX(Diff) g(CHOOSE_FLIP(f)) = O(MaxVars MaxClauses n) d(CHOOSE_FLIP(f)) = O(log MaxClauses + log MaxVars)

Step 4: Pipelining procedure GenSAT(cnf, maxtries, maxflips) for i = 1 to maxtries do INITASSIGN(V); for j = 1 to maxflips do if V satisfies cnf then return V else f = CHOOSEFLIP(); V := V with variable f flipped end end

S I Pipelining Outer Loop macro CHOOSE_FLIP(f): NewS := { SUM({EVAL j C(i) (V[  V[i]/i]) : j  [1…NCl[i]}) : i  [1…n] }; OldS := { SUM({EVAL j C(i) (V) : j  [1…NCl[i]}) : i  [1…n] }; Diff := { NewS[i] – OldS[i] : i  [1…n]}; f := CHOOSE_MAX(Diff) STAGE I STAGE II STAGE III STAGE IV S IS IIS IIIS IVS IS IIS IIIS IVS I S IIS IIIS IVS IS IIS IIIS IVS I S IIS IIIS IVS IS IIS IIIS IV S IS IIS IIIS IVS IS IIS III S II … … … … Try 1 Try 2 Try 3 Try 4

Preliminary Experiments Conducted on hill-climbing variant of GSAT; Comparing software implementation by Selman/Kautz with Hamadi/Merceron and Step 4 Software: running on Pentium II at 400MHz FPGA: running on Xilinx XCV 1000 at 20MHz; programmed using Handel C by Celoxica

Flips per Second DIMACS Problems Software Sel/Kau FPGA Ham/Mer FPGA Step 4 Speedup vs H/M K520 K25 M K520 K25 M K284 K22 M K284 K22 M77.5

Flips per Slice Second DIMACS Problems Slices Ham/Mer f / sl sec Ham/Mer Slices Step 4 f / sl sec Step 4 Impro vement

Conclusions Fastest known one-chip implementation of GSAT using parallel relative scoring plus pipelining current size and speed makes it feasible to use FPGAs as platforms for parallel algorithms FPGA are one-chip parallel machines with serious limitations of programmability higher-level languages needed stack support needed: towards compiling parallel languages to hardware