COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering University of California Riverside

28 September 2007 Future of Computing - W. Najjar 2 Why? Are FPGA: A New HPC Platform? Comparison of  a dual core Opteron (2.5 GHz) to Virtex 4 & 5 FPGA on dp fp  Balanced allocation of adders, multipliers and registers  Use both DSP and logic for multipliers, run at lower speed  Logic & wires for I/O interfaces (dp) Gflop/s OptV-4V-5 MAc1015.928.0 Mult512.019.9 Add523.955.3 Watts OptV-4V-5 9525~35 David Strensky, FPGAs Floating-Point Performance -- a pencil and paper evaluation, in HPCwire.com

28 September 2007 Future of Computing - W. Najjar 3 ROCCC Riverside Optimizing Compiler for Configurable Computing Code acceleration  By mapping of circuits to FPGA  Achieve same speed as hand-written VHDL codes Improved productivity  Allows design and algorithm space exploration Keeps the user fully in control  We automate only what is very well understood

28 September 2007 Future of Computing - W. Najjar 4 Challenges FPGA is an amorphous mass of logic  Structure provided by the code being accelerated  Repeatedly applied to a large data set: streams Languages reflect the von Neumann execution model:  Highly structured and sequential (control driven)  Vast randomly accessible uniform memory CPUs (& GPUs)FPGAs Temporal computingSpatial computing SequentialParallel Centralized storageDistributed storage Control flow drivenData flow driven

28 September 2007 Future of Computing - W. Najjar 5 ROCCC Overview Limitations on the code: No recursion No pointers High level transformations Low level transformations Code generation Hi-CIRRF Java C/C++ Lo-CIRRF SystemC VHDL Binary FPGA CPU GPU DSP Custom unit Procedure, loop and array optimizations Instruction scheduling Pipelining and storage optimizations CIRRF Compiler Intermediate Representation for Reconfigurable Fabrics

28 September 2007 Future of Computing - W. Najjar 6 Input memory (on or off chip) Output memory (on or off chip) Mem Fetch Unit Mem Store Unit Input Buffer Output Buffer Multiple loop bodies Unrolled and pipelined A Decoupled Execution Model  Decoupled memory access from datapath  Parallel loop iterations  Pipelined datapath  Smart buffer (input) does data reuse  Memory fetch and store units, data path configured by compiler  Off chip accesses platform specific

28 September 2007 Future of Computing - W. Najjar 7 So far, working compiler with … Extensive optimizations and transformations  Traditional and FPGA specific  Systolic array, pipelined unrolling, look-up tables Compile + hardware support for data reuse  > 98% reduction in memory fetches on image codes Efficient code generation and pipelining  Within 10% of hand-optimized HDL codes Import of existing IP cores  Leverages huge wealth, integrated with C source code Support for dynamic partial reconfiguration

28 September 2007 Future of Computing - W. Najjar 8 Indices of A[] coefficients #define N 516 void begin_hw(); void end_hw(); int main() { int i; const int T[5] = {3,5,7}; int A[N], B[N]; begin_hw(); L1: for (i=0; i<=(N-3); i=i+1) { B[i] = T[0]*A[i] + T[1]*A[i+1] + T[2]*A[i+2]; } end_hw(); } Example: 3-tap FIR

28 September 2007 Future of Computing - W. Najjar 9 RC Platform Models CPU FPGA Memory interface CPU Memory interface FPGA SRAM Fast Network CPU Memory FPGA SRAM CPU Memory FPGA SRAM 2 1 3

28 September 2007 Future of Computing - W. Najjar 10 What we have learned so far Big speedups are possible  10x to 1,000x on application codes, over Xeon and Itanium, molecular dynamics, bio-informatics, etc.  Works best with streaming data New paradigms and tools  For spatio-temporal concurrency  Algorithms, languages, compilers, run-time systems etc

28 September 2007 Future of Computing - W. Najjar 11 Future? Very wide use of FPGAs Why?  High throughput (> 10x) AND low power (< 25%) How?  Mostly in Models 2 and 3, initially  Model2: See Intel QuickAssist, Xtremedata & DRC  Model 3: SGI, SRC & Cray Contingency  Market brings price of FPGAs down  Availability of some software stack  for savvy programmers, initially Potential  Multiple “killer apps” (to be discovered)

28 September 2007 Future of Computing - W. Najjar 12 Conclusion We as a research community should be ready Stamatis was Thank you

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

Similar presentations

Presentation on theme: "COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

Similar presentations

Presentation on theme: "COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering."— Presentation transcript:

Similar presentations

About project

Feedback