Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.

Similar presentations


Presentation on theme: "COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering."— Presentation transcript:

1 COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering University of California Riverside

2 28 September 2007 Future of Computing - W. Najjar 2 Why? Are FPGA: A New HPC Platform? Comparison of  a dual core Opteron (2.5 GHz) to Virtex 4 & 5 FPGA on dp fp  Balanced allocation of adders, multipliers and registers  Use both DSP and logic for multipliers, run at lower speed  Logic & wires for I/O interfaces (dp) Gflop/s OptV-4V-5 MAc1015.928.0 Mult512.019.9 Add523.955.3 Watts OptV-4V-5 9525~35 David Strensky, FPGAs Floating-Point Performance -- a pencil and paper evaluation, in HPCwire.com

3 28 September 2007 Future of Computing - W. Najjar 3 ROCCC Riverside Optimizing Compiler for Configurable Computing Code acceleration  By mapping of circuits to FPGA  Achieve same speed as hand-written VHDL codes Improved productivity  Allows design and algorithm space exploration Keeps the user fully in control  We automate only what is very well understood

4 28 September 2007 Future of Computing - W. Najjar 4 Challenges FPGA is an amorphous mass of logic  Structure provided by the code being accelerated  Repeatedly applied to a large data set: streams Languages reflect the von Neumann execution model:  Highly structured and sequential (control driven)  Vast randomly accessible uniform memory CPUs (& GPUs)FPGAs Temporal computingSpatial computing SequentialParallel Centralized storageDistributed storage Control flow drivenData flow driven

5 28 September 2007 Future of Computing - W. Najjar 5 ROCCC Overview Limitations on the code: No recursion No pointers High level transformations Low level transformations Code generation Hi-CIRRF Java C/C++ Lo-CIRRF SystemC VHDL Binary FPGA CPU GPU DSP Custom unit Procedure, loop and array optimizations Instruction scheduling Pipelining and storage optimizations CIRRF Compiler Intermediate Representation for Reconfigurable Fabrics

6 28 September 2007 Future of Computing - W. Najjar 6 Input memory (on or off chip) Output memory (on or off chip) Mem Fetch Unit Mem Store Unit Input Buffer Output Buffer Multiple loop bodies Unrolled and pipelined A Decoupled Execution Model  Decoupled memory access from datapath  Parallel loop iterations  Pipelined datapath  Smart buffer (input) does data reuse  Memory fetch and store units, data path configured by compiler  Off chip accesses platform specific

7 28 September 2007 Future of Computing - W. Najjar 7 So far, working compiler with … Extensive optimizations and transformations  Traditional and FPGA specific  Systolic array, pipelined unrolling, look-up tables Compile + hardware support for data reuse  > 98% reduction in memory fetches on image codes Efficient code generation and pipelining  Within 10% of hand-optimized HDL codes Import of existing IP cores  Leverages huge wealth, integrated with C source code Support for dynamic partial reconfiguration

8 28 September 2007 Future of Computing - W. Najjar 8 Indices of A[] coefficients #define N 516 void begin_hw(); void end_hw(); int main() { int i; const int T[5] = {3,5,7}; int A[N], B[N]; begin_hw(); L1: for (i=0; i<=(N-3); i=i+1) { B[i] = T[0]*A[i] + T[1]*A[i+1] + T[2]*A[i+2]; } end_hw(); } Example: 3-tap FIR

9 28 September 2007 Future of Computing - W. Najjar 9 RC Platform Models CPU FPGA Memory interface CPU Memory interface FPGA SRAM Fast Network CPU Memory FPGA SRAM CPU Memory FPGA SRAM 2 1 3

10 28 September 2007 Future of Computing - W. Najjar 10 What we have learned so far Big speedups are possible  10x to 1,000x on application codes, over Xeon and Itanium, molecular dynamics, bio-informatics, etc.  Works best with streaming data New paradigms and tools  For spatio-temporal concurrency  Algorithms, languages, compilers, run-time systems etc

11 28 September 2007 Future of Computing - W. Najjar 11 Future? Very wide use of FPGAs Why?  High throughput (> 10x) AND low power (< 25%) How?  Mostly in Models 2 and 3, initially  Model2: See Intel QuickAssist, Xtremedata & DRC  Model 3: SGI, SRC & Cray Contingency  Market brings price of FPGAs down  Availability of some software stack  for savvy programmers, initially Potential  Multiple “killer apps” (to be discovered)

12 28 September 2007 Future of Computing - W. Najjar 12 Conclusion We as a research community should be ready Stamatis was Thank you


Download ppt "COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering."

Similar presentations


Ads by Google