Download presentation
Presentation is loading. Please wait.
Published byDoris Floyd Modified over 9 years ago
1
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering University of California Riverside
2
28 September 2007 Future of Computing - W. Najjar 2 Why? Are FPGA: A New HPC Platform? Comparison of a dual core Opteron (2.5 GHz) to Virtex 4 & 5 FPGA on dp fp Balanced allocation of adders, multipliers and registers Use both DSP and logic for multipliers, run at lower speed Logic & wires for I/O interfaces (dp) Gflop/s OptV-4V-5 MAc1015.928.0 Mult512.019.9 Add523.955.3 Watts OptV-4V-5 9525~35 David Strensky, FPGAs Floating-Point Performance -- a pencil and paper evaluation, in HPCwire.com
3
28 September 2007 Future of Computing - W. Najjar 3 ROCCC Riverside Optimizing Compiler for Configurable Computing Code acceleration By mapping of circuits to FPGA Achieve same speed as hand-written VHDL codes Improved productivity Allows design and algorithm space exploration Keeps the user fully in control We automate only what is very well understood
4
28 September 2007 Future of Computing - W. Najjar 4 Challenges FPGA is an amorphous mass of logic Structure provided by the code being accelerated Repeatedly applied to a large data set: streams Languages reflect the von Neumann execution model: Highly structured and sequential (control driven) Vast randomly accessible uniform memory CPUs (& GPUs)FPGAs Temporal computingSpatial computing SequentialParallel Centralized storageDistributed storage Control flow drivenData flow driven
5
28 September 2007 Future of Computing - W. Najjar 5 ROCCC Overview Limitations on the code: No recursion No pointers High level transformations Low level transformations Code generation Hi-CIRRF Java C/C++ Lo-CIRRF SystemC VHDL Binary FPGA CPU GPU DSP Custom unit Procedure, loop and array optimizations Instruction scheduling Pipelining and storage optimizations CIRRF Compiler Intermediate Representation for Reconfigurable Fabrics
6
28 September 2007 Future of Computing - W. Najjar 6 Input memory (on or off chip) Output memory (on or off chip) Mem Fetch Unit Mem Store Unit Input Buffer Output Buffer Multiple loop bodies Unrolled and pipelined A Decoupled Execution Model Decoupled memory access from datapath Parallel loop iterations Pipelined datapath Smart buffer (input) does data reuse Memory fetch and store units, data path configured by compiler Off chip accesses platform specific
7
28 September 2007 Future of Computing - W. Najjar 7 So far, working compiler with … Extensive optimizations and transformations Traditional and FPGA specific Systolic array, pipelined unrolling, look-up tables Compile + hardware support for data reuse > 98% reduction in memory fetches on image codes Efficient code generation and pipelining Within 10% of hand-optimized HDL codes Import of existing IP cores Leverages huge wealth, integrated with C source code Support for dynamic partial reconfiguration
8
28 September 2007 Future of Computing - W. Najjar 8 Indices of A[] coefficients #define N 516 void begin_hw(); void end_hw(); int main() { int i; const int T[5] = {3,5,7}; int A[N], B[N]; begin_hw(); L1: for (i=0; i<=(N-3); i=i+1) { B[i] = T[0]*A[i] + T[1]*A[i+1] + T[2]*A[i+2]; } end_hw(); } Example: 3-tap FIR
9
28 September 2007 Future of Computing - W. Najjar 9 RC Platform Models CPU FPGA Memory interface CPU Memory interface FPGA SRAM Fast Network CPU Memory FPGA SRAM CPU Memory FPGA SRAM 2 1 3
10
28 September 2007 Future of Computing - W. Najjar 10 What we have learned so far Big speedups are possible 10x to 1,000x on application codes, over Xeon and Itanium, molecular dynamics, bio-informatics, etc. Works best with streaming data New paradigms and tools For spatio-temporal concurrency Algorithms, languages, compilers, run-time systems etc
11
28 September 2007 Future of Computing - W. Najjar 11 Future? Very wide use of FPGAs Why? High throughput (> 10x) AND low power (< 25%) How? Mostly in Models 2 and 3, initially Model2: See Intel QuickAssist, Xtremedata & DRC Model 3: SGI, SRC & Cray Contingency Market brings price of FPGAs down Availability of some software stack for savvy programmers, initially Potential Multiple “killer apps” (to be discovered)
12
28 September 2007 Future of Computing - W. Najjar 12 Conclusion We as a research community should be ready Stamatis was Thank you
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.