Presentation is loading. Please wait.

Presentation is loading. Please wait.

Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics.

Similar presentations


Presentation on theme: "Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics."— Presentation transcript:

1 Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics Division

2 2 Storaasli_FPGA_SC07 Contents Background: Why FPGAs? ORNL success: FPGA systems, tools and up to 100X speedup Partners: Research Lab,, SRC, THE SUPERCOMPUTER COMPANY Explore FPGAs for future ORNL HPC Virtex4 FPGA blades accelerate mission-critical applications > 100X. Steve Scott, CTO HPCWire 24/3/2006 After exhaustive analysis, Cray concluded that, although multi-core commodity processors will deliver some improvement, exploiting parallelism through a variety of processor technologies using scalar, vector, multithreading and hardware accelerators (e.g., FPGAs or ClearSpeed co-processors) creates the greatest opportunity for application acceleration. ORNL benefit: Exceed petaflops and reduce power Why HPC vendors offer FPGAs,,

3 3 Storaasli_FPGA_SC07 FPGA Logic slice Whats an FPGA? Your custom chip Xilinx Virtex4 FPGA: 25K slices (miniCPUs) Logic array: user-tailored to application On-chip RAM, multipliers and PowerPCs Gigabit transceivers/DSP blocks => FastIO/precision 100–1000 operations/clock cycle

4 4 Storaasli_FPGA_SC Computation (GOPS) Memory Bandwidth (GB/sec) IO Bandwidth (Gbps) Pentium Virtex-4 FPGA Virtex4 Pentium Why FPGAs? Performanceoptimal silicon use (maximize parallel ops/cycle) Rapid growthcells, speed, I/O Power1/10th CPUs Flexibletailor to application Thousands Logic Cells MHz Clock speed (MHz)

5 5 Storaasli_FPGA_SC07 Cray XD1 ORNL FPGA hardware/tools SRC-6 (Carte), Digilent (Viva, VHDL), Nallatech (Viva) Cray XD1 (MitrionC, VHDL): 6 FPGAs Opterons SGI RASC-Altix/Virtex4s (MitrionC) CHiMPS (Bee2 => Cray XD1 => DRC => XT4) RASC sgi

6 6 Storaasli_FPGA_SC07 Find parallelism: 80% FFTs More GF/$ GF/Watt Goal Profile Model faster Ported HPC code spectral transform shallow water model (STSWM) to FPGAs FTRNDE FTRNPE FTTdd UV FFT SHTRNS FFT COMP1 STEP FTRNEX FTRNVX 8 calls in parallel 3 functions in parallel 2 calls in parallel HLL compiler CHiMPS, Mitrion (FPGA Tools Inside) FPGA speedup HLL developer profiles

7 7 Storaasli_FPGA_SC07 Viva: Graphical icons3-dimensionalMitrionC: Text/flow1-dimensional Exploring programming options Gauss matrix solver Compiler, simulator, and debugger + Carte/SRC, CHiMPS-VHDL/Xilinx,

8 8 Storaasli_FPGA_SC07 *FPGA vs 2.2 GHz Opteron First mixed-precision LU and solver for FPGAs Benefits: High performance of LP arithmetic High precision accuracy Speedup increases with matrix size (as LU dominates calculations) DesignDouble FPSingle FPS10e5 PE Amount81632 Max size Achievable frequency 120 MHz150 MHz Slices27,005 (57%)14,792 (59%)14,730 (62%) BRAMs68 (29%)129 (55%)65 (28%) MULT18X18128 (55%)64 (27%)32 (13%) Execution time (us) Matrix size S10e5 Single Double Speedup doublesingleS10e LU Solver Design data type X* LU decomposition speedup 10X for matrix equation solver

9 9 Storaasli_FPGA_SC07 FPGA Speedup 8 hrs => 5 min *Virtex-4 FPGA vs 2.2 GHz Opteron on Cray XD Genome sequence 8K w/align 16K w/align 8K w/o align 16K w/o align 100X* DNA sequence speedup Bacillus anthracis human DNA comparison 24 # # 24= Sequence AE17024

10 10 Storaasli_FPGA_SC07 FPGA speedup grows with query size

11 11 Storaasli_FPGA_SC07 Acknowledgement: This research is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR Summary ORNL FPGA research: Increasing HPC relevence FPGA systems: Cray, SRC, Nallatech, Digilent, SGI Compilers: Mitrion-C, Carte, Viva, DSPlogic, CHiMPS Speedup: 10X eqn soln, 100X DNA sequencing Partners: Xilinx, UT, Mitrion, Cray, SGI Next: Explore DRC, more FPGAs and CHiMPS

12 12 Storaasli_FPGA_SC07 Contact 12 Storaasli_ReconfigHPC_SC07 Olaf Storaasli Future Technologies Group Computer Science and Mathematics Division Google Olaf ORNL


Download ppt "Presented by Field-Programmable Gate Array Research Speeds HPC up to 100X Olaf O. Storaasli Future Technologies Group Computer Science and Mathematics."

Similar presentations


Ads by Google