Presentation is loading. Please wait.

Presentation is loading. Please wait.

JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ.

Similar presentations


Presentation on theme: "JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ."— Presentation transcript:

1 JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ. of Florida, Gainesville Scotty Sirowy (current) David Sheldon (current) Chen Huang (current) This research was supported in part by the National Science Foundation, the Semiconductor Research Corporation, Intel, Freescale, IBM, and Xilinx Frank Vahid Dept. of CS&E University of California, Riverside Associate Director, Center for Embedded Computer Systems, UC Irvine

2 Frank Vahid, UC Riverside 2 SystemC Bytecode for FPGAs Demo

3 Frank Vahid, UC Riverside 3 FPGA Common Presence Caches, FPUs, GPUs, FPGAs App developers may expect FPGA presence How create/distribute apps that make good use of FPGA if present? µP Binary CacheFPU FPGA µP GPU

4 Frank Vahid, UC Riverside 4 “Spatial” Algorithms for FPGAs Example – Count patterns Sequential algorithm Hash table 10s cycles per pattern int patterns[1,000]; int counts[1,000]; while (1) { WaitForPattern(); CurrPattern = X; hash = HashFct(CurrPattern); item = Find(patterns, CurrPattern, hash); if (item) { counts[item]++; } count Level 1 logic pattern logic Level 2 Level m logic CurrPattern count pattern count pattern...... bus Spatial algorithm Pipelined stages Essence is the connectivity of components, not the sequencing of instructions

5 Frank Vahid, UC Riverside 5 Bytecode Modern portability approach Java, C# Pentium Atom Opteron bytecode Compiler VM Virtual Machine (VM): Program that executes bytecode May JIT compile to native architecture

6 Frank Vahid, UC Riverside 6 SystemC Bytecode? Pentium FPGA SystemC bytecode Compiler VM SystemC Opteron + FPGA VM

7 Frank Vahid, UC Riverside 7 UCR SystemC Bytecode and Compiler class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); void getPixel(){ … dataReady.write(1); } void mainComp(){ int i, j; for(i = 0; i < 3; i++){ for(j = 0; j < 3; j++){ sumX = sumX + mem.read()*GX[i][j] } … edge.write(sumX + sumY) } SystemC --header signal clock : 1 signal reset : 1 signal memory_in : 32 signal fb_data : 32 signal leds : 4 process(clock) READ $1 memory_in ADD $2 $0 3 ADD $3 $2 $1 WRITE $3 s1 ADDI $1 $0 1 WRITE $1 dataReady END process(dataReady) READ $5 val6 SW $5 24($0) READ $5 val7 … ADDI $10 $0 0 ADDI $7 $0 0 ADDI $13 $0 8 … END UCR’s SystemC bytecode UCR’s SystemC-to- bytecode compiler MIPS-like sequential instructions Spatial Constructs

8 Frank Vahid, UC Riverside 8 SystemC Bytecode Emulator Emulator Input Memory Output Memory UART Buttons LEDs Read Signal Memory Write Signal Memory Main Processor Instruction Memory USB Interface FPGA Bytecode uploadable via USB drive Accelerators speedup emulation SystemC bytecode

9 Frank Vahid, UC Riverside 9 SystemC Bytecode Accelerators Emulator Input Memory Output Memory UART Buttons LEDs Read Signal Memory Write Signal Memory Main Processor Instruction Memory USB Interface Accelerator 1 Accelerator 2 Accelerator 3 FPGA SystemC bytecode Implementation MIPS-like multicycle RISC datapath 100 MHz Clock ~33 Million Instr/Sec Communicates to core emulator memory mapped registers Area: ~5000 slices # of accelerators limited to # of masters allowed on bus ~1200 lines of VHDL Accelerator RISC Datapath Register File Local Mem Bus, start, load logic

10 Frank Vahid, UC Riverside 10 Dynamic SystemC Accelerator Management Emulator Input Memory Output Memory UART Buttons LEDs Read Signal Memory Write Signal Memory Main Processor Instruction Memory USB Interface Accelerator 1 Accelerator 2 Accelerator 3 FPGA SystemC bytecode Only a limited number of SystemC accelerators can fit on an FPGA fabric Dynamically map processes to accelerators based on process usage Involves online algorithms 42 44 43111210 Image Filter Example

11 Frank Vahid, UC Riverside 11 Just-in-Time Synthesis Emulator Input Memory Output Memory UART Buttons LEDs Read Signal Memory Write Signal Memory Main Processor Instruction Memory Accelerator 1 Accelerator 2 Accelerator 3 FPGA SystemC bytecode Possible to even perform synthesis on-chip – “warp processing” (previous UCR work) Send SystemC bytecode to synthesis server FPGA Specific Bitstream Dynamically reconfigure some or all of the FPGA

12 Frank Vahid, UC Riverside 12 2 n Count 2 n patterns 4 Count 4 patterns 2 Count 2 patterns 1 Count Spatial Algorithms for FPGAs Even better spatial algorithm for pattern counting Pipelined binary tree Level 1 logic Memory 1 pattern logic Memory 2 patterns logic Memory 4 patterns Level 2 Level 3 Level n logic Memory 2 n patterns...... CurrPattern......

13 Frank Vahid, UC Riverside 13 Study of Spatial Algorithms in FCCM (Sirowy FPGA’2008) YearApplicationType 20013D Vec. NormalizationSpatial 2001Efficient CAM -- 2001Automated SensorTemporal 2001Regular ExpressionSpatial 2002Hyperspectral ImageSpatial 2002Machine VisionSpatial 2002RC4Temporal 2002Set CoveringSpatial 2002Template MatchingSpatial 2002Triangle MeshSpatial 2003Congruential SievesTemporal 2003Content ScanningTemporal 2003F.P and Square RootSpatial 2003Gaussian NoiseSpatial 2003TRNG-- 20043D FDTD MethodSpatial 2004Deep Packet Filter-- 2004Online Floating Point-- 2004Molecular DynamicsSpatial 2004Pattern MatchingSpatial 2004Seismic MigrationSpatial 2004Software Deceleration-- 2004 V.M Window-- 2005Data MiningSpatial 2005Cell AutomataTemporal 2005Particle GraphicsSpatial 2005RadiosityTemporal 2005Transient WavesSpatial 2005Road TrafficTemporal 2006All Pairs Shortest PathSpatial 2006Apriori Data MiningSpatial 2006Molecular DynamicsSpatial 2006Gaussian EliminationSpatial 2006Radiation DoseTemporal 2006Random VariatesSpatial FCCM 2001-2006 70 papers describing fast application on FPGA Examined 35 in depth (every other one) 6 used device-specific features 9 represented expected synthesized circuit from the obvious sequential algorithm 20 were spatially-oriented applications akin to earlier pipelined binary tree

14 Frank Vahid, UC Riverside 14 Portable Spatial Applications? Current portable microprocessor binaries – sequential Extensions for threads, processes,... How support spatial constructs Ports, connections, timing model..... www.systemc.org Adds libraries and macros, still standard C++ Sequential and spatial constructs Compiling links in the simulation kernel Self-executing simulation Intended for SoC simulation

15 Frank Vahid, UC Riverside 15 Transmuting Coprocessors Demo

16 Frank Vahid, UC Riverside 16 FPGA is a Size-Limited Coprocessing Resource FPGA implements coprocessors Upload app profile info Select coproc. set, generate new FPGA bitstream Send back new bitstream, re- program FPGA Speedup with previous apps App executions change. Must decide which coprocessors should be FPGA-resident at a given time – transmuting coprocessors

17 Frank Vahid, UC Riverside 17 Transmuting Coprocessor Demo Three image filters: Blur filter (S/L): Blur the image Sobel filter (S/L): Find the edge of the image Emboss filter(S/L): Emboss the image Platform: Virtex 2P(XC2VP30): PPC + Coprocessors PPC Frequency: 100Mhz Coproc. Frequency: 50Mhz 30x120x Size(slice)SmallLarge Blur30120 Sobel228912 Emboss81324

18 Frank Vahid, UC Riverside 18 Demo architecture PPC Peripherals Instruction BRAM EDK Interface to external Display BRAM Image BRAM Coproc VGA control VGA display UARTPush button ISE Image (128*128 pixels and 24bit color): 24 BRAMs Soft version: Read (Image BRAM)  Execution (PPC)  Write (Display BRAM) Coprocessor version: Read (Image BRAM)  Execution(Coproc)  Write (Display BRAM) Dock: send the profile information through UART. PLB

19 Frank Vahid, UC Riverside 19 Coprocessor configurations Microprocessor only Small blur+ small sobel Small blur + small emboss Small sobel + small emboss Large blur Large sobel Large emboss Choose the configuration according to app profile info. PPCPeripherals Memory Virtex2P Coprocessor region Blur (S) Sobel(S) Blur (S) Emboss(s) Sobel(s) Emboss(s) Blur (L)Sobel (L) Emboss(L)

20 Frank Vahid, UC Riverside 20 Video demo program flow Execution Read profile info from UART Update profile information Dock Select new program file Reprogram FPGA Different objectives and different heuristics. Time information Dock + CP selection0.001s Start IMPACT + FPGA reprogramming 12s Filter PPC only (128 frames)30s Filter CP small (128 frames)1s Filter CP large (128 frames)0.25s


Download ppt "JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ."

Similar presentations


Ads by Google