Portable SystemC-on-a-Chip Scott Sirowy, Bailey Miller, and Frank Vahid† Department of Computer Science and Engineering University of California, Riverside {ssirowy,bmiller, vahid}@cs.ucr.edu †Also with the Center for Embedded Computer Systems at UC Irvine This work was supported in part by the National Science Foundation and the Office of Naval Research
Introduction: Prototyping Circuits and Systems Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 go + - MIN 255 data address Edge Detector Pixel Value Task: Create a custom ASIC/FPGA circuit to detect edges in an image
Introduction: Prototyping Circuits and Systems address data go Edge Detector Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 + + + + + + + + + + + + - - + 255 MIN Capture in HDL -- VHDL/Verilog File Entity Edge_Detector is Port { clk : in std_logic; rst : in std_logic; data: in std_logic_vec … };
Introduction: Prototyping Circuits and Systems address data go Edge Detector SystemC C++ based Creation, instantiation, and connection of components Precisely timed communication and execution among concurrently executing components Supports both “software” and “hardware” constructs and semantics Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 + + + + + + + + + + + + - - + 255 MIN Pixel Value Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos();
Introduction: Prototyping Circuits and Systems Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 go - MIN + 255 data address Edge Detector Simulation Requires environment modeling Sometimes hard! Does not interact with real I/O Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Simulation on Desktop PC
Introduction: Prototyping Circuits and Systems Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 go - MIN + 255 data address Edge Detector Implementation Mapping to microprocessor / coprocessor system Interfacing Issues Synthesis Issues Size Constraints Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Mapping & Synthesis
Introduction: Prototyping Circuits and Systems Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 go - MIN + 255 data address Edge Detector In-System Emulation Quickly-obtained simulation interaction with real I/O Prior to time-consuming mapping and synthesis But slower Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Emulation
In-System Emulation of SystemC How? Port publicly available SystemC libraries to target platforms SystemC executable has built-in event kernel Libraries are large and require OS support SystemC Description FPGA Processor Processor
Bytecode Modern portability approach Java, C# Java, C# Bytecode Compiler Bytecode Virtual Machine (VM): Program that executes bytecode May JIT compile to native architecture VM VM Opteron VM Pentium Atom
SystemC Bytecode? SystemC SystemC Bytecode Compiler VM VM VM Pentium Opteron + FPGA FPGA
Portable SystemC-on-a-Chip Task: Create a custom circuit to detect edges in an image Processor Emulation Engine SystemC Bytecode Compiler SystemC Description SystemC Bytecode Processor Processor Emulation Engine Processor FPGA SystemC bytecode can run on any platform that supports the SystemC emulation engine, without the need for recompilation or synthesis Emulation Engine Emulation Accelerators
SystemC Bytecode Compiler class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); } Pinapa Front End (Moy, EMSOFT’05) Extracts architectural features and behavior of each process Uses modified versions of GCC and the SystemC kernel Bytecode Back End Flattens original SystemC circuit Generates SystemC bytecode that preserves architecture and behavioral information Output is a human-readable text file SystemC Description Pinapa Front End ELAB AST Link Bytecode Back End Register Allocation Code Generation SystemC Bytecode Compiler SystemC Bytecode
SystemC Bytecode Sequential Instructions Spatial Instructions Based on the RISC MIPS instruction set Efficient emulation (Davis 2003) Spatial Instructions Includes meta instructions for defining architectural features, bit width specific computations, and reading and writing signals --header signal clock : 1 signal reset : 1 signal memory_in : 32 signal fb_data : 32 signal leds : 4 process(clock) READ $1 memory_in ADD $2 $0 3 ADD $3 $2 $1 WRITE $3 s1 ADDI $1 $0 1 WRITE $1 dataReady END process(dataReady) READ $5 val6 SW $5 24($0) READ $5 val7 … ADDI $10 $0 0 ADDI $7 $0 0 ADDI $13 $0 8 SystemC Bytecode Spatial Constructs MIPS-like sequential instructions
SystemC Emulation Engine Must support a basic SystemC interface Clock Reset 16 I/O pins 8KB Input Memory 8KB Output Memory UART Platforms with more advanced I/O might support more features Increased Memory Extended General Purpose I/O Output I/O SystemC Circuit Clock UART Tx Reset Input Mem Addr Input I/O Input Mem Stream UART Rx Input Mem Data Output Mem Addr Output Mem Data
SystemC Emulation Engine Real I/O Peripherals Representative of many systems Emulation Engine Kernel Virtual Machine Discrete Event Kernel Peripheral Access and Hooks Optional USB Download Interface Emulation Engine Main Processor Input Memory Output Memory USB Interface Instruction Memory UART Read Signal Memory Buttons Write Signal Memory LEDs USB Download Interface I/O Peripherals Emulation Engine Kernel and Support Peripherals
Emulation Engine Acceleration For some SystemC applications, emulation can be slow An Edge Detection circuit required ~10 minutes to process a 320x240 image * Input Memory Main Processor SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs * on a 100 MHz/SRAM Microblaze SystemC Emulation Engine implementation
Emulation Engine Acceleration For some SystemC applications, emulation can be slow An Edge Detection circuit required ~10 minutes to process a 320x240 image * Input Memory Main Processor SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs If available, use platform FPGA to create bytecode accelerators Execute SystemC bytecode natively Accelerator 1 Accelerator 2 Accelerator 3 FPGA Accelerators speedup emulation * on a 100 MHz Microblaze SystemC Emulation Engine implementation
SystemC Bytecode Accelerators MIPS-like multicycle RISC datapath Communicates to core emulator via memory-mapped registers # of accelerators limited to # of masters allowed on bus Emulation Engine Input Memory Main Processor SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs Accelerator RISC Datapath Register File Local Mem Bus, start, load logic Accelerator 1 Accelerator 2 Accelerator 3 FPGA
SystemC-on-a-Chip Implementation Xilinx Spartan 3E Virtex4 Ml403 Virtex5 VLX110T * Platform *Currently building Microblaze (50 MHz) PowerPC (50 MHz) Microblaze (100 MHz) Main Processor Bus Platform OPB PLB PLB SRAM SRAM+BRAM Main Memory BRAM # Emulation Accelerators 0-1 1-2 >3 Accelerator Accelerator Accelerator Accelerator Accelerator Accelerator * Demo
SystemC-on-a-Chip Implementation Pinapa ELAB AST Link Back End SystemC Bytecode Compiler SystemC Bytecode compiler 3,500 lines of code + Pinapa (20,000 lines) Emulation Engine Input Memory Main Processor Output Memory Instruction Memory UART Read Signal Memory SystemC Emulation Engine 3,000 lines of C + 8,000 lines of VHDL USB Interface Buttons Write Signal Memory LEDs Accelerator 1 Accelerator 2 Accelerator 3 FPGA
SystemC-on-a-Chip Implementation Emulation Engine SystemC Bytecode Accelerator 2,000 lines of VHDL Area: ~3000 Slices Clock Frequency: 50-100 MHz Input Memory Main Processor Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs Accelerator RISC Datapath Register File Local Mem Bus, start, load logic Accelerator 1 Accelerator 2 Accelerator 3 FPGA
SystemC-on-a-Chip Experiments Competitive with SystemC PC Simulation, but with the benefits of real I/O Emulation Engine Execution Time Main Processor Input Memory Output Memory Instruction Memory UART Read Signal Memory USB Interface Base Emulation on Virtex 4 Buttons Base Emulation on Virtex 5 Write Signal Memory Emulation + Accelerators (Virtex 4) LEDs Emulation + Accelerators (Virtex 5) Accelerator 1 Execution Time Normalized to SystemC running on a 2.8 GHz Intel Xeon Accelerator 2 Accelerator 3
Conclusions Introduced SystemC Bytecode as a means to emulate SystemC for prototyping For platforms with FPGA resources, introduced bytecode accelerators to speed up SystemC performance Outperforms emulation by over 100X As proof of concept, built 3 test platforms and tested multiple SystemC circuits without having to recompile or synthesize Future Directions Emulation architecture improvements Synthesizing SystemC just-in-time?