Portable SystemC-on-a-Chip

Slides:



Advertisements
Similar presentations
VHDL Design of Multifunctional RISC Processor on FPGA
Advertisements

Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
02/02/20091 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.
© 2004 Xilinx, Inc. All Rights Reserved Implemented by : Alon Ben Shalom Yoni Landau Project supervised by: Mony Orbach High speed digital systems laboratory.
JIT FPGA Ideas Contributing Ph.D. Students Roman Lysecky (Ph.D. 2005, now Asst. Prof. at Univ. of Arizona Greg Stitt (Ph.D. 2007, now Asst. Prof. at Univ.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
Configurable System-on-Chip: Xilinx EDK
1/31/20081 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.
Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor: Evgeny.
Just-in-Time Compilation for FPGA Processor Cores This work was supported in part by the National Science Foundation (CNS ) and by the Semiconductor.
Propagating Constants Past Software to Hardware Peripherals Frank Vahid*, Rilesh Patel and Greg Stitt Dept. of Computer Science and Engineering University.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Engineering 1040: Mechanisms & Electric Circuits Fall 2011 Introduction to Embedded Systems.
Scott Sirowy Department of Computer Science and Engineering University of California, Riverside This work was supported in part by the National Science.
Department of Electronic & Electrical Engineering Embedded system Aims: Introduction to: Hardware. Software Ideas for projects ? Robotics/Control/Sensors.
Ross Brennan On the Introduction of Reconfigurable Hardware into Computer Architecture Education Ross Brennan
Computer Architecture ECE 4801 Berk Sunar Erkay Savas.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
Array Synthesis in SystemC Hardware Compilation Authors: J. Ditmar and S. McKeever Oxford University Computing Laboratory, UK Conference: Field Programmable.
LAB1 Summary Zhaofeng SJTU.SOME. Embedded Software Tools CPU Logic Design Tools I/O FPGA Memory Logic Design Tools FPGA + Memory + IP + High Speed IO.
A Monte Carlo Simulation Accelerator using FPGA Devices Final Year project : LHW0304 Ng Kin Fung && Ng Kwok Tung Supervisor : Professor LEONG, Heng Wai.
FPL Sept. 2, 2003 Software Decelerators Eric Keller, Gordon Brebner and Phil James-Roxby Xilinx Research Labs.
Part A Final Dor Obstbaum Kami Elbaz Advisor: Moshe Porian August 2012 FPGA S ETTING U SING F LASH.
Edge Detection. 256x256 Byte image UART interface PC FPGA 1 Byte every a few hundred cycles of FPGA Sobel circuit Edge and direction.
Fail-Safe Module for Unmanned Autonomous Vehicle
Teaching Digital Logic courses with Altera Technology
Scott Sirowy, Chen Huang, and Frank Vahid † Department of Computer Science and Engineering University of California, Riverside {ssirowy,chuang,
Introduction to Field Programmable Gate Arrays (FPGAs) EDL Spring 2016 Johns Hopkins University Electrical and Computer Engineering March 2, 2016.
Hardware Architecture
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
1 Introduction to Engineering Spring 2007 Lecture 18: Digital Tools 2.
SUBJECT : DIGITAL ELECTRONICS CLASS : SEM 3(B) TOPIC : INTRODUCTION OF VHDL.
Introduction to the FPGA and Labs
Maj Jeffrey Falkinburg Room 2E46E
Popular Microcontrollers and their Selection by Lachit Dutta
Presenter: Darshika G. Perera Assistant Professor
Programmable Hardware: Hardware or Software?
HISTORY OF MICROPROCESSORS
PROGRAMMABLE LOGIC CONTROLLERS SINGLE CHIP COMPUTER
Lab 1: Using NIOS II processor for code execution on FPGA
Basic Processor Structure/design
ECE354 Embedded Systems Introduction C Andras Moritz.
Using Xilinx ChipScope Pro Tools
Introduction to Programmable Logic
Process Management Presented By Aditya Gupta Assistant Professor
FPGA Implementation of Multicore AES 128/192/256
Chapter 1: Introduction
Improving java performance using Dynamic Method Migration on FPGAs
عمارة الحاسب.
Introduction to Microprocessors and Microcontrollers
OS Virtualization.
Figure 1 PC Emulation System Display Memory [Embedded SOC Software]
ریز پردازنده. ریز پردازنده مراجع درس میکروکنترلرهای AVR برنامه نویسی اسمبلی و C محمدعلی مزیدی، سپهر نعیمی و سرمد نعیمی مرجع کامل میکروکنترلرهای AVR.
FPro Bus Protocol and MMIO Slot Specification
9.0 EMBEDDED SOFTWARE DEVELOPMENT TOOLS
ChipScope Pro Software
A High Performance SoC: PkunityTM
Chapter 1 Introduction.
VHDL Introduction.
COMS 361 Computer Organization
ChipScope Pro Software
A Level Computer Science Topic 5: Computer Architecture and Assembly
Digital Designs – What does it take
Computer Architecture
Chapter 13: I/O Systems.
Online SystemC Emulation Acceleration
Presentation transcript:

Portable SystemC-on-a-Chip Scott Sirowy, Bailey Miller, and Frank Vahid† Department of Computer Science and Engineering University of California, Riverside {ssirowy,bmiller, vahid}@cs.ucr.edu †Also with the Center for Embedded Computer Systems at UC Irvine This work was supported in part by the National Science Foundation and the Office of Naval Research

Introduction: Prototyping Circuits and Systems Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 go + - MIN 255 data address Edge Detector Pixel Value Task: Create a custom ASIC/FPGA circuit to detect edges in an image

Introduction: Prototyping Circuits and Systems address data go Edge Detector Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 + + + + + + + + + + + + - - + 255 MIN Capture in HDL -- VHDL/Verilog File Entity Edge_Detector is Port { clk : in std_logic; rst : in std_logic; data: in std_logic_vec … };

Introduction: Prototyping Circuits and Systems address data go Edge Detector SystemC C++ based Creation, instantiation, and connection of components Precisely timed communication and execution among concurrently executing components Supports both “software” and “hardware” constructs and semantics Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 + + + + + + + + + + + + - - + 255 MIN Pixel Value Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos();

Introduction: Prototyping Circuits and Systems Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 go - MIN + 255 data address Edge Detector Simulation Requires environment modeling Sometimes hard! Does not interact with real I/O Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Simulation on Desktop PC

Introduction: Prototyping Circuits and Systems Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 go - MIN + 255 data address Edge Detector Implementation Mapping to microprocessor / coprocessor system Interfacing Issues Synthesis Issues Size Constraints Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Mapping & Synthesis

Introduction: Prototyping Circuits and Systems Memory Controller s1 s2 s3 s4 s6 s7 s8 s9 go - MIN + 255 data address Edge Detector In-System Emulation Quickly-obtained simulation interaction with real I/O Prior to time-consuming mapping and synthesis But slower Capture in HDL class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); Emulation

In-System Emulation of SystemC How? Port publicly available SystemC libraries to target platforms SystemC executable has built-in event kernel Libraries are large and require OS support SystemC Description FPGA Processor Processor

Bytecode Modern portability approach Java, C# Java, C# Bytecode Compiler Bytecode Virtual Machine (VM): Program that executes bytecode May JIT compile to native architecture VM VM Opteron VM Pentium Atom

SystemC Bytecode? SystemC SystemC Bytecode Compiler VM VM VM Pentium Opteron + FPGA FPGA

Portable SystemC-on-a-Chip Task: Create a custom circuit to detect edges in an image Processor Emulation Engine SystemC Bytecode Compiler SystemC Description SystemC Bytecode Processor Processor Emulation Engine Processor FPGA SystemC bytecode can run on any platform that supports the SystemC emulation engine, without the need for recompilation or synthesis Emulation Engine Emulation Accelerators

SystemC Bytecode Compiler class EDGE_DETECTOR : public sc_module { //signal declarations … EDGE_DETECTOR() { SC_method(mainComp); sensitive << dataReady; SC_method(getPixel); sensitive << clock.pos(); } Pinapa Front End (Moy, EMSOFT’05) Extracts architectural features and behavior of each process Uses modified versions of GCC and the SystemC kernel Bytecode Back End Flattens original SystemC circuit Generates SystemC bytecode that preserves architecture and behavioral information Output is a human-readable text file SystemC Description Pinapa Front End ELAB AST Link Bytecode Back End Register Allocation Code Generation SystemC Bytecode Compiler SystemC Bytecode

SystemC Bytecode Sequential Instructions Spatial Instructions Based on the RISC MIPS instruction set Efficient emulation (Davis 2003) Spatial Instructions Includes meta instructions for defining architectural features, bit width specific computations, and reading and writing signals --header signal clock : 1 signal reset : 1 signal memory_in : 32 signal fb_data : 32 signal leds : 4 process(clock) READ $1 memory_in ADD $2 $0 3 ADD $3 $2 $1 WRITE $3 s1 ADDI $1 $0 1 WRITE $1 dataReady END process(dataReady) READ $5 val6 SW $5 24($0) READ $5 val7 … ADDI $10 $0 0 ADDI $7 $0 0 ADDI $13 $0 8 SystemC Bytecode Spatial Constructs MIPS-like sequential instructions

SystemC Emulation Engine Must support a basic SystemC interface Clock Reset 16 I/O pins 8KB Input Memory 8KB Output Memory UART Platforms with more advanced I/O might support more features Increased Memory Extended General Purpose I/O Output I/O SystemC Circuit Clock UART Tx Reset Input Mem Addr Input I/O Input Mem Stream UART Rx Input Mem Data Output Mem Addr Output Mem Data

SystemC Emulation Engine Real I/O Peripherals Representative of many systems Emulation Engine Kernel Virtual Machine Discrete Event Kernel Peripheral Access and Hooks Optional USB Download Interface Emulation Engine Main Processor Input Memory Output Memory USB Interface Instruction Memory UART Read Signal Memory Buttons Write Signal Memory LEDs USB Download Interface I/O Peripherals Emulation Engine Kernel and Support Peripherals

Emulation Engine Acceleration For some SystemC applications, emulation can be slow An Edge Detection circuit required ~10 minutes to process a 320x240 image * Input Memory Main Processor SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs * on a 100 MHz/SRAM Microblaze SystemC Emulation Engine implementation

Emulation Engine Acceleration For some SystemC applications, emulation can be slow An Edge Detection circuit required ~10 minutes to process a 320x240 image * Input Memory Main Processor SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs If available, use platform FPGA to create bytecode accelerators Execute SystemC bytecode natively Accelerator 1 Accelerator 2 Accelerator 3 FPGA Accelerators speedup emulation * on a 100 MHz Microblaze SystemC Emulation Engine implementation

SystemC Bytecode Accelerators MIPS-like multicycle RISC datapath Communicates to core emulator via memory-mapped registers # of accelerators limited to # of masters allowed on bus Emulation Engine Input Memory Main Processor SystemC bytecode Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs Accelerator RISC Datapath Register File Local Mem Bus, start, load logic Accelerator 1 Accelerator 2 Accelerator 3 FPGA

SystemC-on-a-Chip Implementation Xilinx Spartan 3E Virtex4 Ml403 Virtex5 VLX110T * Platform *Currently building Microblaze (50 MHz) PowerPC (50 MHz) Microblaze (100 MHz) Main Processor Bus Platform OPB PLB PLB SRAM SRAM+BRAM Main Memory BRAM # Emulation Accelerators 0-1 1-2 >3 Accelerator Accelerator Accelerator Accelerator Accelerator Accelerator * Demo

SystemC-on-a-Chip Implementation Pinapa ELAB AST Link Back End SystemC Bytecode Compiler SystemC Bytecode compiler 3,500 lines of code + Pinapa (20,000 lines) Emulation Engine Input Memory Main Processor Output Memory Instruction Memory UART Read Signal Memory SystemC Emulation Engine 3,000 lines of C + 8,000 lines of VHDL USB Interface Buttons Write Signal Memory LEDs Accelerator 1 Accelerator 2 Accelerator 3 FPGA

SystemC-on-a-Chip Implementation Emulation Engine SystemC Bytecode Accelerator 2,000 lines of VHDL Area: ~3000 Slices Clock Frequency: 50-100 MHz Input Memory Main Processor Output Memory Instruction Memory UART Read Signal Memory USB Interface Buttons Write Signal Memory LEDs Accelerator RISC Datapath Register File Local Mem Bus, start, load logic Accelerator 1 Accelerator 2 Accelerator 3 FPGA

SystemC-on-a-Chip Experiments Competitive with SystemC PC Simulation, but with the benefits of real I/O Emulation Engine Execution Time Main Processor Input Memory Output Memory Instruction Memory UART Read Signal Memory USB Interface Base Emulation on Virtex 4 Buttons Base Emulation on Virtex 5 Write Signal Memory Emulation + Accelerators (Virtex 4) LEDs Emulation + Accelerators (Virtex 5) Accelerator 1 Execution Time Normalized to SystemC running on a 2.8 GHz Intel Xeon Accelerator 2 Accelerator 3

Conclusions Introduced SystemC Bytecode as a means to emulate SystemC for prototyping For platforms with FPGA resources, introduced bytecode accelerators to speed up SystemC performance Outperforms emulation by over 100X As proof of concept, built 3 test platforms and tested multiple SystemC circuits without having to recompile or synthesize Future Directions Emulation architecture improvements Synthesizing SystemC just-in-time?