CoDeveloper Overview Updated February 19, 2004. 2 Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.

Slides:



Advertisements
Similar presentations
Developing Video Applications on Xilinx FPGAs
Advertisements

Graduate Computer Architecture I Lecture 16: FPGA Design.
LOGO HW/SW Co-Verification -- Mentor Graphics® Seamless CVE By: Getao Liang March, 2006.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Week 1- Fall 2009 Dr. Kimberly E. Newman University of Colorado.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Term Project Overview Yong Wang. Introduction Goal –familiarize with the design and implementation of a simple pipelined RISC processor What to do –Build.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Configurable System-on-Chip: Xilinx EDK
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
MAPLD 2005 A High-Performance Radix-2 FFT in ANSI C for RTL Generation John Ardini.
Performance Analysis of Processor Characterization Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor:
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
Performance Analysis of Processor Midterm Presentation Performed by : Winter 2005 Alexei Iolin Alexander Faingersh Instructor: Evgeny.
1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.
1 Chapter 14 Embedded Processing Cores. 2 Overview RISC: Reduced Instruction Set Computer RISC-based processor: PowerPC, ARM and MIPS The embedded processor.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
Hardware-Software Codesign Elvira Kitsis Hermawan Ho Alex Papadimoulis.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.
HW/SW Co-Design of an MPEG-2 Decoder Pradeep Dhananjay Kiran Divakar Leela Kishore Kothamasu Anthony Weerasinghe.
Hardware/Software Partitioning Witawas Srisa-an Embedded Systems Design and Implementation.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
© 2011 Xilinx, Inc. All Rights Reserved Intro to System Generator This material exempt per Department of Commerce license exception TSU.
Delevopment Tools Beyond HDL
Digital signature using MD5 algorithm Hardware Acceleration
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Impulse Embedded Processing Video Lab Generate FPGA hardware Generate hardware interfaces HDL files HDL files FPGA bitmap FPGA bitmap C language software.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
By: Oleg Schtofenmaher Maxim Fudim Supervisor: Walter Isaschar Characterization presentation for project Winter 2007 ( Part A)
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
Ch.9 CPLD/FPGA Design TAIST ICTES Program VLSI Design Methodology Hiroaki Kunieda Tokyo Institute of Technology.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Automated Design of Custom Architecture Tulika Mitra
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
SPREE RTL Generator RTL Simulator RTL CAD Flow 3. Area 4. Frequency 5. Power Correctness1. 2. Cycle count SPREE Benchmarks Verilog Results 3. Architecture.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
GRECO - CIn - UFPE1 A Reconfigurable Architecture for Multi-context Application Remy Eskinazi Sant´Anna Federal University of Pernambuco – UFPE GRECO.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
Los Alamos National Lab Streams-C Maya Gokhale, Janette Frigo, Christine Ahrens, Marc Popkin- Paine Los Alamos National Laboratory Janice M. Stone Stone.
Developing software and hardware in parallel Vladimir Rubanov ISP RAS.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
LAB1 Summary Zhaofeng SJTU.SOME. Embedded Software Tools CPU Logic Design Tools I/O FPGA Memory Logic Design Tools FPGA + Memory + IP + High Speed IO.
© 2004 Xilinx, Inc. All Rights Reserved Embedded Processor Design.
BridgePoint Integration John Wolfe / Robert Day Accelerated Technology.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
FPL Sept. 2, 2003 Software Decelerators Eric Keller, Gordon Brebner and Phil James-Roxby Xilinx Research Labs.
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center.
UClinux console (HyperTerminal) Memec V2MB1000 prototyping board running uClinux on embedded Xilinx® MicroBlaze™ processor Development system with Xilinx.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Fail-Safe Module for Unmanned Autonomous Vehicle
Teaching Digital Logic courses with Altera Technology
Automated Software Generation and Hardware Coprocessor Synthesis for Data Adaptable Reconfigurable Systems Andrew Milakovich, Vijay Shankar Gopinath, Roman.
Programmable Hardware: Hardware or Software?
Lab 1: Using NIOS II processor for code execution on FPGA
Dynamo: A Runtime Codesign Environment
Andrew Putnam University of Washington RAMP Retreat January 17, 2008
Introduction to cosynthesis Rabi Mahapatra CSCE617
High Level Synthesis Overview
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Portable SystemC-on-a-Chip
Presentation transcript:

CoDeveloper Overview Updated February 19, 2004

2 Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature a mix of traditional processor elements and programmable hardware elements  Altera, Xilinx, and other key platforms  Emphasizing software-oriented design methods  Compatible with familiar software development frameworks  Including Visual Studio, Codewarrior, GCC, Eclipse, etc.  Providing crucial support for hardware/software interfaces  Tools efficiently map the application to the platform  Programming model supports highly parallel hw/sw systems  Providing more than just C to hardware compilation!

3 CoDeveloper Design Flow CoBuilder™ RTL generator CoBuilder™ RTL generator Impulse C design files FPGA synthesis tool FPGA synthesis tool Generated HDL files Generated HDL files Target cross compiler Target cross compiler Impulse Platform Libraries Impulse Platform Libraries CoBuilder™ architecture generator CoBuilder™ architecture generator CoBuilder™ library generator CoBuilder™ library generator Generated software libraries Generated software libraries Impulse hardware libraries Impulse hardware libraries Impulse software libraries Impulse software libraries Generated HDL files Generated HDL files Complete software/hardware application ready to implement on target platform

4 Accelerating Applications: How? 1.Identify and exploit low-level parallelism in standard C  Examples: pipeline and/or unroll loops, schedule assignments, generate hardware for C processes  Advantage: can accelerate legacy C algorithms, inner code loops, etc. and reduce need for manual optimization 2.Encourage and assist programmers in the use of alternate, more efficient parallel programming techniques  Support the language they are most familiar with (ANSI C)  Provide libraries, tools and examples for parallel programming  Advantage: best way to address system-level optimizations  There is where the biggest performance gains are found!

5 Application Domains  Applications requiring repetitive computations at very high speed  Dataflow-oriented, high degrees of parallelism  Pipelined algorithms (e.g. filters)  For processing streams of data in real time  Imaging  Communications  Digital Signal Processing (DSP)

6 Targeting Programmable Platforms  Standard RISC processor plus FPGA  FPGA with embedded (“soft”) RISC processor RISC FPGA RISC FPGA Impulse C Applications Two categories:

7 Who Makes These Platforms?  Altera: Nios™ processor  Cyclone and Stratix FPGA families  Xilinx: MicroBlaze™, PowerPC™ processors  Virtex and Spartan FPGA families  Quicklogic: QuickMIPS™  Others yet to be announced  This is a HOT area in the FPGA industry

8 Mapping Applications to Platforms S/W process H/W process S/W process H/W process S/W process Embedded processor FPGA hardware resources Bus interface FPGA-based platform This is not a trivial task!

9 Example: Altera Nios S/W process H/W process S/W process H/W process S/W process NIOS processor FPGA hardware resources Avalon interface Cyclone or Stratix FPGA Impulse C application

10 Example: Xilinx MicroBlaze S/W process H/W process S/W process H/W process S/W process MicroBlaze or PowerPC processor FPGA hardware resources OPB and/or FSL interface Virtex or Spartan FPGA Impulse C application

11 How Is This Done Today?  Software engineer writes code (typically C) for embedded processor  High design productivity using high-level languages  But low-performance results (processors are SLOW)  Hardware engineer writes low-level HDL code for FPGA or ASIC portion  Very low design productivity (1/10 th or less of the s/w eng)  But high performance results (hardware runs FAST)  System designer is the hardware/software mediator  Specifies hardware/software interfaces and locks down the design to reflect hardware design lead-times

12 How Does CoDeveloper Help?  Reduces or eliminates the need to re- implement software routines as low-level hardware  Automates the process of creating hardware/software interfaces  Provides a software programming model appropriate for the target platform

13 Impulse for HW/SW Codesign Generate RTL Impulse C design files Altera Quartus, Xilinx ISE, other FPGA tools Altera Quartus, Xilinx ISE, other FPGA tools HDL files HDL files Impulse Platform Libraries Impulse Platform Libraries Generate hardware interfaces Generate hardware interfaces Generate software interfaces Software libraries Software libraries HDL files HDL files Nios compiler, MicroBlaze compiler, others Visual Studio ™ CodeWarrior ™ GCC, etc.

14 From C Application to FPGA Platform S/W process H/W process S/W process H/W process S/W process 3. FPGA hardware is automatically created from C language processes 2. Automatic generation of hardware/software interfaces is optimized for target platforms 1. Platform libraries support existing embedded compiler environments The result? Accelerated software with minimal need for hardware or FPGA design knowledge

15 CoDeveloper™ Tool Flow CoBuilder™ RTL generator CoBuilder™ RTL generator Impulse C design files CoMonitor™ application monitor CoMonitor™ application monitor Host standard C compiler Host standard C compiler FPGA synthesis tool FPGA synthesis tool Host simulation executable Host simulation executable Legacy C algorithms Legacy C algorithms HDL design files HDL design files CoValidator™ HDL simulator CoValidator™ HDL simulator CoWave™ waveform viewer CoWave™ waveform viewer Generated HDL files Generated HDL files Host C debugger Target cross compiler Target cross compiler Impulse Design Assistant Impulse Design Assistant Target download binary Target download binary Impulse Platform Libraries Impulse Platform Libraries Target ISS and/or debugger Target ISS and/or debugger Impulse Simulation Libraries Impulse Simulation Libraries

16 Impulse C  Based on Streams-C from Los Alamos National Labs  Dataflow-oriented C language extensions for highly parallel systems  HDL generation for FPGA elements  C language or other code generation for processor elements  Impulse improvements include:  Language redesign (standard ANSI C with Impulse C library functions)  Redesigned scheduling and code generation for FPGA targets  Application Manager, Application Monitor and full compatibility with Visual Studio and other standard development/debugging environments

17  Modified Communicating Sequential Process (CSP) model  Buffered communication channels (FIFOs) to implement streams  Supports dataflow and message-based communications between functional units and local or shared memories  Supports parallelism at the application level and at the level of individual processes (via automated scheduling/pipelining) Impulse C Programming Model S/W process H/W process S/W process

18 Impulse C Processes Software processes set up data and perform non time- critical functions Hardware processes are independently synchronized and perform most of the work C language process C language process Shared memory reads/writes Stream inputs Stream outputs Signal inputs Signal outputs App Monitor outputs

19 CoBuilder Compilation Process CoBuilder™ RTL generator CoBuilder™ RTL generator Impulse C design files FPGA synthesis tool FPGA synthesis tool Generated HDL files Generated HDL files Target cross compiler Target cross compiler Impulse Platform Libraries Impulse Platform Libraries CoBuilder™ architecture generator CoBuilder™ architecture generator CoBuilder™ library generator CoBuilder™ library generator Generated software libraries Generated software libraries Impulse hardware libraries Impulse hardware libraries Impulse software libraries Impulse software libraries Generated HDL files Generated HDL files Complete software/hardware application ready to implement on target platform

20 Impulse C H/W-S/W interfaces Impulse C application, compiled using standard C cross-compiler and related tools Impulse C h/w processes, compiled to HDL by CoBuilder and synthesized using standard tools Library of h/w interface components Bus interface wrapper automatically generated by CoBuilder for the selected platform Impulse C function library for stream and signal h/w-s/w interfaces Interface routines (drivers) automatically generated by CoBuilder for the selected platform Standard bus specified for target platform Impulse C runtime library FPGA platform bus Processor/FPGA interface (s/w driver generated by CoBuilder) FPGA hardware wrapper (generated by CoBuilder as HDL) Impulse C hardware library Application software (embedded processor application) Impulse C hardware processes (generated by CoBuilder as HDL)

Impulse C Case Study 3DES Encryption (Xilinx Microblaze platform)

22 Case Study: 3DES Encryption  Use publicly available 3DES source code  Written by Phil Karn (  Not written to take advantage of parallel programming techniques (optimized for typical processor targets)  Make minimal changes in support of streams-based communication  Use stream reads/writes for data blocks and keys  Simulate legacy and Impulse C versions using same data  Build one simulation executable containing both versions  Compile to hardware, compare results again  Legacy version runs in uP, Impulse C version runs in hardware

23 Application Structure (HW Test) Reference/prototype board Producer (random data) H/W encrypt S/W encrypt (legacy C) Consumer (compare results and speed) Embedded processor FPGA gates “Platform” FPGA

24 3DES Desktop Simulation  Complete 3DES application compiled using standard tools  Legacy C and Impulse C compiled as one application  Verified using both Visual Studio.NET and GCC/GDB  Application Monitor used to debug and improve stream data movement  Example: identified throughput increase possible through simple increase in buffer sizes (double-buffer inputs)

25 3DES Desktop Simulation Producer (from file) H/W encrypt (Impulse C) S/W encrypt (legacy C) Consumer (display results) S/W decrypt (legacy C) H/W decrypt (Impulse C)

26 Desktop Simulation CoDeveloper Application Monitor

27 Compiler Flow Generate RTL Impulse C design files Altera Quartus, Xilinx ISE, other FPGA tools Altera Quartus, Xilinx ISE, other FPGA tools HDL files HDL files Impulse Platform Libraries Impulse Platform Libraries Generate hardware interfaces Generate hardware interfaces Generate software interfaces Software libraries Software libraries HDL files HDL files Nios compiler, MicroBlaze compiler, others Visual Studio ™ CodeWarrior ™ GCC, etc.

28 Results (Virtex II FPGA)  MicroBlaze clock running at 100MHz  3DES algorithm clocking at 24MHz  Performance (1000 blocks):  Clock cycles to process one block: 149 (not including I/O overhead)  36X speedup over software-only version 3DES in MicroBlaze3DES in FPGA 252 ms6.9 ms* *Includes I/O overhead: 11%

29 Summary  CoDeveloper and Impulse C provide:  Programming model and tools supporting hardware / software codesign for programmable hardware targets  Compiler technologies (CoBuilder) allowing software processes to be compiled directly to hardware.  CoDeveloper is a software-oriented solution  Designed (and priced) to appeal to software developers, not just leading edge FPGA experts  Designed to operate in conjunction with existing software development tools and tool flows  Designed to accelerate hardware/software prototypes and products