Workshop - November 2011 - Toulouse A.BERJAOUI (AKKA IS for Astrium) A.LEFEVRE & C. LE LANN (Astrium) SystemC/TLM virtual platforms Use of SystemC/TLM.

Slides:



Advertisements
Similar presentations
Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
Advertisements

Nios Multi Processor Ethernet Embedded Platform Final Presentation
SoC Challenges & Transaction Level Modeling (TLM) Dr. Eng. Amr T. Abdel-Hamid ELECT 1002 Spring 2008 System-On-a-Chip Design.
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
Categories of I/O Devices
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Some Trends in High-level Synthesis Research Tools Tanguy Risset Compsys, Lip, ENS-Lyon
1 General-Purpose Languages, High-Level Synthesis John Sanguinetti High-Level Modeling.
Copyright  2003 Dan Gajski and Lukai Cai 1 Transaction Level Modeling: An Overview Daniel Gajski Lukai Cai Center for Embedded Computer Systems University.
1  1998 Morgan Kaufmann Publishers Interfacing Processors and Peripherals.
Computer Science and Engineering Laboratory, Transport-triggered processors Jani Boutellier Computer Science and Engineering Laboratory This.
Using emulation for RTL performance verification
Final Presentation Part-A
Internal Logic Analyzer Final presentation-part B
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 3: Input/output and co-processors dr.ir. A.C. Verschueren.
Synchron’08 Jean-François LE TALLEC INRIA SOP lab, AOSTE INRIA SOP lab, EPI AOSTE ScaleoChip Company SoC Conception Methodology.
MotoHawk Training Model-Based Design of Embedded Systems.
Puneet Arora ESCUG, 09 Abstraction Levels in SoC Modelling.
Transaction Level Modeling with SystemC Adviser :陳少傑 教授 Member :王啟欣 P Member :陳嘉雄 R Member :林振民 P
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
February 21, 2008 Center for Hybrid and Embedded Software Systems Mapping A Timed Functional Specification to a Precision.
Reliable Data Storage using Reed Solomon Code Supervised by: Isaschar (Zigi) Walter Performed by: Ilan Rosenfeld, Moshe Karl Spring 2004 Midterm Presentation.
Dipartimento di Informatica - Università di Verona Networked Embedded Systems The HW/SW/Network Cosimulation-based Design Flow Introduction Transaction.
1 RAMP Infrastructure Krste Asanovic UC Berkeley RAMP Tutorial, ISCA/FCRC, San Diego June 10, 2007.
Educational Computer Architecture Experimentation Tool Dr. Abdelhafid Bouhraoua.
Lecture 7 Lecture 7: Hardware/Software Systems on the XUP Board ECE 412: Microcomputer Laboratory.
Embedded Systems Design at Mentor. Platform Express Drag and Drop Design in Minutes IP Described In XML Databook s Simple System Diagrams represent complex.
VerificationTechniques for Macro Blocks (IP) Overview Inspection as Verification Adversarial Testing Testbench Design Timing Verification.
Role of Standards in TLM driven D&V Methodology
Workshop - November Toulouse Ronan LUCAS - Magillem Design Services 07/04/2011.
11 Using SPIRIT for describing systems to debuggers DSDP meeting February 2006 Hobson Bullman – Engineering Manager Anthony Berent – Debugger Architect.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
SystemC: A Complete Digital System Modeling Language: A Case Study Reni Rambus Inc.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Automated Design of Custom Architecture Tulika Mitra
1 Integration Verification: Re-Create or Re-Use? Nick Gatherer Trident Digital Systems.
Computer Architecture Lecture10: Input/output devices Piotr Bilski.
High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.
SystemC and Levels of System Abstraction: Part I.
Workshop - November Toulouse Paul Brelet TRT Exploration and application deployment on a SoC: efficient application.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation.
Los Alamos National Lab Streams-C Maya Gokhale, Janette Frigo, Christine Ahrens, Marc Popkin- Paine Los Alamos National Laboratory Janice M. Stone Stone.
Workshop - November Toulouse Toulouse, J.LACHAIZE (Astrium) High Level Synthesis.
PROJECT - ZYNQ Yakir Peretz Idan Homri Semester - winter 2014 Duration - one semester.
Workshop - November Toulouse Astrium Use Case.
SCE-MI Meeting 1 San Jose’, 14 th Nov Author: Andrea Castelnuovo SCE-MI Integrating Emulation in a system level design methodology San Jose’, 14/11/2003.
SOC Virtual Prototyping: An Approach towards fast System- On-Chip Solution Date – 09 th April 2012 Mamta CHALANA Tech Leader ST Microelectronics Pvt. Ltd,
Teaching The Principles Of System Design, Platform Development and Hardware Acceleration Tim Kranich
Way beyond fast © 2002 Axis Systems, Inc. CONFIDENTIAL Axis Common Transaction Interface (CTI) Architecture Highlights 9/11/2003 Ching-Ping Chou Axis Systems,
Recen progress R93088 李清新. Recent status – about hardware design Finishing the EPXA10 JPEG2000 project. Due to the DPRAM problem can’t be solved by me,
بسم الله الرحمن الرحيم MEMORY AND I/O.
Los Alamos National Laboratory Streams-C Maya Gokhale Los Alamos National Laboratory September, 1999.
Creation and Utilization of a Virtual Platform for Embedded Software Optimization: An Industrial Case Study Sungpack Hong, Sungjoo Yoo, Sheayun Lee, Sangwoo.
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
Presenter: Yi-Ting Chung Fast and Scalable Hybrid Functional Verification and Debug with Dynamically Reconfigurable Co- simulation.
April 15, 2013 Atul Kwatra Principal Engineer Intel Corporation Hardware/Software Co-design using SystemC/TLM – Challenges & Opportunities ISCUG ’13.
Programmable Hardware: Hardware or Software?
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
SOFTWARE DESIGN AND ARCHITECTURE
IP – Based Design Methodology
Design Flow System Level
Introduction to cosynthesis Rabi Mahapatra CSCE617
Figure 1 PC Emulation System Display Memory [Embedded SOC Software]
CoCentirc System Studio (CCSS) by
Matlab as a Development Environment for FPGA Design
Matlab as a Design Environment for Wireless ASIC Design
A High Performance SoC: PkunityTM
Chapter 13: I/O Systems.
Presentation transcript:

Workshop - November Toulouse A.BERJAOUI (AKKA IS for Astrium) A.LEFEVRE & C. LE LANN (Astrium) SystemC/TLM virtual platforms Use of SystemC/TLM virtual platforms for the exploration, the specification and the validation of critical embedded SoC

2 Overview Context Separation of time & functionality presentation Timed TLM models Vs CABA models Design Space Exploration with SystemC/TLM 2.0 HW in the loop – Use of CHIPit® Future prospects Open questions

Context Define a proper method to use SystemC/TLM for SoC modelling Use SystemC/TLM for DSE (performance estimation, bottleneck identification…) Use SystemC/TLM models for HW specification Evaluate the selected methodology

SystemC/TLM Usage Context Define a proper method to use SystemC/TLM for SoC modelling Use SystemC/TLM for DSE (performance estimation, bottleneck identification…) Use SystemC/TLM models for HW specification Evaluate the selected methodology

Programmers View (PV) or functional simulation Time is not represented, only functionality is modelled. Functional synchronization is necessary. It is done at System Synchronization Points (SSP): configuration registers access, interrupts and all state alternating accesses.

The need for time Performance measurements Design Space Exploration …how ??? Precision? Modelling granularity? Simulation performance?

The obvious solution: mixing time and functionality It works !!! …but… Functional modifications cannot be verified without having to verify all timed aspects as well Modelling granularity is hard to modify once it has been set Modules cannot be easily reused for other platforms

8 Separation of time & functionality Initiator port Target port Memory T ISS PV ISS PVT Memory PVT PV router Memory PV ISS T Detailed bus model ISSRouterMemory

Functional simulation phase Timed simulation phase 9 ISS PV PV router Memory PV Initiator port Target port ISS T ISS PVT Detailed bus model Memory T Memory PVT T= 0 nsT= 1 nsT= 2 nsT= 3 nsT= 4 nsT= 5 nsT= 6 nsT= 7 nsT= 8 nsT= 9 nsT= 10 nsT= 11 nsT= 12 ns

Advantages and limitations PV & T mixed Modelling is natural. Platforms are simple. Interrupts can be modelled easily Granularity is fixed Mixed debugging and no control over simulation performance Reuse problem PV & T separated Parallel development and debug of reusable PV and T models Granularity can be controlled easily (by changing T model) Modelling is more abstract. Platforms are complex Interrupts are harder to model

TTP in the industry Modelling is too complex to be used by architects Modules are not re-used enough to justify such a modelling effort Traffic generators are enough for DSE. Detailed functionality does not need to be specified for performance estimation. HW specification is easier using cycle approximate/bit accurate models In its current form, TTP cannot be used on an industrial scale:

Timed TLM vs CABA models Different time modelling granularities: CABA in HDL => available, but slow simulations CABA in SystemC => not interesting (not available and slow simulations) Timed TLM (SystemC AT) => preferred A timed TLM model of an existing RTL IP has been build to evaluate the methodology and assess the necessary effort RTL IP chosen = SDRAM memory controller, because: this is a central module in SoC architecture explorations its timing behaviour is harder to determine than other modules (AHB buses for example)

SDRAM Memory Controller The Memory Controller is the interface between the SoC bus and the external (on-board) memories One access latency depends on: the access parameters the controller internal state Objective for the timed model : the model should be pessimistic=longer than the RTL +0 to +20 % timing accuracy

Time analysis methodology RTL analysis RTL is composed of intricate cycle- based State Machines Requires manual extraction of timing rules May need to duplicate the RTL FSM in the TLM model Not interesting Macroscopic analysis Using RTL simulations to produce timing information Either guided statistics choice Or semi-automated using scripts Elected method

Macroscopic time analysis Guided time analysis Timing data is extracted from RTL simulations (traces of all the timings + relevant parameters) Rules are guessed by manually analyzing the traces… …and then automatically tested against a calibration test set This process iterates until the timing accuracy is satisfactory Results of the time analysis iterations The parameters of the previous access also have a major impact (in addition to the parameters of the current access) Some features interfere (refresh and automatic scrubbing)

Timed Model Validation This timing model has been checked against RTL on an extensive test set more than transactions comes from the RTL validation test suite FrequencyMistimed transactions Latency error 32 MHz18%12% 48 MHz14%17% 64 MHz14%18% 96 MHz17% Validation results The model is pessimistic (longer than the RTL) Latency error between 12%-18% The model is too simple to be 100% exact But the goal is to keep a high level of abstraction Possibility to increase the accuracy if necessary

17 Overview Context Separation of time & functionality presentation Timed TLM models vs. CABA models Design Space Exploration with SystemC/TLM 2.0 HW in the loop – Use of CHIPit® Future prospects Open questions

Design Space Exploration with SystemC/TLM 2.0 A simple image processing platform has been designed to assess the use of SystemC/TLM for design space exploration

Algorithm Image spectral-compression platform Performs subsampling on incoming data packets Subsampled packets are then transferred to an auxiliary processing unit which performs a 2D-FFT (using a co-processor) and data encoding Subsampling Encoding 5N 10N 2D-FFT 5N N Input Output

Processing platform Mem_a DMA_aLeon_a Mem_b Leon_bDMA_b FFT IO

Processing platform (contd) IO module generates an interrupt causing DMA_a to transfer the input packet of size 10N to Mem_a At the end of the transfer, Leon_a subsamples the data and writes the result to Mem_a Leon_a configures DMA_b to transfer the result to Mem_b At the end of the transfer, Leon_b configures the FFT module to perform a 2D-FFT Leon_b encodes the result and programs DMA_b to send the result to the IO module

SystemC implementation TLM-2 compliant (time & functionality are mixed) Data exchange is AMBA – bus accurate (single/burst transactions, split) Data sizes are respected and packets are identified by a packet ID. The Leon processor modules act as smart traffic generators: they generate transactions in the correct order towards the appropriate targets. OS tasks are simulated using SC_THREADs

SystemC implementation (contd) No actual processing is performed. Processing time is simulated Bus occupation, processing loads for all processing units were measured accurately A system synchronization bug was identified => a lock register has been added to lock DMA_b during its configuration It was possible to observe the impact of the modification of HW parameters and the input data rate. DMA_a was identified as a bottleneck. ABV could also be implemented using ISIS

Example

25 Overview Context Separation of time & functionality presentation Timed TLM models vs. CABA models Design Space Exploration with SystemC/TLM 2.0 HW in the loop – Use of CHIPit® Future prospects Open questions

HW in the loop – use of CHIPit CHIPit Virtex-based development platform Custom extension boards (SDRAM, Flash, IO, …) UMRBus = practical & fast PC-CHIPit ready-made interface

HW in the loop – use of CHIPit CHIPit can be used for : Incremental validation flow SC/TLM testbench composed of multiple sub-blocks Some sub-blocks may run on hardware (FPGA) The others still run as software SC functional models Soft-hard inter-block transactions via UMRBus + extra SystemC/VHDL Improved simulation speed times faster is possible fewer soft-hard transactions = better improvement

HW in the loop – use of CHIPit What happens on a transaction ? Uncontrolled clock mode HW clock keeps working during a transaction SW clock and HW clock are not synchronised Easy to implement Controlled clock mode HW clock is stopped upon each transaction, waiting for soft SW clock and HW clock are synchronised on transaction bounds Needed if inputs/outputs must observe precise relative timings Harder to implement, more timing issues Not possible for all designs : complex designs require extra care SDRAM controller needs constant auto-refresh Inputs from extension boards may need immediate treatment

HW in the loop – use of CHIPit Uncontrolled clock example : whole system overview Electronic board with inputs/outputs to other electronic systems SDRAM for internal data storage ASIC/FPGA for data processing

HW in the loop – use of CHIPit Uncontrolled clock example : ASIC internal view Data processing composed of several sub-blocks Sub-blocks perform independent tasks Sequenced altogether with very few signals (eg. req/ack)

HW in the loop – use of CHIPit Uncontrolled clock example : ASIC re-modelling for HW Sequencer control signals re-modelled as APB transactions Inter-block FIFOs splitted (FIFO->SDRAM and SDRAM->FIFO) FIFOs mapped on AHB buses at fixed addresses Added DMAs to handle pipeline inputs and outputs from/to memory DMA channels can perform any AHB transfer (eg. SDRAM FIFO)

HW in the loop – use of CHIPit Uncontrolled clock example : ASIC re-modelling for SC Use of TLM2 transactions between blocks SDRAM+controller merged into a memory abstraction model SDRAM access ports re-modelled as AHB buses

HW in the loop – use of CHIPit Benefits Same C file used for both Gaut VHDL generation and SystemC full-soft emulation intrinsic algorithm consistency between model and hardware Few steps necessary from Gaut regeneration to FPGA synthesis and SC model compilation, scriptable for process automation handy for fast algorithm exploration Outcome: SystemC model executable, allowing choice at runtime between full-soft functional model and soft+hard co-simulation $> scmodel SIMU input.bin output_simu.bin > log_simu.txt $> scmodel CHIPit input.bin output_hard.bin > log_hard.txt $> diff output_simu.bin output_hard.bin $>

HW in the loop – use of CHIPit Limitations Still have to develop SystemC+VHDL for each new transactor Limits whole process automation Encourages the use of common transactor types (AMBA, etc) Controlled clock mode much more complex to implement Encourages the design of independent blocks, inter-connected via a few FIFOs or via a common memory Blocks with strong timing requirements on IO hardly compatible with uncontrolled clock mode (better design with intelligent IO behaviour : req+ack, handshake, etc) Implementation limited to actual CHIPit resources SDRAM bus width is static (cannot test larger bus than available) Custom extension boards required as early as algorithm exploration

HW in the loop – use of CHIPit SceMi : the wanna-be standard for co-simulation Formerly proposed by Cadence, now transferred to Accelera Defines a C++ API for HW-SW co-simulation Controlled clock / uncontrolled clock modes Function-based interface Pipe-based interface (C++ stream = hardware FIFO) Multi-threaded operation on software side CHIPit SceMi library available Needs a supplementary licence Just a wrapper over UMRBus libraries to provide clock control All transactors still need to be coded by hand (SystemC+VHDL) still a lot of work to do before getting co-simulation working

36 Overview Context Separation of time & functionality presentation Timed TLM models vs. CABA models Design Space Exploration with SystemC/TLM 2.0 HW in the loop – Use of CHIPit® Future prospects Open questions

Space industry applicability SystemC/TLM is suitable for DSE with the use of HLS Specification flow needs to be sorted out

Future prospects Important need in development infrastructure: Abstraction layer (architects are not TLM2 experts) Interrupts and streaming modelling (TLM is currently a memory mapped platform oriented protocol) Build and assembly tools are needed Well defined modelling guidelines should be established

Workshop - November 2011 Thank you ? ? ? Any questions ?

Open questions Who does the modelling? System, HW or SW architect? SW validation uses paper specs => Towards validation using HW based models in SystemC/TLM? Towards a TLM3 standard? With embedded systems industrial partners such as Airbus and Astrium? (Business model?)