- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Actual design flows and tools.

Slides:



Advertisements
Similar presentations
Fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.
Advertisements

Fakultät für informatik informatik 12 technische universität dortmund Specifications and Modeling Peter Marwedel TU Dortmund, Informatik 12 Graphics: ©
Technische universität dortmund fakultät für informatik informatik 12 Discrete Event Models Peter Marwedel TU Dortmund, Informatik 12 Germany
Embedded Systems & Parallel Programming P. Marwedel, Univ. Dortmund/Informatik 12 + ICD/ES, 2007 Universität Dortmund A view on embedded systems.
Embedded System, A Brief Introduction
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Technische universität dortmund fakultät für informatik informatik 12 Discrete Event Models Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund,
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Reference: Message Passing Fundamentals.
Behavioral Synthesis Outline –Synthesis Procedure –Example –Domain-Specific Synthesis –Silicon Compilers –Example Tools Goal –Understand behavioral synthesis.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Review of “Embedded Software” by E.A. Lee Katherine Barrow Vladimir Jakobac.
Architectural-Level Synthesis Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long.
Models of Computation for Embedded System Design Alvise Bonivento.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
MOBIES Project Progress Report Engine Throttle Controller Design Using Multiple Models of Computation Edward Lee Haiyang Zheng with thanks to Ptolemy Group.
Mahapatra-Texas A&M-Fall'001 Codesign Framework Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
1 Embedded Computer System Laboratory RTOS Modeling in Electronic System Level Design.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
What is Software Architecture?
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2005/6 Universität Dortmund Specifications.
EECE **** Embedded System Design
Composing Models of Computation in Kepler/Ptolemy II
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
CAD Techniques for IP-Based and System-On-Chip Designs Allen C.-H. Wu Department of Computer Science Tsing Hua University Hsinchu, Taiwan, R.O.C {
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Extreme Makeover for EDA Industry
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
1 2-Hardware Design Basics of Embedded Processors (cont.)
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright.
A Methodology for Architecture Exploration of heterogeneous Signal Processing Systems Paul Lieverse, Pieter van der Wolf, Ed Deprettere, Kees Vissers.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
3 rd Nov CSV881: Low Power Design1 Power Estimation and Modeling M. Balakrishnan.
Ihr Logo Operating Systems Internals & Design Principles Fifth Edition William Stallings Chapter 2 (Part II) Operating System Overview.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Fall 2004EE 3563 Digital Systems Design EE 3563 VHSIC Hardware Description Language  Required Reading: –These Slides –VHDL Tutorial  Very High Speed.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Presentation by Tom Hummel OverSoC: A Framework for the Exploration of RTOS for RSoC Platforms.
CprE 588 Embedded Computer Systems Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #5 – System-Level.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Gedae, Inc. Gedae: Auto Coding to a Virtual Machine Authors: William I. Lundgren, Kerry B. Barnes, James W. Steed HPEC 2004.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
System-on-Chip Design Hao Zheng Comp Sci & Eng U of South Florida 1.
Hardware/Software Co-Design of Complex Embedded System NIKOLAOS S. VOROS, LUIS SANCHES, ALEJANDRO ALONSO, ALEXIOS N. BIRBAS, MICHAEL BIRBAS, AHMED JERRAYA.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
System-on-Chip Design
Chapter 1 Introduction.
EEE2135 Digital Logic Design Chapter 1. Introduction
Chapter 1 Introduction.
IP – Based Design Methodology
Introduction to cosynthesis Rabi Mahapatra CSCE617
Retargetable Model-Based Code Generation in Ptolemy II
Architectural-Level Synthesis
Transaction Level Modeling: An Overview
Mapping DSP algorithms to a general purpose out-of-order processor
Presentation transcript:

- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Actual design flows and tools

- 2 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Comprises allocation (selecting components from a library), partitioning (mapping of parts of the system specification onto the components) & scheduling (serialization of execution). SpecC

- 3 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund SpecC abstract busses replaced by actual wires in a series of refinements

- 4 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund SpecC Compilers are used to generate binary machine code and hardware synthesis tools are used to generate custom hardware.

- 5 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund An actual example - Getting started with the SCE window - Copyright © 2002 by Center of Embedded Computing Systems (CECS), UC Irvine

- 6 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Browsing the specification Copyright © 2002 by Center of Embedded Computing Systems (CECS), UC Irvine

- 7 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund An actual example - Validation by simulation - Copyright © 2002 by Center of Embedded Computing Systems (CECS), UC Irvine

- 8 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Analyze profiling results Copyright © 2002 by Center of Embedded Computing Systems (CECS), UC Irvine

- 9 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund PE selection Copyright © 2002 by Center of Embedded Computing Systems (CECS), UC Irvine

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund The top level behavior is mapped to the DSP Copyright © 2002 by Center of Embedded Computing Systems (CECS), UC Irvine

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Estimate the performance (too slow) Copyright © 2002 by Center of Embedded Computing Systems (CECS), UC Irvine

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Selecting additional custom HW including datapath and controller Copyright © 2002 by Center of Embedded Computing Systems (CECS), UC Irvine

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Binding „codebook“ to the custom datapath Copyright © 2002 by Center of Embedded Computing Systems (CECS), UC Irvine

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Execution time of the DSP Copyright © 2002 by Center of Embedded Computing Systems (CECS), UC Irvine

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Execution time for hardware Assuming that sec is acceptable Copyright © 2002 by Center of Embedded Computing Systems (CECS), UC Irvine

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund 2. IMEC tool flow IMEC = Interuniversitair Micro-Electronica Centrum, Leuven, Belgium (Large research facility)

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Imec tool flow - Global view - Considers the system at the concurrent process level as a set of concurrent and dynamic processes, whose specification consists of algorithms, abstract data types, communication primitives, and real-time requirements. Tools can perform source code transformations on the dynamic data types & provide also a memory pool organization in the virtual memory space.

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Imec tool flow - Matador/TCM - Again considering a system of concurrent processes. For these tools, the emphasis is on mapping tasks to processors. Different configurations of multi- processor systems are evaluated and curves of designs that are non- inferior to others are generated. These curves provide a view of the design space, and are the basis for final design decisions.

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Matador/ Task Concurrency Management (TCM) Wong et al. [Wong et al., 2001]: configurations for a personal MPEG-4 player: combination of StrongArm processors and custom accelerators. Found 4 configurations satisfying timing constraint of 30 ms. Processor combination Number of high speed processors Number of low speed processors Total number of processors For combinations 1 and 4, only one allocation of tasks to processors meets the timing constraints. For combinations 2 and 3, different time budgets lead to different task to processor mappings and different energy consumptions. Wong et al. [Wong et al., 2001]: configurations for a personal MPEG-4 player: combination of StrongArm processors and custom accelerators. Found 4 configurations satisfying timing constraint of 30 ms. Processor combination Number of high speed processors Number of low speed processors Total number of processors For combinations 1 and 4, only one allocation of tasks to processors meets the timing constraints. For combinations 2 and 3, different time budgets lead to different task to processor mappings and different energy consumptions.

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Pareto points Multiobjective optimization: – several conflicting criteria –eg. performance vs. cost vs. power consumption Definition: A (design) point J i is dominated by point J k, if J k is equal or better than J i in each criterion ( J i  J k ). Definition: A (design) point is Pareto-optimal or a Pareto point, if it is not dominated by any other point. Multiobjective optimization: – several conflicting criteria –eg. performance vs. cost vs. power consumption Definition: A (design) point J i is dominated by point J k, if J k is equal or better than J i in each criterion ( J i  J k ). Definition: A (design) point is Pareto-optimal or a Pareto point, if it is not dominated by any other point.

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Pareto curves

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Data Transfer and Storage Exploration (DTSE) Reduction of the data transfers between processing components and at a reduction of the storage requirements. DTSE has proven to be highly effective in minimizing the energy cost related to data memory hierarchies. For several typical embedded applications, reductions of main memory accesses, cache accesses and energy consumption between 50% and 80% have been reported. Reduction of the data transfers between processing components and at a reduction of the storage requirements. DTSE has proven to be highly effective in minimizing the energy cost related to data memory hierarchies. For several typical embedded applications, reductions of main memory accesses, cache accesses and energy consumption between 50% and 80% have been reported. © Falk, 2004

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Data Transfer and Exploration (DTSE) A memory oriented data flow analysis is performed, followed by global data flow and loop transformations to reduce the amount of background memories; data reuse transformations exploit a distributed memories; storage cycle budget distribution determines the bandwidth requirements and the balancing of the available cycle budget over the different memory accesses. A memory oriented data flow analysis is performed, followed by global data flow and loop transformations to reduce the amount of background memories; data reuse transformations exploit a distributed memories; storage cycle budget distribution determines the bandwidth requirements and the balancing of the available cycle budget over the different memory accesses.

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Data Transfer and Exploration (DTSE) The memory hierarchy layer assignment produces a netlist of memory units & an assignment of variables to memories. For multi-processor systems, task- / data-parallelism exploitation minimizes communication & storage overhead caused by the parallel execution of subsystems; Data layout transformations map variables with non- overlapping lifetimes in the same physical memory location. The memory hierarchy layer assignment produces a netlist of memory units & an assignment of variables to memories. For multi-processor systems, task- / data-parallelism exploitation minimizes communication & storage overhead caused by the parallel execution of subsystems; Data layout transformations map variables with non- overlapping lifetimes in the same physical memory location.

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Address optimization (ADOPT) Addressing is simplified in address optimization (ADOPT) tools. DTSE steps increase the addressing complexity of transformed applications, by introducing more complex index expressions, conditions.. This overhead is neglected by the DTSE methodology. The optimized memory system, will be power efficient but slow. Parts of the additional complexity can be removed by source code optimizations for address optimization (ADOPT): algebraic cost minimization first minimizes operation instances. The goal is to find factorizations of addressing expressions in order to be able to reuse computations as much as possible. The reuse of expressions is based on common subexpression elimination techniques. Addressing is simplified in address optimization (ADOPT) tools. DTSE steps increase the addressing complexity of transformed applications, by introducing more complex index expressions, conditions.. This overhead is neglected by the DTSE methodology. The optimized memory system, will be power efficient but slow. Parts of the additional complexity can be removed by source code optimizations for address optimization (ADOPT): algebraic cost minimization first minimizes operation instances. The goal is to find factorizations of addressing expressions in order to be able to reuse computations as much as possible. The reuse of expressions is based on common subexpression elimination techniques.

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Address optimization (ADOPT) High penalty by DTSE addressing using modulo (%) & /. Optimization of this addressing is a major goal in ADOPT. / and % can be replaced by variables as follows: for (j=0; j<n; j++) a[j/6][j%6] =...;  addressing only consists of ++ and -- operators. In many examples, ADOPT reduces runtimes by about 3. High penalty by DTSE addressing using modulo (%) & /. Optimization of this addressing is a major goal in ADOPT. / and % can be replaced by variables as follows: for (j=0; j<n; j++) a[j/6][j%6] =...;  addressing only consists of ++ and -- operators. In many examples, ADOPT reduces runtimes by about 3. int jdiv6=0, jmod6=0; for (j=0; j<n; j++, jmod6 ++) { if (jmod6 >= 6) { jmod6 -= 6; jdiv6 ++; } a[jdiv6] [jmod6] =...; }

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Backend Compilers Mapping to configurable hardware using OCAPI-XL Compilers Mapping to configurable hardware using OCAPI-XL

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Cosynthesis for embedded micro-architectures (COSYMA) Ernst et al., Univ. Braunschweig

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Cosynthesis for embedded micro-architectures (COSYMA) C + process header Based on SUIF Simulation + analytical 1 process or integrated in part. Granularity=basic block

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Ptolemy II Ptolemy II supports specifications using different models of computation. In particular, it supports: 1.Communicating sequential processes (CSP). 2.Continuous time (CT): appropriate for ME, analog circuits. Supported by extensible differential equation solvers. 3.Discrete event model (DE): model used by many (e.g. VHDL) simulators. 4.Distributed discrete events (DDE). 5.Finite state machines (FSM). 6.Process networks (PN), using Kahn process networks 7.Synchronous dataflow (SDF) 8.Synchronous/reactive (SR) MoC. Discrete time, signals do not need to have a value at every clock tick. Esterel used. Ptolemy II supports specifications using different models of computation. In particular, it supports: 1.Communicating sequential processes (CSP). 2.Continuous time (CT): appropriate for ME, analog circuits. Supported by extensible differential equation solvers. 3.Discrete event model (DE): model used by many (e.g. VHDL) simulators. 4.Distributed discrete events (DDE). 5.Finite state machines (FSM). 6.Process networks (PN), using Kahn process networks 7.Synchronous dataflow (SDF) 8.Synchronous/reactive (SR) MoC. Discrete time, signals do not need to have a value at every clock tick. Esterel used.

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Ptolemy II Source … ptII3.0/ptII3.0.2/doc/index.htm Source … ptII3.0/ptII3.0.2/doc/index.htm Show Ptolemy II

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Octopus Addressing the poor match between the focus of object- oriented design techniques on the software object structure and the need to allocate operations to tasks. This poor match was the main concern that was addressed in the design of OCTOPUS. Totally software-oriented flow used at Nokia Addressing the poor match between the focus of object- oriented design techniques on the software object structure and the need to allocate operations to tasks. This poor match was the main concern that was addressed in the design of OCTOPUS. Totally software-oriented flow used at Nokia

 P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Octopus 1.In the systems requirement phase, behavior is described by use case diagrams and use cases. The structure of the environment is described by a so-called context diagram. 2 In the system architecture phase, the structure is broken down into subsystems. Major interfaces between the subsystems are identified, but their behavior is not. 3 The subsystem analysis phase is done  subsystems. Class diagrams for the subsystems are generated. Behaviors of subsystems can be defined in various ways, including StateCharts, so-called event lists and event sheets. 4 The subsystem design phase generates outlines for processes/threads, classes and interprocess messages. 5 The subsystem implementation phase generates actual code of the selected programming language. 1.In the systems requirement phase, behavior is described by use case diagrams and use cases. The structure of the environment is described by a so-called context diagram. 2 In the system architecture phase, the structure is broken down into subsystems. Major interfaces between the subsystems are identified, but their behavior is not. 3 The subsystem analysis phase is done  subsystems. Class diagrams for the subsystems are generated. Behaviors of subsystems can be defined in various ways, including StateCharts, so-called event lists and event sheets. 4 The subsystem design phase generates outlines for processes/threads, classes and interprocess messages. 5 The subsystem implementation phase generates actual code of the selected programming language.