6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

Slides:



Advertisements
Similar presentations
Automatic Data Movement and Computation Mapping for Multi-level Parallel Architectures with Explicitly Managed Memories Muthu Baskaran 1 Uday Bondhugula.
Advertisements

Embedded System, A Brief Introduction
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
ECE 667 Synthesis and Verification of Digital Circuits
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
ECE Synthesis & Verification - Lecture 2 1 ECE 667 Spring 2011 ECE 667 Spring 2011 Synthesis and Verification of Digital Circuits High-Level (Architectural)
Courseware Integer Linear Programming approach to Scheduling Sune Fallgaard Nielsen Informatics and Mathematical Modelling Technical University of Denmark.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Synthesis of Embedded Software Using Free-Choice Petri Nets.
Efficient Software Performance Estimation Methods for Hardware/Software Codesign Kei Suzuki Alberto Sangiovanni-Vincentelli Present: Yanmei Li.
11 1 Hierarchical Coarse-grained Stream Compilation for Software Defined Radio Yuan Lin, Manjunath Kudlur, Scott Mahlke, Trevor Mudge Advanced Computer.
COE 561 Digital System Design & Synthesis Architectural Synthesis Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Compiler Challenges, Introduction to Data Dependences Allen and Kennedy, Chapter 1, 2.
High-level System Modeling and Power Management Techniques Jinfeng Liu Dept. of ECE, UC Irvine Sep
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
System Partitioning Kris Kuchcinski
(Page 554 – 564) Ping Perez CS 147 Summer 2001 Alternative Parallel Architectures  Dataflow  Systolic arrays  Neural networks.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
A New Approach for Task Level Computational Resource Bi-Partitioning Gang Wang, Wenrui Gong, Ryan Kastner Express Lab, Dept. of ECE, University of California,
Center for Embedded Computer Systems University of California, Irvine and San Diego SPARK: A Parallelizing High-Level Synthesis.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
Mapping Stream Programs onto Heterogeneous Multiprocessor Systems [by Barcelona Supercomputing Centre, Spain, Oct 09] S. M. Farhad Programming Language.
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
Energy Aware Task Mapping Algorithm For Heterogeneous MPSoC Based Architectures Amr M. A. Hussien¹, Ahmed M. Eltawil¹, Rahul Amin 2 and Jim Martin 2 ¹Wireless.
High Performance Embedded Computing © 2007 Elsevier Lecture 3: Design Methodologies Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based.
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Language Concepts Ver 1.1, Copyright 1997 TS, Inc. VHDL L a n g u a g e C o n c e p t s Page 1.
Chapter 5B: Hardware/Software Codesign / Partitioning EECE **** Embedded System Design.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
Hardware/Software Codesign of Embedded Systems
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
6. Application mapping 6.1 Problem definition
Performance Analysis of a JPEG Encoder Mapped To a Virtual MPSoC-NoC Architecture Using TLM 林孟諭 Dept. of Electrical Engineering National Cheng Kung.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Software Systems Division (TEC-SW) ASSERT process & toolchain Maxime Perrotin, ESA.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Structural style Modular design and hierarchy Part 1
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
Introduction to cosynthesis Rabi Mahapatra CSCE617
Reconfigurable Computing
CSCI1600: Embedded and Real Time Software
Architecture Synthesis
CSCI1600: Embedded and Real Time Software
Presentation transcript:

6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

6.3 HW/SW PARTITIONING Introduction By hardware/software partitioning we mean the mapping of task graph nodes to either hardware or software. Applying hardware/software partitioning, we will be able to decide which parts must be implemented in hardware and which in software COOL (COdegign toOL) For COOL, the input consists of three parts: ① Target technology : This part of the input to COOL comprises information about the available hardware platform components. The type of the processors used must be included in this part of the input to COOL. ② Design constraints : The second part of the input comprises design constraints such as the required throughput, latency, maximum memory size, or maximum area for application-specific hardware. ③ Behavior : The third part of the input describes the required overall behavior. Hierarchical task graphs are used for this. COOL used two kinds of edges: communication edges and timing edges Application mapping (part 2)

For partitioning, COOL uses the following steps: ① Translation of the behavior into an internal graph model. ② Translation of the behavior of each node from VHDL into C. ③ Compilation of all C program for the selected target processor type, computation of the resulting program size, estimation of the resulting execution time. ④ Synthesis of hardware components: For each leaf node, application-specific hardware is synthesized. ⑤ Flattening the hierarchy: The next step is to extract a flat graph from the hierarchical flow graph ⑥ Generating and solving a mathematical model of the optimization problem: COOL uses integer linear programming (ILP) to solve the optimization problem. ⑦ Iterative improvements: In order to work with good estimates of the communication time, adjacent nodes mapped to the same hardware component are now merged. ⑧ Interface synthesis: After partitioning, the glue logic required for interfacing processors, application-specific hardware and memories is created Application mapping (part 2)

The following index sets will be used in the description of ILP model: Index set V denotes task graph nodes. Each v  V corresponds to one task graph node. Index set L denotes task graph node types. Each l  L corresponds to one task graph node type. Index set of M denotes hardware component types. Each m  M corresponds to one hardware component type. For each of the hardware components, there may be multiple copies, or “instances”. Each instance is identified by an index j  J. Index set KP denotes processors. Each k  KP identifies one of the processors. The following decision variables are required by the model: X v,m : this variable will be 1, if node v is mapped to hardware component type m  M and 0 otherwise. Y v,k : this variable will be 1, if node v is mapped to processor k  KP and 0 otherwise. NY l,k : this variable will be 1, if at least one node of type l is mapped to processor k  KP and 0 otherwise. Type is a mapping V  L from task graph to their corresponding types Application mapping (part 2)

The cost function accumulates the total cost of all hardware units: C=processor costs + memory costs + cost of application specific hardware We can now present a brief description of some of the constraints of the ILP model: Operation assignment constraints: These constraints guarantee that each operation is implemented either in hardware or in software.  Additional constraints ensure that decision variables X v,m and Y v,k have 1 as an upper bound and, hence, are in fact 0/1-valued variables:  If the functionality of a certain node of type l is mapped to some processor k, then the processors’ instruction memory must include a copy of the software for this function: 5 6. Application mapping (part 2)

6  Additional constraints ensure that decision variables NY l,k are also 0/1- valued variables : Resource constraints Precedence constraints Design constraints Timing constraints

Example: In the following, we will show how these constraints can be generated for the task graph in Fig Suppose that we have a hardware component library containing three components types H 1, H 2 and H 3 with costs of 20, 25 and 30 cost units, respectively. Furthermore, suppose that we can also use a processor P of cost 5. Execution times of tasks T 1 to T 5 on components 7 6. Application mapping (part 2) T1 T2 T5 T3T4 TH1H2H3P

The following operation assignment constraints must be generated, assuming that a maximum of one processor (P 1 ) is to be used: X 1,1 + Y 1,1 = 1 (Task 1 either mapped to H 1 or to P 1 ) X 2,2 + Y 2,1 = 1 (Task 2 either mapped to H 2 or to P 1 ) X 3,3 + Y 3,1 = 1 (Task 3 either mapped to H 3 or to P 1 ) X 4,1 + Y 4,1 = 1 (Task 4 either mapped to H 3 or to P 1 ) X 5,1 + Y 5,1 = 1 (Task 5 either mapped to H 1 or to P 1 ) Furthermore, assume that the types of tasks T 1 to T 5 are l =1, 2, 3, 3 and 1, respectively. Then, the following additional resource constraints are required: NY 1,1  Y 1,1 (6.17) NY 2,1  Y 2,1 NY 3,1  Y 3,1 NY 3,1  Y 4,1 NY 1,1  Y 5,1 (6.18) The total function is: Where #() denotes the number of instances of hardware components. This number can be computed from the variables introduced so far if the schedule is also taken into account Application mapping (part 2)

For a timing constraint of 100 time units, the minimum cost design comprises components H 1, H 2 and P. This means that tasks T 3 and T 4 are implemented in software and all others in hardware. 6.4 Mapping to heterogeneous multi-processors The different approaches for this mapping can be classified by two criteria: mapping tools may either assume a fixed execution platform or may design such a platform during the mapping and they may or may not include automatic parallelization of the source codes. The DOL tools from ETH incorporate: Automatic selection of computation templates Automatic selection of communication techniques Automatic selection of scheduling and arbitration The input to DOL consists of a set of tasks together with use cases. The output describes the execution platform, the mapping of tasks to processors together with task schedules. This output is executed to meet constraints and to maximize objectives Application mapping (part 2)

10 6. Application mapping (part 2) DOL problem graph RISC HWM1 HWM2 PTP bus shared bus RISCHWM1 PTP bus HWM2 shared bus DOL architecture graph

6. Application mapping (part 2) 11 DOL specification graph

6. Application mapping (part 2) 12 DOL implementation An allocation  :  is a subset of the architecture graph, representing hardware components allocated (selected) for a particular design. A binding  : a selected subset of the edges between specification and architecture identifies a relation between the two. Selected edges are called bindings. A schedule  :  assigns start times to each node v in the problem graph.