Universität Dortmund - 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.

Slides:



Advertisements
Similar presentations
Fakultät für informatik informatik 12 technische universität dortmund Optimizations - Compilation for Embedded Processors - Peter Marwedel TU Dortmund.
Advertisements

Fakultät für informatik informatik 12 technische universität dortmund Standard Optimization Techniques Peter Marwedel Informatik 12 TU Dortmund Germany.
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Hardware/Software Codesign.
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
EECE **** Embedded System Design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
ISBN Chapter 3 Describing Syntax and Semantics.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Courseware High-Level Synthesis an introduction Prof. Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
CS 104 Introduction to Computer Science and Graphics Problems
System Partitioning Kris Kuchcinski
1 EE249 Discussion A Method for Architecture Exploration for Heterogeneous Signal Processing Systems Sam Williams EE249 Discussion Section October 15,
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.
On the Task Assignment Problem : Two New Efficient Heuristic Algorithms.
ICS 252 Introduction to Computer Design
Describing Syntax and Semantics
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Hardware/Software Partitioning Witawas Srisa-an Embedded Systems Design and Implementation.
Technische Universität Dortmund Classical scheduling algorithms for periodic systems Peter Marwedel TU Dortmund, Informatik 12 Germany 2007/12/14.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Universität Dortmund Actual design flows and tools.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
Task Alloc. In Dist. Embed. Systems Murat Semerci A.Yasin Çitkaya CMPE 511 COMPUTER ARCHITECTURE.
Data Structures & AlgorithmsIT 0501 Algorithm Analysis I.
May 2004 Department of Electrical and Computer Engineering 1 ANEW GRAPH STRUCTURE FOR HARDWARE- SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMS A NEW GRAPH.
Types of IP Models All-integer linear programs Mixed integer linear programs (MILP) Binary integer linear programs, mixed or all integer: some or all of.
SOFTWARE / HARDWARE PARTITIONING TECHNIQUES SHaPES: A New Approach.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
Chapter 5B: Hardware/Software Codesign / Partitioning EECE **** Embedded System Design.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
Hardware/Software Codesign of Embedded Systems
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
1 Memory Management Chapter 7. 2 Memory Management Subdividing memory to accommodate multiple processes Memory needs to be allocated to ensure a reasonable.
Exact and heuristics algorithms
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Operational Research & ManagementOperations Scheduling Economic Lot Scheduling 1.Summary Machine Scheduling 2.ELSP (one item, multiple items) 3.Arbitrary.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course.
Physically Aware HW/SW Partitioning for Reconfigurable Architectures with Partial Dynamic Reconfiguration Sudarshan Banarjee, Elaheh Bozorgzadeh, Nikil.
Production Scheduling Lorena Kawas lk2551 Raul Galindo rg2802.
Jamie Unger-Fink John David Eriksen.  Allocation and Scheduling Problem  Better MPSoC optimization tool needed  IP and CP alone not good enough  Communication.
1 Chapter 5 Branch-and-bound Framework and Its Applications.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.
The Hardware / Software Tradeoff -John Burnette-
Introduction to cosynthesis Rabi Mahapatra CSCE617
Standard Optimization Techniques
CSCI1600: Embedded and Real Time Software
Integer Programming (정수계획법)
Integer Programming (정수계획법)
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special purpose hardware in the long run? Correct for fixed functionality, but wrong in general, since “By the time MPEG-n can be implemented in software, MPEG- n+1 has been invented” [de Man]

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Functionality to be implemented in software or in hardware? Decision based on hardware/ software partitioning, a special case of hardware/ software codesign.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Codesign Tool (COOL) as an example of HW/SW partitioning Inputs to COOL: 1.Target technology 2.Design constraints 3.Required behavior Inputs to COOL: 1.Target technology 2.Design constraints 3.Required behavior

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software codesign: approach [Niemann, Hardware/Software Co-Design for Data Flow Dominated Embedded Systems, Kluwer Academic Publishers, 1998 (Comprehensive mathematical model)] Processor P1 Processor P2 Hardware Specification Mapping

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Steps of the COOL partitioning algorithm (1) 1.Translation of the behavior into an internal graph model 2.Translation of the behavior of each node from VHDL into C 3.Compilation All C programs compiled for the target processor, Computation of the resulting program size, estimation of the resulting execution time (simulation input data might be required) 4.Synthesis of hardware components:  leaf node, application-specific hardware is synthesized. High-level synthesis sufficiently fast. 1.Translation of the behavior into an internal graph model 2.Translation of the behavior of each node from VHDL into C 3.Compilation All C programs compiled for the target processor, Computation of the resulting program size, estimation of the resulting execution time (simulation input data might be required) 4.Synthesis of hardware components:  leaf node, application-specific hardware is synthesized. High-level synthesis sufficiently fast.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Steps of the COOL partitioning algorithm (2) 4.Flattening of the hierarchy: Granularity used by the designer is maintained. Cost and performance information added to the nodes. Precise information required for partitioning is pre- computed 5.Generating and solving a mathematical model of the optimization problem: Integer programming IP model for optimization. Optimal with respect to the cost function (approximates communication time) 4.Flattening of the hierarchy: Granularity used by the designer is maintained. Cost and performance information added to the nodes. Precise information required for partitioning is pre- computed 5.Generating and solving a mathematical model of the optimization problem: Integer programming IP model for optimization. Optimal with respect to the cost function (approximates communication time)

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Steps of the COOL partitioning algorithm (3) 6.Iterative improvements: Adjacent nodes mapped to the same hardware component are now merged.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Steps of the COOL partitioning algorithm (4) 7.Interface synthesis: After partitioning, the glue logic required for interfacing processors, application-specific hardware and memories is created.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Integer programming models Ingredients: 1.Cost function 2.Constraints Ingredients: 1.Cost function 2.Constraints Involving linear expressions of integer variables from a set X Def.: The problem of minimizing (1) subject to the constraints (2) is called an integer programming (IP) problem. If all x i are constrained to be either 0 or 1, the IP problem said to be a 0/1 integer programming problem. Cost function Constraints: ℕ ℝ

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Example Optimal C

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Remarks on integer programming  Maximizing the cost function can be done by setting C‘=-C  Integer programming is NP-complete.  In practice, running times can increase exponentially with the size of the problem, but problems of some thousands of variables can still be solved with commercial solvers, depending on the size and structure of the problem.  IP models can be a good starting point for modeling, even if in the end heuristics have to be used to solve them.  Maximizing the cost function can be done by setting C‘=-C  Integer programming is NP-complete.  In practice, running times can increase exponentially with the size of the problem, but problems of some thousands of variables can still be solved with commercial solvers, depending on the size and structure of the problem.  IP models can be a good starting point for modeling, even if in the end heuristics have to be used to solve them.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 An IP model for HW/SW partitioning  Notation:  Index set I denotes task graph nodes.  Index set L denotes task graph node types e.g. square root, DCT or FFT  Index set KH denotes hardware component types. e.g. hardware components for the DCT or the FFT.  Index set J of hardware component instances  Index set KP denotes processors. All processors are assumed to be of the same type  Notation:  Index set I denotes task graph nodes.  Index set L denotes task graph node types e.g. square root, DCT or FFT  Index set KH denotes hardware component types. e.g. hardware components for the DCT or the FFT.  Index set J of hardware component instances  Index set KP denotes processors. All processors are assumed to be of the same type

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 An IP model for HW/SW partitioning  X i,k : =1 if node v i is mapped to hardware component type k  KH and 0 otherwise.  Y i,k : =1 if node v i is mapped to processor k  KP and 0 otherwise.  NY l,k =1 if at least one node of type l is mapped to processor k  KP and 0 otherwise.  T is a mapping from task graph nodes to their types: T: I  L  The cost function accumulates the cost of hardware units: C = cost(processors) + cost(memories) + cost(application specific hardware)  X i,k : =1 if node v i is mapped to hardware component type k  KH and 0 otherwise.  Y i,k : =1 if node v i is mapped to processor k  KP and 0 otherwise.  NY l,k =1 if at least one node of type l is mapped to processor k  KP and 0 otherwise.  T is a mapping from task graph nodes to their types: T: I  L  The cost function accumulates the cost of hardware units: C = cost(processors) + cost(memories) + cost(application specific hardware)

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Constraints  Operation assignment constraints All task graph nodes have to be mapped either in software or in hardware. Variables are assumed to be integers. Additional constraints to guarantee they are either 0 or 1:

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Operation assignment constraints (2)  l  L,  i:T(v i )=c l,  k  KP: NY l,k  Y i,k For all types l of operations and for all nodes i of this type: if i is mapped to some processor k, then that processor must implement the functionality of l. Decision variables must also be 0/1 variables:  l  L,  k  KP: NY l,k  1.  l  L,  i:T(v i )=c l,  k  KP: NY l,k  Y i,k For all types l of operations and for all nodes i of this type: if i is mapped to some processor k, then that processor must implement the functionality of l. Decision variables must also be 0/1 variables:  l  L,  k  KP: NY l,k  1.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Resource & design constraints  k  KH, the cost (area) used for components of that type is calculated as the sum of the costs of the components of that type. This cost should not exceed its maximum.  k  KP, the cost for associated data storage area should not exceed its maximum.  k  KP the cost for storing instructions should not exceed its maximum. The total cost (  k  KH ) of HW components should not exceed its maximum The total cost of data memories (  k  KP ) should not exceed its maximum The total cost instruction memories (  k  KP ) should not exceed its maximum  k  KH, the cost (area) used for components of that type is calculated as the sum of the costs of the components of that type. This cost should not exceed its maximum.  k  KP, the cost for associated data storage area should not exceed its maximum.  k  KP the cost for storing instructions should not exceed its maximum. The total cost (  k  KH ) of HW components should not exceed its maximum The total cost of data memories (  k  KP ) should not exceed its maximum The total cost instruction memories (  k  KP ) should not exceed its maximum

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Scheduling Processor p 1 ASIC h 1 FIR 1 FIR 2 v1v1 v2v2 v3v3 v4v4 v9v9 v 10 v 11 v5v5 v6v6 v7v7 v8v8 e3e3 e4e4 t p1p1 v8v8 v7v7 v7v7 v8v8 or... t c1c1 or... e3e3 e3e3 e4e4 e4e4 t FIR 2 on h 1 v4v4 v3v3 v3v3 v4v4 or... Communication channel c 1

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Scheduling / precedence constraints For all nodes v i1 and v i2 that are potentially mapped to the same processor or hardware component instance, introduce a binary decision variable b i1,i2 with b i1,i2 =1 if v i1 is executed before v i2 and = 0 otherwise. Define constraints of the type (end-time of v i1 )  (start time of v i2 ) if b i1,i2 =1 and (end-time of v i2 )  (start time of v i1 ) if b i1,i2 =0 Ensure that the schedule for executing operations is consistent with the precedence constraints in the task graph. For all nodes v i1 and v i2 that are potentially mapped to the same processor or hardware component instance, introduce a binary decision variable b i1,i2 with b i1,i2 =1 if v i1 is executed before v i2 and = 0 otherwise. Define constraints of the type (end-time of v i1 )  (start time of v i2 ) if b i1,i2 =1 and (end-time of v i2 )  (start time of v i1 ) if b i1,i2 =0 Ensure that the schedule for executing operations is consistent with the precedence constraints in the task graph.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Other constraints Timing constraints These constraints can be used to guarantee that certain time constraints are met. Some less important constraints omitted.. Timing constraints These constraints can be used to guarantee that certain time constraints are met. Some less important constraints omitted..

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Example HW types H1, H2 and H3 with costs of 20, 25, and 30. Processors of type P. Tasks T1 to T5. Execution times: HW types H1, H2 and H3 with costs of 20, 25, and 30. Processors of type P. Tasks T1 to T5. Execution times: TH1H2H3P

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Operation assignment constraints (1) TH1H2H3P X 1,1 +Y 1,1 =1 (task 1 mapped to H1 or to P) X 2,2 +Y 2,1 =1 X 3,3 +Y 3,1 =1 X 4,3 +Y 4,1 =1 X 5,1 +Y 5,1 =1

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Operation assignment constraints (2) Assume types of tasks are l=1, 2, 3, 3, and 1.  l  L,  i:T(v i )=c l,  k  KP: NY l,k  Y i,k Assume types of tasks are l=1, 2, 3, 3, and 1.  l  L,  i:T(v i )=c l,  k  KP: NY l,k  Y i,k Functionality 3 to be implemented on processor if node 4 is mapped to it.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Other equations Time constraints leading to: Application specific hardware required for time constraints under 100 time units. TH1H2H3P Cost function: C=20 #(H1) + 25 #(H2) + 30 # (H3) + cost(processor) + cost(memory)

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Result For a time constraint of 100 time units and cost(P)<cost(H3): TH1H2H3P Solution (educated guessing) : T1  H1 T2  H2 T3  P T4  P T5  H1

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Separation of scheduling and partitioning Combined scheduling/partitioning very complex;  Heuristic: 1.Compute estimated schedule 2.Perform partitioning for estimated schedule 3.Perform final scheduling 4.If final schedule does not meet time constraint, go to 1 using a reduced overall timing constraint. Combined scheduling/partitioning very complex;  Heuristic: 1.Compute estimated schedule 2.Perform partitioning for estimated schedule 3.Perform final scheduling 4.If final schedule does not meet time constraint, go to 1 using a reduced overall timing constraint. 2. Iteration t specification Actual execution time 1. Iteration approx. execution time t Actual execution time approx. execution time New specification

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Application example Audio lab (mixer, fader, echo, equalizer, balance units) slow SPARC processor 1µ ASIC library Allowable delay of µs (~ 44.1 kHz) Audio lab (mixer, fader, echo, equalizer, balance units) slow SPARC processor 1µ ASIC library Allowable delay of µs (~ 44.1 kHz) SPARC processor ASIC (Compass, 1 µ) External memory Outdated technology; just a proof of concept.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Running time for COOL optimization  Only simple models can be solved optimally.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Deviation from optimal design  Hardly any loss in design quality.

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Running time for heuristic

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Design space for audio lab Everything in software: 72.9 µs, 0 2 Everything in hardware: 3.06 µs, 457.9x Lowest cost for given sample rate: 18.6 µs, 78.4x10 6 2,

Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Final remarks COOL approach: shows that formal model of hardware/SW codesign is beneficial; IP modeling can lead to useful implementation even if optimal result is available only for small designs. Other approaches for HW/SW partitioning: starting with everything mapped to hardware; gradually moving to software as long as timing constraint is met. starting with everything mapped to software; gradually moving to hardware until timing constraint is met. Binary search. COOL approach: shows that formal model of hardware/SW codesign is beneficial; IP modeling can lead to useful implementation even if optimal result is available only for small designs. Other approaches for HW/SW partitioning: starting with everything mapped to hardware; gradually moving to software as long as timing constraint is met. starting with everything mapped to software; gradually moving to hardware until timing constraint is met. Binary search.