© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

Slides:



Advertisements
Similar presentations
ECE 667 Synthesis and Verification of Digital Circuits
Advertisements

Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Modeling shared cache and bus in multi-core platforms for timing analysis Sudipta Chattopadhyay Abhik Roychoudhury Tulika Mitra.
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Synthesis of Embedded Software Using Free-Choice Petri Nets.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
Chapter 13 Embedded Systems
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
System Partitioning Kris Kuchcinski
Courseware Basics of Real-Time Scheduling Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building.
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Co-Synthesis Algorithms Part of HW/SW Codesign of Embedded Systems Course (CE )
Real-Time Operating System Chapter – 8 Embedded System: An integrated approach.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
UNIVERSITAT POLITÈCNICA DE CATALUNYA Departament d’Arquitectura de Computadors Exploiting Pseudo-schedules to Guide Data Dependence Graph Partitioning.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )
Real-Time Operating Systems for Embedded Computing 李姿宜 R ,06,10.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #21 – HW/SW.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 2: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
High Performance Embedded Computing © 2007 Elsevier Lecture 18: Hardware/Software Codesign Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
1 Real-Time Scheduling. 2Today Operating System task scheduling –Traditional (non-real-time) scheduling –Real-time scheduling.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
1.  System Characteristics  Features of Real-Time Systems  Implementing Real-Time Operating Systems  Real-Time CPU Scheduling  An Example: VxWorks5.x.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
CHaRy Software Synthesis for Hard Real-Time Systems
Ph.D. in Computer Science
Introduction to cosynthesis Rabi Mahapatra CSCE617
CSCI1600: Embedded and Real Time Software
Mapping DSP algorithms to a general purpose out-of-order processor
Author: Xianghui Hu, Xinan Tang, Bei Hua Lecturer: Bo Xu
CSCI1600: Embedded and Real Time Software
Research: Past, Present and Future
Paper discussed in class: A. Kalavade, E
(via graph coloring and spilling)
Presentation transcript:

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.

© 2004 Wayne Wolf System partitioning Lagnese et al: partition a large description based on functional information, not initial allocation. Thomas et al:  developed Verilog-based simulation system for performance evaluation  assumes bus-based CPU-ASIC model  provides several types of communication primitives  design evaluation based on both static evaluation (time for single execution) and dynamic evaluation

© 2004 Wayne Wolf Hardware-software partitioning Partitioning methods usually allow more than one ASIC. Typically ignore CPU memory traffic in bus utilization estimates. Typically assume that CPU process blocks while waiting for ASIC. CPU ASIC mem

© 2004 Wayne Wolf Gupta and De Micheli Target architecture: CPU + ASICs on bus Break behavior into threads at nondeterministic delay points; delay of thread is bounded Software threads run under RTOS; threads communicate via queues

© 2004 Wayne Wolf Specification and modeling Specified in Hardware C. Spec divided into threads at non-deterministic delay points. Hardware properties: size, # clock cycles. CPU/software thread properties:  thread latency  thread reaction rate  processor utilization  bus utilization CPU/ASIC execution are non-overlapping.

© 2004 Wayne Wolf HW/SW allocation Start with unbounded-delay threads in CPU, rest of threads in ASIC. Optimization:  test one thread for move  if move to SW does not violate performance requirement, move the thread  feasibility depends on SW, HW run times, bus utilization  if thread is moved, immediately try moving its successor threads

© 2004 Wayne Wolf COSYMA Ernst et al.: moves operations from software to hardware. Operations are moved to hardware in units of basic blocks. Estimates communication overhead based on bus operations and register allocation. Hardware and software communicate by shared memory.

© 2004 Wayne Wolf COSYMA design flow C* ES graph partitioning cost estimation gnu C run time analysis CDFG high-level synthesis

© 2004 Wayne Wolf Cost estimation Speedup estimate for basic block b:   c(b) = w(t HW (b) - t SW (b) + t com (Z) - t com (Z + b)) * It(b)  w = weight, It(b) = # iterations taken on b Sources of estimates:  Software execution time (t SW ) is estimated from source code.  Hardware execution time (t HW ) is estimated by list scheduling.  Communiation time (t com ) is estimated by data flow analysis of adjacent basic blocks.

© 2004 Wayne Wolf COSYMA optimization Goal: satisfy execution time. User specifies maximum number of function units in co- processor. Start with all basic blocks in software. Estimate potential speedup in moving a basic block to software using execution profiling. Search using simulated annealing. Impose high cost penalty for solutions that don’t meet execution time.

© 2004 Wayne Wolf Two-phase optimization Inner loop uses estimates to search through design space quickly. Outer loop uses detailed measurements to check validity of inner loop assumptions:  code is compiled and measured  ASIC is synthesized Results of detailed estimate are used to apply correction to current solution for next run of inner loop.

© 2004 Wayne Wolf Vahid et al. Uses binary search to minimize hardware cost while satisfying performance. Cost and performance compete—to reduce competition, accept any solution with cost below C size. Cost function:  k perf (  performance violations) + k areaf (  hardware size). k

© 2004 Wayne Wolf Kalavade et al. Uses both local and global measures to meet performance objectives and minimize cost. Global criterion: degree to which performance is critically affected by a component. Local criterion: heterogeneity of a node = implementation cost.  a function which has a high cost in one mapping but low cost in the other is an extremity  two functions which have very different implementation requirements (precision, etc.) repel each other into different implementations

© 2004 Wayne Wolf GCLP algorithm Schedule one node at a time:  compute critical path  select node on critical path for assignment  evaluate effect of change in allocation of this node  if performance is critical, reallocate for performance, else reallocate for cost Extremity value helps avoid assigning an operation to a partition where it clearly doesn’t belong. Repellers help reduce implementation cost.

© 2004 Wayne Wolf D’Ambrosio et al. Use general-purpose optimizer for HW/SW assignment. Can model both hard and soft deadlines. Measure expandability of system as difference between upper and lower performance bounds. Loose upper bound on CPU utilization leads to excessive hardware cost in final result. Use simulation to estimate execution time of each process.

© 2004 Wayne Wolf Binary search algorithm If zero-cost solution is found for given hardware size, zero-cost solution is guaranteed to exist for larger hardware size. Therefore, can use binary search to select satisfying solution. Evaluate cost of point when it is tested, rather than generate costs of all points in advance. Sufficient to look for a zero-cost solution: