We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byJessica Widmer
Modified over 2 years ago
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning. Bus-based systems.
© 2004 Wayne Wolf System partitioning Lagnese et al: partition a large description based on functional information, not initial allocation. Thomas et al: developed Verilog-based simulation system for performance evaluation assumes bus-based CPU-ASIC model provides several types of communication primitives design evaluation based on both static evaluation (time for single execution) and dynamic evaluation
© 2004 Wayne Wolf Hardware-software partitioning Partitioning methods usually allow more than one ASIC. Typically ignore CPU memory traffic in bus utilization estimates. Typically assume that CPU process blocks while waiting for ASIC. CPU ASIC mem
© 2004 Wayne Wolf Gupta and De Micheli Target architecture: CPU + ASICs on bus Break behavior into threads at nondeterministic delay points; delay of thread is bounded Software threads run under RTOS; threads communicate via queues
© 2004 Wayne Wolf Specification and modeling Specified in Hardware C. Spec divided into threads at non-deterministic delay points. Hardware properties: size, # clock cycles. CPU/software thread properties: thread latency thread reaction rate processor utilization bus utilization CPU/ASIC execution are non-overlapping.
© 2004 Wayne Wolf HW/SW allocation Start with unbounded-delay threads in CPU, rest of threads in ASIC. Optimization: test one thread for move if move to SW does not violate performance requirement, move the thread feasibility depends on SW, HW run times, bus utilization if thread is moved, immediately try moving its successor threads
© 2004 Wayne Wolf COSYMA Ernst et al.: moves operations from software to hardware. Operations are moved to hardware in units of basic blocks. Estimates communication overhead based on bus operations and register allocation. Hardware and software communicate by shared memory.
© 2004 Wayne Wolf COSYMA design flow C* ES graph partitioning cost estimation gnu C run time analysis CDFG high-level synthesis
© 2004 Wayne Wolf Cost estimation Speedup estimate for basic block b: c(b) = w(t HW (b) - t SW (b) + t com (Z) - t com (Z + b)) * It(b) w = weight, It(b) = # iterations taken on b Sources of estimates: Software execution time (t SW ) is estimated from source code. Hardware execution time (t HW ) is estimated by list scheduling. Communiation time (t com ) is estimated by data flow analysis of adjacent basic blocks.
© 2004 Wayne Wolf COSYMA optimization Goal: satisfy execution time. User specifies maximum number of function units in co- processor. Start with all basic blocks in software. Estimate potential speedup in moving a basic block to software using execution profiling. Search using simulated annealing. Impose high cost penalty for solutions that don’t meet execution time.
© 2004 Wayne Wolf Two-phase optimization Inner loop uses estimates to search through design space quickly. Outer loop uses detailed measurements to check validity of inner loop assumptions: code is compiled and measured ASIC is synthesized Results of detailed estimate are used to apply correction to current solution for next run of inner loop.
© 2004 Wayne Wolf Vahid et al. Uses binary search to minimize hardware cost while satisfying performance. Cost and performance compete—to reduce competition, accept any solution with cost below C size. Cost function: k perf ( performance violations) + k areaf ( hardware size). k
© 2004 Wayne Wolf Kalavade et al. Uses both local and global measures to meet performance objectives and minimize cost. Global criterion: degree to which performance is critically affected by a component. Local criterion: heterogeneity of a node = implementation cost. a function which has a high cost in one mapping but low cost in the other is an extremity two functions which have very different implementation requirements (precision, etc.) repel each other into different implementations
© 2004 Wayne Wolf GCLP algorithm Schedule one node at a time: compute critical path select node on critical path for assignment evaluate effect of change in allocation of this node if performance is critical, reallocate for performance, else reallocate for cost Extremity value helps avoid assigning an operation to a partition where it clearly doesn’t belong. Repellers help reduce implementation cost.
© 2004 Wayne Wolf D’Ambrosio et al. Use general-purpose optimizer for HW/SW assignment. Can model both hard and soft deadlines. Measure expandability of system as difference between upper and lower performance bounds. Loose upper bound on CPU utilization leads to excessive hardware cost in final result. Use simulation to estimate execution time of each process.
© 2004 Wayne Wolf Binary search algorithm If zero-cost solution is found for given hardware size, zero-cost solution is guaranteed to exist for larger hardware size. Therefore, can use binary search to select satisfying solution. Evaluate cost of point when it is tested, rather than generate costs of all points in advance. Sufficient to look for a zero-cost solution:
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 2: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )
High Performance Embedded Computing © 2007 Elsevier Lecture 18: Hardware/Software Codesign Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning Functionality to be implemented in software or in hardware? No need to consider special.
1 ECE-777 System Level Design and Automation Hardware/Software Co-design Cristinel Ababei Electrical and Computer Department, North Dakota State University.
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
1 SYNTHESIS of PIPELINED SYSTEMS for the CONTEMPORANEOUS EXECUTION of PERIODIC and APERIODIC TASKS with HARD REAL-TIME CONSTRAINTS Paolo Palazzari Luca.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
Universität Dortmund P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning Functionality to be implemented in software.
- 1 - P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
HLS - Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
1 Real-Time Scheduling. 2Today Operating System task scheduling –Traditional (non-real-time) scheduling –Real-time scheduling.
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
Real-Time Operating Systems for Embedded Computing 李姿宜 R ,06,10.
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
System Partitioning Kris Kuchcinski
1 Chapter 13 Embedded Systems Embedded Systems Characteristics of Embedded Operating Systems eCos.
Modeling shared cache and bus in multi-core platforms for timing analysis Sudipta Chattopadhyay Abhik Roychoudhury Tulika Mitra.
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
Simulated-Annealing-Based Solution By Gonzalo Zea s Shih-Fu Liu s
Courseware Basics of Real-Time Scheduling Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Co-Synthesis Algorithms Part of HW/SW Codesign of Embedded Systems Course (CE )
1 of 14 Lab 2: Design-Space Exploration with MPARM.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #21 – HW/SW.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Automated Design of Custom Architecture Tulika Mitra
Real-Time Operating System Chapter – 8 Embedded System: An integrated approach.
CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
Computer Architecture 2010 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
1 Copyright 2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Out-of-order execution Lihu Rappoport 11/ MAMAS – Computer Architecture Out-Of-Order Execution Dr. Lihu Rappoport.
© 2017 SlidePlayer.com Inc. All rights reserved.