We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byJessica Widmer
Modified over 4 years ago
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning. Bus-based systems.
© 2004 Wayne Wolf System partitioning Lagnese et al: partition a large description based on functional information, not initial allocation. Thomas et al: developed Verilog-based simulation system for performance evaluation assumes bus-based CPU-ASIC model provides several types of communication primitives design evaluation based on both static evaluation (time for single execution) and dynamic evaluation
© 2004 Wayne Wolf Hardware-software partitioning Partitioning methods usually allow more than one ASIC. Typically ignore CPU memory traffic in bus utilization estimates. Typically assume that CPU process blocks while waiting for ASIC. CPU ASIC mem
© 2004 Wayne Wolf Gupta and De Micheli Target architecture: CPU + ASICs on bus Break behavior into threads at nondeterministic delay points; delay of thread is bounded Software threads run under RTOS; threads communicate via queues
© 2004 Wayne Wolf Specification and modeling Specified in Hardware C. Spec divided into threads at non-deterministic delay points. Hardware properties: size, # clock cycles. CPU/software thread properties: thread latency thread reaction rate processor utilization bus utilization CPU/ASIC execution are non-overlapping.
© 2004 Wayne Wolf HW/SW allocation Start with unbounded-delay threads in CPU, rest of threads in ASIC. Optimization: test one thread for move if move to SW does not violate performance requirement, move the thread feasibility depends on SW, HW run times, bus utilization if thread is moved, immediately try moving its successor threads
© 2004 Wayne Wolf COSYMA Ernst et al.: moves operations from software to hardware. Operations are moved to hardware in units of basic blocks. Estimates communication overhead based on bus operations and register allocation. Hardware and software communicate by shared memory.
© 2004 Wayne Wolf COSYMA design flow C* ES graph partitioning cost estimation gnu C run time analysis CDFG high-level synthesis
© 2004 Wayne Wolf Cost estimation Speedup estimate for basic block b: c(b) = w(t HW (b) - t SW (b) + t com (Z) - t com (Z + b)) * It(b) w = weight, It(b) = # iterations taken on b Sources of estimates: Software execution time (t SW ) is estimated from source code. Hardware execution time (t HW ) is estimated by list scheduling. Communiation time (t com ) is estimated by data flow analysis of adjacent basic blocks.
© 2004 Wayne Wolf COSYMA optimization Goal: satisfy execution time. User specifies maximum number of function units in co- processor. Start with all basic blocks in software. Estimate potential speedup in moving a basic block to software using execution profiling. Search using simulated annealing. Impose high cost penalty for solutions that don’t meet execution time.
© 2004 Wayne Wolf Two-phase optimization Inner loop uses estimates to search through design space quickly. Outer loop uses detailed measurements to check validity of inner loop assumptions: code is compiled and measured ASIC is synthesized Results of detailed estimate are used to apply correction to current solution for next run of inner loop.
© 2004 Wayne Wolf Vahid et al. Uses binary search to minimize hardware cost while satisfying performance. Cost and performance compete—to reduce competition, accept any solution with cost below C size. Cost function: k perf ( performance violations) + k areaf ( hardware size). k
© 2004 Wayne Wolf Kalavade et al. Uses both local and global measures to meet performance objectives and minimize cost. Global criterion: degree to which performance is critically affected by a component. Local criterion: heterogeneity of a node = implementation cost. a function which has a high cost in one mapping but low cost in the other is an extremity two functions which have very different implementation requirements (precision, etc.) repel each other into different implementations
© 2004 Wayne Wolf GCLP algorithm Schedule one node at a time: compute critical path select node on critical path for assignment evaluate effect of change in allocation of this node if performance is critical, reallocate for performance, else reallocate for cost Extremity value helps avoid assigning an operation to a partition where it clearly doesn’t belong. Repellers help reduce implementation cost.
© 2004 Wayne Wolf D’Ambrosio et al. Use general-purpose optimizer for HW/SW assignment. Can model both hard and soft deadlines. Measure expandability of system as difference between upper and lower performance bounds. Loose upper bound on CPU utilization leads to excessive hardware cost in final result. Use simulation to estimate execution time of each process.
© 2004 Wayne Wolf Binary search algorithm If zero-cost solution is found for given hardware size, zero-cost solution is guaranteed to exist for larger hardware size. Therefore, can use binary search to select satisfying solution. Evaluate cost of point when it is tested, rather than generate costs of all points in advance. Sufficient to look for a zero-cost solution: 10080503010000
ECE 667 Synthesis and Verification of Digital Circuits
Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Modeling shared cache and bus in multi-core platforms for timing analysis Sudipta Chattopadhyay Abhik Roychoudhury Tulika Mitra.
CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
CS5270 Lecture 31 Uppaal, and Scheduling, and Resource Access Protocols CS 5270 Lecture 3.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
PradeepKumar S K Asst. Professor Dept. of ECE, KIT, TIPTUR. PradeepKumar S K, Asst.
*time Optimization Heiko, Diego, Thomas, Kevin, Andreas, Jens.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Modern VLSI Design 3e: Chapter 10 Copyright 2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
- 1 - P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Synthesis of Embedded Software Using Free-Choice Petri Nets.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
© 2018 SlidePlayer.com Inc. All rights reserved.