Design & Co-design of Embedded Systems Introduction to Co-synthesis Algorithms + HW/SW Partitioning Algorithms Maziar Goudarzi.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Hardware/ Software Partitioning 2011 年 12 月 09 日 Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: © Alexandra Nolte, Gesine Marwedel, 2003 These.
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Processes and operating systems zScheduling policies: yRMS; yEDF. zScheduling modeling.
CPE555A: Real-Time Embedded Systems
Design & Co-design of Embedded Systems Distributed System Co-synthesis (1) Maziar Goudarzi.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Modern VLSI Design 3e: Chapter 10 Copyright  2002 Prentice Hall Adapted by Yunsi Fei ECE 300 Advanced VLSI Design Fall 2006 Lecture 24: CAD Systems &
- 1 -  P. Marwedel, Univ. Dortmund, Informatik 12, 05/06 Universität Dortmund Hardware/Software Codesign.
Efficient Software Performance Estimation Methods for Hardware/Software Codesign Kei Suzuki Alberto Sangiovanni-Vincentelli Present: Yanmei Li.
© 2008 Wayne Wolf Overheads for Computers as Components 2 nd ed. Multiprocessors zWhy multiprocessors? zCPUs and accelerators. zMultiprocessor performance.
Scheduling for Embedded Real-Time Systems Amit Mahajan and Haibo.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Define Embedded Systems Small (?) Application Specific Computer Systems.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 08: RC Principles: Software (1/4) Prof. Sherief Reda.
Models of Computation for Embedded System Design Alvise Bonivento.
Courseware Basics of Real-Time Scheduling Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building.
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
By Group: Ghassan Abdo Rayyashi Anas to’meh Supervised by Dr. Lo’ai Tawalbeh.
Fall 2001CS 4471 Chapter 2: Performance CS 447 Jason Bakos.
Misconceptions About Real-time Computing : A Serious Problem for Next-generation Systems J. A. Stankovic, Misconceptions about Real-Time Computing: A Serious.
High Performance Embedded Computing © 2007 Elsevier Chapter 6, part 1: Multiprocessor Software High Performance Embedded Computing Wayne Wolf.
Winter-Spring 2001Codesign of Embedded Systems1 Introduction to HW/SW Co-Synthesis Algorithms Part of HW/SW Codesign of Embedded Systems Course (CE )
Design & Co-design of Embedded Systems Distributed System Co-synthesis (2) Maziar Goudarzi.
Evolution of Digital-Design: Past, Present, and Future Design & Co-design of Embedded Systems Maziar Goudarzi.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Mahapatra-Texas A&M-Fall'001 Codesign Framework Parts of this lecture are borrowed from lectures of Johan Lilius of TUCS and ASV/LL of UC Berkeley available.
Universität Dortmund  P. Marwedel, Univ. Dortmund, Informatik 12, 2003 Hardware/software partitioning  Functionality to be implemented in software.
Voicu Groza, 2008 SITE, HARDWARE/SOFTWARE CODESIGN OF EMBEDDED SYSTEMS 1 Hardware/Software Codesign of Embedded Systems DESIGN METHODOLOGIES Voicu.
Lecture 13 Introduction to Embedded Systems Graduate Computer Architecture Fall 2005 Shih-Hao Hung Dept. of Computer Science and Information Engineering.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
Automated Design of Custom Architecture Tulika Mitra
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.
Scheduling policies for real- time embedded systems.
Real-Time Operating Systems for Embedded Computing 李姿宜 R ,06,10.
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #21 – HW/SW.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 2: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
High Performance Embedded Computing © 2007 Elsevier Lecture 18: Hardware/Software Codesign Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
6. Application mapping 6.1 Problem definition
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
© 2000 Morgan Kaufman Overheads for Computers as Components Accelerators zAccelerated systems. zSystem design: yperformance analysis; yscheduling and.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Lecture 2, CS52701 The Real Time Computing Environment I CS 5270 Lecture 2.
Design & Co-design of Embedded Systems
System-on-Chip Design Hao Zheng Comp Sci & Eng U of South Florida 1.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course.
1 Hardware-Software Co-Synthesis of Low Power Real-Time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs Li Shang and Niraj K.Jha Proceedings.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
Winter-Spring 2001Codesign of Embedded Systems1 Essential Issues in Codesign: Architectures Part of HW/SW Codesign of Embedded Systems Course (CE )
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Dynamo: A Runtime Codesign Environment
Wayne Wolf Dept. of EE Princeton University
Unit OS9: Real-Time and Embedded Systems
Introduction to cosynthesis Rabi Mahapatra CSCE617
CSCI1600: Embedded and Real Time Software
Multiprocessor and Real-Time Scheduling
Processes and operating systems
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Design & Co-design of Embedded Systems Introduction to Co-synthesis Algorithms + HW/SW Partitioning Algorithms Maziar Goudarzi

Fall 2005 Design & Co-design of Embedded Systems2 Today Program zIntroduction zPreliminaries zHardware/Software Partitioning zDistributed System Co-Synthesis (Next session) Reference: Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2, Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W. Wolf, Kluwer Academic Publishers, 1997.

Introduction to HW/SW Co-Synthesis Algorithms Introduction

Fall 2005 Design & Co-design of Embedded Systems4 Introduction zImplementing a system? Why use CPU? yEasier implementation yEasier (and cheaper) to change and debug zWhy use hardware modules? yMeeting other constraints xperformance, power consumption, etc zFound a CPU meeting all non-functional constraints? yYes! What could be better? Use the CPU. yNo! Design custom logic, or a combination of both

Fall 2005 Design & Co-design of Embedded Systems5 Introduction (cont’d) zWhy more than one CPU or custom logic? zWhy not use the fastest available CPU?

Fall 2005 Design & Co-design of Embedded Systems6 Introduction (cont’d) zReason 1: yExponential cost per CPU performance yFigure: xlate-1996 retail prices of Pentium Processor Pentium processor prices Clock speed (MHz)

Fall 2005 Design & Co-design of Embedded Systems7 Introduction (cont’d) zExponential price/performance implies yPaying for performance in a uni-processor is very expensive xUsing multiple small CPUs is cheaper xCommunication overhead is added, but still an economic choice xProcessors need not be CPUs. But special-function units. xSpecial-purpose PEs can be even cheaper than dedicated CPU! Measured in system manufacturing cost, not necessarily in design cost

Fall 2005 Design & Co-design of Embedded Systems8 Introduction (cont’d) zReason 2: yScheduling overhead xMore than 31% overhead, under reasonable assumptions, when executing multiple processes Reason: uncertainty in the times at which the processes will need to execute Result: we have to reserve extra CPU horsepower, which comes at exponential cost

Fall 2005 Design & Co-design of Embedded Systems9 Introduction (cont’d) zDefinition yHW/SW co-synthesis: process of simultaneously design the SW architecture of an application and the HW architecture on which that SW is executed.

Fall 2005 Design & Co-design of Embedded Systems10 Introduction (cont’d) Problem Specification SW (app.) Arch. HW Engine PE Mem Communication Channels CoSynthesis

Fall 2005 Design & Co-design of Embedded Systems11 Introduction (cont’d) zProblem specification includes yFunctionality yNon-functional requirements xPerformance goals, physical constraints, etc

Fall 2005 Design & Co-design of Embedded Systems12 Introduction (cont’d) zHardware Architecture yOne or more Processing-Elements (PEs) zSoftware (Application) Architecture includes yProcess structure xEach process executes sequentially xDetermines The amount of parallelism The amount of communication xProper process structure is crucial for cost-effective implementation yAllocation of the processes onto PEs in the HW engine zCommunication channels yHardware elements ySoftware primitives

Fall 2005 Design & Co-design of Embedded Systems13 Introduction (cont’d) zHW/SW Co-synthesis yAllows trade-offs between SW architecture and HW on which it executes yWhere is such trade-off important? xEveryday processing applications vs. Embedded applications xEmbedded computing: Computing with limited resources yDifferent co-synthesis styles depending on xThe Specification xThe System Components xSystem Elements to synthesize

Fall 2005 Design & Co-design of Embedded Systems14 Introduction (cont’d) zTwo broad implementation styles yHW/SW partitioning xTarget HW architecture: a CPU and multiple ASICs yDistributed System Co-synthesis xTarget HW architecture: arbitrary hardware topologies

Introduction to HW/SW Co-Synthesis Algorithms Preliminaries

Fall 2005 Design & Co-design of Embedded Systems16 Preliminaries zRate (execution rate) yMaximum frequency at which a processing must be done zSingle-rate vs. Multi-rate yExample of multi-rate system xaudio/video decoder

Fall 2005 Design & Co-design of Embedded Systems17 Preliminaries (cont’d) zLatency yRequired maximum time between starting and finishing a processing task

Fall 2005 Design & Co-design of Embedded Systems18 Behavior Models zDFG: Data Flow Graph ySuitable for data-processing algorithms zCFG: Control Flow Graph ySuitable for process control algorithms zCDFG: Control Data Flow Graph yCombination of the two above

Fall 2005 Design & Co-design of Embedded Systems19 Behavior Models (cont’d) zSingle-rate systems yStandard model: Control-Data Flow Graph (CDFG) xImplies a program-counter or system-state xNot suitable to model multi-rate tasks Due to unified system state

Fall 2005 Design & Co-design of Embedded Systems20 Behavior Models (cont’d) zMulti-rate systems yCommon model: Task Graph zTask Graph yEach Node: Process yEach Edge: Communication yEach Set of connected nodes: sub-task P1 P2P3 P4P5 P6

Fall 2005 Design & Co-design of Embedded Systems21 Behavior Models (cont’d) zSDFG: Synchronous Data Flow Graph ySuitable for signal processing applications y= DFG + may be cyclic yLee and Messerschmitt: xAlgorithm to check feasibility of an SDFG + schedule it on a uni-processor or multiprocessor a b c

Fall 2005 Design & Co-design of Embedded Systems22 Behavior Models (cont’d) zCo-design Finite-State Machine (CFSM) yPOLIS project at UC-Berkeley yUsed for control-dominated systems xe.g., ECU (Engine Control Unit) yEvent-driven FSM xTransitions occur by events (instead of periodic clock signal) idle test error Done/ stop_time Timeout/ alarm=ON Reset/ alarm=OFF Go / start_timer

Fall 2005 Design & Co-design of Embedded Systems23 Architectural Models zThe hardware engine also needs a description zHere, only basic models for cost estimation

Fall 2005 Design & Co-design of Embedded Systems24 Architectural Models (cont’d) zHW-engine is another graph yGenerally: xProcessing Elements (PE) as nodes + communication channels as edges xProblem: How to model busses? xSolution: Nodes also used for channels Edges represents nets connecting PEs and channels Nodes are labeled with their type

Fall 2005 Design & Co-design of Embedded Systems25 Architectural Models (cont’d) zComponent Technology Library yUsed when pre-designed components constitute the HW engine yIncludes xGeneral parameters e.g., manufacturing cost, average power consumption, clock rate xInformation regarding functional elements (behaviors) A table giving execution time of each behavior on that PE

Fall 2005 Design & Co-design of Embedded Systems26 Architectural Models (cont’d) zCPU scheduling yProcess vs. thread (light-weight process) xWe use these terms interchangeably yScheduling policies to run multiple processes on a single CPU xNon-preemptive vs. preemptive (prioritized) xTime-slicing not normally used in embedded systems

Fall 2005 Design & Co-design of Embedded Systems27 Architectural Models (cont’d) yScheduling policies (cont’d) xPriority can be static or dynamic A well-known static priority scheme: –RMS (Rate monotonic Scheduling) –Best static schedule –Guarantees all deadlines –Needs 31% extra CPU horsepower A well-known dynamic priority scheme: –EDF (Earliest Deadline First) –100% CPU utilization –May miss deadlines xMore on this later

Fall 2005 Design & Co-design of Embedded Systems28 Topics zIntroduction zPreliminaries zHardware/Software Partitioning zDistributed System Co-Synthesis

Fall 2005 Design & Co-design of Embedded Systems29 Topics zIntroduction zA Classification zExamples yVulcan yCosyma

Fall 2005 Design & Co-design of Embedded Systems30 Introduction to HW/SW Partitioning zThe first variety of co-synthesis applications zDefinition yA HW/SW partitioning algorithm implements a specification on some sort of multiprocessor architecture zUsually yMultiprocessor architecture = one CPU + some ASICs on CPU bus

Fall 2005 Design & Co-design of Embedded Systems31 Introduction to HW/SW Partitioning (cont’d) zA Terminology yAllocation xSynthesis methods which design the multiprocessor topology along with the PEs and SW architecture yScheduling xThe process of assigning PE (CPU and/or ASICs) time to processes to get executed

Fall 2005 Design & Co-design of Embedded Systems32 Introduction to HW/SW Partitioning (cont’d) zIn most partitioning algorithms yType of CPU is fixed and given yASICs must be synthesized xWhat function to implement on each ASIC? xWhat characteristics should the implementation have? yAre single-rate synthesis problems xCDFG is the starting model

Fall 2005 Design & Co-design of Embedded Systems33 HW/SW Partitioning (cont’d) zNormal use of architectural components yCPU performs less computationally-intensive functions yASICs used to accelerate core functions zWhere to use? yHigh-performance applications xNo CPU is fast enough for the operations yLow-cost application xASIC accelerators allow use of much smaller, cheaper CPU

Fall 2005 Design & Co-design of Embedded Systems34 A Classification zCriterion: Optimization Strategy xTrade-off between Performance and Cost yPrimal Approach xPerformance is the primary goal xFirst, all functionality in ASICs. Progressively move more to CPU to reduce cost. yDual Approach xCost is the primary goal xFirst, all functions in the CPU. Move operations to the ASIC to meet the performance goal.

Fall 2005 Design & Co-design of Embedded Systems35 A Classification (cont’d) zClassification due to optimization strategy (cont’d) yExample co-synthesis systems xVulcan (Stanford): Primal strategy xCosyma (Braunschweig, Germany): Dual strategy

Co-Synthesis Algorithms: HW/SW Partitioning HW/SW Partitioning Examples: Vulcan

Fall 2005 Design & Co-design of Embedded Systems37 Partitioning Examples: Vulcan zGupta, De Micheli, Stanford University zPrimal approach 1. All-HW initial implementation. 2. Iteratively move functionality to CPU to reduce cost. zSystem specification language yHardwareC xIs compiled into a flow graph

Fall 2005 Design & Co-design of Embedded Systems38 Partitioning Examples: Vulcan (cont’d) nop x=ay=b 1 1 x=a; y=b; HardwareC cond x=ey=f c>dc<=d if (c>d) x=e; else y=f; HardwareC

Fall 2005 Design & Co-design of Embedded Systems39 Partitioning Examples: Vulcan (cont’d) zFlow Graph Definition yA variation of a (single-rate) task graph yNodes xRepresent operations xTypically low-level operations: mult, add yEdges xRepresent data dependencies xEach contains a Boolean condition under which the edge is traversed

Fall 2005 Design & Co-design of Embedded Systems40 Partitioning Examples: Vulcan (cont’d) zFlow Graph yis executed repeatedly at some rate ycan have initiation-time constraints for each node t(v i )+l ij  t(v j )  t(v i )+u ij ycan have rate constraints on each node m i  R i  M i