High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.

Slides:



Advertisements
Similar presentations
Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
ECE-777 System Level Design and Automation Hardware/Software Co-design
ECOE 560 Design Methodologies and Tools for Software/Hardware Systems Spring 2004 Serdar Taşıran.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Reference: Message Passing Fundamentals.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
High-level System Modeling and Power Management Techniques Jinfeng Liu Dept. of ECE, UC Irvine Sep
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Mahapatra-Texas A&M-Fall'001 cosynthesis Introduction to cosynthesis Rabi Mahapatra CPSC498.
Define Embedded Systems Small (?) Application Specific Computer Systems.
1 EE249 Discussion A Method for Architecture Exploration for Heterogeneous Signal Processing Systems Sam Williams EE249 Discussion Section October 15,
Mahapatra-Texas A&M-Fall'001 Partitioning - I Introduction to Partitioning.
CS294-6 Reconfigurable Computing Day 3 September 1, 1998 Requirements for Computing Devices.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
What is Software Architecture?
SBSE Course 4. Overview: Design Translate requirements into a representation of software Focuses on –Data structures –Architecture –Interfaces –Algorithmic.
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Making FPGAs a Cost-Effective Computing Architecture Tom VanCourt Yongfeng Gu Martin Herbordt Boston University BOSTON UNIVERSITY.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Automated Design of Custom Architecture Tulika Mitra
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: HW/SW Partitioning Part of HW/SW Codesign of Embedded Systems Course (CE )
High Performance Embedded Computing © 2007 Elsevier Chapter 4, part 1: Processes and Operating Systems High Performance Embedded Computing Wayne Wolf.
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
High Performance Embedded Computing © 2007 Elsevier Lecture 3: Design Methodologies Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte Based.
High Performance Embedded Computing © 2007 Elsevier Chapter 1, part 2: Embedded Computing High Performance Embedded Computing Wayne Wolf.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Hardware/Software Co-design Design of Hardware/Software Systems A Class Presentation for VLSI Course by : Akbar Sharifi Based on the work presented in.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #21 – HW/SW.
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 2: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf.
Fuzzy Genetic Algorithm
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
High Performance Embedded Computing © 2007 Elsevier Lecture 18: Hardware/Software Codesign Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
Computers Operating System Essentials. Operating Systems PROGRAM HARDWARE OPERATING SYSTEM.
- 1 - EE898_HW/SW Partitioning Hardware/software partitioning  Functionality to be implemented in software or in hardware? No need to consider special.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
CS244-Introduction to Embedded Systems and Ubiquitous Computing Instructor: Eli Bozorgzadeh Computer Science Department UC Irvine Winter 2010.
C OMPARING T HREE H EURISTIC S EARCH M ETHODS FOR F UNCTIONAL P ARTITIONING IN H ARDWARE -S OFTWARE C ODESIGN Theerayod Wiangtong, Peter Y. K. Cheung and.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  What Operating Systems Do  Computer-System Organization  Computer-System Architecture  Operating-System Structure.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.
Winter-Spring 2001Codesign of Embedded Systems1 Co-Synthesis Algorithms: Distributed System Co- Synthesis Part of HW/SW Codesign of Embedded Systems Course.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
1 Hardware-Software Co-Synthesis of Low Power Real-Time Distributed Embedded Systems with Dynamically Reconfigurable FPGAs Li Shang and Niraj K.Jha Proceedings.
Software Systems Division (TEC-SW) ASSERT process & toolchain Maxime Perrotin, ESA.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Dynamo: A Runtime Codesign Environment
Reconfigurable Computing
CSCI1600: Embedded and Real Time Software
Digital Processing Platform
ELEC 7770 Advanced VLSI Design Spring 2012 Introduction
ELEC 7770 Advanced VLSI Design Spring 2010 Introduction
Department of Electrical Engineering Joint work with Jiong Luo
Parallel Programming in C with MPI and OpenMP
CSCI1600: Embedded and Real Time Software
Presentation transcript:

High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 3: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf

© 2006 Elsevier Topics Multi-objective optimization. Co-synthesis for control. Co-synthesis for caches. Co-synthesis for reconfigurable platforms. Hardware/software co-simulation.

© 2006 Elsevier Multi-objective optimization Operations research provides notions for optimization functions with multiple objectives. Pareto optimality: optimal solution cannot be improved without making something else worse.

© 2006 Elsevier GOPS Feasibility factor computed from throughput factors. Upper-bound throughput for RMS: Upper-bound throughput for EDF:

© 2006 Elsevier Upper bound feasibility Upper-bound feasibility tests:

© 2006 Elsevier Lower bound feasibility test Lower bound:

© 2006 Elsevier Feasibility factor Feasibility factor P : Use feasibility factor to prune the search space and as an optimization objective.

© 2006 Elsevier Genetic algorithms Modeled as:  Genes = strings of symbols.  Mutations = changes to strings. Types of moves:  Reproduction makes a copy of a string.  Mutation changes a string.  Crossover interchanges parts of two strings.

© 2006 Elsevier MOGAC Technology tables characterize hardware components. Genetic model:  Processing element allocation string lists all PEs and types.  Task allocation string shows assignment of tasks to PEs.  Link allocation task maps communication to links.  IC allocation string maps tasks to chips.

© 2006 Elsevier MOGAC optimization procedure Forms initial solution. Repeats evolve/evaluate cycle. Evaluation determines noninferior solutions. Some noninferior solutions may not survive evolution. Clusters solutions to reduce run time. [Dic98] © 1998 IEEE

© 2006 Elsevier MOGAC constraints nis(x): noninferior solutions in x. dom(a,b) = 1 if a is not dominated by b. Cluster rank:

© 2006 Elsevier Energy-aware task scheduling Yang et al. schedule multiprocessors for energy. Combine design-time and runtime:  At design time, scheduler evaluates scheduling/allocation choices; optimizes with genetic algorithms; generates table.  At run time, heuristics use the table to choose best scheduling/allocation pattern.

© 2006 Elsevier Co-synthesis for wireless Wireless systems are bandwidth and energy limited. COWLS uses parallel recombinative simulated annealing.  Ranked by communication time, computation time, utilization. Scheduling influences both power consumption and timing.  Slack determines idle time.

© 2006 Elsevier Control and I/O synthesis Control finite-state machine (CFSM) model describges control-dominated systems. Event-driven model. Finite, non-zero, unbounded reaction times. Implementations:  Hardware is logic guarded by latches.  Software is synthesized from s-graph that models control flow graph. Can be used as an intermediate representation for Esterel, etc.

© 2006 Elsevier Modal process model Chou et al. use modal models:  I/O behavior depends on current mode and on inputs. Abstract control types define control operations with known properties.

© 2006 Elsevier Interface synthesis Chou et al. represent I/O as control flow graphs.  Generate tasks, allocate I/O ports, split wide-word operations, use memory mapped I/O where ncessary, generate I/O sequencer. Daveau et al. synthesize communication by allocating operations to units in a library.  Communication unit must provide requred services, use the right protocol, and run at the required data rate.

© 2006 Elsevier Cache modeling for co-synthesis Cache state affects task execution time. Li and Wolf used two- state model for processes in cache:  One time if in cache.  Another time if not in cache. This model is more abstract than cache line model. [Li99] © 1999 IEEE

© 2006 Elsevier Co-synthesis with caches System cost: Hierarchical scheduling algorithm:  Schedule tasks (>= process) over hyperperiod.  Refine schedule by moving processes within a task. Dynamic urgency models how process uses cache:

© 2006 Elsevier Wuytack et al. Methodology for dynamic memory management: 1. Define application using abstract data types. 2. Refine ADTs into concrete data structures. 3. Virtual memory divided among several memory managers. 4. Spit virtual memory segments into groups to parallelize data accesses. 5. Order background memory accesses to optimize bandwidth. 6. Allocate physical memories.

© 2006 Elsevier Co-synthesis for reconfigurable systems FPGA fabric can hold different accelerators at different times. Combinations of accelerators may be limited.  Must take floorplan into account. Schedule must take reconfiguration time, energy into account.

© 2006 Elsevier CORDS CORDS uses evolutionary algorithms similar to MOGAC.  Adds reconfiguration delay to costs based on current schedule state.  Dynamic priority of task depends on slack + reconfiguration delay.  Increases dynamic priority of tasks with low reconfiguration time to group together several reconfigurations and save energy.

© 2006 Elsevier Nimble Performs fine-grained partitioning for instruction- level parallelism. Platform described in architecture description language. Program represented as control flow graph. Selects interesting parts of loops by analyzing control dependence graph. [Li00] © 2000 IEEE

© 2006 Elsevier Hardware/software co-simulation Must connect models with different models of computation, different time scales. Simulation backplane manages communication. Becker et al. used PLI in Verilog-XL to add C code that communicates with software models, UNIX networking to connect hardware simulator.

© 2006 Elsevier Mentor Graphics Seamless Hardware modules described using standard HDLs. Software can be loaded as C or binary. Bus interface module connects hardware models to processor instruction set simulator. Coherent memory server manages shared memory.