We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byJeffry Smith
Modified about 1 year ago
High Performance Embedded Computing © 2007 Elsevier Chapter 7, part 1: Hardware/Software Co-Design High Performance Embedded Computing Wayne Wolf
© 2006 Elsevier Topics Platforms. Performance analysis. Design representations.
© 2006 Elsevier Design platforms Different levels of integration: PC + board. Custom board with CPU + FPGA or ASIC. Platform FPGA. System-on-chip.
© 2006 Elsevier CPU/accelerator architecture CPU is sometimes called host. Accelerator communicate via shared memory. May use DMA to communicate. CPU memory accelerator
© 2006 Elsevier Example: Xilinx Virtex-4 System-on-chip: FPGA fabric. PowerPC. On-chip RAM. Specialized I/O devices. FPGA fabric is connected to PowerPC bus. MicroBlaze CPU can be added in FPGA fabric.
© 2006 Elsevier Example: WILDSTAR II Pro
© 2006 Elsevier Performance analysis Must analyze accelerator performance to determine system speedup. High-level synthesis helps: Use as estimator for accelerator performance. Use to implement accelerator.
© 2006 Elsevier Data path/controller architecture Data path performs regular operations, stores data in registers. Controller provides required sequencing. Data path controller
© 2006 Elsevier High-level synthesis High-level synthesis creates register-transfer description from behavioral description. Schedules and allocates: Operators. Variables. Connections. Control step or time step is one cycle in system controller. Components may be selected from technology library.
© 2006 Elsevier Models Model as data flow graph. Critical path is set of nodes on path that determines schedule length.
© 2006 Elsevier Schedules As-soon-as-possible (ASAP) pushes all nodes to start of slack region. As-late-as-possible (ASAP) pushes all nodes to end of slack region. Useful for bounding schedule. ASAP ALAP
© 2006 Elsevier First-come first-served, critical path FCFS walks through data flow graph from sources to sinks. Schedules each operator in first available slot based on available resources. Critical-path scheduling walks through critical nodes first.
© 2006 Elsevier List scheduling Improvement on critical path scheduling. Estimates importance of nodes off the critical path. Estimates how close node is to being critical. D, number of descendants, estimates criticality. Node with fewer descendants is less likely to become critical. Traverse graph from sources to sinks. For nodes at a given depth, order nodes by criticality.
© 2006 Elsevier Force-directed scheduling Forces model the connections to other operators. Forces on operator change as schedule of related operators change. Forces are a linear fucntion of displacement. Predecessor/successor forces relate operator to nearby operators. Place operator at minimum- force location in schedule.
© 2006 Elsevier Distribution graph Bound schedule using ASAP, ALAP. Count number of operators of a given type at each point in the schedule. Weight by how likely each operator is to be at that time in the schedule.
© 2006 Elsevier Path-based scheduling Minimizes the number of control states in controller. Schedules each path independently, then combines paths into a system schedule. Schedule path combinations using minimum clique covering.
© 2006 Elsevier Accelerator estimation How do we use high-level synthesis, etc. to estimate the performance of an accelerator? We have a behavioral description of the accelerator function. Need an estimate of the number of clock cycles. Need to evaluate a large number of candidate accelerator designs. Can’t afford to synthesize them all.
© 2006 Elsevier Estimation methods Hermann et al. used numerical methods. Estimated incremental costs due to adding blocks to the accelerator. Henkel and Ernst used path-based scheduling. Cut CFDG into subgraphs: reduce loop iteration count; cut at large joins; divide into equal-sized pieces. Schedule each subgraph independently.
© 2006 Elsevier Henkel and Ernst path-based estimation [Hen01] © 2001 IEEE
© 2006 Elsevier Fast incremental evaluation Vahid and Gajski estimate controller and data path costs incrementally. Hardware cost: FU = function units. SU = storage units. M = multiplexers. C = control logic. W = wiring. [Vah95] © 1995 IEEE
© 2006 Elsevier Vahid and Gajski estimation procedure Compile information on data path inputs and outputs, function and storage units, controller states, etc. Update algorithm changes tables based on incremental hardware changes. Executes in constant time for reasonable design characteristics. [Vah95] © 1995 IEEE
© 2006 Elsevier Single- vs. multi-threaded One critical factor is available parallelism: single-threaded/blocking: CPU waits for accelerator; multithreaded/non-blocking: CPU continues to execute along with accelerator. To multithread, CPU must have useful work to do. But software must also support multithreading.
© 2006 Elsevier Total execution time Single-threaded: Multi-threaded: P2 P1 A1 P3 P4 P2 P1 A1 P3 P4
© 2006 Elsevier Execution time analysis Single-threaded: Count execution time of all component processes. Multi-threaded: Find longest path through execution.
© 2006 Elsevier Hardware-software partitioning Partitioning methods usually allow more than one ASIC. Typically ignore CPU memory traffic in bus utilization estimates. Typically assume that CPU process blocks while waiting for ASIC. CPU ASIC mem
© 2006 Elsevier Synthesis tasks Scheduling: make sure that data is available when it is needed. Allocation: make sure that processes don’t compete for the PE. Partitioning: break operations into separate processes to increase parallelism, put serial operations in one process to reduce communication. Mapping: take PE, communication link characteristics into account.
© 2006 Elsevier Scheduling and allocation Must schedule/allocate computation communication Performance may vary greatly with allocation choice. P1 P2 P3 P1 P2 P3 CPU1 ASIC1
© 2006 Elsevier Problems in scheduling/allocation l Can multiple processes execute concurrently? l Is the performance granularity of available components fine enough to allow efficient search of the solution space? l Do computation and communication requirements conflict? l How accurately can we estimate performance? software custom ASICs
© 2006 Elsevier Partitioning example before after r = p1(a,b); s = p2(c,d); z = r + s; r=p1(a,b);s=p2(c,d); z = r + s
© 2006 Elsevier Problems in partitioning l At what level of granularity must partitioning be performed? l How well can you partition the system without an allocation? l How does communication overhead figure into partitioning?
© 2006 Elsevier Problems in mapping Mapping and allocation are strongly connected when the components vary widely in performance. Software performance depends on bus configuration as well as CPU type. Mappings of PEs and communication links are closely related.
© 2006 Elsevier Program representations CDFG: single-threaded, executable, can extract some parallelism. Task graph: task-level parallelism, no operator-level detail. TGFF generates random task graphs. UNITY: based on parallel programming language.
© 2006 Elsevier Platform representations Technology table describes PE, channel characteristics. CPU time. Communication time. Cost. Power. Multiprocessor connectivity graph describes PEs, channels. TypeSpeedcost ARM 750E610 MIPS50E68 PE 1 PE 2 PE 3
Chapter7. System Organization. System Organization - How computers and their major components are interconnected and managed at the system level. 7.1.
PARTIAL RECONFIGURATION USING FPGAs: ARCHITECTURE 1.
High Performance Embedded Computing © 2007 Elsevier Chapter 2, part 3: CPUs High Performance Embedded Computing Wayne Wolf.
Computer Systems Lecturer: Szabolcs Mikulas URL: Textbook: W. Stallings,
Distributed Computing Dr. Eng. Ahmed Moustafa Elmahalawy Computer Science and Engineering Department.
High Performance Embedded Computing © 2007 Elsevier Lecture 8: Embedded Processor Issues Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.
The Client/Server Database Environment CS263 Lecture 12.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 2 Parallel Hardware and Parallel Software An Introduction to Parallel Programming Peter Pacheco.
Main Memory. Goals for Today Protection: Address Spaces –What is an Address Space? –How is it Implemented? Address Translation Schemes –Segmentation –Paging.
Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The Intel Pentium.
Workshop - November Toulouse A.BERJAOUI (AKKA IS for Astrium) A.LEFEVRE & C. LE LANN (Astrium) SystemC/TLM virtual platforms Use of SystemC/TLM.
What is an Operating System? A program that acts as an intermediary between a user of a computer and the computer hardware. Operating system goals: Execute.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 20: Database System.
Operating Systems Part IV: Memory Management. Main Memory Large array of words or bytes each having its own address Several processes must be kept in.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Chapter 18: Database System Architectures Centralized Systems Client--Server Systems Parallel.
Operating Systems Chapter 6. Main functions of an operating system 1. User/computer interface: Provides an interface between the user and the computer.
1 Chapter 6 Operating Systems. 2 Learning outcomes Describe the functions of the Operating System and its components. Explain where the operating system.
Computer Architecture Lecture 31 Fasih ur Rehman.
ADAMA UNIVERSITY SCHOOL OF ENGENEERING & INFORMATION TECHNOLOGIES Sem. II,2010/11 INSTRUCTOR TARIKU W OPERATING SYSTEM (IT3016)
SCSC 311 Information Systems: hardware and software.
Reconfigurable Computing After a Decade: A New Perspective and Challenges For Hardware-Software Co-Design and Development Tirumale K Ramesh, Ph.D. Boeing.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 17: Database.
MT311 (Oct 2007) Java Application Development Concepts of Programming Languages, Language Evaluation Tutorial 5.
1 CSE 380 Computer Operating Systems Instructor: Insup Lee University of Pennsylvania Fall 2003 Lecture Notes: Multiprocessors (updated version)
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 8: Main Memory.
1 Operating Systems Chapter 6. 2 What is an operating system? A program that runs on the hardware and supports Resource Abstraction Resource Sharing Abstracts.
Copyright 2004 Bernd Brügge TUM Software Engineering WS TUM System Design II Bernd Brügge Technische Universität München Applied Software Engineering.
A Review of Routing and Wavelength Assignment Approaches for Wavelength-Routed Optical WDM Networks Mohammad Reza Faghani In the name of God, The Beneficent,
ECE3055 Computer Architecture and Operating Systems Lecture 9 Memory Subsystem (II) OS Perspective Prof. Hsien-Hsin Sean Lee School of Electrical and Computer.
Compiled by : S. Agarwal, Lecturer & Systems Incharge St. Xaviers Computer Centre, St. Xaviers College Kolkata. March-2003.
© 2016 SlidePlayer.com Inc. All rights reserved.