Download presentation
Presentation is loading. Please wait.
1
BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael Chu, John Wawrzynek UC Berkeley BRASS Group André DeHon California Institute of Technology
2
BRASS February 26, 2002FPGA 20022 Outline Hardware Virtualization SCORE model Run-time scheduler Fully Dynamic Quasi-Static Results 7x reduction in scheduling overhead App performance improved by a factor of 2-7. Conclusion
3
BRASS February 26, 2002FPGA 20023 Hardware Virtualization Traditional Mapping Tools Expose resource constraints to designer HW virtualization enables: App compatibility/longevity across a device family Automatic performance scaling on larger devices
4
BRASS February 26, 2002FPGA 20024 Programming Model Streaming dataflow graph of operators (FSM + datapath) Dynamic data-dependent behavior Arbitrary size operators Stream Computation Organized for Reconfigurable Execution (SCORE) (1) Data-flow based framework Programming Model Execution Environment Hardware Platform Run-time representation Graph of fixed size compute pages Akin to virtual memory pages Run-time scheduling is required to handle dynamic page behavior
5
BRASS February 26, 2002FPGA 20025 Stream Computation Organized for Reconfigurable Execution (SCORE) (2) Hardware Platform uP/Reconfigurable array hybrid Array: compute pages (CP) and configurable memory blocks (CMB) Stream interface between resources Global Controller manages reconfiguration Array Reconfiguration Scheduler Operation Temporal Partitioning Buffer intermediate results Resource Allocation/Mapping Compute pages Memory segments Communication channels
6
BRASS February 26, 2002FPGA 20026 Run-time scheduling (late binding of resources) Benefit: automatic performance scaling Extra burden: scheduler Complex optimization with multiple simultaneous constraints (CPs, CMBs, and network) NP-hard problem Run-time Scheduler What is the right timeslice size? Depends on an application’s run-time behavior Affected by the scheduler overhead (lower bound) Space of scheduling solutions Range in quality and complexity T radeoffs: timeslice vs asynchronous or dynamic vs static
7
BRASS February 26, 2002FPGA 20027 Problem Statement SCORE Micro-architecture Parallel reconfiguration of independent CPs/CMBs Reconfiguration time is thousands of cycles Problem Investigate scheduling cost Reduce it to a minimum (comparable to reconfiguration time) Understand its effect on application run-times.
8
BRASS February 26, 2002FPGA 20028 Initial Scheduling Solution Version of priority-list scheduling Availability of input tokens and output space determines the priority Candidates are chosen by BFS Fixed timeslice size Fully Dynamic Scheduler Perform scheduling operation each timeslice Large critical loop
9
BRASS February 26, 2002FPGA 20029 Fully Dynamic Scheduler (1) Two types of overhead: Scheduler (avg. 124 Kcycles) Reconfiguration [array global controller] (avg. 3.5 Kcycles) Average overhead per timeslice > 127 Kcycles
10
BRASS February 26, 2002FPGA 200210 Fully Dynamic Scheduler (2) Total Execution Time Scheduler Overhead is on average 36% of execution time Timeslice Size = 250Kcycles.
11
BRASS February 26, 2002FPGA 200211 Quasi-Static Scheduler Small Run-time Critical Loop: Query Array Issue Script Commands Pre-compute Schedule from Graph topology Back annotations (I/O rates) Generate script of configuration commands. Timeslice size Dynamically controlled by array hardware stall detect. Hardware continuously (or at small intervals) monitors array activity. Quasi Static
12
BRASS February 26, 2002FPGA 200212 Results (1) A low overhead scheduling solution Scheduler overhead (avg. 14Kcycles) Reconfiguration (avg. 4Kcycles) 7x average reduction in overhead
13
BRASS February 26, 2002FPGA 200213 Results (2) 4.5x average application speedup Reduction in overhead AND Improvement in scheduling quality
14
BRASS February 26, 2002FPGA 200214 Results Summary Tested applications: Image de/compression – consist of both dynamic and static rate operators. All demonstrate similar speedups under Quasi-Static scheduler. Performance improvements can be attributed to: Reduced scheduler overhead Improved scheduling quality: Global rather than local (BFS) view as in dynamic scheduler Reduction of the lower bound of timeslice size Expands the space of apps well suited for execution under a virtualized hardware Retained powerful semantics of dynamic data- dependent dataflow
15
BRASS February 26, 2002FPGA 200215 Conclusion Run-time scheduler Required for automatic scaling under hardware virtualization Run-time overhead sets lower bound on the size of scheduling step (response time): Restricting applicability of virtualized hardware Makes this model impractical for some apps Low overhead run-time scheduling is achievable: Without semantic restrictions With higher (or comparable) scheduling quality. 7x reduction in overhead and simultaneous Performance improvement of 2-7x. OS is a viable alternative to manual scheduling.
16
BRASS February 26, 2002FPGA 200216 Thank You Thanks to: DARPA, Xilinx and STMicro For more information http://brass.cs.berkeley.edu/SCORE
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.