Presentation is loading. Please wait.

Presentation is loading. Please wait.

BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael.

Similar presentations


Presentation on theme: "BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael."— Presentation transcript:

1 BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael Chu, John Wawrzynek UC Berkeley BRASS Group André DeHon California Institute of Technology

2 BRASS February 26, 2002FPGA 20022 Outline  Hardware Virtualization  SCORE model  Run-time scheduler  Fully Dynamic  Quasi-Static  Results  7x reduction in scheduling overhead  App performance improved by a factor of 2-7.  Conclusion

3 BRASS February 26, 2002FPGA 20023 Hardware Virtualization  Traditional Mapping Tools  Expose resource constraints to designer  HW virtualization enables:  App compatibility/longevity across a device family  Automatic performance scaling on larger devices

4 BRASS February 26, 2002FPGA 20024  Programming Model  Streaming dataflow graph of operators (FSM + datapath) Dynamic data-dependent behavior Arbitrary size operators Stream Computation Organized for Reconfigurable Execution (SCORE) (1)  Data-flow based framework  Programming Model  Execution Environment  Hardware Platform  Run-time representation  Graph of fixed size compute pages Akin to virtual memory pages  Run-time scheduling is required to handle dynamic page behavior

5 BRASS February 26, 2002FPGA 20025 Stream Computation Organized for Reconfigurable Execution (SCORE) (2)  Hardware Platform  uP/Reconfigurable array hybrid Array: compute pages (CP) and configurable memory blocks (CMB)  Stream interface between resources  Global Controller manages reconfiguration  Array Reconfiguration  Scheduler Operation  Temporal Partitioning  Buffer intermediate results  Resource Allocation/Mapping  Compute pages  Memory segments  Communication channels

6 BRASS February 26, 2002FPGA 20026  Run-time scheduling (late binding of resources)  Benefit: automatic performance scaling  Extra burden: scheduler Complex optimization with multiple simultaneous constraints (CPs, CMBs, and network)  NP-hard problem Run-time Scheduler  What is the right timeslice size?  Depends on an application’s run-time behavior  Affected by the scheduler overhead (lower bound)  Space of scheduling solutions  Range in quality and complexity  T radeoffs: timeslice vs asynchronous or dynamic vs static

7 BRASS February 26, 2002FPGA 20027 Problem Statement  SCORE Micro-architecture  Parallel reconfiguration of independent CPs/CMBs  Reconfiguration time is thousands of cycles  Problem  Investigate scheduling cost  Reduce it to a minimum (comparable to reconfiguration time)  Understand its effect on application run-times.

8 BRASS February 26, 2002FPGA 20028 Initial Scheduling Solution  Version of priority-list scheduling Availability of input tokens and output space determines the priority Candidates are chosen by BFS  Fixed timeslice size  Fully Dynamic Scheduler  Perform scheduling operation each timeslice  Large critical loop

9 BRASS February 26, 2002FPGA 20029 Fully Dynamic Scheduler (1)  Two types of overhead:  Scheduler (avg. 124 Kcycles)  Reconfiguration [array global controller] (avg. 3.5 Kcycles)  Average overhead per timeslice > 127 Kcycles

10 BRASS February 26, 2002FPGA 200210 Fully Dynamic Scheduler (2)  Total Execution Time  Scheduler Overhead is on average 36% of execution time  Timeslice Size = 250Kcycles.

11 BRASS February 26, 2002FPGA 200211 Quasi-Static Scheduler  Small Run-time Critical Loop:  Query Array  Issue Script Commands  Pre-compute Schedule from  Graph topology  Back annotations (I/O rates)  Generate script of configuration commands.  Timeslice size  Dynamically controlled by array hardware stall detect.  Hardware continuously (or at small intervals) monitors array activity. Quasi Static

12 BRASS February 26, 2002FPGA 200212 Results (1)  A low overhead scheduling solution  Scheduler overhead (avg. 14Kcycles)  Reconfiguration (avg. 4Kcycles)  7x average reduction in overhead

13 BRASS February 26, 2002FPGA 200213 Results (2)  4.5x average application speedup  Reduction in overhead AND  Improvement in scheduling quality

14 BRASS February 26, 2002FPGA 200214 Results Summary  Tested applications:  Image de/compression – consist of both dynamic and static rate operators.  All demonstrate similar speedups under Quasi-Static scheduler.  Performance improvements can be attributed to:  Reduced scheduler overhead  Improved scheduling quality: Global rather than local (BFS) view as in dynamic scheduler  Reduction of the lower bound of timeslice size  Expands the space of apps well suited for execution under a virtualized hardware  Retained powerful semantics of dynamic data- dependent dataflow

15 BRASS February 26, 2002FPGA 200215 Conclusion  Run-time scheduler  Required for automatic scaling under hardware virtualization  Run-time overhead sets lower bound on the size of scheduling step (response time): Restricting applicability of virtualized hardware Makes this model impractical for some apps  Low overhead run-time scheduling is achievable:  Without semantic restrictions  With higher (or comparable) scheduling quality.  7x reduction in overhead and simultaneous  Performance improvement of 2-7x.  OS is a viable alternative to manual scheduling.

16 BRASS February 26, 2002FPGA 200216 Thank You  Thanks to:  DARPA, Xilinx and STMicro  For more information  http://brass.cs.berkeley.edu/SCORE


Download ppt "BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael."

Similar presentations


Ads by Google