Presentation is loading. Please wait.

Presentation is loading. Please wait.

BRASS Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John Wawrzynek University of California, Berkeley – BRASS.

Similar presentations


Presentation on theme: "BRASS Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John Wawrzynek University of California, Berkeley – BRASS."— Presentation transcript:

1 BRASS Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John Wawrzynek University of California, Berkeley – BRASS group André DeHon California Institute of Technology – Dept. Computer Science Stream Computations Organized for Reconfigurable Execution SCORE

2 BRASS FPL 2000 (8/30/00)2 Goal: Software Survival  Software for microprocessors survives on new devices  Binary compatibility  Automatic improvement  Software for reconfigurable devices does not  Substantial effort to port/redeploy

3 BRASS FPL 2000 (8/30/00)3 Outline  Problem: Software Survival  A New Compute Model  SCORE Components  Preliminary Results  Future Work

4 BRASS FPL 2000 (8/30/00)4 Why Can’t Reconfig. Software Survive?  Resource constraints/sizes are exposed:  to programmer  in low-level representation (netlist)  Design revolves around device size  Algorithmic structure  Exploited parallelism

5 BRASS FPL 2000 (8/30/00)5 The SCORE Approach  A compute model with unbounded resources  Efficient hardware virtualization  Demand paging

6 BRASS FPL 2000 (8/30/00)6 Page-Compatible Devices  Family of devices with:  Common page definition  Varying number of pages  Binary Compatibility  Automatic Performance Improvement

7 BRASS FPL 2000 (8/30/00)7 Virtualizing a Netlist (is bad)  Netlist is sensitive to timing  Disallow asynchronous features (e.g. busses)  Synchronous  WASMII [Ling+Amano, FCCM ’93]  Page I/O via registers  Execute each cycle of every page  Huge reconfiguration overhead! Execute Reconfigure time Page Execution

8 BRASS FPL 2000 (8/30/00)8 Previous Attempts at Virtualization  Multi-context  DPGA[DeHon, FPGA ‘94]  TM-FPGA[Xilinx, FCCM ‘97]  Configuration Cache  Striped  PipeRench[CMU, FPGA ’98]  Pipelined reconfiguration  Restricted to feed-forward pipelines

9 BRASS FPL 2000 (8/30/00)9 Streams  Goal  Less frequent reconfiguration  Batch process block of inputs  Amortize reconfiguration cost over large data set  Stream is:  Unidirectional page-to-page link  FIFO queue of data tokens  Unbounded depth

10 BRASS FPL 2000 (8/30/00)10 Stream Implementation  Only one endpoint (page) loaded  Stream = memory buffer  Desire distributed, on-chip memory  Both endpoints (pages) loaded  Stream = wire

11 BRASS FPL 2000 (8/30/00)11 Execution Example: Spatial DCT Zig-Zag Quantize / ZLE Huffman Enc. DCT Zig-zag Huffman Enc. Quantize / ZLE

12 BRASS FPL 2000 (8/30/00)12 Execution Example: Time-Multiplexed DCTZig-zag Quant / ZLE Huffman Enc.

13 BRASS FPL 2000 (8/30/00)13 SCORE Components Graph-based Compute Model Hardware Support Scheduler Run-time Support

14 BRASS FPL 2000 (8/30/00)14 SCORE Compute Model  Computation = graph of compute nodes  Concretely:compute pages  Abstractly:operators with local state (FSM)  Communication = streaming data flow  Storage =  Streams  Memory segments, accessed through streams

15 BRASS FPL 2000 (8/30/00)15 SCORE Hardware Model  Paged FPGA  Compute Page (CP) Fixed-size slice of RC hardware Fixed number of I/O ports  Distributed, on-chip memory Configurable Memory Block (CMB) Stream access  High-level interconnect  Microprocessor  Run-time support + user code

16 BRASS FPL 2000 (8/30/00)16 SCORE Run-Time Support  Mechanics of run-time reconfiguration  Page swap [context save/load]  Reconfigure interconnect  Page Scheduling  Which page to run where, when  Static … Dynamic

17 BRASS FPL 2000 (8/30/00)17 Functional Simulation  FPGA based on HSRA [Berkeley, FPGA ’99]  CP:512 4-LUTs  CMB:2Mbit DRAM  Area for CP-CMB pair:  Page reconfiguration:5000 cycles (from CMB)  Synchronous operation(same clock speed as processor)  x86 microprocessor  Page Scheduler task  Swap on timer interrupt (every 250,000 cycles)  Fully dynamic scheduling.25  :12.9mm 2 (1/9 of PII-450).18  : 6.7mm 2 (1/16 of PIII-600)

18 BRASS FPL 2000 (8/30/00)18 Applications  Multimedia processing applications  Hand-partitioned into 512-LUT pages  Good applications  Primarily feed-forward (feedback loops fit in HW)  Bad applications  Large, tight feedback loops (e.g. ADPCM) ApplicationPagesSegments JPEGEncode136 Decode134 MPEGEncode45102 WaveletEncode146 Decode156

19 BRASS FPL 2000 (8/30/00)19 Application: JPEG Encode

20 BRASS FPL 2000 (8/30/00)20 Scaling Results: JPEG Encode Physical Compute Pages Total Time (Makespan in millions of cycles)

21 BRASS FPL 2000 (8/30/00)21 Summary  SCORE enables software survival on reconfigurable systems  Binary compatibility  Automatic performance scaling  Virtual Hardware  Requirements:  Graph-based compute model  Paged FPGA hardware  Run-time support for RTR/Scheduling

22 BRASS FPL 2000 (8/30/00)22 Future Work  Compilation/CAD  Partitioning FSM operators into pages  Study architectural parameters  Page size  CMB size  Tolerable reconfiguration time  Scheduling  Static scheduling

23 BRASS FPL 2000 (8/30/00)23 More Info on the Web  SCORE project:   Tutorial:  score_tutorial.html


Download ppt "BRASS Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John Wawrzynek University of California, Berkeley – BRASS."

Similar presentations


Ads by Google