Download presentation
Presentation is loading. Please wait.
1
v The DARPA Dynamic Programming Benchmark on a Reconfigurable Computer Justification High performance computing benchmarking Compare and improve the performance of reconfigurable computers against other supercomputers, distributed and massively parallel processing machines Design and Benchmark a Dynamic Programming problem to: Understand the advantages of reconfigurable supercomputing machines Study and devise a methodology for optimal mapping of algorithms Study the scalability of algorithms with the size of the input and with the size of the system Explore limitations of reconfigurable computers Justify and propose architectural performance improvements to important problems Contributions of research We have: Designed and implemented benchmark 2 of the DARPA HPCS Discrete Mathematics problems Developed a methodology to map similar algorithms to reconfigurable hardware platforms Explore architectures and their trade-offs in area, bandwidth, power consumption, and parallelism Objectives Luis E. Cordova, Duncan A. Buell, and Sreesa Akella Department of Computer Science and Engineering University of South Carolina >>> MUX +A 9,1 +A 9,2 +A 1,1 +A 1,2 +A 1,10 +A 9,10 +A 10,1 +A 10,2 +A 10,10 … MUX x > … x > x > … x > … … … LUT … … … OBM A OBM B OBM C OBM D OBM E Chip1 Maximizing Chip2 Sequencing OBM F … MUX … FIFO x > … tracking x > … … … > > xxx > > … … … Dynamic Programming Problem SRAM on- board-memory banks FPGA user logic chips High level view of hardware platform Maximizing loop Sequencing loop Transformation 1: column-wise reading Fully registered matrix architecture Transformation 2: row-wise reading Sequencing loop scheme Reconfigurable Computing Methodology: The entire design is based on standard high-level programming languages, ANSI C or Fortran. There is a seamless path between the naïve version of the algorithm coded in C language to a version mapped to the specific SRC platform. The methodology is based on transformations of the initial architecture to architectures that better exploit the parallelism of the problem. The effective utilization of the hardware resources is assisted by the SRC high-level compiler. The compiler aids in code debugging and elimination of slowdowns in suboptimal architectures. Storing all the matrices on-chip yields the top performance. The architecture is detailed as above in the 3-d figure for the first matrix. Maximizing loop scheme Architecture exploration Limitation: We find a need to automate higher level compilation steps at the problem level; this step requires specialized or expert knowledge on the field of application that is being studied. *Two sequencing architectures optimized for area (a) and memory bandwidth (b). (a)(b) *Two maximizing architectures. The architecture reading in row- wise fashion (transformation 2) offers higher performance than 1. Specification of top performance is an ANSI C file. We are able to explore a large number of possible architecures providing different trade-offs between parallelism, economy of resources, and throughput.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.