Introduction to: Reconfigurable Hardware Shervin Vakili December 22, 2007 All materials are copyrights of their respective authors as.

Introduction to: Reconfigurable Hardware Shervin Vakili shervinv@gmail.com December 22, 2007 All materials are copyrights of their respective authors as listed in references

Reconfigurable Hardware Reconfigurable computing refers to systems incorporating some form of hardware programmability, that customizes how the hardware is used using a number of physical control points. These control points can be changed periodically in order to execute different applications using the same hardware. Since, the inconsistent requirements of modern applications for both flexibility and implementation efficiency, cannot be satisfied by conventional instruction-set processors and application-specific circuits, reconfigurable hardware offers a good balance between implementation efficiency and flexibility.

Reconfigurable Hardware (cont’d) This is because the reconfigurable hardware combines post- fabrication programmability with the parallel computation style of application specific circuits, which is more efficient in comparison to the sequential computation style of instruction-set processors. There are additional reasons for using reconfigurable resources in System-on-Chip (SoC) design. The increasing non-recurring engineering (NRE) costs push designers to use the same SoC in several applications and products for achieving low cost per chip. The presence of reconfigurable resources allows the fine tuning of the chip for different products or product variations.

Reconfigurable Hardware (cont’d) Also, the increasing complexity in future designs adds the possibility of using design flows, which can require costly and slow redesign of the chip. In this way:  Reconfigurable elements are often homogenous arrays, which can be pre-verified to minimize the possibility of design errors.  Post-manufacturing programmability of reconfigurable elements allows correction of problems.

Reconfigurable Hardware (cont’d) [6]

Types of Reconfiguration Logic reconfiguration. Instruction-set reconfiguration. Static reconfiguration or dynamic reconfiguration. Full or partial reconfiguration. Fine-grained, medium-grained and coarse grained reconfiguration.

Logic Reconfiguration A typical block for logic reconfiguration contains a look-up table (LUT), an optional D flip-flop (latch) and additional combinational logic. The LUT allows any logic function to be implemented, providing generic logic. The latch can be used for pipelining reasons, registers for holding logic values or any other situation where clocking is required. The additional combinational logic is usually ‘ carry logic ’ used to speed up carry-based computations (e.g. additions). In addition to operating as a function generator, each LUT can provide RAM functionality. Furthermore, two or more logic blocks can be combined to implement more complex functions.

Logic Reconfiguration (cont’d) Example of basic logic block (Xilinx Virtex FPGA): Each FPGA slice contains two basic reconfigurable logic blocks. The 4-bit look-up table (LUT) is implemented with a multiplexer whose select lines are the inputs of the LUT and whose inputs are constants.

Instruction-Set Reconfiguration The concept of instruction-set reconfiguration refers to architectures consisting of microprocessor and reconfigurable logic. The key benefit is the combination of software flexibility with hardware efficiency. One promising approach is the use of reconfigurable instruction-set processors (RISP), which have the capability to adapt their instruction set to the application being executed through a reconfiguration in their hardware. Through the adaptation, specialized hardware accelerates the execution of the application. By moving the execution of some application tasks to the reconfigurable part of the processor, a remarkable improvement in performance can be achieved.

Instruction-Set Reconfiguration (cont’d) One important issue is the type of interface between the microprocessor and the reconfigurable logic. Option 1 : Use of a reconfigurable functional unit (RFU) inside the processor. The instruction decoder issues instructions to the RFU as it is one of the functional units of the processor. The communication cost is very small and the speed improvement is significant. Option 2 : The reconfigurable logic is placed next to the processor (operating as a co-processor). Communication is performed by using a protocol.

Static Reconfiguration Static reconfiguration (often referred as compile-time reconfiguration) is the simplest and most common approach for implementing applications with reconfigurable logic. It involves hardware changes at a relatively slow rate, and consists of a single system-wide configuration. Prior the execution of an application, the reconfigurable resources are loaded with their respective configurations, and during the execution of the operation, the reconfigurable resources will remain in the same configurations (i.e. remain static) throughout the end of application execution. Advantages: Higher performance than pure software implementation, lower cost than specific hardware.

Static Reconfiguration (cont’d) In order to reconfigure a statically reconfigurable architecture, the system has to be halted while the reconfiguration is in progress and then restarted with the new configuration. Traditional FPGA architectures are primarily statically programmed devices, allowing only one configuration to be loaded at a time. This type of FPGAs is programmed using a serial stream of configuration information (stored in an SRAM), requiring a full reconfiguration if any change is needed.

Dynamic Reconfiguration Whereas static reconfiguration allocates logic for the duration of an application, dynamic reconfiguration (often referred as run-time reconfiguration) uses a dynamic allocation scheme that re-allocates hardware at run time (i.e. during execution of the application). The physical hardware is smaller than the sum of required resources. With dynamic reconfiguration we swap the number of configurations in and out of the actual hardware, as they are needed. Problems: Divide the algorithms into time-exclusive segments that do not need to run concurrently and manage the transmission of intermediate results from one configuration to the next.

Dynamic Reconfiguration (cont’d) Advantages: The benefits of static reconfiguration are remained, and we can achieve an efficient trade-off between time and space (cost). There are two different configuration memory styles that can be used with dynamic reconfigurable systems. Single context device is a serially programmed device that requires a complete reconfiguration in order to change any of the programming bits. Multi-context device has multiple layers of programming bits, each of which can be active at a different point in time.

Dynamic Reconfiguration (cont’d) In order to implement run-time reconfiguration onto a single context device (FPGA), the different full configurations must be grouped into layers within the configuration memory, and each layer is swapped in and out of the FPGA as needed. Although, in single context devices, the reconfiguration of the hardware is simple, there is a high-overhead when only a small part of the configuration memory needs to be changed. Because in such devices only full reconfigurations are allowed, a good partitioning of the different configurations between layers is essential.

Dynamic Reconfiguration (cont’d) Multi-context architectures include multiple memory bits for each programming bit location. One layer of configuration information can be active at a given moment, but the device can quickly switch between different layers (contexts) of already- programmed configurations. However, this method requires more area than single context structures, since there must be many storage units per programming location.

Partial Reconfiguration In some cases, configurations do not occupy the full reconfigurable hardware, or only a part of a configuration requires modification. In both of these situations, a partial reconfiguration of the reconfigurable resources is desired, rather than the full reconfiguration supported by the serial architectures (programmed using serial streams of reconfiguration information). Partially reconfigurable architectures use addresses (like a RAM device) to specify the target location of the configuration data, allowing the selective reconfiguration of the reconfigurable recourses.

Partial Reconfiguration (cont’d) The undisturbed portions of the reconfigurable resources may continue execution, allowing the overlap of computation (execution) with reconfiguration. Attention is required in order to manage the transmission of data between the unchanged and changed portions of the reconfigurable resources. Partially, run-time reconfigurable architectures can allow complete reconfiguration flexibility (Xilinx 6200) or may require a full array column to be reconfigured at once (Xilinx Virtex).

Five Ways to Design & Implement Custom Logic Hardwired Implementation Processor + Software  Traditional Embedded Processor or DSP  Homogeneous Multiprocessor MIT Raw  Heterogeneous Multiprocessor QuickSilver ACM Configurable Processor + Software  Tensilica Xtensa  PDI VUPU Reconfigurable Hardware  NEC DRP Reconfigurable Processor + Software  IP Flex DAP/DNA  Stretch

A Taxonomy [2] - HW vs. SW, Configurability, and Reconfigurability

Reconfigurable Hardware vs. Reconfigurable Processor

Architectural model Characterization The Systolic Ring Architectural model  Based on a coarse- grained configurable PE  Circular datapaths  3 parameters C: # of layers N: # of Dnodes per layer # of layers : 4 (C = 4) # of Dnode per layer : 2 (N = 2) Dnode Switch layer 1 layer 2 layer 3 layer 4 The Systolic Ring Architectural model  Based on a coarse- grained configurable PE  Circular datapaths  3 parameters C: # of layers N: # of Dnodes per layer S: # of Rings # of layers : 4 (C = 4) # of Dnode per layer : 2 (N = 2) 4 Systolic Ring (S = 4) [6]

MorphoSys Project MorphoSys project at the University of California at Irvine Goal: design and build a processor with an accompanying reconfigurable circuit chip which is tolerated to operate much slower than the processor. Targeted at image processing applications. It consists of  a control processor with I-cache/D-cache,  a reconfigurable array with an associated control memory,  a data buffer (usually acting as a frame buffer),  and a DMA controller.

MorphoSys Project [6] MorphoSys platform contains a Tiny RISC processor that is a 4- stage pipeline, MIPS-like RISC machine with 16 32-bit registers, 32-bit ALU/shift unit and on-chip data cache memory The reconfigurable array consists of an 8x8 matrix of Reconfigurable Cells (RC). Each RC comprises an ALU- Multiplier, a shift unit, input multiplexers, and a register file with five 16-bit registers. The array is based on a coarse- grained architecture, that allows dynamic reconfiguration.

Operative Density Definition N PE : # of PE A: Core Area (relative unit ²) Characterizes  Fixed N PE # of operators per relative area unit  Variable N PE OD as a function of N PE  A(N PE ) = N PE *A PE +A interconnect (N PE )+A memory (N PE ) A sequencer (N PE ) OD(N PE ) = k  A(N PE ) =k.N PE  the architectural model is scalable [6]

Operative Density NameType Area(M ²) ARDOISE Fine Grain RA 2612300 0.2 Systolic Ring (S=1, C=6, N=2) Coarse Grain RA 24500 4.8 Systolic Ring (S=1, C=16, N=4) Coarse Grain RA 1287600 1.7 DART Coarse Grain RA 24300 8.0 MorphoSys Coarse Grain RA 1285500 2.3 TMS320C62 DSP VLIW812300 0.1 NameType N PE Area(M ²) OD (N PE ) ARDOISE Fine Grain RA 2612300 0.2 Systolic Ring (S=1, C=6, N=2) Coarse Grain RA 24500 4.8 Systolic Ring (S=1, C=16, N=4) Coarse Grain RA 1287600 1.7 DART Coarse Grain RA 24300 8.0 MorphoSys Coarse Grain RA 1285500 2.3 TMS320C62 DSP VLIW812300 0.1

Reconfigurable Instruction Cell Array A project in Edinburgh university The architecture is a dynamic reconfigurable fabric which has coarse grained heterogeneous functional units (cells) connected to each other through a reconfigurable interconnect structure. The functional units support primitive operators that can perform addition/subtraction, multiplication, logic, multiplex, shift and register operations. Additional functional units (cells) are provided to handle control/branch operations. There are three major optimization techniques for increasing the throughput. These are classified as Loop Unrolling, Split Computation and Multi Sampling [1].

Reconfigurable Instruction Cell Array Arranging the cells are important for efficiency Evolutionary algorithms are used for finding the best placement [3]. For example a 1-D and 2-D DCT is developed. Three methods was used for placement. Results shows that 41 cells are needed in 5*9 array [4].

References [1] Y. Ying, I.Nousias, M. Milward, S. Khawam and T. Arslan, “System-level scheduling on Instruction Cell-Based Reconfigurable Systems,” Proc. Design, Automation and Test in Europe, Mar. 2006 [2] J. Becker, A. Alsolaim, J. Starzyk, and M. Glesner, “A parallel dynamically reconfigurable architecture designed for application-specific hardware/software systems in future mobile communication,” Journal of Supercomputing, Kluwer Academic Publishers, Oct. 2000 [3] W. Fung, T. Arslan, S. Khawam, “Genetic Algorithm based Engine for Domain-Specific Reconfigurable Arrays,” Proc. 1st NASA/ESA Conference on Adaptive Hardware and Systems, pp. 200- 206, Jun. 2006. [4] W. Fung1, T. Arslan, S. Khawam, “A hybrid engine for the placement of domain-specific reconfigurable arrays,” Second NASA/ESA Conference on Adaptive Hardware and Systems, 2007. [5] L. Bisdounis, “Reconfigurable Systems”, Presentation [Online], Available on-line: http://support.inf.uth.gr/courses/CE654/07b%20Reconfigurable%20systems.pdf http://support.inf.uth.gr/courses/CE654/07b%20Reconfigurable%20systems.pdf [6] “Reconfigurable computation and communication architectures, ” Available on-line: http://vada.skku.ac.kr/ClassInfo/system_level_design/sdr_slides/lec5-reconfigurable-architecture.ppt http://vada.skku.ac.kr/ClassInfo/system_level_design/sdr_slides/lec5-reconfigurable-architecture.ppt

Introduction to: Reconfigurable Hardware Shervin Vakili December 22, 2007 All materials are copyrights of their respective authors as.

Similar presentations

Presentation on theme: "Introduction to: Reconfigurable Hardware Shervin Vakili December 22, 2007 All materials are copyrights of their respective authors as."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to: Reconfigurable Hardware Shervin Vakili December 22, 2007 All materials are copyrights of their respective authors as.

Similar presentations

Presentation on theme: "Introduction to: Reconfigurable Hardware Shervin Vakili December 22, 2007 All materials are copyrights of their respective authors as."— Presentation transcript:

Similar presentations

About project

Feedback