Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,

Similar presentations


Presentation on theme: "A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,"— Presentation transcript:

1 A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma, A. La Rosa, L. Lavagno, C. Passerone, R.Canegallo Nice, France April 22, 2003

2 Outline Motivations XiRisc: a VLIW Processor PiCoGA: A Pipelined Configurable Gate Array Software Development Environment Results & Measurements Conclusions

3 Motivations Increased on-chip Transistor density Increased Integration costs Strong limitations in power supply Severe power consumption constraints Millions of transistors/Chip 199719992001200320052007 0 400 200 300 100 2009 Technology (nm) Increased Algorithmic complexity Quest for performance and flexibility 1997199920012003200520072009 Algorithm complexity Moore’s law Battery capacity

4 Embedded systems Algorithms analysis 90% of computational complexity is concentrated in small kernels covering small parts of overall code Many algorithms show a relevant instruction-level parallelism Performance improved by multiple parallel data paths Operand granularity is typically different from 32-bit Traditional ALU is power-inefficient Significant improvements can be obtained extending embedded processors with application-specific function units Reconfigurable computing to achieve maximum flexibility

5 Existing Architectures Standard processor coupled with embedded programmable logic where application specific functions are dynamically remapped depending on the performed algorithm 1: Coprocessor model 2: Function unit model

6  32-bit load/store Risc architecture (5 stages pipeline) Concurrent fetch and execution of two 32-bit instructions per cycle  VLIW Elaboration:  Set of specialized function units implementing DSP-specific operations EXTENDED INSTRUCTION SET RISC ARCHITECTURE  Function unit approach: Reconfigurable device fits in a classical RISC pipeline: Low communication overhead Exploits very high resource parallelism

7 Architecture Duplicated instruction decode logic (2 simmetrical data- channels) Duplicated commonly used function Units (Alu and Shifter) All others function units are shared (DSP operations, Memory handler) A tightly coupled pipelined configurable Gate Array

8 Dynamic Instruction Set Extension configuration specification region specification pGA-load Specific operation to transfer data from a configuration cache to the PiCoGA: 32-bit and 64-bit operation to launch the execution inside the PiCoGA (Data exchange through register file): operation specification 32-bit pGA-op Source 1Source 2 Dest 1Dest 2 64-bit pGA-op Source 1 Source 2 operation specification Dest 1Dest 2 Source 3Source 4

9 PiCoGA: a Pipelined Configurable Gate Array Two-dimensional array of LUT-based Reconfigurable Logic Cells Each row implements a possible stage of a customized pipeline, independent and concurrent with the processor Up to 4x32-bit input data and up to 2x32-bit output data from/to register File  Embedded function unit for dynamic extension of the Instruction Set PiCoGA

10 DFG-based elaboration Row elaboration is activated by an embedded control unit Execution enable signal for of each pipeline stage PiCoGA operation latency is dependent on the operation performed

11 Configuration Cache PiCoGA PiCoGA Configuration Goal: to reduce cache misses due to PiCoGA configuration Multi-context programming (4 cache layers/planes inside the array) Dedicated Configuration Cache with high bandwith bus to the PiCoGA (192 bits) Partial Run-Time Reconfiguration (A region is configured while another one is active) Configuration is completely concurrent with processor elaboration Layer4 Layer3 Layer2 Layer1

12 PiCoGA mapping The Software Development Environment Inititial C code Profiling Computation kernel extraction 100010100001 100101001010 110110010010 100101110101 101001011101 101001010110 111111111101 Executable code Latency information Assembler Level Scheduler pGA-op

13 Software Simulation Goals: check the correctness of the algorithm and evaluate performances In the source code pGA-op is described using a pragma directive: #pragma pGA shift_add 0x12 5 c a b c = ( a << 2 ) + b #pragma end /**************************************/ /* Shift_add mapped on PiCoGA */ /**************************************/ #if defined(PiCoGA)... asm(“pGA-op 0x12...”)... /*************************************/ /* Emulation function _shift_add */ /************************************/ #else void _shift_add(){... c = ( a << 2 ) + b... } #endif

14 Sofware Simulation Two special instructions are defined to support emulation:... topga... jal _shft_add fmpga...... topga saves current state and passes arguments to emulation function. Function clock cycle count is halted fmpga copies emulation function result(s) and restores registers; cycle count is incremented with the latency value of the pGA-op Evaluation of overall performances by counting elaboration cycles

15 Results and Measurements Normalized Energy Histogram Speed-ups for several signal processing cores: 75% of energy consumption for a VLIW architecture is due to accesses to instruction and data memory Strong reduction of accesses to instruction memory DESCRC Median Filter Motion Estimation Motion Prediction Turbo Codes 13.5x4.3x7.7x12.4x4.5x12x

16 Conclusions XiRisc: VLIW Risc architecture enhanced by run-time reconfigurable function unit PiCoGA: pipelined, runtime configurable, row-oriented array of LUT-based cells Specific software development toolchain Speedups range from 4.3x to 13.5x Up to 93% energy consumption reduction


Download ppt "A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,"

Similar presentations


Ads by Google