Presentation is loading. Please wait.

Presentation is loading. Please wait.

A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.

Similar presentations


Presentation on theme: "A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section."— Presentation transcript:

1 A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section of Electronics and Computers, Department of Physics, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece nivas@skiathos.physics.auth.gr Algarve, Portugal February 22-23, 2005

2 2 Outline Motivations Proposed Architecture Software Development Environment DemonstrationResultsConclusions

3 3Motivations Quest for Performance and Flexibility Large portion of computational complexity is concentrated in small kernels covering small parts of overall code Performance Improved by Accelerating these kernels Performance Improved by Accelerating these kernels Many Algorithms Show a relevant Instruction Level Parallelism (ILP) Performance Improved by parallel execution Performance Improved by parallel execution Traditional Processors have computation clock slack Performance Improved by chaining of operations (Spatial Computation) Performance Improved by chaining of operations (Spatial Computation) Extending Embedded Processors With Application Specific Function Units Reconfigurable Instruction Set Processors for Performance with Maximum Flexibility

4 4 Proposed Architecture Reconfigurable Instruction Set Processor (RISP) Core Processor 32-bit load/store RISC architecture 32-bit load/store RISC architecture 5 Pipeline Stages 5 Pipeline Stages Single Issue Elaboration Single Issue Elaboration Reconfigurable Logic Coupling Reconfigurable Function Unit (RFU) approach Reconfigurable Function Unit (RFU) approach => Low Communication Overhead Tightly Coupled => RFU Fits in two RISC pipeline stages Tightly Coupled => RFU Fits in two RISC pipeline stages => Better Utilization of the Pipeline Stages RFU 1-D Array of Coarse Grain Processing Elements (PEs) 1-D Array of Coarse Grain Processing Elements (PEs) PE Functionality Configurable at Design Time to meet Application requirements PE Functionality Configurable at Design Time to meet Application requirements Exploits Instruction Level Parallelism – Spatial & Temporal Computation Exploits Instruction Level Parallelism – Spatial & Temporal Computation

5 5 Proposed Architecture Core Processor Commonly Used Function Units Commonly Used Function Units Control Logic Properly Extended to Handle Reconfigurable Instructions Control Logic Properly Extended to Handle Reconfigurable Instructions 4-Read-1-Write Register File 4-Read-1-Write Register File Core / RFU Interface Receives & Delivers Control and Data Signals Receives & Delivers Control and Data Signals Tightly Coupled RFU Configuration-Processing- Interconnection Layers Configuration-Processing- Interconnection Layers Operates & Delivers Results in two Concurrent Pipeline Stages Operates & Delivers Results in two Concurrent Pipeline Stages

6 6 Standard And Reconfigurable Instructions Re=‘0’ => Standard Instruction Control Logic : Configure Core Datapath Control Logic : Configure Core Datapath Operands : Source1-2 & Destination Operands : Source1-2 & Destination ReOpCode = “nop” ReOpCode = “nop” Re=‘1’ => Reconfigurable Instruction Control Logic : Configure Interface Control Logic : Configure Interface Operands : Source1-4 & Destination Operands : Source1-4 & Destination ReOpCode = “OpCode” ReOpCode = “OpCode” Three Types of Reconfigurable Instructions Complex Computational Operations Complex Computational Operations Complex Addressing Modes Complex Addressing Modes Complex Control Flow Operations Complex Control Flow Operations Each Instruction can be multicycle 32-Bit Instruction Word Format

7 7 Reconfigurable Function Unit (RFU) Embedded RFU for Dynamic Extension of the Instruction Set Executes Multiple-Input-Single-Output (MISO) Reconfigurable Instructions 1-D Array of Coarse Grain Reconfigurable Blocks Comprised of Three Layers Processing Layer Processing Layer Interconnection Layer Interconnection Layer Configuration Layer Configuration Layer

8 8 RFU-Processing Layer PE Basic Structure Configurable PE functionality for the targeted application Unregistered Output => Spatial Computation Register Output => Temporal Computation Floating PEs => Can operate in both core pipeline stages on demand Local Memory for Read Only Values Execute Long Chains of Operation in one processor cycle

9 9 RFU-Interconnection Layer 1-D Array of PEs Operands from Register File Constant Values from Local Memory Input Network Operand Select Output Network => Delivers Results to corresponding pipeline stages

10 10 RFU-Configuration Layer Configuration Bits Local Storage Structure Multi-Context Configuration Layer Coarse Grain => Small Number of Configuration Bits => Negligible Overhead to Download new Contexts

11 11 Architecture Synthesis & Evaluation A Hardware Model (VHDL) was Designed for Evaluation Purposes Configuration Value Granularity 32-bits Number of Processing Elements 8 Processing Elements Functionality ALU, Shifter, Multiplier Configuration Contexts 16 words of 134 bits Local Memory Size 8 constants of 32-bits Number of Provided Local Operands 4 ComponentArea (mm 2 ) Processor Core0.134 RFU Processing Layer0.186 RFU Interconnection Layer0.125 RFU Configuration Layer0.137 RFU Total0.448 The Model was Synthesized with STM 0.13um Process The RFU Area Overhead is 3.3x the Area of the Core Processor No Caches were taken into account No Overhead to Core Critical Path

12 12 Software Development Environment

13 13 Demonstration-RFU Elaboration Largest MaxMISO for a Quantization Kernel Execution on the Core => six cycles Execution on the Core+RFU => one cycle Performance Improvements Reduced Instruction Memory Accesses

14 14 Results CRCFIRFFTQUANTVLC 1.6x1.8x2.8x1.9x1.7x Energy Consumption Dominated by Memory Accesses Speed-Ups for Several Kernels – Core Vs. Core+RFU

15 15 Conclusions A RISC Processor Enhanced by a Run-Time Reconfigurable Function Unit 1-D Reconfigurable Array of Coarse Grain Processing Elements Multiple-Input-Single-Output Reconfigurable Instructions Specific Software Development Environment Low Cost Performance and Energy Consumption Improvements Next Step => Expand to VLIW Elaboration to Boost Achieved Speed-Ups


Download ppt "A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section."

Similar presentations


Ads by Google