Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Integrated Temporal Partitioning and Mapping Framework for Handling Custom Instructions on a Reconfigurable Functional Unit Farhad Mehdipour †, Hamid.

Similar presentations


Presentation on theme: "An Integrated Temporal Partitioning and Mapping Framework for Handling Custom Instructions on a Reconfigurable Functional Unit Farhad Mehdipour †, Hamid."— Presentation transcript:

1 An Integrated Temporal Partitioning and Mapping Framework for Handling Custom Instructions on a Reconfigurable Functional Unit Farhad Mehdipour †, Hamid Noori ††, Morteza Saheb Zamani †, Kazuaki Murakami ††, Mehdi Sedighi †, Koji Inoue †† †Computer and IT Engineering Department, Amirkabir University of Technology {mehdipur,szamani,msedighi}@aut.ac.ir ††Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University noori@c.csce.kyushu-u.ac.jpnoori@c.csce.kyushu-u.ac.jp, {murakami,inoue}@i.kyushu-u.ac.jp

2 ACSAC 2006 - Shanghai, China Kyushu University Agenda Introduction General overview of the architecture Generating Custom Instructions Reconfigurable Functional Unit (RFU) Tool Chain used for our quantitative approach Integrated Temporal Partitioning and Mapping  The Integrated Framework  Incremental Temporal Partitioning Algorithm  Mapping Procedure Experimental Results

3 ACSAC 2006 - Shanghai, China Kyushu University Introduction Approaches for designing embedded SoCs  Application Specific Integrated Circuits (ASICs) Higher performance Lower power consumption Not flexible Expensive and time consuming design process  General Purpose Processors (GPPs) Availability of tools Programmability Low performance High power consumption  Application Specific Instruction-set Processors (ASIPs) More flexible than ASICs Higher performance than GPPs Long and costly design and verification  Extensible Processors More flexibility significant non-recurring engineering costs

4 ACSAC 2006 - Shanghai, China Kyushu University General overview of the architecture Adaptive Dynamic Extensible Processor Base Processor Reg FileFetch Decode Execute Memory Write Augmented Hardware RFU Profiler Sequencer N-way in-order general RISC Detects start addresses of Hot Basic Blocks (HBBs) Executes Custom Instructions Switches between main processor and RFU

5 ACSAC 2006 - Shanghai, China Kyushu University Operation modes Applications Processor Profiler RFU Training Mode Sequencer Processor Profiler RFU Sequencer Running Tools for Generating Custom Instructions, Generating Configuration Data for RFU and Initializing Sequencer Table Training Mode Normal Mode Processor Profiler RFU Sequencer Monitors PC and Switches between main processor and RFU Executing CIs Applications Binary Rewritin g Profiler Binary-Level Profiling Detecting Start Address of HBBs

6 ACSAC 2006 - Shanghai, China Kyushu University Integrating base processor with other components

7 ACSAC 2006 - Shanghai, China Kyushu University Generation of Custom Instructions Custom instructions  Limited to one Hot Basic Block (HBB)  Exclude floating point, multiply, divide and load instructions  Include at most one STORE, at most one BRANCH/JUMP and all other fixed point instructions Simple algorithm for generating custom instructions  HBBs usually include 10~40 instructions for Mibench  Custom instruction generator is going to be executed on the base processor (in online training mode)

8 ACSAC 2006 - Shanghai, China Kyushu University Generating Custom Instructions 4052c0addiu $29,$29,-32 4052c8mov.d $f0,$f12 4052d0sw $18,24($29) 4052d8addu $18,$0,$6 4052e0sw $31,28($29) 4052e8sw $16,16($29) 4052f0mfc1 $16,$f0 4052f8mfc1 $17,$f1 405300srl $6,$17,0x14 405308andi $6,$6,2047 405310sltiu $2,$6,2047 405318addu $6,$6,$18 405320sltiu $2,$6,2047 405328lui $2,32783 405330and $17,$17,$2 405338andi $2,$6,2047 405340sll $2,$2,0x14 405348or $17,$17,$2 405350mtc1 $16,$f0 405358mtc1 $17,$f1 405360lw $31,28($29) 405370lw $16,16($29) 405378addiu $29,$29,32 405380jr $31 Finding the biggest sequence of instructions in the HBB that can be executed on the ACC Moving the instructions and appending supportable instructions to the head of the detected instruction sequence after checking flow-dependency and anti- dependency Moving the instructions and appending supportable instructions to the tail of the detected instruction sequence after checking flow-dependency and anti- dependency Rewriting object code if instructions have been moved Moving instructions, should not modify the logic of the application Custom instruction generation is done without considering any other constraints.

9 ACSAC 2006 - Shanghai, China Kyushu University Reconfigurable Functional Unit (RFU) RFU is a matrix of Functional Units (FUs) RFU has configuration memory FUs support only logical operations, add/subtract, shifts and compare RFU updates the PC after executing each CI RFU has variable delay which depends on depth of DFG of Custom Instructions

10 ACSAC 2006 - Shanghai, China Kyushu University RFU Architecture: A Quantitative Approach 22 programs of MiBench were chosen Simplescalar toolset was utilized for simulation RFU is a matrix of FUs  No of Inputs  No of Outputs  No of FUs  Width  Depth  Connections  Location of Inputs & Outputs Coverage (Mapping) rate: Percentage of generated CIs that can be mapped on the RFU considering constraints Considering frequency and weight in measurement  CI Execution Frequency  Weight (To equal number of executed instructions)  Average = for all CIs (ΣFreq*Weight)

11 ACSAC 2006 - Shanghai, China Kyushu University Tool Chain

12 ACSAC 2006 - Shanghai, China Kyushu University RFU Inputs (no constraint) 96.37 89.3798.48 8

13 ACSAC 2006 - Shanghai, China Kyushu University RFU Outputs (no constraint) 6 96.58

14 ACSAC 2006 - Shanghai, China Kyushu University RFU Architecture Distributing Inputs in different rows  Row1 = 7  Row 2 = 2  Row 3 = 2  Row 4 = 2  Row 5 = 1 Connections with Variable Length  row1  row3 = 1  row1  row4 = 1  row1  row5 = 1  row2  row4 = 1 Synthesis results using Hitachi 0.18 μm Area : 1.1534 mm 2 Delay : 9.66 ns

15 ACSAC 2006 - Shanghai, China Kyushu University Generating Custom Instruction for the Target RFU In our primary CI generator we did not consider any constraints for the generated CIs and tried to generate CIs as large as possible. Therefore, some of the generated CIs could not be mapped on the proposed RFU due to its constraints after fixing the architecture.

16 ACSAC 2006 - Shanghai, China Kyushu University Customizing CI generator for the Target RFU – First Approach (CIGen) Some primary constraints of the RFU (number of inputs, number of outputs and number of nodes) were added to our CI generator tool to generate CIs that are mappable. In this approach the CI generator is unaware of the mapping process results Some of CIs may not be ultimately mapped to the RFU due to the routing and connection constraints

17 ACSAC 2006 - Shanghai, China Kyushu University Customizing CI generator for the Target RFU – Second Approach Integrated Framework  Performs an integrated temporal partitioning and mapping process  Takes rejected CIs as input  Partitions them to appropriate mappable CIs Advantages  All generated CIs are mappable  Using a mapping-aware temporal partitioning process

18 ACSAC 2006 - Shanghai, China Kyushu University Integrated Framework- Incremental Temporal Partitioning Algorithm Incremental Temporal Partitioning  The node with the highest ASAP level is selected and moved to the subsequent partition. Nodes selection and moving order: 15, 13, 11, 9, 14, 12, 10, 8, 3 and 7.

19 ACSAC 2006 - Shanghai, China Kyushu University Mapping Custom Instructions Mapping is the same as the well-known placement problem:  Determining the appropriate positions for DFG nodes on the RFU. Assigning CI instructions to FUs is done based on the priority of the nodes.

20 ACSAC 2006 - Shanghai, China Kyushu University An Example: Mapping of a CI on the RFU

21 ACSAC 2006 - Shanghai, China Kyushu University Customizing Mapping Tool Spiral shaped mapping is possible thanks to the horizontal connections in the third and fourth rows of RFU

22 ACSAC 2006 - Shanghai, China Kyushu University CIs length for Mibench applications

23 ACSAC 2006 - Shanghai, China Kyushu University Percentage of rejected CIs for CIGen

24 ACSAC 2006 - Shanghai, China Kyushu University Initial and final number of partitions

25 ACSAC 2006 - Shanghai, China Kyushu University Maximum critical path length for CIs

26 ACSAC 2006 - Shanghai, China Kyushu University Performance Evaluation issue1-way L1- I cache32K, 2 way, 1 cycle latency L1- D cache32K, 4 way, 1 cycle latency Unified L21M, 6 cycle latency Execution units1 integer, 1 floating point RUU size64 Fetch queue size64 Simplescalar was configured to behave as a MIPS324K processor. The base processor supports MIPS instruction set. 22 applications of Mibench

27 ACSAC 2006 - Shanghai, China Kyushu University Delay of RFU according to CI length CI LengthRFU Delay (ns) 11.38 22.28 33.12 44.89 56.47 67.57 78.65 89.66 Synopsys Tools + Hitachi 0.18μm

28 ACSAC 2006 - Shanghai, China Kyushu University Speedup

29 ACSAC 2006 - Shanghai, China Kyushu University Conclusions Proposing a reconfigurable functional unit for an Adaptive Dynamic Extensible Processor using a quantitative approach. Developing an integrated framework for partitioning and mapping custom instructions for the proposed RFU.

30 ACSAC 2006 - Shanghai, China Kyushu University Thank you for your attention.


Download ppt "An Integrated Temporal Partitioning and Mapping Framework for Handling Custom Instructions on a Reconfigurable Functional Unit Farhad Mehdipour †, Hamid."

Similar presentations


Ads by Google