1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE International Symposium on Low-Power Electronics and Design (ISLPED), 2010 Date:2010/05/20 吳俊雄

2 OUTLINE  INTRODUCTION  MULTI-OBJECTIVE ASIP DESIGN  Two Algorithms for Custom Instruction Synthesis 1. Mixed Integer Linear Programming 2. Simulated Annealing Method  EXPERIMENTAL RESULTS

3 INTRODUCTION  Traditional custom instruction synthesis flows for ASIPs mainly target performance improvement.  We show that the existing custom instruction exploration algorithms 1. Mixed Integer Linear Programming (MILP) 2. Simulated Annealing Method  And cost estimation methods 1. Performance improvement 2. Energy efficiency 3. Area overhead

4 INTRODUCTION  Our work presented in this paper has three major contributions 1. We address the importance of energy and resource efficiency in ASIP design 2. We discuss a set of key factors during the custom instruction selection 3. We show that traditional design space exploration algorithms are either not feasible or inefficient to estimate all the necessary factors  Since the theoretical complexity for exploring the design space thoroughly is O(2 n ), most practical techniques adopt heuristics to prune the design space during the search.  Present a holistic ASIP synthesis and simulation flow which allows the flexibility to adjust the optimization goal between energy efficiency, area overhead and performance.

5 MULTI-OBJECTIVE ASIP DESIGN  There are two major energy factors: 1. Instruction fetch consumes a considerable portion of the total energy within a processor. 2. The data communication between operations is originally implemented through register file accesses within the base processor.  The dynamic energy consumption is affected by the reduction of the number of instructions and data register file accesses.

6 MULTI-OBJECTIVE ASIP DESIGN  Custom processor 1 with CFU1 achieves better performance improvement, because it utilizes operation parallelism in the DFG to reduce the total execution cycles.  Custom processor 2 with CFU2 achieves larger energy saving, because it realizes a sub-graph covering more operations and data transfer edges.

7 MULTI-OBJECTIVE ASIP DESIGN  We show that generating custom instructions from a DFG can be viewed as solving an operation scheduling problem.  The scheduling scheme should ensure data dependency and that the input/output edges of each software stage satisfy the I/O constraint set by the register file ports.  For a scheduling scheme, the number of software stages with operations in represents the number of instructions for the customized processor. The edges across different software stages represent register file accesses.

8 Two Algorithms for Custom Instruction Synthesis  Mixed Integer Linear Programming (MILP)  Primary Variable definition: i: index of the operations, l: index of software stages.  Parameter definition: hardware execution delay k is the index of operation types. S 3, 4 =1

9 Two Algorithms for Custom Instruction Synthesis  Assistant Variable definition: execution cycle delay  Constraints: 1. data dependency constraint 2. I/O Sd 6 =0.8 i j

10 Two Algorithms for Custom Instruction Synthesis  SN:The number of instructions  SE:The total number of data accesses  For multi-issue, out-of-order processors equals to the longest execution path delay of the DFG  :The largest number of this type of operations among different software stages  :the number of functional modules (operators) of type k needed in the final custom hardware extension.

11 Two Algorithms for Custom Instruction Synthesis  :The unit hardware area of functional module type k.  energy consumption area overhead execution cycle  The advantage of applying MILP to solve the scheduling problem is that, theoretically, it can find the optimum solution to the problem with sufficient searching time.

12 Two Algorithms for Custom Instruction Synthesis  Simulated Annealing Method  Solution Vector definition: OPv = {op1, op2, op3,..., opn}  Solution variation mechanism: In each iteration, we randomly select n operations and move them to a different software stage to generate a new solution. n represents the maximum distance between current solution and the one it evolves to. t is the current temperature, T is the starting temperature and N is the total number of operations.

13 Two Algorithms for Custom Instruction Synthesis  The allowable range for certain operation to move around is determined by the location of its parent and child nodes.  In our algorithm, the actual moving range for an operation is further tightened by the current temperature - range = R * sqr(t/T ). We randomly move the operation to a software stage within this range. R=[3~8]

14 Two Algorithms for Custom Instruction Synthesis  Solution acceptance mechanism: A new solution is accepted when its cost is smaller than that of the current solution, or can be accepted with a probability of p when the new cost is larger than that of the current solution, where  Simulated Annealing algorithm balances the trade-off between the solution quality and searching time.

15 Two Algorithms for Custom Instruction Synthesis

16 MULTI-OBJECTIVE ASIP SYNTHESIS FLOW

17 EXPERIMENTAL RESULTS  CPLEX is used to solve the MILP problem for design space exploration.  The baseline processor is an out-of-order MIPS style processor.  Set the ratio between the weight variable g1 and g2 to be 12.2 : 1.  Set the register file I/O constraints to be 4/2.  We perform experiments for energy reduction and for performance improvement by setting the variable å2 and å3 at zero, and å1 and å2 at zero, respectively.

18 EXPERIMENTAL RESULTS The average speedup 1.42 for Binary Tree 1.64 for MILP (p.) 1.56 for MILP (e.) The average energy consumption reductions are 18.1%, 22.7% and 29.8%.

19 EXPERIMENTAL RESULTS  The custom instruction templates presented in (b) and (c) are targeting performance and energy efficiency, respectively. There are more operations in the templates identified for energy efficiency, shown in (c), and they include longer critical paths than the sub-graphs shown in (b).

20 EXPERIMENTAL RESULTS  For different designs, the ratio between å1 and å2 can be varied to find the best trade-off between them. å3=0, å1 = 1, å2 = 0 å1 = å2 = 0.5

21 EXPERIMENTAL RESULTS  The SA algorithm achieves an average of 1.46 performance speedup, which is a little lower than that achieved by the MILP algorithm (1.64).

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

Similar presentations

Presentation on theme: "1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

Similar presentations

Presentation on theme: "1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE."— Presentation transcript:

Similar presentations

About project

Feedback