Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adding custom instructions to Simplescalar/GCC architecture Somasundaram.

Similar presentations


Presentation on theme: "Adding custom instructions to Simplescalar/GCC architecture Somasundaram."— Presentation transcript:

1 Adding custom instructions to Simplescalar/GCC architecture Somasundaram

2 Agenda  Motivation  GCC overall architecture  Simplescalar architecture  Adding a custom instruction  Conclusion  Motivation  GCC overall architecture  Simplescalar architecture  Adding a custom instruction  Conclusion

3 Motivation  Extensible processors What regular ISA instructions can be combined?What regular ISA instructions can be combined? Which regular ISA instructions are to be combined into a CFU instruction?Which regular ISA instructions are to be combined into a CFU instruction? Retarget the compiler to produce optimised code with CFU instructionsRetarget the compiler to produce optimised code with CFU instructions Simulate the extended processor with CFU instructionsSimulate the extended processor with CFU instructions

4 GNU Compiler Collection  Many front-ends C FortranFortran C++/Java/AdaC++/Java/Ada  Backend targeted at many processors x86, Alpha, Sparcx86, Alpha, Sparc ARC, ARM, MIPS...ARC, ARM, MIPS...

5 GCC Compiler Flow RTL? Are we interested in everything? Combine small RISC ISA like patterns into bigger CISC ISA like patterns

6 GCC – Low Level Optimisation  Uses Lisp like RTL as IR Example: Tip: use –da compiler option to get the IR output (insn (set (reg/v:SI 36) (mult:SI (reg:SI 42) (mult:SI (reg:SI 42) (reg:SI 41))) 41 {mulsi3} (nil) (reg:SI 41))) 41 {mulsi3} (nil) (nil)) (nil)) (call_insn (parallel[ (set (reg:SI 0 r0) (set (reg:SI 0 r0) (call (mem:SI (symbol_ref:SI ("printf")) 0) (call (mem:SI (symbol_ref:SI ("printf")) 0) (const_int 0 [0x0]))) (const_int 0 [0x0]))) (clobber (reg:SI 14 lr)) (clobber (reg:SI 14 lr)) ] ) -1 (nil) ] ) -1 (nil) (nil) (nil) (expr_list (use (reg:SI 1 r1)) (expr_list (use (reg:SI 1 r1)) (expr_list (use (reg:SI 0 r0)) (expr_list (use (reg:SI 0 r0)) (nil)))) (nil))))

7 GCC - Target Machine Description  Use a similar language in md [machine description] file (define_insn "mulsi3" [(set (match_operand:SI 0 "s_register_operand" "=&r,&r") [(set (match_operand:SI 0 "s_register_operand" "=&r,&r") (mult:SI (match_operand:SI 2 "s_register_operand" "r,r") (match_operand:SI 1 "s_register_operand" "%?r,0")))] (match_operand:SI 1 "s_register_operand" "%?r,0")))] "" "" "mul%?\\t%0, %2, %1" "mul%?\\t%0, %2, %1" [(set_attr "type" "mult")])

8 GCC Combine Phase  Combines some standard IR pattern into a single user-defined IR pattern  User-defined IR patterns are defined in the target.md file  Operand constraints should be satisfied  Example: MAC (Multiply-Accumulate) Merge mulsi3 and addsi3  mulsi3addsi Merge mulsi3 and addsi3  mulsi3addsi

9 GCC Combine Phase  How is it done? Let us assume that the following patterns are defined in the machine description addsi3  Matches C=A+B (all 32-bit regs) mulsi3  Matches C=A*B (all 32-bit regs) mulsi3addsi  Matches D=A*B+C (all 32-bit regs) mulsi4addsi  Matches E=A*B+C*D (all 32-bit regs)

10 GCC Combine Phase Assume this DDG sub-graph

11 GCC Combine Phase Try 55,45:No matching pattern Try 55,47:No matching pattern Try 55,53:We have a match

12 GCC Combine phase Try 55,52:No matching pattern Try 55,50:No matching pattern Try 55,45:No matching pattern Try 55,47:No matching pattern Try 55,52,50:No matching pattern Try 55,52,45:No matching pattern Try 55,52,47:No matching pattern Try 55,50,45:No matching pattern Try 55,50,47:No matching pattern Try 55,47,45:No matching pattern Cannot try to combine more than 3 patterns! Hence, stop!

13 GCC Combine phase: Summary  Can combine upto 3 instructions together  Can recursively combine more instructions  Deletes a smaller instruction once combined  Always works on a function

14 Retargetting GCC for CFU  Build a better Combiner phase Write a new combiner with better pattern merger which works on inputs from RTLWrite a new combiner with better pattern merger which works on inputs from RTL Replace existing combiner with this combinerReplace existing combiner with this combiner  New patterns for the CFU instruction in the target.md file  Changes in GAS (included in binutils package) to generate insn. word

15 SimpleScalar is  Instruction Set simulator  Profiles programs  Simulates micro-architectural features  Different levels of speed of simulation Vs accuracy trade-off  Written in C  Easily retargettable

16 Simplescalar: CFU issues  More arguments than used by RISC instructions Out-of-order execution needs to take care of the increase in dependenciesOut-of-order execution needs to take care of the increase in dependencies  New instructions in decode tree Easy to add new instructions to the decode tree (machine.def)Easy to add new instructions to the decode tree (machine.def)

17 Let us add a new instruction  Achieve the operation E=A*B+C*D using one instruction  4 input operands and 1 output operand  Extension to ARM ISA  Provide CompilerCompiler AssemblerAssembler SimulatorSimulator

18 Pattern for the instruction  gcc/config/arm/arm.md (define_insn "*mulsi4addsi" [(set (match_operand:SI 0 "s_register_operand" "=r") [(set (match_operand:SI 0 "s_register_operand" "=r")(plus:SI (mult:SI (match_operand:SI 2 "s_register_operand" "r") (mult:SI (match_operand:SI 2 "s_register_operand" "r") (match_operand:SI 1 "s_register_operand" "r")) (match_operand:SI 1 "s_register_operand" "r")) (mult:SI (match_operand:SI 4 "s_register_operand" "r") (mult:SI (match_operand:SI 4 "s_register_operand" "r") (match_operand:SI 3 "s_register_operand" "r"))))] (match_operand:SI 3 "s_register_operand" "r"))))] "" "" "ml2a%?\\t%0, %2, %1, %4, %3" "ml2a%?\\t%0, %2, %1, %4, %3" [(set_attr "type" "mult")])

19 Simplescalar changes  Instruction Decode Tree Chain of decoders: Each looking at a set of bitsChain of decoders: Each looking at a set of bits  target-arm/arm.def New chain of decoder macros for CFU class of instructionsNew chain of decoder macros for CFU class of instructions Increase the number of input dependencies in all the instructio macros from 5 to 6 (predication in ARM)Increase the number of input dependencies in all the instructio macros from 5 to 6 (predication in ARM)

20 Simplescalar changes  sim-outorder.c Increase the number of input dependencies to be monitored in the reservation unitIncrease the number of input dependencies to be monitored in the reservation unit Both macros and code has to be changedBoth macros and code has to be changed  Other files need to be changed for the same purpose  Compile ‘test program’ and verify!

21 Summary  Identify the ways to add new instructions to Simplescalar and GCC  Determine the capabilities of the current combiner in GCC  Demonstrate the addition of a new custom instruction  Understand GCC to some extent!


Download ppt "Adding custom instructions to Simplescalar/GCC architecture Somasundaram."

Similar presentations


Ads by Google