Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers Stephen Hines, David Whalley and Gary Tyson Computer Science Dept.

Similar presentations


Presentation on theme: "Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers Stephen Hines, David Whalley and Gary Tyson Computer Science Dept."— Presentation transcript:

1 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers Stephen Hines, David Whalley and Gary Tyson Computer Science Dept. Florida State University October 23, 2006

2 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 2/17 Instruction Packing Store frequently occurring instructions as specified by the compiler in a small, low- power Instruction Register File (IRF) Allow multiple instruction fetches from the IRF by packing instruction references together  Tightly packed – multiple IRF references  Loosely packed – piggybacks an IRF reference onto an existing instruction Facilitate parameterization of some instructions using an Immediate Table (IMM)

3 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 3/17 Instruction Cache PC IF/ID IRF IMM IRWP packed instruction insn1 insn2 imm3 insn3 imm3 insn4 Execution of IRF Instructions Instruction Fetch StageFirst Half of Instruction Decode Stage To Instruction Decoder Executing a Tightly Packed Param4c Instruction

4 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 4/17 Outline Introduction Improved Promotion to the IRF Compiler Optimizations  Instruction Selection  Register Re-assignment  Instruction Scheduling Experimental Evaluation Conclusions & Future Work

5 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 5/17 Improved Promotion to the IRF Different classes of instructions can consume 1 – 5 slots More accurately model the benefits of promoting from one class of instruction to another  Original IRF papers did not promote multiple I-type instructions with different default immediate values  addi $3, $3, 4 and addi $3, $3, 1 would not both reside in the IRF, no matter how frequently they occurred

6 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 6/17 Mixed Profiling Static profiling is best for decreasing code size Dynamic profiling is best for reducing energy consumption Can simultaneously weight static and dynamic profile data to obtain a mixed result that has both good code compression and reduced energy consumption Can obtain most of the benefits of individual static/dynamic profiling

7 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 7/17 Compiler Optimizations Instruction Selection  Choose beneficial encodings for increasing redundancy Register Re-assignment  Attempts to rename registers such that instructions can be accessed via IRF Instruction Scheduling  Intra-block – focus on reordering instructions so that dense packs are formed (both tight and loose)  Inter-block – attempt to move instructions between blocks to fill up packs ending with branches/jumps Code duplication Predication

8 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 8/17 Intra-block Instruction Scheduling 31 2 4 5 Instruction Dependence DAG 12344’51234 5 2 5 1 54 Without Instruction Scheduling With Instruction Scheduling

9 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 9/17 a c Code Duplication to Reduce Code Size 5 b 5’ 2 W XY Z 11 33’44’ 33’44’ 6 slots is too many to fit in a single packed instruction … but we can duplicate a single instruction … resulting in the ability to pack the remaining 5 slots together.

10 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 10/17 a Predication – Forward Branches Cond Branch X 44’ Y Z Fall-through Branch taken path 232’b 44’1 Instructions packed after forward branches will only be executed when the branch is not taken 44’2’32

11 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 11/17 Predication – Backward Branches abc 22’ Branch def 112 Branch offset 2’ Instructions packed after backward branches will only be executed when the branch is taken

12 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 12/17 Predication Advantages with IRF IRF facilitates a form of predication for the MIPS – a baseline architecture that traditionally does not support predication No need to waste instruction encoding space specifying predicate bits for most/all instructions (even ARM traded away general predication for reducing code size with Thumb and Thumb2) No need to fetch, decode and possibly execute instructions that are annulled after the branch within a pack (reducing energy consumption and execution time)

13 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 13/17 Experimental Evaluation MiBench embedded benchmark suite – 6 categories representing common tasks for various domains SimpleScalar MIPS/PISA architectural simulator  Out-of-order, single issue embedded machine with 8KB 4- way set associative L1 instruction and data caches and 128-entry bimodal branch predictor  Wattch/Cacti extensions for modeling energy consumption (inactive portions of pipeline only dissipate 10% of normal energy when using cc3 clock gating) VPO – Very Portable Optimizer targeted for SimpleScalar MIPS/PISA

14 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 14/17 Energy Consumption

15 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 15/17 Static Code Size

16 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 16/17 IRF Promotion with Mixed Profiling

17 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 17/17 Conclusions & Future Work Compiler optimizations targeted specifically for IRF can further reduce energy (12.2%  15.8%), code size (16.8%  28.8%) and execution time Unique transformation opportunities exist due to IRF, such as code duplication for code size reduction and predication As processor designs become more idiosyncratic, it is increasingly important to explore the possibility of evolving existing compiler optimizations Register targeting and loop unrolling should also be explored with instruction packing Enhanced parameterization techniques

18 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 18/17

19 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 19/17 Tightly Packed Instruction Format New opcodes for this T-format of MISA instructions Supports sequential execution of up to 5 RISA instructions from the IRF  Unnecessary fields are padded with nop Supports up to 2 parameters replacing instruction slots  Parameters can come from 32-entry IMM  Each IRF entry also retains a default immediate value as well  Branches use these 5 bits for displacements  R-type RISA instructions can use parameter to replace RD field

20 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 20/17 MIPS Instruction Format Modifications Creating Loosely Packed instructions  R-type: Removed shamt field and merged with rs  I-type: Shortened immediate values (16-bit  11bit) Lui now uses 21-bit immediate values, hence no loose packing  J-type: Unchanged

21 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 21/17

22 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 22/17 Introduction Embedded Processor Design Constraints  Energy Consumption  Static Code Size  Execution Time Fetch logic is responsible for 36% of total processor power on StrongARM Two Primary Areas for Improvement in Instruction Fetch  Better fetch mechanism and storage Instruction Cache and/or ROM – Lower power than main memory, but still a fairly large, flat storage method  Better instruction encodings Instruction encodings are wasteful with bits  Maximize functionality, but simplify decoding (fixed length)  Most applications only apply a subset of available instructions  Nowhere near theoretical compression limits

23 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 23/17 Instruction Redundancy Profiled largest benchmark in each of six MiBench categories Most frequent 32 instructions comprise 66.5% of total dynamic and 31% of total static instructions

24 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 24/17 Access of Data & Instructions Each lower layer is designed to improve accessibility of current/frequent items, albeit at a reduction in number of available items. Caching is beneficial, but compilers can do better for the “most frequently” accessed data items (e.g. Register Allocation). Instructions have no analogue to the Data Register File (RF). L1 Data Cache ????? Data Register File L1 Instruction Cache L2 Cache Main Memory

25 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 25/17 Instruction Packing Store frequently occurring instructions as specified by the compiler in a small, low- power Instruction Register File (IRF) Allow multiple instruction fetches from the IRF by packing instruction references together Facilitate parameterization of some instructions using an Immediate Table (IMM)

26 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 26/17 Outline Introduction IRF Instruction Set Architecture IRF Register Windowing Compiler Optimizations Experimental Framework Results Interaction with Other Techniques Proposed Enhancements for Instruction Packing Conclusions Publication Plan

27 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 27/17 IRF Instruction Set Architecture MIPS ISA – commonly known and provides simple encoding RISA (Register ISA) – instructions available via IRF access MISA (Memory ISA) – instructions available in memory  New instruction formats that can reference multiple RISA instructions – Tightly Packed  Original instructions modified to pack an additional RISA reference – Loosely Packed

28 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 28/17 Instruction Cache PC IF/ID IRF IMM IRWP packed instruction insn1 insn2 imm3 insn3 imm3 insn4 Execution of IRF Instructions Instruction Fetch StageFirst Half of Instruction Decode Stage To Instruction Decoder Executing a Tightly Packed Param4c Instruction

29 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 29/17 Tightly Packed Instruction Format New opcodes for this T-format of MISA instructions Supports sequential execution of up to 5 RISA instructions from the IRF  Unnecessary fields are padded with nop Supports up to 2 parameters replacing instruction slots  Parameters can come from 32-entry IMM  Each IRF entry also retains a default immediate value as well  Branches use these 5 bits for displacements  R-type RISA instructions can use parameter to replace RD field

30 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 30/17 MIPS Instruction Format Modifications Creating Loosely Packed instructions  R-type: Removed shamt field and merged with rs  I-type: Shortened immediate values (16-bit  11bit) Lui now uses 21-bit immediate values, hence no loose packing  J-type: Unchanged

31 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 31/17 Compilation Framework

32 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 32/17 Packing Instructions Greedy selection of instructions based on static or dynamic profile data Examine a sliding window of instructions for each basic block, attempting to pack adjacent RISA instructions together  Denser packs are more favorable (e.g. tight5)  Branch offset distances can slip into range when we apply an iterative packing algorithm

33 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 33/17 IRF Register Windowing Simply increasing the number of IRF registers  reduced number of RISA instructions available in a single MISA instruction Instead, keep format the same and provide multiple IRF windows to switch between on a per-function basis Can dynamically switch IRF contents on calls/returns or provide actual physically separate register windows

34 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 34/17 Software Windowing Add new instructions to dynamically save/restore instruction registers Calculate cost/benefit analysis of creating a new window/partition for a given function Insert appropriate instructions for saving/restoring IRs between call/return sequences Drawbacks  Requires the complete call graph for the application  Extra instructions for saving/restoring can add to execution time

35 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 35/17 Hardware Windowing IRF windows are shared amongst functions with similar instruction mix Function addresses are modified to incorporate an instruction register window pointer (IRWP)  Call instruction – transfers to new function and switches the window by adjusting the IRWP; also saves the return address and current IRWP  Return Instruction – transfers control back to the caller and restores the previous IRWP Inactive windows can be kept in a low-power state

36 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 36/17 Compiler Optimizations Improved Instruction Promotion Adapt existing techniques to improve instruction packing  Instruction Selection  Register Re-assignment  Instruction Scheduling Increase application redundancy Eliminate constraints on packing

37 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 37/17 Improving Instruction Promotion Different Classes of instructions – can consume between one and five of the potentially available slots for a tight pack Goal is to more accurately model the benefits of promoting from one class of instruction to another  Original IRF papers did not consider the promotion of multiple I- type instructions with different default immediate values  addi $3, $3, 4 and addi $3, $3, 1 would not both reside in the IRF, no matter how frequently they occurred Can simultaneously weight static and dynamic profile data to obtain a mixed result that has both good code compression and reduced energy consumption Can obtain most of the benefits of individual static/dynamic profiling

38 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 38/17 Instruction Selection Encode short jumps as branches (now parameterizable)  j label  beq $0, $0, rel_label Replace simple instructions with equivalent parameterizable forms  addu $2, $3, $0  addiu $2, $3, 0 Ensure that commutative operations always have the same order of operands  addu $2, $4, $2  addu $2, $2, $4

39 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 39/17 Register Re-assignment Performed after registers have been assigned in the second compilation with packing Use a Register Interference Graph to represent register live ranges in the function Re-assign live ranges to other registers if this improves packing density  Live ranges ordered by dynamic frequency  Modify register in conflicting live ranges if beneficial

40 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 40/17 Instruction Scheduling Intra-block – focus on reordering instructions so that dense packs are formed (both tight and loose) Inter-block – attempt to move instructions between blocks to fill up packs ending with branches/jumps  Code duplication  Predication

41 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 41/17 Experimental Framework MiBench embedded benchmark suite – 6 categories representing common tasks for various domains SimpleScalar MIPS/PISA architectural simulator  Out-of-order, single issue embedded machine with 8KB 4- way set associative L1 instruction and data caches and 128-entry bimodal branch predictor  Wattch/Cacti extensions for modeling energy consumption (inactive portions of pipeline only dissipate 10% of normal energy when using cc3 clock gating) VPO – Very Portable Optimizer targeted for SimpleScalar MIPS/PISA

42 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 42/17 Results – Processor Energy

43 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 43/17 Results – Static Code Size

44 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 44/17 Results – Execution Cycles

45 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 45/17 Results – Mixed Profiling

46 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 46/17 Interaction with Other Techniques Loop Cache – small buffer that captures innermost loops dynamically  IRF can operate synergistically, increasing utilization of loop cache when innermost loops are larger than normal  Fetch Energy reduced by 56% with 8-entry loop cache and 4 window IRF L0 (Filter) Cache – small instruction cache with low energy consumption, but can negatively impact execution time due to cache misses  IRF reduces working set size and can mask a portion of the cache miss penalty due to overlapped fetch  Fetch energy reduced by 70% with 256-byte L0 cache and 4 window IRF

47 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 47/17 Proposed Enhancements for Instruction Packing Splitting of MISA/RISA Split Opcode/Operand Encoding in RISA Improved Parameterization Link-Time Instruction Packing Instruction Promotion as a Side Effect

48 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 48/17 Splitting of MISA/RISA No need to arbitrarily limit RISA to same instructions as MISA Tailor each ISA to particular design goals  MISA – reduced code size (small instructions)  RISA – improved performance & expressiveness (large instructions) Baseline ISA choices  ARM/Thumb – 32/16 bit dual width ISA  FITS – opcodes mapped to 16 bit instructions  ARM/Thumb2 – 32/16 bit variable length ISA

49 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 49/17 Split MISA Encodings

50 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 50/17 Splitting Opcodes/Operands Existing code compression schemes have benefited from separating the encoding of opcodes and operands Can re-encode RISA to separate opcode and operand streams Loosely packed instructions can use traditional or new split encodings Tightly packed instructions could even continue supporting parameterization

51 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 51/17 Split Opcode/Operand Encodings

52 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 52/17 Improved Parameterization Current IRF parameterization is limited to immediate values and destination registers  Branches use parameter for offset distance  Other I-type instructions can index into IMM based on the parameter value  R-type instructions use parameter to replace RD Extend parameterization to any field: RS, RT, RD, immediate, and even Opcode!!!

53 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 53/17 Parameterization Encoding

54 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 54/17 Link-Time Instruction Packing Current implementation cannot pack statically linked library routines Incremental packing strategy with a more precise evaluation of exact benefits of packing particular instructions Integration with existing tools like DIABLO for ARM/Thumb or MIPS Support for new inter-procedural optimizations like procedural abstraction and dead code elimination

55 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 55/17 Promotion as a Side Effect Current implementation uses a statically allocated IRF that does not change at run- time  Prior dynamic scheme experienced a great deal of overhead for saving/restoring IRF entries Change promotion mechanism to make saves and restores simpler Capture simple loop behaviors in the same manner as other techniques like Zero Overhead Loop Buffer (ZOLB)

56 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 56/17 Loosely Packed Promotion Use entire 5-bit field to specify IRF entry to write into  All RISA executions must come from tightly packed instructions Reserve 1-bit in the loosely packed field to represent read/write of IRF entry  Write this instruction (after executing it) into the corresponding 4-bit IRF entry  Reduces available number of IRs

57 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 57/17 IRF Calling Conventions Maintain some IRs as non-scratch (callee-save) and some as scratch (caller-save) Use appropriate registers for varying behavior (tight loops with and without function calls)

58 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 58/17 Dynamic Promotion Peel loop and then pack instructions in the actual loop  Special handling for conditional control flow  Can alleviate huge code size increases when combined with loop unrolling Added code size and potentially extra instructions for save/restore, but huge potential for energy savings in tight loops  Link-time analysis can eliminate saves/restores  Potential for actual rotating register file (like SPARC) Surpasses existing ZOLB and loop cache techniques since loops with conditional control flow can be captured

59 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 59/17 Conclusions Efficient fetch mechanism that simultaneously reduces  Processor Energy Consumption (15.8% reduction)  Static Code Size (28.8% reduction)  Execution Time (1.3% reduction) Improvements via parameterization, windowing, compiler optimizations and integration with other processor enhancements Future work is rich and diverse:  Specialized MISA and RISA encodings  Improved parameterization  Link-time analysis and instruction packing  Improved methods for IRF promotion Demonstrates the need for careful design/balance of instruction sets and instruction fetch mechanisms

60 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 60/17 Previous Publications 1. Hines, S., Green, J., Tyson, G., and Whalley, D. Improving program efficiency by packing instructions into registers. In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture (June 2005), IEEE Computer Society, pp. 260–271. 2. Hines, S., Tyson, G., and Whalley, D. Reducing instruction fetch cost by packing instructions into register windows. In Proceedings of the 38th annual ACM/IEEE International Symposium on Microarchitecture (November 2005), IEEE Computer Society, pp. 19–29. 3. Hines, S., Tyson, G., and Whalley, D. Improving the energy and execution efficiency of a small instruction cache by using an instruction register file. In Proceedings of the 2nd Watson Conference on Interaction between Architecture, Circuits, and Compilers (September 2005), pp. 160–169.

61 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 61/17 Publications to be Submitted 4. Hines, S., Whalley, D., and Tyson, G. Adapting compilation techniques to enhance the packing of instructions into registers. Submitting to the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, October 2006. 5. Hines, S., Tyson, G., and Whalley, D. Improving fetch efficiency by using an instruction register file. Planning to submit.

62 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 62/17 Other Publications 6. Hines, S., Kulkarni, P., Whalley, D., and Davidson, J. Using de-optimization to re-optimize code. In Proceedings of the 5th International Conference on Embedded Software (September 2005), ACM Press, pp. 114–123. 7. Kulkarni, P., Hines, S., Hiser, J., Whalley, D., Davidson, J., and Jones, D. Fast searches for effective optimization phase sequences. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (June 2004), pp. 171–182. 8. Kulkarni, P. A., Hines, S. R., Whalley, D. B., Hiser, J. D., Davidson, J. W., and Jones, D. Fast and efficient searches for effective optimization phase sequences. ACM Transactions on Architecture and Code Optimization (June 2005), 165–198. 9. Kulkarni, P., Zhao, W., Hines, S., Whalley, D., Yuan, X., van Engelen, R., Gallivan, K., Hiser, J., Davidson, J., Cai, B., Bailey, M., Moon, H., Cho, K., Paek, Y., and Jones, D. VISTA: VPO interactive system for tuning applications. ACM Transactions on Embedded Computing Systems, 2006. 10. Kleffner, M., Jones, D., Hiser, J., Kulkarni, P., Parent, J., Hines, S., Whalley, D., Davidson, J., and Gallivan, K. On the use of compilers in DSP laboratory instruction. In Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing (May 2006). 11. Kreahling, W., Hines, S., Whalley, D., and Tyson, G. Reducing the cost of conditional transfers of control by using comparison specifications. In Proceedings of the 2006 ACM SIGPLAN conference on Languages, Compilers, and Tools for Embedded Systems (June 2006), ACM Press.

63 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 63/17 Conferences/Journals Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) Conference on Languages, Compilers and Tools for Embedded Systems (LCTES) Conference on Programming Language Design and Implementation (PLDI) International Symposium on Computer Architecture (ISCA) International Symposium on Microarchitecture (MICRO) ACM Transactions on Architecture and Code Optimizations (TACO) IEEE Transactions on Computers (TOC) ACM Transactions on Computer Systems (TOCS) ACM Transactions in Embedded Computing Systems (TECS)

64 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 64/17 Publication Plan

65 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 65/17 The End Questions ???

66 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 66/17

67 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 67/17

68 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 68/17 Compiler Optimization Legend

69 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 69/17 Intra-block Instruction Scheduling

70 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 70/17 Code Duplication to Reduce Code Size

71 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 71/17 Predication – Forward Branches

72 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 72/17 Predication – Backward Branches

73 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 73/17 Predication Advantages with IRF IRF facilitates a form of predication for the MIPS – a baseline architecture that traditionally does not support predication No need to waste instruction encoding space specifying predicate bits for most/all instructions (even ARM traded away general predication for reducing code size with Thumb and Thumb2) No need to fetch, decode and possibly execute instructions that are annulled after the branch within a pack (reducing energy consumption and execution time)

74 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 74/17 Energy Consumption

75 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 75/17 Static Code Size

76 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 76/17 Execution Cycles

77 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 77/17 Conclusions IRF provides dense encodings for frequently occurring instructions Compiler optimizations targeted specifically for IRF can further reduce energy (12.2%  15.8%), code size (16.8%  28.8%) and execution time Unique transformation opportunities exist due to IRF, such as code duplication for code size reduction and predication As processor designs become more idiosyncratic, it is increasingly important to explore the possibility of evolving existing compiler optimizations

78 Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers 78/17


Download ppt "Adapting Compilation Techniques to Enhance the Packing of Instructions into Registers Stephen Hines, David Whalley and Gary Tyson Computer Science Dept."

Similar presentations


Ads by Google