# Compiler-Based Register Name Adjustment for Low-Power Embedded Processors Discussion by Garo Bournoutian.

## Presentation on theme: "Compiler-Based Register Name Adjustment for Low-Power Embedded Processors Discussion by Garo Bournoutian."— Presentation transcript:

Compiler-Based Register Name Adjustment for Low-Power Embedded Processors Discussion by Garo Bournoutian

Introduction Problem:Problem: High power consumption due to bit transitions on instruction bus –Current compilers allocate registers with minimal amount of spill/fill code Importance:Importance: Growing number of mobile, battery-powered devices. –Solution allows for longer battery-life, larger die sizes Approach:Approach: rearrange, rename registers in code to allow for minimal bit transitions

What is a bit transition? Assembly Code:Instruction Word: add r3, r2, r4… 0011 0010 0100 … sub r6, r3, r5… 0110 0011 0101 … sub r3, r2, r6… 0011 0010 0110 … mul r4, r4, r5… 0100 0100 0101 … Transitions / field:7 4 5 Total transitions: 16

The original code had a total of 16 transitions: add r3, r2, r4 sub r6, r3, r5 sub r3, r2, r6 mul r4, r4, r5 An Example The optimized code now has a total of 10 transitions: add r6, r2, r4 sub r7, r6, r5 sub r6, r2, r7 mul r4, r4, r5 Just renaming r3 to r6 and r6 to r7, you have a 37% reduction of bit transitions.

Formulation Must map code in basic blocks to numerical structures ld r5,(r1)0 add r3, r2, r5 add r4, r3, r2 mul r3, r4, r3 st r3, (r7)10

Heuristic Solution Solving this problem for multiple basic blocks and literals is NP-Complete (Traveling Salesman Problem) Effective, efficient heuristic solution for RNA requires two steps: –Register PerturBation (RPB) Maximizes distribution skew of register pairs –Register PermuTation (RPT) Uses frequencies of register pairs to minimize hamming distance

Register PerturBation Commutativity Transformation –add,mul, and, or operations –No side-effects in code performance Linear Time Complexity Dead Register Reassignment r1  r2, r3r1  r2, r3 r4  r1, r2r2  r1, r2 r2  r3, r4r2  r3, r2 Linear Time Complexity

Register PermuTation Capture utilization frequency of register/literal pairs by means of Register Histogram Graph (RHG) –Directed graph –Nodes = registers/literal –Edge between two nodes whose registers are consecutive in the code Iterative search finds optimal encoding between each pair. –Complexity of O(|E|*|R| 2 )R = set of all registers E = number of edges

Application of Heuristic Applied primarily on major application loops Special care taken to preserve def-use chains between loops –Adds trivial number of instructions at “hot spots” Profile information may be used to prioritize which order to visit basic blocks Can be used within compilation system or as stand-alone tool operating on binary code

Experimental Results Used modified version of SimpleScalar –Made Control Flow Graph for each “Hot Spot” –Generated Basic Block Frequencies Encapsulated RPB and RPT into stand-alone module –Input the generated CFG into this module Ran module on six different benchmarks –RPT Improvement as high as 25% –RPB Improvement as high as 44% (How they supported their findings)

Conclusions Presented compiler-driven, power-aware register name adjustment (RNA) algorithm –Formally defined as NP-Complete –Two efficient heuristics for attacking problem RPB – commutativity and dead register reassignment RPT – register pair frequencies and remappings Significant power improvements resulting from compiler-based optimization (no additional hardware support needed) –Independent of ISA –Easily integrated within any compilation framework