Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extensions to Structure Layout Optimizations in the Open64 Compiler Michael Lai AMD.

Similar presentations


Presentation on theme: "Extensions to Structure Layout Optimizations in the Open64 Compiler Michael Lai AMD."— Presentation transcript:

1 Extensions to Structure Layout Optimizations in the Open64 Compiler Michael Lai AMD

2 Related Work Structure splitting, structure peeling, structure field reordering (Hagog & Tice, Hundt, Mannarswamy & Chakrabarti) Above implemented in the Open64 Compiler (Chakrabarti & Chow) Structure instance interleaving (Truong, Bodin & Seznec) Data splitting (Curial, Zhao & Amaral) Array reshaping (Zhao, Cui, Gao, Silvera & Amaral)

3 Current Framework source WHIRL.o WHIRL ipa_link frontend ipl

4 Instance Interleaving a[0].field_1 a[0].field_2 an “instance” of the structure a[0].field_3 a[1].field_1 a[1].field_2 another “instance” of the structure a[1].field_3...

5 Instance Interleaving a[0].field_1 field_1 of all the instances are a[1].field_1 interleaved together … a[0].field_2 field_2 of all the instances are a[1].field_2 interleaved together … a[0].field_3 field_3 of all the instances are a[1].field_3 interleaved together...

6 Instance Interleaving array[0].field_1 array[0].field_2 array[0].field_3 … array[0].field_m array[1].field_1 array[1].field_2 array[1].field_3 … array[1].field_m … array[n-1].field_1 array[n-1].field_2 array[n-1].field_3 … array[n-1].field_m array[0].field_1 array[1].field_1 array[2].field_1 … array[n-1].field_1 array[0].field_2 array[1].field_2 array[2].field_2 … array[n-1].field_2 … array[0].field_m array[1].field_m array[2].field_m … array[n-1].field_m

7 Implementation Profitability analysis (done in ipl) – During ipl compilation of each source file, access patterns of structure fields are analyzed and their usage statistics recorded – After all the functions have been compiled by ipl, the “most likely to benefit” structure (if any) is marked and passed to ipo – (By way of illustration, the ideal structure is one with many fields, each of which appearing in its own hot loop)

8 Implementation Legality analysis (done in ipo) – Usual checking for address taken, escaped types, etc. Code transformation (done in ipo) – Create internal pointers ptr_1, ptr_2, …, ptr_m to keep track of the m locations array[0].field_1, array[0].field_2, …, array[0].field_m – Rewrite array[i].field_j to ptr_j[i], if “i” is known; otherwise, incur additional overhead to compute “i”

9 Instance Interleaving array[0].field_1 array[0].field_2 array[0].field_3 … array[0].field_m array[1].field_1 array[1].field_2 array[1].field_3 … array[1].field_m … array[n-1].field_1 array[n-1].field_2 array[n-1].field_3 … array[n-1].field_m array[0].field_1 array[1].field_1 array[2].field_1 … array[n-1].field_1 array[0].field_2 array[1].field_2 array[2].field_2 … array[n-1].field_2 … array[0].field_m array[1].field_m array[2].field_m … array[n-1].field_m = ptr_1 = ptr_2 = ptr_m array[i].field_j becomes ptr_j[i]

10 Array Remapping field_1 field_2 field_3 … field_m field_1 field_2 field_3 … field_m … field_1 field_2 field_3 … field_m field_1 … field_1 field_2 … field_2 … field_m … field_m iteration 0 iteration 1 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 a[0] a[1] a[2] … a[m-1] a[m] a[m+1] a[m+2] … a[2m-1] … a[(n-1)m] a[(n-1)m+1] a[(n-1)m+2] … a[nm-1] a[0] a[1] a[2] … a[n-1] a[n] a[n+1] a[n+2] … a[2n-1] … a[(m-1)n] a[(m-1)n+1] a[(m-1)n+2] … a[mn-1]

11 Implementation Profitability analysis (done in ipl) – During ipl compilation of each source file, discover if there are arrays that behave like structures and suffer poor data cache utilization at the same time – After all the functions have been compiled by ipl, the “most likely to benefit” arrays (if any) are marked and passed to ipo – For each of these arrays, record the stride, group size, and array size associated with it

12 Implementation Legality analysis (done in ipo) – Check for array aliasing, address taken, argument passing, etc. Code transformation (done in ipo) – Construct the array remapping permutation alpha(i) = (i % m) * n + (i / m), where m is the group size and n is the number of such groups – Rewrite a[i] to a[alpha(i)]

13 Array Remapping field_1 field_2 field_3 … field_m field_1 field_2 field_3 … field_m … field_1 field_2 field_3 … field_m field_1 … field_1 field_2 … field_2 … field_m … field_m iteration 0 iteration 1 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 iteration 0 iteration 1 iteration 2 iteration n-1 a[0] a[1] a[2] … a[m-1] a[m] a[m+1] a[m+2] … a[2m-1] … a[(n-1)m] a[(n-1)m+1] a[(n-1)m+2] … a[nm-1] a[0] a[1] a[2] … a[n-1] a[n] a[n+1] a[n+2] … a[2n-1] … a[(m-1)n] a[(m-1)n+1] a[(m-1)n+2] … a[mn-1] a[i] becomes a[(i%m)*n+(i/m)]

14 Performance Results AMD systemspeed (1-copy) runrate (12-copy) run 462.libquantum (structure peeling) +6.35%+43.43% 429.mcf (instance interleaving) +2.43%+38.38% 470.lbm (array remapping)-16.35% (degradation)+138.55% Intel systemspeed (1-copy) runrate (4-copy) run 462.libquantum (structure peeling) +7.01%+24.30% 429.mcf (instance interleaving) -6.04% (degradation)+34.62% 470.lbm (array remapping)-23.28% (degradation)+119.51%

15 Future Work Integrate existing structure layout optimizations with the new structure instance interleaving work Combine profitability heuristics of all structure layout optimizations Extend structure instance interleaving optimization to more than one structure Extend array remapping optimization to multi- dimensional arrays

16 References 1.G. Chakrabarti and F. Chow. “Structure Layout Optimizations in the Open64 Compiler.” Proceedings of the Open64 Workshop, Boston, 2008. 2.M. Hagog and C. Tice. “Cache Aware Data Layout Reorganization Optimization in gcc.” Proceedings of the gcc Developers Summit, 2005. 3.R. Hundt, S. Mannarswamy, and D.R. Chakrabarti. “Practical Structure Layout Optimization and Advice.” Proceedings of the International Symposium on Code Generation and Optimization, New York, 2006. 4.D.N. Truong, F. Bodin, and A. Seznec. “Improving Cache Behavior of Dynamically Allocated Data Structures.” Proceedings of the International Conference on Parallel Architectures and Compilation Techniques, Washington D.C., 1998.


Download ppt "Extensions to Structure Layout Optimizations in the Open64 Compiler Michael Lai AMD."

Similar presentations


Ads by Google