Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.

Similar presentations


Presentation on theme: "Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC."— Presentation transcript:

1 Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC.

2 Open64 Workshop 20082 Outline  Motivation  Types of structure layout optimizations  Criteria for structure layout optimizations  Implementation details  Performance results  Future work  Conclusion

3 Open64 Workshop 20083 Motivation  Poor data locality in many applications  High data cache miss rates  Growing gap between processor and memory speeds Our Approach  Change layout of data structures  Requires whole-program optimization  Use Inter-Procedural Analysis and Optimizations (IPA) Our Aim  Make applications more cache-friendly

4 Open64 Workshop 20084 IPA  Summarization  Analysis  Optimization

5 Open64 Workshop 20085 Types of Structure Layout Optimizations  Structure splitting  Structure peeling struct struct_A { double d1; double d2; int i; float f; long long l; char c; struct struct_A * next; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; };

6 Open64 Workshop 20086 Structure Splitting Example struct new_struct_A { double d1; int i; long long l; struct new_struct_A * next; struct cold_sub_struct_A * p; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; struct struct_A * next; }; struct cold_sub_struct_A { double d2; float f; char c; };

7 Open64 Workshop 20087 Structure Peeling Example struct new_struct_A { double d1; int i; long long l; }; struct struct_A { double d1; double d2; int i; float f; long long l; char c; }; struct cold_sub_struct_A { double d2; float f; char c; };

8 Open64 Workshop 20088 Criteria for structure layout optimizations  Legality Analysis  Type cast  Address of a field is taken  Escaped types  Parameter types  Full visibility to IPA  Alignment restrictions  Profitability Analysis  Hotness  Affinity  Field accesses at loop level  Size

9 Open64 Workshop 20089 Implementation Details Step 1: Type information summarization (IPL) Step 2: Symbol table merging (IPA) Step 3: Legality and profitability analysis (IPA analysis) Step 4: Transforming the program (IPA optimization)

10 Open64 Workshop 200810 Implementation Details: Type information summarization  Information summarization in IPL  Framework for computing static profiles using heuristics  New TY flag TY_NO_SPLIT  SUMMARY_TY_INFO  SUMMARY_LOOP  For each DO_LOOP, WHILE_DO, DO_WHILE  Bit-vector to track field accesses of up to N structure for each loop  Considers field accesses immediately inside loop  These fields are considered affine to each other  Execution count of statements immediately inside loop  From statically estimated profiles or from runtime feedback

11 Open64 Workshop 200811 Implementation Details: IPA Analysis  Inter-procedurally update statically estimated execution count of PUs  Update statically estimated loop frequencies in SUMMARY_LOOP  Consider SUMMARY_LOOP from the hottest P PUs  Determine candidates for structure-layout transformation  Determine new layout of structures

12 Open64 Workshop 200812 Implementation Details: IPA Analysis Example F4F4 F3F3 F2F2 F1F1 BV L1L1 22 0101 L2L2 140010 L3L3 12 0101 L4L4 881100 L5L5 660101 F4F4 F3F3 F2F2 F1F1 AG 1 40 AG 2 14 AG 3 88 L i — Loops F j — Fields in a struct AG k — Affinity groups

13 Open64 Workshop 200813 Implementation Details: Transforming the program struct S struct T { // N fields // AG1 fields struct T * p; // AG2 fields // M fields }; }; // peel T struct S { // N fields struct T1 * p1; struct T2 * p2; // M fields };  New type definitions  Field table update  Field access statements  New symbols  Assignment statements Example: struct T1 struct T2 { // AG1 fields // AG2 fields };

14 Open64 Workshop 200814 Implementation Details: Transforming the program (continued) Function calls to memory management routines Example: p = (T *) malloc (N * sizeof (T)) if (p == NULL) exit (1);  Detect memory management routine calls involving transformed type T  Replicate call, assignment statements  Update size of memory being allocated  Handle comparisons involving pointer p

15 Open64 Workshop 200815 Performance Results Compilations options: -Ofast at 32-bit ABI Speedup due to structure layout optimizations Benchmarks AMD Opteron™ (2.8GHz, 4GB, 1MB) AMD Barcelona(2. 0GHz, 8GB, 512KB) Intel® EM64T(3.4G Hz, 4GB, 1MB) Intel® Core™(3.0 GHz, 4GB, 4MB) SiCortex MIPS®(500MHz, 4GB, 256KB) Geometric Mean 179.art134%66%56%47%41%62.5% 181.mcf24%23% 31%13%22.0% 462.libquantum32%17%40%72%62%39.6% Geometric Mean46.9%29.6%37.2%47.2%32.1% 37.9%

16 Open64 Workshop 200816 Performance Results (continued) Compilations options: -Ofast at 64-bit ABI Speedup due to structure layout optimizations Benchmarks AMD Opteron™ (2.8GHz, 4GB, 1MB) AMD Barcelona(2. 0GHz, 8GB, 512KB) Intel® EM64T(3.4G Hz, 4GB, 1MB) Intel® Core™(3.0 GHz, 4GB, 4MB) SiCortex MIPS®(500MHz, 4GB, 256KB) Geometric Mean 179.art169%66%53%60%45%69.3% 181.mcf25%35%12%30%7%18.6% 462.libquantum82%51%75%70%69%68.6% Geometric Mean70.2%49.0%36.3%50.1%27.9% 44.6%

17 Open64 Workshop 200817 Performance Results (continued) Compilations options: -Ofast at 64-bit ABI Multiple copies of 462.libquantum running on multi-core chip Platform: Quad-core AMD Barcelona (2.0 GHz, 8GB, 512KB, 2MB) 3 rd level cache shared among 4 cores Speedup from structure layout optimizations Benchmark1 copy2 copies4 copies 462.libquantum51%69%123%

18 Open64 Workshop 200818 Future Work  Tune static profile estimation  Less restrictions  Integrate with field-reordering

19 Open64 Workshop 200819 Conclusion  A framework for performing structure layout transformations is now available in the Open64 compiler.  The superior infrastructure in the Open64 compiler helped us implement the optimizations cleanly and with relatively less effort.  Substantial speedups are possible on some of the CPU2000 and CPU2006 SPEC benchmarks.  Structure layout optimization is a required feature for a compiler to remain competitive.


Download ppt "Structure Layout Optimizations in the Open64 Compiler: Design, Implementation and Measurements Gautam Chakrabarti and Fred Chow PathScale, LLC."

Similar presentations


Ads by Google