Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluation of Offset Assignment Heuristics Johnny Huynh, Jose Nelson Amaral, Paul Berube University of Alberta, Canada Sid-Ahmed-Ali Touati Universite.

Similar presentations


Presentation on theme: "Evaluation of Offset Assignment Heuristics Johnny Huynh, Jose Nelson Amaral, Paul Berube University of Alberta, Canada Sid-Ahmed-Ali Touati Universite."— Presentation transcript:

1 Evaluation of Offset Assignment Heuristics Johnny Huynh, Jose Nelson Amaral, Paul Berube University of Alberta, Canada Sid-Ahmed-Ali Touati Universite de Versailles, France

2 Outline Background Traditional Approach to Offset Assignment Simple Offset Assignment Address-Register Assignment Improving the Problem Model Optimal Address-Code Generation Memory Layout Permutations Evaluating Current Heuristics Methodology Results Conclusions and Future Work

3 Outline Background Traditional Approach to Offset Assignment Simple Offset Assignment Address-Register Assignment Improving the Problem Model Optimal Address-Code Generation Memory Layout Permutations Evaluating Current Heuristics Methodology Results Conclusions and Future Work

4 Background Digital Signal Processors (DSPs) have few general purpose registers Program variables kept in memory Address Registers (AR) used to access variables After a variable is accessed, the AR can be auto-incremented (or decremented) by one word in the same cycle.

5 Processor Model Texas Instruments TMS320C54X DSP family: Accumulator-based DSP 8 Address Registers Initializing an address register requires 2 cycles of overhead Explicit address computations require 1 cycle of overhead Using auto-increment (or auto-decrement) has no overhead.

6 Processor Model Example: add ‘A’ and ‘B’, store in accumulator $AR0 = &A $ACC = *$AR0 $AR0 = $AR0 + 2 $ACC += *$AR0 $AR0 = &A $ACC = *$AR0++ $ACC += *$AR0 Explicit address computation Auto-Increment ACB ABC 0x1000 0x1001 0x1002

7 Processor Model Example: add ‘A’ and ‘B’, store in accumulator $AR0 = &A $ACC = *$AR0 $AR0 = $AR0 + 2 $ACC += *$AR0 $AR0 = &A $ACC = *$AR0++ $ACC += *$AR0 Explicit address computation Auto-Increment ACB ABC 0x1000 0x1001 0x1002

8 The Offset-Assignment Problem Given k address registers and a basic block accessing n variables, find a memory layout that minimizes address- computation overhead. How should the variables be placed in memory? Which register should access each variable?

9 Outline Background Traditional Approach to Offset Assignment Simple Offset Assignment Address-Register Assignment Improving the Problem Model Optimal Address-Code Generation Memory Layout Permutations Evaluating Current Heuristics Methodology Results Conclusions and Future Work

10 Traditional Approach to Offset Assignment Access Sequence Address Register Assignment Sub-Sequence Sub-Layout Simple Offset Assignment Sub-Layout Simple Offset Assignment Sub-Layout Simple Offset Assignment Basic Block Generate Access Sequence Address-Computation Overhead Address-Code Generation

11 Traditional Approach: Simple Offset Assignment (SOA) In 1992, Bartley introduced the simplest form of the offset assignment problem: Given a single address register and basic block with n variables, find a memory layout that minimizes overhead. Equivalent to finding a maximum weight path cover (NP-complete) Many researchers have proposed heuristics for this problem: Liao et. al. (1996) Leupers and Marwedel (1996) Sugino et. al. (1996)

12 Simple Offset Assignment (SOA) Fix the access sequence Assume only one address register (k = 1) Find an ordering of variables in memory (memory layout) that has minimum overhead. AB D FC E 2 2 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layout:

13 Simple Offset Assignment (SOA) Create Access Graph G = (V, E) V = variables weight of edge is the frequency of consecutive accesses A path defines a memory layout -- Find the Maximum Weight Path Cover NP-Complete! AB D FC E 2 2 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layout:

14 Simple Offset Assignment (SOA) Create Access Graph G = (V, E) V = variables weight of edge is the frequency of consecutive accesses A path defines a memory layout -- Find the Maximum Weight Path Cover NP-Complete! AB D FC E 2 2 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layout: dafceb

15 Traditional Approach: General Offset Assignment (GOA) Problem presented by Liao et. al. in 1996. Given k address registers, and a basic block with n variables, find an assignment of variables to address registers that minimizes the total overhead of all registers. This problem formulation is more accurately described as Address- Register Assignment (ARA). Consists of SOA problems, and is at least NP-hard. Many researchers have proposed heuristics for address-register assignment: Leupers and Marwedel (1996) Sugino et. al. (1996) Zhuang et. al. (2003)

16 General Offset Assignment (GOA) Fix the access sequence Allow multiple address registers (k>1) Find an ordering of variables in memory (memory layout) that has minimum overhead. Assign each variable to an address register to form access sub-sequences. AB D FC E 2 2 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Sub-sequence1: ‘a b c b c a’ Sub-sequence2: ‘d e f e f d’

17 General Offset Assignment (GOA) AB D FC E 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Sub-sequence1: ‘a b c b c a’ Sub-sequence2: ‘d e f e f d’ Each sub-sequence can be viewed as an independent SOA problem. Solve each sub-sequence as independent SOA problems. More appropriate to call this problem the Address Register Assignment (ARA) problem. Requires solving SOA instances, so is at least NP-hard.

18 General Offset Assignment (GOA) AB D FC E 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: abcdef Each sub-sequence can be viewed as an independent SOA problem. Solve each sub-sequence as independent SOA problems. More appropriate to call this problem the Address Register Assignment (ARA) problem. Requires solving SOA instances, so is at least NP-hard.

19 Address-Code Generation Recall that variables are assigned to address registers. There is nothing left to decide – each address register has a defined sequence of accesses. Imposes a restriction that all access to a variable is done by a single address register. AB D FC E 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: abcdef AR0 AR1

20 Address-Code Generation Recall that variables are assigned to address registers. There is nothing left to decide – each address register has a defined sequence of accesses. Imposes a restriction that all access to a variable is done by a single address register. AB D FC E 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: abcdef AR0 AR1

21 Address-Code Generation Recall that variables are assigned to address registers. There is nothing left to decide – each address register has a defined sequence of accesses. Imposes a restriction that all access to a variable is done by a single address register. AB D FC E 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: abcdef AR0 AR1

22 Address-Code Generation Recall that variables are assigned to address registers. There is nothing left to decide – each address register has a defined sequence of accesses. Imposes a restriction that all access to a variable is done by a single address register. AB D FC E 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: abcdef AR0 AR1

23 Address-Code Generation Recall that variables are assigned to address registers. There is nothing left to decide – each address register has a defined sequence of accesses. Imposes a restriction that all access to a variable is done by a single address register. AB D FC E 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: abcdef AR0 AR1

24 Address-Code Generation Recall that variables are assigned to address registers. There is nothing left to decide – each address register has a defined sequence of accesses. Imposes a restriction that all access to a variable is done by a single address register. AB D FC E 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: abcdef AR0 AR1

25 Address-Code Generation Recall that variables are assigned to address registers. There is nothing left to decide – each address register has a defined sequence of accesses. Imposes a restriction that all access to a variable is done by a single address register. AB D FC E 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: abcdef AR0 AR1

26 Address-Code Generation Recall that variables are assigned to address registers. There is nothing left to decide – each address register has a defined sequence of accesses. Imposes a restriction that all access to a variable is done by a single address register. AB D FC E 2 2 Ex. Access Sequence: ‘a d b e c f b e c f a d’ Memory Layouts: abcdef AR0 AR1 *Requires Explicit Address Computations

27 ‘a d b e c f b e c f a d’ ‘a b c b c a’ ‘d e f e f d’ [a, b, c][d, e, f] Simple Offset Assignment Simple Offset Assignment Address Register Assignment Sub-sequence and memory layout accessed by AR0 Sub-sequence and memory layout accessed by AR1 Traditional Approach to Offset Assignment

28 Outline Background Traditional Approach to Offset Assignment Simple Offset Assignment Address-Register Assignment Improving the Problem Model Optimal Address-Code Generation Memory Layout Permutations Evaluating Current Heuristics Methodology Results Conclusions and Future Work

29 Optimal Address-Code Generation Given a fixed access sequence and memory layout, it is possible to generate optimal addressing-code in polynomial time: Minimum-Cost Circulation (Gebotys, 1997) Minimum-Weight Perfect Matching (Udayanarayanan, 2000)

30 Optimal Address-Code Generation Build a network-flow graph Vertices represent variable accesses For each access a i that occurs before another a j, there is an edge (a i,a j ) (not all shown the graph). Edges represent an opportunity for a register to access variables. Each unit flow represents the accesses performed by an address register. Optimal Address-Code is found by finding a minimum- cost circulation.

31 Traditional Approach to Offset Assignment Access Sequence Address Register Assignment Sub-Sequence Sub-Layout Simple Offset Assignment Address-Computation Overhead Address-Code Generation Sub-Sequence Sub-Layout Simple Offset Assignment Sub-Sequence Sub-Layout Simple Offset Assignment NP-Hard NP-Complete Solved, but not used!

32 Memory Layout Permutations (MLP) Since optimal address-code generation algorithms exist, they can be applied after a memory layout is formed (by traditional approaches). However, the traditional approach generates multiple sub-layouts that were originally assumed to be independent. How is a single memory layout formed from a set of sub-layouts?

33 Memory Layout Permutations Let M i be a memory sub-layout. Let M i r be the reciprocal of M i Given an access sequence and m memory sub- layouts, arrange {(M 1 |M 1 r ),…,(M m |M m r )}, such that overhead is minimum when the sub-layouts are placed contiguously in memory.

34 Memory Layout Permutations Example: ‘a d b e c f b e c f a d’ ‘a b c b c a’ ‘d e f e f d’ {a, b, c}{d, e, f} [a, b, c, d, e, f], [f, e, d, c, b, a] [c, b, a, d, e, f], [f, e, d, a, b, c] [a, b, c, f, e, d], [d, e, f, c, b, a] [c, b, a, f, e, d], [d, e, f, a, b, c] Simple Offset Assignment Simple Offset Assignment Address Register Assignment Memory Layout Permutations This is an optimal address register assignment These are optimal simple offset assignments All possible Memory Layout Permutations (all have cost > 4) Optimal Layout: {b, c, a, d, e, f} with cost = 4 is not found

35 Outline Background Traditional Approach to Offset Assignment Simple Offset Assignment Address-Register Assignment Improving the Problem Model Optimal Address-Code Generation Memory Layout Permutations Evaluating Current Heuristics Methodology Results Conclusions and Future Work

36 Experimental Methodology Evaluating the Solution Space Testcases are DSP code kernels from the UTDSP benchmark suite. Use gcc to obtain access sequences. The quality of a memory layout is evaluated using the minimum-cost circulation technique. The entire solution space is found for each access sequence, to be used as a point of reference. Basic Block Compile with gcc Access Sequence Compute Overhead of All Layouts using Minimum-Cost Flow KernelAccessesVariablesPossible # of layouts iir_arr21820,160 iir_arr_swp3312239,500,800 latnrm_arr_swp30101,824,400 latnrm_ptr30101,824,400 latnrm_ptr_swp30101,824,400

37 Experimental Methodology Evaluating Current Heuristics Identified and implemented three Address-Register Assignment heuristic algorithms: Leupers Sugino Zhuang LeupersSuginoZhuang LiaoLeupersALOMAOFUB&B Access Sequence Sub-Sequences Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

38 Experimental Methodology Evaluating Current Heuristics Identified and implemented five Simple Offset Assignment heuristic algorithms: Liao Leupers ALOMA Order-First Use (OFU) Branch and Bound (B&B) LeupersSuginoZhuang LiaoLeupersALOMAOFUB&B Access Sequence Sub-Sequences Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

39 Experimental Methodology Evaluating Current Heuristics Each combination of ARA and SOA algorithm generates a set of sub-layouts. All possible memory layout permutations are generated, forming a set of memory layouts. Each memory layout is evaluated using the Minimum-Cost Circulation technique. LeupersSuginoZhuang LiaoLeupersALOMAOFUB&B Access Sequence Sub-Sequences Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

40 Results The 15 combinations of algorithms produce 15 distributions overhead values. The distributions are aggregated into one distribution. The aggregate distributions represent the solution space of all current algorithms.

41 Results Memory layouts have a significant impact on overhead. Some layouts have 100% higher overhead than the minimum. Over 99% of all layouts have an overhead that is 50% higher than the minimum.

42 Results Memory layouts produced by traditional approaches have a large range of possible overhead values -- sometimes the same as the entire solution space itself. In some cases, no combination of ARA and SOA heuristics can produce an optimal layout.

43 Results Memory layouts produced by traditional approaches have a large range of possible overhead values -- sometimes the same as the entire solution space itself. In some cases, no combination of ARA and SOA heuristics can produce an optimal layout.

44 Distribution of Overhead Values Testcase: iir_arr_swp -- infinite impulse response filter Overhead (cycles)ExhaustiveAlgorithmic 61440 71955772 815149172240 9217571576516 109047889510496 111041012262565 12216289040 Average Overhead10.519.6

45 Exhaustive Solution Space Testcase: iir_arr_swp -- infinite impulse response filter

46 Algorithmic Solution Space Testcase: iir_arr_swp -- infinite impulse response filter

47 Efficiency of SOA Algorithms For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values. The distributions can be aggregated to form a single distribution. LeupersSuginoZhuang LiaoLeupersALOMAOFUB&B Access Sequence Sub-Sequences Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

48 Efficiency of SOA Algorithms For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values. The distributions can be aggregated to form a single distribution. LeupersSuginoZhuang LiaoLeupersALOMAOFUB&B Access Sequence Sub-Sequences Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

49 Efficiency of SOA Algorithms For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values. The distributions can be aggregated to form a single distribution. LeupersSuginoZhuang LiaoLeupersALOMAOFUB&B Access Sequence Sub-Sequences Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

50 Efficiency of SOA Algorithms For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values. The distributions can be aggregated to form a single distribution. LeupersSuginoZhuang LiaoLeupersALOMAOFUB&B Access Sequence Sub-Sequences Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

51 Efficiency of SOA Algorithms For each SOA algorithm, combine with each of the 5 ARA algorithms to generate 5 distributions of overhead values. The distributions can be aggregated to form a single distribution. LeupersSuginoZhuang LiaoLeupersALOMAOFUB&B Access Sequence Sub-Sequences Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

52 Overhead (cycles)LiaoLeupersSuginoB&BOFU 600000 76610644 8293 3572931004 9960 11879602448 102154 212421541910 11619 354619354 1200000 Efficiency of SOA Algorithms Testcase: iir_arr_swp -- infinite impulse response filter

53 0 500 1000 1500 2000 2500 3000 67891011 Overhead (cycles) Frequency Liao Leupers Sugino BNB OFU

54 Evaluating SOA Algorithms Testcase: latnrm_ptr -- normalized lattice filter 0 500 1000 1500 2000 2500 678910 Overhead (Cycles) Frequency Liao Leupers Sugino BNB OFU

55 Efficiency of ARA Algorithms For each ARA algorithm, combine with each of the 3 SOA algorithms to generate 3 distributions of overhead values. The distributions can be aggregated to form a single distribution. LeupersSuginoZhuang LiaoLeupersALOMAOFUB&B Access Sequence Sub-Sequences Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

56 Efficiency of ARA Algorithms For each ARA algorithm, combine with each of the 3 SOA algorithms to generate 3 distributions of overhead values. The distributions can be aggregated to form a single distribution. LeupersSuginoZhuang LiaoLeupersALOMAOFUB&B Access Sequence Sub-Sequences Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

57 Efficiency of ARA Algorithms For each ARA algorithm, combine with each of the 3 SOA algorithms to generate 3 distributions of overhead values. The distributions can be aggregated to form a single distribution. LeupersSuginoZhuang LiaoLeupersALOMAOFUB&B Access Sequence Sub-Sequences Sub-Layouts Memory Layout Permutations Memory Layouts Compute Overhead for each layout via Minimum-Cost Circulation Distribution of Overhead values

58 Efficiency of ARA Algorithms Testcase: iir_arr_swp -- infinite impulse response filter Overhead (cycles)LeupersSuginoZhuang 6000 72619 82041483553 9208910183408 1047401265630 11256500 12000

59 Efficiency of ARA Algorithms Testcase: iir_arr_swp -- infinite impulse response filter 0 1000 2000 3000 4000 5000 6000 6789101112 Overhead (Cycles) Frequency Leupers Sugino Zhuang

60 Evaluating ARA Algorithms Testcase: latnrm_ptr -- normalized lattice filter 0 1000 2000 3000 4000 5000 6000 678910 Overhead (Cycles) Frequency Leupers Sugino Zhuang

61 Evaluating Offset Assignment Algorithms There is low variability between SOA algorithms -- may be attributed to small problem sizes. The choice of ARA algorithm has more impact on overhead. Much of the variability attributed to the different number of address registers used. For all combinations of SOA and ARA algorithms, the permutation of sub-layouts affects the overhead.

62 Outline Background Traditional Approach to Offset Assignment Simple Offset Assignment Address-Register Assignment Improving the Problem Model Optimal Address-Code Generation Memory Layout Permutations Evaluating Current Heuristics Methodology Results Conclusions and Future Work

63 Conclusions The objective is to minimize address-computation overhead. Given a fixed access sequence and memory layout, the minimum-cost circulation (MCC) technique can minimize overhead. Offset assignment algorithms should be evaluated with MCC. Offset assignment still has a significant impact on overhead. To be effective, current offset assignment algorithms (ARA,SOA) must address the Memory Layout Permutation problem.

64 Future Work A new algorithm is needed to generate memory layouts that will minimize overhead as computed by the Minimum-Cost Flow technique. Address-computation overhead must be minimized for loop bodies and for variables that are live between basic blocks and procedures.

65 References Gebotys, C.: DSP address optimization using a minimum cost circulation technique. Proceedings of the 1997 IEEE/ACM International Conference on Computer-Aided Design. 100-103. Leupers, R., Marwedel, P.: Algorithms for address assignment in DSP code generation. Proceedins of the 1996 IEEE/ACM International Conference on Computer-Aided Design. 109-112. Liao, S., Devadas, S., Keutzer, K., Tjiang, S., Wang, A.: Storage assignment to decrease code size. ACM Transactions of Programming Languages and Systems 18(3) (1996). 235-253. Sugino, N., Iimuro, S., Nishihara, A., Jujii, N.: DSP code optimization utilizing memory addressing operation. IEICE Transaction Fundamentals 8 (1996). 1217-1223. Zhuang, X., Lau, C., Pande, S.: Storage assignment optimizations through variable coalescence for embedded processors. Proceedings of the 2003 ACM SIGPLAN Conference on Language, Compiler, and Tools for Embedded Systems. 220-231. Bartley, D.H.: Optimizing stack frame accesses for processors with restricted addressing modes. Software – Practice & Experience 22(2) (2001). 158-172.

66 Questions?


Download ppt "Evaluation of Offset Assignment Heuristics Johnny Huynh, Jose Nelson Amaral, Paul Berube University of Alberta, Canada Sid-Ahmed-Ali Touati Universite."

Similar presentations


Ads by Google