Presentation is loading. Please wait.

Presentation is loading. Please wait.

New Algorithms for SIMD Alignment Liza Fireman - Technion Ayal Zaks – IBM Haifa Research Lab Erez Petrank – Microsoft Research & Technion.

Similar presentations


Presentation on theme: "New Algorithms for SIMD Alignment Liza Fireman - Technion Ayal Zaks – IBM Haifa Research Lab Erez Petrank – Microsoft Research & Technion."— Presentation transcript:

1 New Algorithms for SIMD Alignment Liza Fireman - Technion Ayal Zaks – IBM Haifa Research Lab Erez Petrank – Microsoft Research & Technion

2 Fireman, Petrank & ZaksSIMD Alignment Alg's 2 SIMD (Single Instruction Multiple Data) Support packed vector operations. + =

3 Fireman, Petrank & ZaksSIMD Alignment Alg's 3 SIMD (Single Instruction Multiple Data) Support packed vector operations. Widely used with multimedia extensions. –Altivec (IBM, Motorola), SSE (Intel). Manual programming for SIMD is error prone. Automatically generating optimized code for SIMD (auto-vectorization or simdization) is challenging, but promising. One challenge: satisfy the alignment constraint imposed by the hardware. –Altivec: 16-bytes registers are loaded from 16-bytes consecutive and aligned memory locations.

4 Fireman, Petrank & ZaksSIMD Alignment Alg's 4 Misaligned Streams Example: for (i = 0; i < 1000; i++) a[i] = b[i+1] +c[i+2]; The above code requires additional realignment operations. … … b[i-1] b[i]b[i+1]b[i+2] b[i+3] b[i+4] b[i+5]b[i+6]b[i+7] b[i]b[i+1]b[i+2] b[i+3] b[i+4] b[i+5]b[i+6]b[i+7] b[i+1]b[i+2] b[i+3] b[i+4]

5 Fireman, Petrank & ZaksSIMD Alignment Alg's 5 The SIMD Alignment Problem Given an expression, execute it with the minimum number of “shifts”. Requirements: –Input and output operands come with a specified alignment –The inputs to each operation have the same alignment Usually shifts are executed inside a loop and have a noticeable cost.

6 Fireman, Petrank & ZaksSIMD Alignment Alg's 6 A Graph Abstraction Represent expressions as graphs (standard). Annotate alignments of inputs and outputs. Solution provides alignments for the inner vertices (the operations): alignment of the operation inputs. Mapping of graph solutions to expression solutions (and vice versa) is easy if each array appears in a single alignment only.

7 Fireman, Petrank & ZaksSIMD Alignment Alg's 7 1 332 3 bcde a a[i+3]= b[i+1]*c[i+3] + d[i+3]*e[i+2] * * + 3 3 3 A (Tree) Example

8 Fireman, Petrank & ZaksSIMD Alignment Alg's 8 Previous Heuristics Several simple heuristics have been proposed to solve the alignment problem. The Zero-Shift Policy The Eager-Shift Policy The Lazy-Shift Policy The Majority Policy

9 Fireman, Petrank & ZaksSIMD Alignment Alg's 9 Talk Outline Introduction: SIMD alignment, graph abstraction, heuristics. Tree expressions: dynamic programming. Expressions with two alignments: node multi- way cuts. The general case Measurements Conclusions

10 Fireman, Petrank & ZaksSIMD Alignment Alg's 10 Two Interesting Special Cases Single-appearance tree expressions –Each array appears once in the input. Expressions with only two alignments –Each array appears with only one of the alignments We present two efficient algorithms that solve the problem optimally for these two cases.

11 Fireman, Petrank & ZaksSIMD Alignment Alg's 11 1 332 3 bcde a a[i+3]= b[i+1]*c[i+3] + d[i+3]*e[i+2] * * + A Tree Example

12 Optimal Algorithm for a Tree Dynamic programming. –Progressive local computations for the global optimum. l m j

13 Optimal Algorithm for a Tree Dynamic programming. –Progressive local computations for the global optimum. l m j 1 2 … k … i 1 2 … k … i 1 2 … k … i 1 2 … k … i

14 1 332 3 bcde a * * + 1 23 1 23 1 23 1 23 1 23 1 23 1 23 1 23 ∞ ∞ ∞ 0 ∞0∞ 0 0 ∞ ∞ ∞ 2 1 1 1 1 2 3 2 3 ∞ 2 ∞

15 Fireman, Petrank & ZaksSIMD Alignment Alg's 15 Complexity – Tree Expressions Traverse the tree nodes, for each possible alignment, do work for each incoming edge. Overall O(k|V|), where k is the number of possible alignment.

16 Fireman, Petrank & ZaksSIMD Alignment Alg's 16 1 It Doesn’t Work on DAGs + 1 * 1 0 + 0 0 12 1 1 0 12 1 1 1

17 Fireman, Petrank & ZaksSIMD Alignment Alg's 17 Two-Alignments Expressions Not necessarily trees. Only two alignments in the expression.

18 Fireman, Petrank & ZaksSIMD Alignment Alg's 18 1 100 0 bcde f * * + 0 1 1 1 a * 1 * 0 + 0 + 0 for (i = 0; i < 1000; i++) f[i] = (b[i+1]*a[i+1] + c[i+1]*a[i+1]) + (d[i]*a[i+1] + e[i]*a[i+1])

19 Fireman, Petrank & ZaksSIMD Alignment Alg's 19 1 100 0 bcde f * * + 0 1 1 1 a * 1 * 0 + 0 + 0 for (i = 0; i < 1000; i++) f[i] = (b[i+1]*a[i+1] + c[i+1]*a[i+1]) + (d[i]*a[i+1] + e[i]*a[i+1]) Only a single shift required here.

20 Fireman, Petrank & ZaksSIMD Alignment Alg's 20 2 2 3 3 1 bc e f a 3 d Node Multi-way cut S1S1 S2S2 S3S3

21 Fireman, Petrank & ZaksSIMD Alignment Alg's 21 Choosing a node for the cut means: shift after executing it. To make sure that all inputs of an operation get aligned: link them all to each other ! –Moral graphs Using Node Multiway Cuts 01 1 ac d 0 b *

22 1 100 0 bcde f * * + 1 a * * + + S0S0 S1S1 1 1 1 1 0 0 0 0 0 0 0 1 1

23 Fireman, Petrank & ZaksSIMD Alignment Alg's 23 1 100 0 bcde f * * + 0 1 1 1 a * 1 * 0 + 0 + 0 for (i = 0; i < 1000; i++) f[i] = (b[i+1]*a[i+1] + c[i+1]*a[i+1]) + (d[i]*a[i+1] + e[i]*a[i+1]) Cost: 2 shifts Each cut node implies one shift

24 Fireman, Petrank & ZaksSIMD Alignment Alg's 24 The Algorithm Create the modified graph. Find a 2-way min-cut via max-flow algorithms. Complexity: min-cut on modified graph O(|V| 4 ). 3-way and up node-cuts are NP-Hard. The node-cut and edge-cut problems are different from the SIMD alignment problem. But relations exist. Derive approximation alg’s for SIMD from approx. alg. for node-cut and edge-cut. (See paper.) No NP-Completeness result known for this problem.

25 Fireman, Petrank & ZaksSIMD Alignment Alg's 25 Measurements Part 1: How much does it cost to shift ? Part 2: Generate Random graphs and check: OPT/HEU for random graphs –Single-appearance trees –2-alignment DAGS OPT = # shifts used by the optimal solution HEU = # shifts used by the best heuristic

26 Fireman, Petrank & ZaksSIMD Alignment Alg's 26 2 2 e f 1 1 cd 1 1 ab Part 1: cost of shift Cost of best heuristic: 2. Cost of optimal solution: 1

27 Fireman, Petrank & ZaksSIMD Alignment Alg's 27 2 2 e f 1 1 cd 1 1 ab Part 1: cost of shift Cost of best heuristic: 2. Cost of optimal solution: 1 6% runtime improvement

28 Fireman, Petrank & ZaksSIMD Alignment Alg's 28 Random Trees: OPT/HEU

29 Fireman, Petrank & ZaksSIMD Alignment Alg's 29 Random Layered Graphs: OPT/HEY w – width of layered graph d – graph ’ s depth

30 Fireman, Petrank & ZaksSIMD Alignment Alg's 30 Related Work [Eichnberg-Wu-O’Brien PLDI’04]: –set of alignment heuristics. –Code generation (can use our algorihtms). [Wu-Eichnberg-Wang CGO 2005] –Runtime alignments Several compilers (e.g., GCC, VAST, compilers for SSE) use the zero-shift policy. [Ren-Wu-Padua PLDI 2006] handle strides > 1. Much literature on distributing data to processors.

31 Fireman, Petrank & ZaksSIMD Alignment Alg's 31 Summary The SIMD alignment problem is important. Previously only heuristics were used We propose optimal algorithms for: –single-appearance tree expressions –expressions with only two alignments Guaranteed approximation ratio for the general case. Measurements show that optimizations are effective. Future work: is SIMD-Alignment NP-Complete, or can you solve it? –More special cases?


Download ppt "New Algorithms for SIMD Alignment Liza Fireman - Technion Ayal Zaks – IBM Haifa Research Lab Erez Petrank – Microsoft Research & Technion."

Similar presentations


Ads by Google