Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spiral: an empirical search system for program generation and optimization David Padua Department of Computer Science University of Illinois at Urbana-

Similar presentations


Presentation on theme: "Spiral: an empirical search system for program generation and optimization David Padua Department of Computer Science University of Illinois at Urbana-"— Presentation transcript:

1 Spiral: an empirical search system for program generation and optimization David Padua Department of Computer Science University of Illinois at Urbana- Champaign

2 2 Program optimization today The optimization phase of a compiler applies a series of transformations to achieve its objectives. The compiler uses the outcome of program analysis to determine which transformations are correctness-preserving. Compiler transformation and analysis techniques are reasonably well-understood. Since many of the compiler optimization problems have “exponential complexity”, heuristics are needed to drive the application of transformations.

3 3 Optimization drivers Developing driving heuristics is laborious. One reason for this is the lack of methodologies and tools to build optimization drivers. As a result, although there is much in common among compilers, their optimization phases are usually re- implemented from scratch.

4 4 Optimization drivers (Cont.) A consequence: Machines and languages not widely popular usually lack good compilers. (some popular systems too) –DSP, network processor, and embedded system programming is often done in assembly language. –Evaluation of new architectural features requiring compiler involvement is not always meaningful. –Languages such as APL, MATLAB, LISP, … suffer from chronic low performance. –New languages difficult to introduce (although compilers are only a part of the problem).

5 5 A methodology based on the notion of search space Program transformations often have several possible target versions. –Loop unrolling: How many times –Loop tiling: size of the tile. –Loop interchanging: order of loop headers –Register allocation: which registers are stored in memory to give room for new values. The process of optimization can be seen as a search in the space of possible program versions.

6 6 Empirical search Iterative compilation Perhaps the simplest application of the search space model is empirical search where several versions are generated and executed on the target machine. The fastest version is selected. T. Kisuki, P.M.W. Knijnenburg, M.F.P. O'Boyle, and H.A.G. Wijshoff. Iterative compilation in program optimization. In Proc. CPC2000, pages 35-44, 2000

7 7 Empirical search and traditional compilers Searching is not a new approach and compilers have applied it in the past, but using architectural prediction models instead of actual runs: –KAP searched for best loop header order –SGI’s MIPS-pro and IBM PowerPC compilers select the best degree of unrolling.

8 8 Limitations of empirical search Empirical search is conceptually simple and portable. However, –the search space tends to be too large specially when several transformations are combined. –It is not clear how to apply this method when program behavior is a function of the input data set. Need heuristics/search strategies. Availability of performance “formulas” could help evaluate transformations across input data sets and facilitate search.

9 9 Compilers and Library Generators Source Program Internal representation Algorithm Program Transformation Program Generation

10 10 Empirical search in program/library generators Examples: –FFTW [M. Frigo, S. Johnson] –Spiral (FFT/signal processing) [J. Moura (CMU), M. Veloso (CMU), J. Johnson (Drexel), …] –ATLAS (linear algebra)(R. Whaley, A. Petitet, J. Dongarra) –PHiPAC[J. Demmel et al]

11 11

12 12 SPIRAL The approach: –Mathematical formulation of signal processing algorithms –Automatically generate algorithm versions –A generalization of the well-known FFTW –Use compiler technique to translate formulas into implementations –Adapt to the target platform by searching for the optimal version

13 13

14 14 Fast DSP Algorithms As Matrix Factorizations Computing y = F 4 x is carried out as: t 1 = A 4 x ( permutation ) t 2 = A 3 t 1 ( two F 2 ’s ) t 3 = A 2 t 2 ( diagonal scaling ) y = A 1 t 3 ( two F 2 ’s ) The cost is reduced because A 1, A 2, A 3 and A 4 are structured sparse matrices.

15 15 Tensor Product Formulation of Cooley-Tuckey Theorem Example is a diagonal matrix is a stride permutation

16 16 Formulas for Matrix Factorizations R1 where n = n 1 …n k, n i- = n 1 …n i-1, n i+ = n i+1 …n k R2R2

17 17 Factorization Trees F2F2 F2F2 F2F2 F 8 : R 1 F 4 : R 1 F2F2 F2F2 F2F2 F 8 : R 1 F 4 : R 1 F2F2 F2F2 F2F2 F 8 : R 2 Different computation order Different data access pattern Different performance

18 18 Walsh-Hadamard Transform

19 19 Optimal Factorization Trees Depend on the platform Difficult to deduct Can be found by empirical search –The search space is very large –Different search algorithms Random, DP, GA, hill-climbing, exhaustive

20 20

21 21

22 22 Size of Search Space N# of formulasN 2121 12929 20793 2 12 10 103049 2323 32 11 518859 2424 112 12 2646723 2525 452 13 13649969 2626 1972 14 71039373 2727 9032 15 372693519 2828 42792 16 1968801519

23 23

24 24

25 25 More Search Choices Programming: –Loop unrolling –Memory allocation –In-lining Platform choices: –Compiler optimization options

26 26 The SPIRAL System Formula Generator SPL Compiler Performance Evaluation Search Engine DSP Transform Target machine DSP Library SPL Program C/FORTRAN Programs

27 27 Spiral Spiral does the factorization at installation time and generates one library routine for each size. FFTW only generates codelets (input size  64) and at run time performs the factorization.

28 28 A Simple SPL Program DefinitionDirectiveFormulaComment ; This is a simple SPL program (define A (matrix(1 2)(2 1))) (define B (diagonal(3 3)) #subname simple (tensor (I 2)(compose A B)) ;; This is an invisible comment

29 29 Templates (template (F n)[ n >= 1 ] ( do i=0,n-1 y(i)=0 do j=0,n-1 y(i)=y(i)+W(n,i*j)*x(j) end end )) Pattern I-code Condition

30 30 SPL Compiler Parsing Intermediate Code Generation Intermediate Code Restructuring Target Code Generation Abstract Syntax Tree I-Code FORTRAN, C Template Table SPL FormulaTemplate Definition Optimization I-Code

31 31 Intermediate Code Restructuring Loop unrolling –Degree of unrolling can be controlled globally or case by case Scalar function evaluation –Replace scalar functions with constant value or array access Type conversion –Type of input data: real or complex –Type of arithmetic: real or complex –Same SPL formula, different C/Fortran programs

32 32

33 33 Optimizations SPL Compiler C/Fortran Compiler Formula Generator * High-level scheduling * Loop transformation * High-level optimizations - Constant folding - Copy propagation - CSE - Dead code elimination * Low-level optimizations - Instruction scheduling - Register allocation

34 34 Basic Optimizations (FFT, N=2 5, SPARC, f77 –fast –O5)

35 35 Basic Optimizations (FFT, N=2 5, MIPS, f77 –O3 )

36 36 Basic Optimizations (FFT, N=2 5, PII, g77 –O6 –malign-double)

37 37 Performance Evaluation Evaluation the performance of the code generated by the SPL compiler Platforms: SPARC, MIPS, PII Search strategy: dynamic programming

38 38 Pseudo MFlops Estimation of the # of FP operations: –FFT (radix-2): 5nlog 2 n – 10 + 16

39 39 FFT Performance (N=2 1 to 2 6 ) SPARCMIPS PII

40 40 FFT Performance (N=2 7 to 2 20 ) SPARCMIPS PII

41 41 Important Questions What lessons can be learned from this work? Can this approach be used in other domains ?

42 42


Download ppt "Spiral: an empirical search system for program generation and optimization David Padua Department of Computer Science University of Illinois at Urbana-"

Similar presentations


Ads by Google