Presentation is loading. Please wait.

Presentation is loading. Please wait.

Static Single Assignment Form in the COINS Compiler Infrastructure - Current Status and Background - Masataka Sassa, Toshiharu Nakaya, Masaki Kohama (Tokyo.

Similar presentations


Presentation on theme: "Static Single Assignment Form in the COINS Compiler Infrastructure - Current Status and Background - Masataka Sassa, Toshiharu Nakaya, Masaki Kohama (Tokyo."— Presentation transcript:

1 Static Single Assignment Form in the COINS Compiler Infrastructure - Current Status and Background - Masataka Sassa, Toshiharu Nakaya, Masaki Kohama (Tokyo Institute of Technology) Takeaki Fukuoka, Masahito Takahashi and Ikuo Nakata

2 0. COINS infrastructure and the SSA form 1. Current optimization using SSA form in COINS 2.A comparison of SSA translation algorithms 3.A comparison of SSA back translation algorithms 4.A survey of compiler infrastructures Outline Background Static single assignment (SSA) form facilitates compiler optimizations. Compiler infrastructure facilitates compiler development.

3 0. COINS infrastructure and Static Single Assignment Form (SSA Form)

4 COINS compiler infrastructure Multiple source languages Retargetable Two intermediate form, HIR and LIR Optimizations Parallelization C generation, source-to- source translation Written in Java 2000~ developed by Japanese institutions under Grant of the Ministry High Level Intermediate Representation (HIR) Basic analyzer & optimizer HIR to LIR Low Level Intermediate Representation (LIR) SSA optimizer Code generator C frontend Advanced optimizer Basic parallelizer frontend C generation SPARC SIMD parallelizer x86 New machine Fortran C New language COpenMP Fortran frontend C generation C

5 1: a = x + y 2: a = a + 3 3: b = x + y Static Single Assignment (SSA) Form (a)Normal (conventional) form (source program or internal form) 1: a 1 = x 0 + y 0 2: a 2 = a 1 + 3 3: b 1 = x 0 + y 0 (b) SSA form SSA form is a recently proposed internal representation where each use of a variable has a single definition point. Indices are attached to variables so that their definitions become unique.

6 1: a = x + y 2: a = a + 3 3: b = x + y Optimization in Static Single Assignment (SSA) Form (a) Normal form 1: a 1 = x 0 + y 0 2: a 2 = a 1 + 3 3: b 1 = x 0 + y 0 (b) SSA form 1: a 1 = x 0 + y 0 2: a 2 = a 1 + 3 3: b 1 = a 1 (c) After SSA form optimization 1: a1 = x0 + y0 2: a2 = a1 + 3 3: b1 = a1 (d) Optimized normal form SSA translation Optimization in SSA form (common subexpression elimination) SSA back translation SSA form is becoming increasingly popular in compilers, since it is suited for clear handling of dataflow analysis and optimization.

7 x1 = 1x2 = 2 x3 =   (x1;L1, x2:L2) … = x3 x = 1x = 2 … = x L3 L2L1 L2 L3 (b) SSA form(a) Normal form Translating into SSA form (SSA translation)

8 x1 = 1x2 = 2 x3 =   (x1;L1, x2:L2) … = x3 x1 = 1 x3 = x1 x2 = 2 x3 = x2 … = x3 L3 L2L1 L2 L3 (a) SSA form(b) Normal form Translating back from SSA form (SSA back translation)

9 1. SSA form module in the COINS compiler infrastructure

10 High Level Intermediate Representation (HIR) Basic analyzer & optimizer HIR to LIR Low Level Intermediate Representation (LIR) SSA optimizer Code generator C frontend Advanced optimizer Basic parallelizer frontend C generation SPARC SIMD parallelizer x86 New machine Fortran C New language COpenMP Fortran frontend COINS compiler infrastructure C generation C

11 Low level Intermediate Representation (LIR) SSA optimization module Code generation Source program object code LIR to SSA translation (3 variations) LIR in SSA SSA basic optimization com subexp elimination copy propagation cond const propagation dead code elimination Optimized LIR in SSA SSA to LIR back translation (2 variations) + 2 coalescing 12,000 lines transformation on SSA copy folding dead phi elim edge splitting SSA optimization module in COINS

12 Outline of SSA module in COINS Translation into and back from SSA form on Low Level Intermediate Representation (LIR) ‐ SSA translation: Use dominance frontier [Cytron et al. 91] ‐ SSA back translation: [Sreedhar et al. 99] ‐ Basic optimization on SSA form: dead code elimination, copy propagation, common subexpression elimination, conditional constant propagation Useful transformation as an infrastructure for SSA form optimization ‐ Copy folding at SSA translation time, critical edge removal on control flow graph … ‐ Each variation and transformation can be made selectively Preliminary result ‐ 1.43 times faster than COINS w/o optimization ‐ 1.25 times faster than gcc w/o optimization

13 2. A comparison of two major algorithms for SSA translation Algorithm by Cytron [1991] Dominance frontier Algorithm by Sreedhar [1995] DJ-graph Comparison made to decide the algorithm to be included in COINS

14 x1 = 1x2 = 2 x3 =   (x1;L1, x2:L2) … = x3 x = 1x = 2 … = x L3 L2L1 L2 L3 (b) SSA form(a) Normal form Translating into SSA form (SSA translation)

15 0 100 200 300 400 500 600 700 800 900 01000200030004000 No. of nodes of control flow graph Translation time (milli sec) Cytron Sreedhar SSA translation time (usual programs) (The gap is due to the garbage collection)

16 (b) ladder graph(a) nested loop Peculiar programs

17 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 01000200030004000 No. of nodes of control flow graph Translation time (milli sec) Cytron Sreedhar SSA translation time (nested loop programs)

18 0 500 1000 1500 2000 2500 3000 3500 01000200030004000 No. of nodes of control flow graph Translation time (milli sec) Cytron Sreedhar SSA translation time (ladder graph programs)

19 3. A comparison of two major algorithms for SSA back translation Algorithm by Briggs [1998] Insert copy statements Algorithm by Sreedhar [1999] Eliminate interference There have been no studies of comparison Comparison made on COINS

20 x1 = 1x2 = 2 x3 =   (x1;L1, x2:L2) … = x3 x1 = 1 x3 = x1 x2 = 2 x3 = x2 … = x3 L3 L2L1 L2 L3 (a) SSA form(b) Normal form Translating back from SSA form (SSA back translation)

21 x0 = 1 x1 =  (x0, x2) y = x1 x2 = 2 return y x0 = 1 x1 =  (x0, x2) x2 = 2 return x1 x0 = 1 x1 = x0 x2 = 2 x1 = x2 return x1 not correct block1 block3 block2 block1 block3 block2 block1 block3 block2 Copy propagation Back translation by na ï ve method Problems of naïve SSA back translation (lost copy problem)

22 x0 = 1 x1 =  (x0, x2) x2 = 2 return x1 x0 = 1 x1 = x0 x2 = 2 x1 = x2 return temp block1 block3 block2 block1 block3 block2 temp = x1 (a) SSA form(b) normal form after back translation live range of x1 live range of temp To remedy these problems... (i) SSA back translation algorithm by Briggs

23 x0 = 1 x1 =  (x0, x2) x2 = 2 return x1 live range of x0 x1 x2 x0 = 1 x1’ =  (x0, x2) x1 = x1’ x2 = 2 return x1 {x0, x1 ’, x2}  A x1 = A A = 2 A = 1 block1 block3 block2 block1 block3 block2 return x1 (a) SSA form(b) eliminating interference (c) normal form after back translation live range of x0 x1' x2 block1 block3 block2 (ii) SSA back translation algorithm by Sreedhar

24 SSA form BriggsBriggs + Coalescing Sreedhar Lost copy031 [1] Simple ordering052 [2] Swap075 [5]3 [3] Swap-lost0107 [7]4 [4] do096 [4]4 [2] fib040 [0] GCD095 [2] Selection Sort090 [0] Hige Swap083 [3]4 [4] No. of copies [no. of copies in loops] Empirical comparison of SSA back translation

25 Summary SSA form module of the COINS infrastructure Empirical comparison of algorithms for SSA translation gave criterion to make a good choice Empirical comparison of algorithms for SSA back translation clarified there is no single algorithm which gives optimal result

26 4.A Survey of Compiler Infrastructures SUIF * Machine SUIF * Zephyr * Scale gcc COINS Saiki & Gondow * National Compiler Infrastructure (NCI) project

27 An Overview of the SUIF2 System Monica Lam Stanford University http://suif.stanford.edu/ [PLDI 2000 tutorial]

28 OSUIF The SUIF System Interprocedural Analysis Parallelization Locality Opt Scalar opt Inst. Scheduling Register Allocation Alpha x86 CMachSUIF PGI Fortran SUIF2 JavaEDG C++ * C++ OSUIF to SUIF is incomplete EDG C *

29 Overview of SUIF Components (I) Basic Infrastructure Extensible IR and utilities Hoof: Suif object specification lang Standard IR Modular compiler system Pass submodule Data structures (e.g. hash tables) FE: PGI Fortran, EDG C/C++, Java SUIF1 / SUIF2 translators, S2c Interactive compilation: suifdriver Statement dismantlers SUIF IR consistency checkers Suifbrowser, TCL visual shell Linker Backend Infrastructure MachSUIF program representation Optimization framework Scalar optimizations common subexpression elimination deadcode elimination peephole optimizations Graph coloring register allocation Alpha and x86 backends Object-oriented Infrastructure OSUIF representationJava OSUIF -> SUIF lowering object layout and method dispatch

30 Overview of SUIF Components (II) High-Level Analysis Infrastructure Graphs, sccs Iterated dominance frontier Dot graph output Region framework Interprocedural analysis framework Presburger arithmetic (omega) Farkas lemma Gaussian elimination package Intraprocedural analyses copy propagation deadcode elimination Steensgaard’s alias analysis Call graph Control flow graphs Interprocedural region-based analyses: array dependence & privatization scalar reduction & privatization Interprocedural parallelization Affine partitioning for parallelism & locality unifies: unimodular transform (interchange, reversal, skewing) fusion, fission statement reindexing and scaling Blocking for nonperfectly nested loops

31 Memory/Memory vs File/File Passes Suif-file1 Suif-file2 Suif-file3 Suif-file4 driver+module1 driver+module2 driver+module3 Suif-file1 Suif-file4 Suifdriver imports/executes module1 module2 module3 A series of stand-alone programs A driver that imports & applies modules to program in memory COMPILER

32 Machine SUIF Michael D. Smith Harvard University Division of Engineering and Applied Sciences June 2000 © Copyright by Michael D. Smith 2000 All rights reserved.

33 Typical Backend Flow lower optimize realize optimize finalize layout output Object, assembly, or C code SUIF intermediate form Parameter bindings from dynamically-linked target libraries Machine-SUIF IR for real machine Machine-SUIF IR for idealized machine (suifvm)

34 parameterization libcfglibutil libbvd dcecse layout alpha lib x86 lib alpha lib libcfglibutil libbvd dcecse layout parameterization x86 lib alpha lib libcfglibutil libbvd dcecse layout parameterization x86 lib yfc mlib suif machsuif mlib suifvm lib deco mlib opi Target Parameterization Analysis/optimization passes written without direct encoding of target details Target details encapsulated in OPI functions and data structures Machine-SUIF passes work without modification on disparate targets

35 suif machsuif mlib opi parameterization alpha lib suifvm lib x86 lib libcfglibutil libbvd yfc mlib deco mlib dcecse layout Substrate Independence Optimizations, analyses, and target libraries are substrate-independent Machine SUIF is built on top of SUIF You could replace SUIF with Your Favorite Compiler Deco project at Harvard uses this approach

36 Fortran program C analyzer Java analyzer High level Intermediate Representation (HIR) Basic optimizer Data flow analyzer Common subexp elim Dead code elim HIR to LIR Low level intermediate representation (LIR) SSA optimizer LIR-SSA transformation Basic optimizer Advanced optimizer Code generator Sparc code generator New language analyzer C program Advanced optimizer Alias analyzer Loop optimizer Parallelization Java program New language program C program OpenMP program SPARC code SIMD parallelizer SIMD instruction selection based on machine descr x86 description Machine dependent optimizer SPARC description PowerPC description Register allocator Instruction scheduler X86 code New machine code Compiler control (schedule module invocation) Symbol table X86 description for code generation SPARC description for code generation New machine description for code generation Fortran analyzer Loop analyzer Coarse grain parallelizer Loop parallelizer C generation Phase 1 2000-2002 Phase 2 2003-2004 by COINS’s users source/ target Code generation based on machine description Overall structure of COINS

37 Features of COINS Multiple source languages Multiple target architectures HIR: abstract syntax tree with attributes LIR: register transfer level with formal specification Enabling source-to-source translation and application to software engineering Scalar analysis & optimization (in usual form and in SSA form) Basic parallelization (e.g. OpenMP) SIMD parallelization Code generators generated from machine description Written in Java (early error detection), publicly available [http://www.coins-project.org/]

38

39 x1=… = x1 y1=… z1=… x2=… = x2 y2=… z2=… = y1 = y2 x3=  (x1,x2) y3=  (y1,y2) z3=  (z1,z2) = z3 x1=… = x1 y1=… z1=… x2=… = x2 y2=… z2=… = y1 = y2 y3=  (y1,y2) z3=  (z1,z2) = z3 x1=… = x1 y1=… z1=… x2=… = x2 y2=… z2=… = y1 = y2 z3=  (z1,z2) = z3 x =… = x y =… z =… x =… = x y =… z =… = y = z Minimal SSA form Semi-pruned SSA formPruned SSA form Normal form Translating into SSA form (SSA translation)

40 Previous work: SSA form in compiler infrastructure SUIF (Stanford Univ.): no SSA form machine SUIF (Harvard Univ.): only one optimization in SSA form Scale (Univ. Massachusetts): a couple of SSA form optimizations. But it generates only C programs, and cannot generate machine code like in COINS. GCC: some attempts but experimental Only COINS will have full support of SSA form as a compiler infrastructure

41 Example of a Hoof Definition Uniform data access functions (get_ & set_) Automatic generation of meta class information etc. class New : public SuifObject { public: int get_x(); void set_x(int the_value); ~New(); void print(…); static const Lstring get_class_name(); … } concrete New { int x; } hoof C++

42 Input program for (i=0; i<10; i=i+1) { a[i]=i;... } HIR (abstract syntax tree with attributes) (for (assign ) (cmpLT ) (block (assign (subs > ).... ) (assign (add ) ) ) HIR (high-level intermediate representation)

43 LIR (set (mem (static (var i))) (const 0)) (labeldef _lab5) (jumpc (tstlt (mem (static (var i))) (const 10)) (list (label _lab6) (label _lab4)))) (labeldef _lab6) (set (mem (add (static (var a)) (mul (mem (static (var i))) (const 4)))) (mem (static (var i)))... Source program for (i=0; i<10; i=i+1){ a[i]=i;... } LIR (low-level intermediate representation)


Download ppt "Static Single Assignment Form in the COINS Compiler Infrastructure - Current Status and Background - Masataka Sassa, Toshiharu Nakaya, Masaki Kohama (Tokyo."

Similar presentations


Ads by Google