1 Basic Block and Trace Chapter 8. 2 Tree IR (1) Semantic gap (2) IR is not proper for optimization analysis Machine Languages Eg: - Some expressions.

Slides:



Advertisements
Similar presentations
CS412/413 Introduction to Compilers Radu Rugina Lecture 27: More Instruction Selection 03 Apr 02.
Advertisements

CSE 5317/4305 L9: Instruction Selection1 Instruction Selection Leonidas Fegaras.
Tiling Examples for X86 ISA Slides Selected from Radu Ruginas CS412/413 Lecture on Instruction Selection at Cornell.
Register Allocation COS 320 David Walker (with thanks to Andrew Myers for many of these slides)
Target Code Generation
Register Allocation CS 320 David Walker (with thanks to Andrew Myers for most of the content of these slides)
1 CS 201 Compiler Construction Machine Code Generation.
Chapter 8 ICS 412. Code Generation Final phase of a compiler construction. It generates executable code for a target machine. A compiler may instead generate.
Code Generation Steve Johnson. May 23, 2005Copyright (c) Stephen C. Johnson The Problem Given an expression tree and a machine architecture, generate.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Generation I Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
COMPILERS Basic Blocks and Traces hussein suleman uct csc3005h 2006.
Program Representations. Representing programs Goals.
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
CS412/413 Introduction to Compilers Radu Rugina Lecture 16: Efficient Translation to Low IR 25 Feb 02.
Informationsteknologi Saturday, September 29, 2007 Computer Architecture I - Class 41 Today’s class More assembly language programming.
Lecture 11 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD
Basic Blocks Mooly Sagiv Schrierber Wed 10:00-12:00 html:// Chapter 8.
Execution of an instruction
Register Allocation (Slides from Andrew Myers). Main idea Want to replace temporary variables with some fixed set of registers First: need to know which.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
Lecture 23 Basic Blocks Topics Code Generation Readings: 9 April 17, 2006 CSCE 531 Compiler Construction.
CS 536 Spring Code generation I Lecture 20.
1 Basic Block, Trace and Instruction Selection Chapter 8, 9.
1 Basic Block and Trace Chapter 8. 2 Tree IR (1) Semantic gap (2) IR is not proper for optimization analysis Machine Languages Tree representation =>
Compiler Construction Recap Rina Zviel-Girshin and Ohad Shacham School of Computer Science Tel-Aviv University.
Instruction Selection Mooly Sagiv Schrierber Wed 10:00-12:00 html://
4/6/08Prof. Hilfinger CS164 Lecture 291 Code Generation Lecture 29 (based on slides by R. Bodik)
Code Generation Mooly Sagiv html:// Chapter 4.
Princess Sumaya Univ. Computer Engineering Dept. Chapter 2:
RISC:Reduced Instruction Set Computing. Overview What is RISC architecture? How did RISC evolve? How does RISC use instruction pipelining? How does RISC.
Introduction For some compiler, the intermediate code is a pseudo code of a virtual machine. Interpreter of the virtual machine is invoked to execute the.
Linked Lists in MIPS Let’s see how singly linked lists are implemented in MIPS on MP2, we have a special type of doubly linked list Each node consists.
Instruction Selection II CS 671 February 26, 2008.
1 Code Generation Part II Chapter 8 (1 st ed. Ch.9) COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University,
1 Code Generation Part II Chapter 9 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2005.
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
Chapter 10 The Assembly Process. What Assemblers Do Translates assembly language into machine code. Assigns addresses to all symbolic labels (variables.
Execution of an instruction
Lecture 4: MIPS Instruction Set
Microprocessors The ia32 User Instruction Set Jan 31st, 2002.
Chapter 1 Introduction Study Goals: Master: the phases of a compiler Understand: what is a compiler Know: interpreter,compiler structure.
Chapter# 6 Code generation.  The final phase in our compiler model is the code generator.  It takes as input the intermediate representation(IR) produced.
COMPILERS Instruction Selection hussein suleman uct csc305h 2005.
Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.
1 A Simple Syntax-Directed Translator CS308 Compiler Theory.
Instruction Selection Mooly Sagiv Schrierber Wed 10:00-12:00 html://
COMPILERS Instruction Selection hussein suleman uct csc3003s 2007.
What is a program? A sequence of steps
Compiler Principles Fall Compiler Principles Lecture 8: Intermediate Representation Roman Manevich Ben-Gurion University.
Instruction Selection CS 671 February 19, CS 671 – Spring The Back End Essential tasks: Instruction selection Map low-level IR to actual.
©SoftMoore ConsultingSlide 1 Code Optimization. ©SoftMoore ConsultingSlide 2 Code Optimization Code generation techniques and transformations that result.
CS 404 Introduction to Compiler Design
Assembly language.
COMPILERS Instruction Selection
Optimization Code Optimization ©SoftMoore Consulting.
RISC Concepts, MIPS ISA Logic Design Tutorial 8.
COMPILERS Basic Blocks and Traces
COMPILERS Basic Blocks and Traces
Instructions - Type and Format
Lecture 4: MIPS Instruction Set
Code Generation.
Unit IV Code Generation
The University of Adelaide, School of Computer Science
Instruction Selection Hal Perkins Autumn 2005
Part II Instruction-Set Architecture
8 Code Generation Topics A simple code generator algorithm
Optimization 薛智文 (textbook ch# 9) 薛智文 96 Spring.
Code Generation Part II
Target Code Generation
Lecture 4: Instruction Set Design/Pipelining
Presentation transcript:

1 Basic Block and Trace Chapter 8

2 Tree IR (1) Semantic gap (2) IR is not proper for optimization analysis Machine Languages Eg: - Some expressions have side effects ESEQ, CALL  Tree representation => no execution order is assumed. - Semantic Gap CJUMP vs. Jump on Condition 2 targets1 target + “ fall through ”

3 Semantic Gap Continued - ESEQ within expression is inconvenient - evaluation order matters - CALL node within expression causes side effect ! - CALL node within the argument – expression of other CALL nodes will cause problem if the args of result are passed in the same (one) register. - Rewrite Tree into an equivalent tree(Canonical Form) SEQ S1S1 S2S2 S3S3 S4S4 S5S5 => S 1 ;S 2 ;S 3 ;S 4 ;S 5

4 Transformation Step 1: A tree is rewritten into a list of “ canonical trees ” without SEQ or ESEQ nodes. -> Tree. StmList linearize(Tree.Stm S); Step 2: Grouping into a set of “ basic blocks ” which contains no internal jumps or labels -> BasicBlocks Step 3: Basic Blocks are ordered into a set of “ traces ” in which every CJUMP is immediately followed by false label. -> Trace Schedule(BasicBlock b)

5 8.1 Canonical Trees Def : canonical trees as having following properties: 1. No SEQ or ESEQ 2. The parent of each CALL is either EXP(..) or MOVE(TEMP t, ….) => Separate SEQs and EXPressions

6 Transformations on ESEQ - move ESEQ to higher level. Eg. ESEQ S1S1 S2S2 e SEQ S1S1 S2S2 e ESEQ S e1e1 BINOP e2e2 op ESEQ S e1e1 MEM ESEQ S e1e1 JUMP ESEQ S e1e1 CJUMP opl1l1 l2l2 e ESEQ BINOP S ope1e1 e2e2 ESEQ S MEM e1e1 ESEQ JMP S e1e1 SEQ CJUMP S ope1e1 e2e2 l2l2 l1l1 Case 1: Case 2:

7 Case 3: ESEQ S e1e1 BINOP e2e2 op ESEQ S1 e2e2 CJUMP op ㅣ1ㅣ1 ㅣ2ㅣ2 e1e1 ESEQ MOVE S t BINOP opTEMPe2e2 e1e1 t SEQ S CJUMP ope2e2 ㅣ1ㅣ1 ㅣ2ㅣ2 ESEQ MOVE TEMP e1e1 t t

8 Case 4: When S does not affect e1 in case 3 and (s and e1 have not I/O) ESEQ S e1e1 BINOP e2e2 op ESEQ S BINOP ope1e1 e2e2 ESEQ S1 e2e2 CJUMP op ㅣ1ㅣ1 ㅣ2ㅣ2 e1e1 SEQ S CJUMP ope1e1 e2e2 ㅣ1ㅣ1 ㅣ2ㅣ2 if s,e 1 commute

9 How can we tell if tow expressions commute? MOVE(MEM(x),y)  MEM(z) x = z (aliased) not commute x z commute CONST(n) can commute with any Expression!! => Be Conservative !! ?? We don ’ t know yet!

10 General Rewriting Rules 1.Identify the subexpressions. 2.Pull the ESEQs out of the stm or exp. Ex: [e1,e2,ESEQ(S,e3)] -> (s 1 ;[e 1,e 2,e 3 ]) s 1, e 1,e 2 commute -> (SEQ (MOVE(t 1, e 1 ),SEQ(MOVE(t 2,e 2 ),s)); -> (SEQ(MOVE(t 1,e 1 ),s); [TEMP(t 1 ),e 2,e 3 ]  Reorder(ExpListexps) => (stms; ExpList)

11 MOVING CALLS TO TOPLEVEL CALL returns its result in the same register TEMP(Ri)  BINOP(+,CALL( … ),CALL( … )) Solution CALL(fun,args) -> ESEQ(MOVE(TEMP t,CALL()),TEMP t) Then eliminate ESEQ. => need extra TEMP(t) (registers) do_stm(MOVE(TEMP t new, CALL(f, args))) do_stm (EXP(CALL(f, args))) - will not reorder on CALL node - will reorder on f and args as the children of MOVE overwrite TEMP(RV)

12 A LINEAR LIST OF STATEMENTS S0[S0 ’ ; ] SEQ ab c a bc SEQ(SEQ(SEQ … ())) => a;b;c linear(stm s)

TAMING CONDITIONAL BRANCHES BASIC BLOCK a sequence of statements entered at the beginning exited at the end - The 1 st stmt is a LABEL - The last stmt is a JUMP a CJUMP - no other LABELs, JUMPs, CJUMPs.. F T CJUMP Cond CJUMP T: F: …… t. JUMP LABEL

14 Algorithm Scan from beginning to end - when Label is found, begin new Block - when JUMP or CJUMP is found, a block is ended - If block ends without JUMP or CJUMP, insert JUMP LABEL, LABEL ;  Epilogue block of Function. Label it as DONE and put JUMP DONE at the end of body of the function. -> Canon.BasicBlocks.

15 Trace: a sequence of stmts that could be consecutively executed during the execution of the program. We want a set of traces that exactly covers the program : one block in one trace. To reduce JUMPs, fewer traces are preferred !! Traces Exit

T F JUMP T->F F->T remove JUMP JUMP on False Idea !! TF

17 Algorithm 8.2 (Canon.Trace Schedule) Put all the blocks of the Program into a list Q. while Q is not empty Start a new(empty) trace, call it T. Remove the head element b from Q. while b is not marked Mark b; T <- T;b. Examine the successors of b. if there is nay unmarked successor C b <- C. END the current trace T.

18 Finishing Up - analysis and optimizations are efficient for basic blocks (not for stmts level) -Some local arrangement (1)CJUMP + false Label => OK (2)CJUMP + true Label => reverse condition (3)CJUMP l t,l f + no l t l f  CJUMP l t, l f ’ ; LABEL l f ’ ; JUMP l f  JUMP on true l t ; JUMP l f  chance to optimized !!  Finding optimal trace is not easy !!

19 Instruction Selection Chapter 9

20 What we are going to do. Tree IRmachine Instruction (Jouette Architecture or SPARC or MIPS or Pentium or T )  LOAD R1,e a (c); MEM CONST BINOP +eaea C

21 Machine Example - Jouette Architecture Register R 0 always contains zero ADD r i <- r j +r k + MUL r i <- r j *r k * SUB r i <- r j -r k - DIV r i <- r j /r k BINOP / ADDI r i <- r j +C + CONST + SUBI r i <- r j -C - CONST LOAD r i <- M[r j +C] + CONST MEM + CONST MEM  Instructions produces a result in a register => EXP r i TEMP

22 STORE M[r j +C] <-r i + CONST MEM + CONST MEM MOVE MOVEM M[r j ] <-M[r i ] MEM MOVE MEM  Execution of instructions produce side effects on Mem. => Stm

23 Tiling the IR tree ex: a[i]:= x i:register a,x:frame var 2 LOAD r 1 <-M[fp+a] 4 ADDI r 2 <- r MUL r 2 <- r i *r 2 6 ADD r 1 <- r 1 +r 2 8 LOAD r 2 <-M[fp+x] 9 STORE M[r 1 +0] <- r 2 * CONST x MEM FP + CONST a MEM FP + MOVE + MEM CONST 4 TEMP i

24 Another Solution ex: a[i]:= x i:register a,x:frame var 2 LOAD r 1 <-M[fp+a] 4 ADDI r 2 <- r MUL r 2 <- r i *r 2 6 ADD r 1 <- r 1 +r 2 8 ADDI r 2 <- fp+x 9 MOVEM M[r 1 ] <- M[r 2 ] * CONST x MEM FP + CONST a MEM FP + MOVE + MEM CONST 4 TEMP i

25 Or Another Tiles with a different set of tile-pattern 3 LOAD r 1 <-M[r 1 +0] 4 ADDI r 2 <- r MUL r 2 <- r i *r 2 6 ADD r 1 <- r 1 +r 2 8 ADD r 2 <- fp+ r 2 10 STORE M[r 1 +0] <- r 2 1 ADDI r 1 <- r 0 + a 2 ADD r 1 <- fp +r 1 7 ADDI r 2 <- r 0 + x 9 LOAD r 2 <-M[r 2 +0] * CONST x MEM FP + CONST a MEM FP + MOVE + MEM CONST 4 TEMP i

26 OPTIMAL and OPTIMUM TILINGS Optimum Tiling : one whose tiles sum to the lowest possible value.  cost of tile : instr. exe. time, # of bytes, Optimal Tiling : one where no two adjacent tiles can be combined into a single tile of lower cost.   then why we keep ? are enough. 3025

27 Algorithms for Instruction Selection 1.Optimal vs Optimum simplemaybe hard 2.CISC vs RISC (Complex Instr. Set Computer) tile size largesmall optimal >= optimumoptimal ~= optimum instruction cost variesalmost same! on addressing mode

28 Maximal Munch – optimal tiling algorithm 1.starting at root, find the largest tile that fits. 2.repeat step 1 for several subtrees which are generated(remain)!! 3. Generate instructions for each tile (which are in reverse order) => traverse tree of tiles in post-order When several tiles can be matched, select the largest tile(which covers the most nodes). If same tiles are matched, choose an arbitrary one.

29 Implementation See Program 9.3 for example(p181)  case statements for each root type!!  There is at least one tile for each type of root node!!

30 MEM B 30+2 A Dynamic Programming – finding optimum tiling  finding optimum solutions based on optimum solutions of each subproblem!! 1. Assign cost to every node in the tree. 2. Find several matches. 3. Compute the cost for each match. 4. Choose the best one. 5. Let the cost be the value of node = =?

31 + MEM ADDI + + CONST CONST2CONST1 + CONST MEM + CONST MEM LOAD r i <-M[r j ] LOAD r i <-M[r j +c] cost Example MEM node

32 Tree Grammars Example : Schizo-Jouette machine ADD d i <- d j +d k MUL d i <- d j *d k SUB d i <- d j - d k DIV d i <- d j /d k d d+ d d d* d d d- d d d/ d ADDI d i <- d j +C SUBI d i <- d j -C d d+ CONST d+ dCONSTd d d- CONST MOVEA d j <- a i MOVED a j <- d i da ad A generalization of DP for machines with complex instruction set and several classes of registers and addressing modes. a i : address register d j : data register

33 LOAD d i <-M[a j +C] STORE M[a j +C]<- d i MOVEM M[a j ] <- M[a i ] + CONST dMEM + CONST dMEM aa a + CONST MEM + CONST MEM MOVE aa a dddd MEM MOVE a MEM a

34 Use Context-free grammar to describe the tiles; ex: nonterminals : statement d : data a : address d -> MEM(+(a,CONST)) d-> MEM(+(CONST,a)) d-> MEM(CONST) d-> MEM(a) d -> a a -> d MOVEA MOVED LOAD => ambiguous grammar!! -> parse based on the minimum cost!! s  MOVE(MEM(+(a,CONST)), d) STORE s  MOVE(M(a),M(a)) MOVEM

35 Efficiency of Tiling Algorithms Order of Execution Cost for “ Maximal Munch & Dynamic Programming ” T : # of different tiles. K : # of non-leaf node of tile (in average) K ’ : largest # of node that need to be examined to choose the right tile ~= the size of largest tile T ’ : average # of tile-patterns which matches at each tree node Ex: for RISC machine T = 50, K = 2, K ’ = 4, T ’ = 5,

36 N : # of input nodes in a tree. complexity = N/K * ( K ’ + T ’ ) of maximal Munch # of node (#of patterns) to be examined to find matched pattern to find minimum cost complexity of Dynamic Programming = N * (K’ + T’) “linear to N”

RISC vs CISC RISC 1. 32registers. 2. only one class of integer/pointer registers. 3. arithmetic operations only between registers. 4. “ three-address ” instruction form r1<-r2 & r3 5. load and store instructions with only the M[reg+const] addressing mode. 6. every instruction exactly 32 bits long. 7. One result or effect per instruction.

38 CISC(Complex Instruction Set Computers) Complex Addressing Mode 1. few registers (16 or 8 or 6). 2. registers divided into different classes. 3. arithmetic operations can access registers or memory through “ addressing mode ”. 4. “ two-address ” instruction of the form r1<-r1 & r2. 5. several different addressing modes. 6. variable length instruction format. 7. instruction with side effects. eg: auto-increment/decrement.

39 Solutions for CISC 1. Few registers. - do it in register allocation phase. 2. Classes of registers. - specify the operands and result explicitly. - ex: left opr of arith op (e.g. mul) must be eax - t1  t2 x t3 ==> - move eax, t2 eax  t2 - mul t3 eax  eax x t3; edx  garbage - mov t1 eax t1  eax 3. Two addressing instructions - add extra move instruction -> resgister allocation t1 <- t2+t3 move t1,t2 t1<- t2 add t1,t3 t1<- t1+t3

40 4. Arithmetic operations can address memory. - actually handled by “ register spill ” phase. - load memory operand into register and store back into memory -> may trash registers!! -ex: add [ebp – 8,] ecx is equivalent to - mov eax, [ebp – 8] - add eax, ecx - mov [ebp – 8], eax

41 5. several addressing modes - takes time to execute (no faster than multiInstr seq) “ trash ” fewer registers short instruction sequence select appropriate patterns for addressing mode. 6. Variable Length Instructions - let assembler do generate binary code. 7. Instruction with Side effect eg: r2 <- M[r1]; r1<- r1 + 4; - difficult to model!! (a) ignore the auto increment-> forget it! (b) try to match special idioms (c) try to invent new algorithms.

42 assembly language instruction without register assignment. package assem; public abstract class Instr { public String assem; // instr template public abstract temp.TempList use(); // retrun src list public abstract temp.TempList def(); // return dst list public abstract Targets jumps(); // return jump public String format(temp.tempMap m); // txt of assem instr } public Targets(temp.LabelList labes); Abstract Assembly Language Instructions

43 // dst, src and jump can be null. public OPER(String assem, TempList dst, TempList src, temp.LabelList jump); public OPER(String assem, TempList dst, TempList src); public MOVE(String assem, Temp dst, Temp src) public LABEL(String assem, temp.Label label);

44 Example assem.Instr is independent of th etarget machine assembly. ex: MEM( +( fp, CONST(8)) ==> new OPER(“LOAD ‘d0 <- M[‘s0 + 8]”, new TempList(new Temp(),null), new TempList(frame.FP(), null)); call format(…) on the above Instr. we get LOAD r1 <- M[r27+8] assume reg. allocator assign r1 to the new Temp and r27 is the frame pointer register.

45 Another Example *(+(Temp(t87), CONST(3)), MEM(temp(t92)) assem dst src ADDI ‘d0 <- ‘s0 + 3t908t87 LOAD ‘d0 <- M[‘s0+0]t909t92 MUL ‘d0 <- ‘s0*’s1 t910t908,t909 after register allocation, the instr look like: –ADDI r1 <- r12 + 3t908/r1t87/r12 –LOAD r2 <- M[r13+0] t909/r2t92/r13 –MUL r1 <- r1*r2 t910/r1 Two-address instructions –t1  t1 + t2 ==> –assem dst src –add ‘d0 ‘s1 t1 t1,t2