1 Languages and Compilers (SProg og Oversættere) Lecture 10 Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm.

1 Languages and Compilers (SProg og Oversættere) Lecture 10 Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm Hutchinson whose slides this lecture is based on.

2 Where are we (going)? Compiler Driver Syntactic Analyzer calls Contextual AnalyzerCode Generator calls Dependency diagram of a typical Multi Pass Compiler: A multi pass compiler makes several passes over the program. The output of a preceding phase is stored in a data structure and used by subsequent phases. input Source Text output AST input output Decorated AST input output Object Code

3 Code Generation A compiler translates a program from a high-level language into an equivalent program in a low-level language. TAM Program Triangle Program Compile Run Result JVM Program Java Program Compile Run Result x86 Program C Program Compile Run Result We shall look at this in more detail the next couple of lectures

4 Triangle Abstract Machine Architecture TAM is a stack machine –There are no data registers as in register machines. –The temporary data are stored in the stack. But, there are special registers (Table C.1 of page 407) TAM Instruction Set –Instruction Format (Figure C.5 of page 408) –op: opcode (4 bits) r: special register number (4 bits) n: size of the operand (8 bits) d: displacement (16 bits) Instruction Set –Table C.2 of page 409

5 TAM Registers

6 TAM Code Machine code are 32 bits instructions in the code store – op (4 bits), type of instruction – r (4 bits), register – n (8 bits), size – d (16 bits), displacement Example: LOAD (1) 3[LB]: – op = 0 (0000) – r = 8 (1000) – n = 1 (00000001) – d = 3 (0000000000000011) 0000 1000 0000 0001 0000 0000 0000 0011

7 TAM Instruction set

8 TAM Architecture Two Storage Areas –Code Store (32 bit words) Code Segment: to store the code of the program to run –Pointed to by CB and CT Primitive Segment: to store the code for primitive operations –Pointed to by PB and PT –Data Store (16 bit words) Stack –global segment at the base of the stack »Pointed to by SB –stack area for stack frames of procedure and function calls »Pointed to by LB and ST Heap –heap area for the dynamic allocation of variables »Pointed to by HB and HT

9 TAM Architecture

10 Expression Evaluation on a Stack Machine (SM) On a stack machine, the intermediate results are stored on a stack. Operations take their arguments from the top of the stack and put the result back on the stack. Stack machine: Typical Instructions: STORE a LOAD x MULT SUB ADD Stack machine: Very natural for expression evaluation (see examples on next two pages). Requires more instructions for the same expression, but the instructions are simpler.

11 Expression Evaluation on a Stack Machine Example 1: Computing (a * b) + (1 - (c * 2)) on a stack machine. LOAD a//stack: a LOAD b//stack: a b MULT//stack: (a*b) LOAD #1//stack: (a*b) 1 LOAD c //stack: (a*b) 1 c LOAD #2//stack: (a*b) 1 c 2 MULT//stack: (a*b) 1 (c*2) SUB //stack: (a*b) (1-(c*2)) ADD //stack: (a*b)+(1-(c*2)) LOAD a//stack: a LOAD b//stack: a b MULT//stack: (a*b) LOAD #1//stack: (a*b) 1 LOAD c //stack: (a*b) 1 c LOAD #2//stack: (a*b) 1 c 2 MULT//stack: (a*b) 1 (c*2) SUB //stack: (a*b) (1-(c*2)) ADD //stack: (a*b)+(1-(c*2)) Note the correspondence between the instructions and the expression written in postfix notation: a b * 1 c 2 * - +

12 Expression Evaluation on a Stack Machine Example 2: Computing (0 < n) && odd(n) on a stack machine. LOAD #0//stack: 0 LOAD n//stack: 0 n LT//stack: (0<n) LOAD n//stack: (0<n) n CALL odd //stack: (0<n) odd(n) AND //stack: (0<n)&&odd(n) LOAD #0//stack: 0 LOAD n//stack: 0 n LT//stack: (0<n) LOAD n//stack: (0<n) n CALL odd //stack: (0<n) odd(n) AND //stack: (0<n)&&odd(n) This example illustrates that calling functions/procedures fits in just as naturally with the stack machine evaluation model as operations that correspond to machine instructions. In register machines this is much more complicated, because a stack must be created in memory for managing subroutine calls/returns.

13 Global Variables and Assignment Commands Triangle source code ! simple expression and assignment let var n: Integer in begin n := 5; n := n + 1 end TAM assembler code 0: PUSH 1 1: LOADL 5 2: STORE (1) 0[SB] 3: LOAD (1) 0[SB] 4: LOADL 1 5: CALL add 6: STORE (1) 0[SB] 7: POP (0) 1 8: HALT

14 The “Phases” of a Compiler Syntax Analysis Contextual Analysis Code Generation Source Program Abstract Syntax Tree Decorated Abstract Syntax Tree Object Code Error Reports Next lecture

15 Storage Allocation A compiler translates a program from a high-level language into an equivalent program in a low-level language. The low level program must be equivalent to the high-level program. => High-level concepts must be modeled in terms of the low-level machine. This lecture is not about the code generation phase itself, but about the way we represent high-level structures in terms of a typical low- level machine’s memory architecture and machine instructions. => We need to know this before we can talk about code generation.

16 What This Lecture is About High Level Program Low-level Language Processor How to model high-level computational structures and data structures in terms of low-level memory and machine instructions. Procedures Expressions Variables Arrays Records Objects Methods Registers Machine Instructions Bits and Bytes Machine Stack How to model ?

17 Data Representation Data Representation: how to represent values of the source language on the target machine. Records Arrays Strings Integer Char ? 00..10 01..00... High level data-structures 0: 1: 2: 3: Low level memory model word Note: addressing schema and size of “memory units” may vary …

18 Data Representation Important properties of a representation schema: non-confusion: different values of a given type should have different representations uniqueness: Each value should always have the same representation. These properties are very desirable, but in practice they are not always satisfied: Example: confusion: approximated floating point numbers. non-uniqueness: one’s complement representation of integers +0 and -0

19 Data Representation Important issues in data representation: constant-size representation: The representation of all values of a given type should occupy the same amount of space. direct versus indirect representation x bit pattern handle Direct representation of a value x Indirect representation of a value x

20 Indirect Representation small x bit pattern Q: What reasons could there be for choosing indirect representations? To make the representation “constant size” even if representation requires different amounts of memory for different values. big x bit pattern Both are represented by pointers =>Same size

21 Indirect versus Direct The choice between indirect and direct representation is a key decision for a language designer/implementer. Direct representations are often preferable for efficiency: More efficient access (no need to follow pointers) More efficient “storage class” (e.g stack rather than heap allocation) For types with widely varying size of representation it is almost a must to use indirect representation (see previous slide) Languages like Pascal, C, C++ try to use direct representation wherever possible. Languages like Scheme, ML use mostly indirect representation everywhere (because of polymorphic higher order functions) Java: primitive types direct, “reference types” indirect, e.g. objects and arrays.

22 Data Representation We now survey representation of the data types found in Triangle, assuming direct representations wherever possible. We will discuss representation of values of: Primitive Types Record Types Static Array Types We will use the following notations (if T is a type): #[T] The cardinality of the type (i.e. the number of possible values) size[T] The size of the representation (in number of bits/bytes)

23 Data Representation: Primitive Types What is a primitive type? The primitive types of a programming language are those types that cannot be decomposed into simpler types. For example integer, boolean, char, etc. Type: boolean Has two values true and false => #[ boolean ] = 2 => size[ boolean ] ≥ 1 bit Note: In general if #[T] = n then size[T] ≥ log 2 n bits Value false true Possible Representation 1bitbyte(option 1)byte(option2) 000000000 00000000 10000000111111111

24 Data Representation: Primitive Types Type: integer Fixed size representation, usually dependent (i.e. chosen based on) what is efficiently supported by target machine. Typically uses one word (16 bits, 32 bits, or 64 bits) of storage. size[ integer ] = word (= 16 bits) => # [ integer ] ≤ 2 16 = 65536 Modern processors use two’s complement representation of integers 1000010010010111 Multiply with -(2 15 ) Multiply with 2 n Value = -1.2 15 +0.2 14 +…+0.2 3 +1.2 2 +1.2 1 +1.2 0 n = position from left

25 Data Representation: Primitive Types Example: Primitive types in TAM Type Boolean Char Integer Representation 00...00 and 00...01 Unicode Two’s complement Size 1 word Example: A (possible) representation of primitive types on a Pentium Type Boolean Char Integer Representation 00...00 and 11..11 ASCII Two’s complement Size 1 byte 1 word

26 Data Representation: Composite Types Composite types are types which are not “atomic”, but which are constructed from more primitive types. Records (called structs in C) Aggregates of several values of several different types Arrays Aggregates of several values of the same type Variant Records or Disjoint Unions (Pointers or References) (Objects) (Functions)

27 Data Representation: Records Example: Triangle Records type Date = record y : Integer, m : Integer, d : Integer end; type Details = record female : Boolean, dob : Date, status : Char end; var today: Date; var my: Details type Date = record y : Integer, m : Integer, d : Integer end; type Details = record female : Boolean, dob : Date, status : Char end; var today: Date; var my: Details

28 Data Representation: Records Example: Triangle Record Representation today.m 2002 2 today.y today.d 5 my.dob.m 1970 5 my.dob.y my.dob.d 17 false ‘u’ my.female my.dob my.status … 1 word:

29 Data Representation: Records Records occur in some form or other in most programming languages: Ada, Pascal, Triangle (here they are actually called records) C, C++, C# (here they are called structs). The usual representation of a record type is just the concatenation of individual representations of each of its component types. r.I 1 r.I 2 r.I n value of type T 1 value of type T 2 value of type T n

30 Data Representation: Records Example: size[ Date ] = 3*size[ integer ] = 3 words address[today.y] = address[today]+0 address[today.m] = address[today]+1 address[today.d] = address[today]+2 address[my.dob.m] = address[my.dob]+1 = address[my]+2 Q: How much space does a record take up? And how to access record elements? Note: these formulas assume that addresses are indexes of words (not bytes) in memory (otherwise multiply offsets by 2)

31 Data Representation: Disjoint Unions What are disjoint unions? Like a record, has elements which are of different types. But the elements never exist at the same time. A “type tag” determines which of the elements is currently valid. Example: Pascal variant records type Number = record case discrete: Boolean of true: (i: Integer); false: (r: Real) end; var num: Number type Number = record case discrete: Boolean of true: (i: Integer); false: (r: Real) end; var num: Number Mathematically we write disjoint union types as: T = T 1 | … | T n

32 Data Representation: Disjoint Unions Example: Pascal variant records representation type Number = record case discrete: Boolean of true: (i: Integer); false: (r: Real) end; var num: Number type Number = record case discrete: Boolean of true: (i: Integer); false: (r: Real) end; var num: Number Assuming size[Integer]=size[Boolean]=1 and size[Real]=2, then size[Number] = size[Boolean] + MAX(size[Integer], size[Real]) = 1 + MAX(1, 2) = 3 num.i true 15 num.discrete unused num.r false num.discrete 3.14

33 Data Representation: Disjoint Unions type T = record case I tag : T tag of v 1 : (I 1 : T 1 ); v 2 : (I 2 : T 2 );... v n : (I n : T n ); end; var u: T type T = record case I tag : T tag of v 1 : (I 1 : T 1 ); v 2 : (I 2 : T 2 );... v n : (I n : T n ); end; var u: T v1v1 type T 1 v2v2 type T 2 vnvn type T n or … u.I 1 u.I 2 u.I tag u.I n u.I tag or size[T] = size[T tag ] + MAX(size[T 1 ],..., size[T n ]) address[u.I tag ] = address[u] address[u.I 1 ] = address[u]+size[T tag ]... address[u.I n ] = address[u]+size[T tag ]

34 Arrays An array is a composite data type, an array value consists of multiple values of the same type. Arrays are in some sense like records, except that their elements all have the same type. The elements of arrays are typically indexed using an integer value (In some languages such as for example Pascal, also other “ordinal” types can be used for indexing arrays). Two kinds of arrays (with different runtime representation schemas): static arrays: their size (number of elements) is known at compile time. dynamic arrays: their size can not be known at compile time because the number of elements may vary at run-time. Q: Which are the “cheapest” arrays? Why?

35 Static Arrays Example: type Name = array 6 of Char; var me: Name; var names: array 2 of Name type Name = array 6 of Char; var me: Name; var names: array 2 of Name ‘K’ ‘r’ ‘i’ ‘s’ ‘ ’ me[0] me[1] me[2] me[3] me[4] me[5] ‘J’ ‘o’ ‘h’ ‘n’ ‘ ’ names[0][0] names[0][1] names[0][2] names[0][3] names[0][4] names[0][5] Name ‘S’ ‘o’ ‘p’ ‘h’ ‘i’ ‘a’ names[1][0] names[1][1] names[1][2] names[1][3] names[1][4] names[1][5] Name

36 Static Arrays Example: type Coding = record Char c, Integer n end var code: array 3 of Coding type Coding = record Char c, Integer n end var code: array 3 of Coding ‘K’ 5 code[0].c code[0].n Coding ‘i’ 22 code[1].c code[1].n Coding ‘d’ 4 code[2].c code[2].n Coding

37 Static Arrays type T = array n of TE; var a : T; type T = array n of TE; var a : T; a[0] a[1] a[2] a[n-1] size[T] = n * size[TE] address[a [0] ] = address[a] address[a [1] ] = address[a]+size[TE] address[a [2] ] = address[a]+2*size[TE] … address[a [ i ] ] = address[a]+i*size[TE] …

38 Dynamic Arrays char[ ] buffer; buffer = new char[buffersize];... for (int i=0; i<buffer.length; i++) buffer[i] = ‘ ’; char[ ] buffer; buffer = new char[buffersize];... for (int i=0; i<buffer.length; i++) buffer[i] = ‘ ’; Example: Java Arrays (all arrays in Java are dynamic) Dynamic arrays are arrays whose size is not known until run time. Dynamic array: no size given in declaration Array creation at runtime determines size Can ask for size of an array at run time Q: How could we represent Java arrays?

39 Dynamic Arrays char[ ] buffer; buffer = new char[len]; char[ ] buffer; buffer = new char[len]; Java Arrays ‘C’ ‘o’ buffer[0] buffer[1] ‘m’ buffer[2] buffer[3] ‘p’ A possible representation for Java arrays 7 buffer[4] ‘i’ buffer[5] ‘l’ buffer[6] ‘e’ buffer.length buffer.origin

40 Dynamic Arrays char[ ] buffer; buffer = new char[len]; char[ ] buffer; buffer = new char[len]; Java Arrays ‘C’ ‘o’ buffer[0] buffer[1] ‘m’ buffer[2] buffer[3] ‘p’ Another possible representation for Java arrays 7 buffer[4] ‘i’ buffer[5] ‘l’ buffer[6] ‘e’ buffer.length buffer Note: In reality Java also stores a type in its representation for arrays, because Java arrays are objects (instances of classes).

41 Static Storage Allocation Example: Global variables in Triangle let type Date = record y: Integer, m:Integer, d:Integer end; //Date var a: array 3 of Integer; var b: Boolean; var c: Char; var t: Date; in... let type Date = record y: Integer, m:Integer, d:Integer end; //Date var a: array 3 of Integer; var b: Boolean; var c: Char; var t: Date; in... Exist as long as program is running Compiler can: compute exactly how much memory is needed for globals. allocate memory at a fixed position for each global variable.

42 Static Storage Allocation address[a] = 0 address[b] = 3 address[c] = 4 address[t] = 5 a[0] a[1] a[2] a b c t.y t.m t.d t let type Date = record y: Integer, m:Integer, d:Integer end; //Date var a: array 3 of Integer; var b: Boolean; var c: Char; var t: Date; let type Date = record y: Integer, m:Integer, d:Integer end; //Date var a: array 3 of Integer; var b: Boolean; var c: Char; var t: Date; Example: Global variables in Triangle

43 Stack Storage Allocation let var a: array 3 of Integer; var b: Boolean; var c: Char; proc Y() ~ let var d: Integer; var e:... in... ; proc Z() ~ let var f: Integer; in begin...; Y();... end in begin...; Y();...; Z(); end let var a: array 3 of Integer; var b: Boolean; var c: Char; proc Y() ~ let var d: Integer; var e:... in... ; proc Z() ~ let var f: Integer; in begin...; Y();... end in begin...; Y();...; Z(); end Example: When do the variables in this program “exist” as long as the program is running when procedure Y is active when procedure Z is active Now we will look at allocation of local variables

44 Stack Storage Allocation Start of programEnd of program time call depth global Y Z 1 2 Y Z 1) Procedure activation behaves like a stack (LIFO). 2) The local variables “live” as long as the procedure they are declared in. 1+2 => Allocation of locals on the “call stack” is a good model. A “picture” of our program running:

45 Stack Storage Allocation: Accessing locals/globals First time around, we assume that in a procedure only local variables declared in that procedure and global variables are accessible. We will extend on this later to include nested scopes and parameters. A stack allocation model (under the above assumption): Globals are allocated at the base of the stack. Stack contains “frames”. Each frame corresponds to a currently active procedure. (It is often called an “activation frame”) When a procedure is called (activated) a frame is pushed on the stack When a procedure returns, its frame is popped from the stack.

46 Stack Storage Allocation: Accessing locals/globals SB LB ST call frame SB = Stack base LB = Locals base ST = Stack top call frame Dynamic link globals

47 What’s in a Frame? A frame contains A dynamic link: to next frame on the stack (the frame of the caller) Return address Local variables for the current activation return address locals Link data Local data LB ST dynamic link

48 What happens when a procedure is called? LB ST SB = Stack base LB = Locals base ST = Stack top call frame call frame When procedure f() is called push new f() call frame on top of stack. Make dynamic link in new frame point to old LB Update LB (becomes old ST ) new call frame for f()

49 What happens when a procedure returns? LB ST When procedure f() returns Update LB (from dynamic link) Update ST (to old LB ) current call frame for f() current call frame for f() call frame Note, updating the ST implicitly “destroys” or “pops” the frame.

50 Accessing global/local variables Q: Is the stack frame for a procedure always at the same position on the stack? A: No, look at the picture of procedure activation below. Imagine what the stack looks like at each point in time. time call depth global Y Z 1 2 Y Z G Y G Z Y

51 Accessing global/local variables The global frame is always at the same place in the stack. => Address global variables relative to SB A typical instruction to access a global variable: LOAD 4[SB] Frames are not always at the same position in the stack. Depends on the number of frames already on the stack. => Address local variables relative to LB A typical instruction to access a local variable: LOAD 3[LB] RECAP: We are still working under the assumption of a “flat” block structure How do we access global and local variables on the stack?

52 Accessing global/local variables Example: Compute the addresses of the variables in this program let var a: array 3 of Integer; var b: Boolean; var c: Char; proc Y() ~ let var d: Integer; var e:... in... ; proc Z() ~ let var f: Integer; in begin...; Y();... end in begin...; Y();...; Z(); end let var a: array 3 of Integer; var b: Boolean; var c: Char; proc Y() ~ let var d: Integer; var e:... in... ; proc Z() ~ let var f: Integer; in begin...; Y();... end in begin...; Y();...; Z(); end Var Size Address abcdefabcdef 3 1 1 [0]SB [3]SB [4]SB 1 ? 1 [2]LB [3]LB [2]LB

53 Accessing non-local variables RECAP: We have discussed stack allocation of locals in the call frames of procedures Some other things stored in frames: A dynamic link: pointing to the previous frame on the stack => corresponds to the “caller” of the current procedure. A return address: points to the next instruction of the caller. Addressing global variables relative to SB (stack base) Addressing local variables relative to LB (locals base) Now… we will look at accessing non-local variables. Or in other words. We will look into the question: How does lexical scoping work?

54 Accessing non-local variables: what is this about? Example: How to access p1,p2 from within Q or S? let var g1: array 3 of Integer; var g2: Boolean; proc P() ~ let var p1,p2 proc Q() ~ let var q:... in... ; proc S() ~ let var s: Integer; in... let var g1: array 3 of Integer; var g2: Boolean; proc P() ~ let var p1,p2 proc Q() ~ let var q:... in... ; proc S() ~ let var s: Integer; in... Scope Structure var g1,g2 proc P() var p1,p2 proc Q() proc S() var q Q: When inside Q, does the dynamic link always point to frame of P? var s

55 Accessing non-local variables Q: When inside Q, does the dynamic link always point to a frame P? A: No! Consider the following scenarios: var g1,g2 proc P() var p1,p2 proc Q() proc S() var q var s time G 1 P S Q G P 1 2 3 Q 2 3 P S Q P Q

56 Accessing non-local variables We can not rely on the dynamic links to get to the lexically scoped frame(s) of a procedure. => Another item is added in the link data: the static link. The static link in a frame points to the next lexically scoped frame somewhere higher on the stack. Registers L1, L2, etc. are used to point to the lexical scoped frames. (L1, is most local). A typical instruction for accessing a non-local variable looks like: LOAD [4]L1 LOAD [3]L2 These + LB and SB are called the display registers

57 What’s in a Frame (revised)? A frame contains A dynamic link: to next frame on the stack (the frame of the caller) Return address Local variables for the current activation static link locals Link data Local data LB ST dynamic link return address

58 Accessing non-local variables proc P() proc Q()... proc S()... time G P S Q LB ST P() frame globals SB S() frame Q() frame Dynamic L. Static Link L1

59 Accessing variables, addressing schemas overview We now have a complete picture of the different kinds of addresses that are used for accessing variables stored on the stack. Type of variable Global Local Non-local, 1 level up Non-local, 2 levels up... Load instruction LOAD offset[SB] LOAD offset[LB] LOAD offset[L1] LOAD offset[L2]

60 Routines We call the assembly language equivalent of procedures “routines”. In the preceding material we already learned some things about the implementation of routines in terms of the stack allocation model: Addressing local and globals through LB,L1,L2,… and SB Link data: static link, dynamic link, return address. We have yet to learn how the static link and the L1, L2, etc. registers are set up. We have yet to learn how routines can receive arguments and return results from/to their caller.

61 Routines We call the assembly language equivalent of procedures “routines”. What are routines? Unlike procedures/functions in higher level languages. They are not directly supported by language constructs. Instead they are modeled in terms of how to use the low-level machine to “emulate” procedures. What behavior needs to be “emulated”? Calling a routine and returning to the caller after completion. Passing arguments to a called routine Returning a result from a routine Local and non-local variables.

62 Routines Transferring control to and from routine: Most low-level processors have CALL and RETURN for transferring control from caller to callee and back. Transmitting arguments and return values: Caller and callee must agree on a method to transfer argument and return values. => This is called the “routine protocol” There are many possible ways to pass argument and return values. => A routine protocol is like a “contract” between the caller and the callee. There are many possible ways to pass argument and return values. => A routine protocol is like a “contract” between the caller and the callee. ! The routine protocol is often dictated by the operating system.

63 Routine Protocol Examples The routine protocol depends on the machine architecture (e.g. stack machine versus register machine). Example 1: A possible routine protocol for a RM - Passing of arguments: first argument in R1, second argument in R2, etc. - Passing of return value: return the result (if any) in R0 Note: this example is simplistic: - What if more arguments than registers? - What if the representation of an argument is larger than can be stored in a register. For RM protocols, the protocol usually also specifies who (caller or callee) is responsible for saving contents of registers.

64 Routine Protocol Examples Example 2: A possible routine protocol for a stack machine - Passing of arguments: pass arguments on the top of the stack. - Passing of return value: leave the return value on the stack top, in place of the arguments. Note: this protocol puts no boundary on the number of arguments and the size of the arguments. Most micro-processors, have registers as well as a stack. Such “mixed” machines also often use a protocol like this one. The Triangle Abstract Machine also adopts this routine protocol. We now look at it in detail (in TAM).

65 TAM: Routine Protocol SB LB ST globals just before the calljust after the call args SB LB ST globals result What happens in between?

66 TAM: Routine Protocol LB ST (1) just before the call args (2) just after entry LB ST args link data note: Going from (1) -> (2) in TAM is the execution of a single CALL instruction.

67 TAM: Routine Protocol (2) just after entry LB ST args link data (3.1) during execution of routine LB ST args link data local data shrinks and grows during execution

68 TAM: Routine Protocol (3.2) just before return LB ST args link data local data result (4) just after return LB ST result note: Going from (3.2) -> (4) in TAM is the execution of a single RETURN instruction.

69 TAM: Routine Protocol, Example let var g: Integer; func F(m: Integer, n: Integer) : Integer ~ m*n ; proc W(i:Integer) ~ let const s ~ i*i in begin putint(F(i,s)); putint(F(s,s)) end in begin getint(var g); W(g+1) end Triangle Program

70 TAM: Routine Protocol, Example PUSH 1 -- expand globals make place for g LOADA 0[SB] -- push address of g CALL getint -- read integer into g CALL succ -- add 1 CALL(SB) W -- call W (using SB as static link) POP 1 -- contract globals HALT PUSH 1 -- expand globals make place for g LOADA 0[SB] -- push address of g CALL getint -- read integer into g CALL succ -- add 1 CALL(SB) W -- call W (using SB as static link) POP 1 -- contract globals HALT TAM assembly code: let var g: Integer;... in begin getint(var g); W(g+1) end

71 TAM: Routine Protocol, Example F: LOAD -2[LB] -- push value of argument m LOAD -1[LB] -- push value of argument n CALL mult -- multiply m and n RETURN(1) 2 -- return replacing 2 word argument pair by 1 word result F: LOAD -2[LB] -- push value of argument m LOAD -1[LB] -- push value of argument n CALL mult -- multiply m and n RETURN(1) 2 -- return replacing 2 word argument pair by 1 word result func F(m: Integer, n: Integer) : Integer ~ m*n ; arguments addressed relative to LB (negative offsets!) Size of return value and argument space needed for updating the stack on return from call.

72 TAM: Routine Protocol, Example W: LOAD -1[LB] -- push value of argument i LOAD -1[LB] -- push value of argument i CALL mult -- multiply: result is value of s LOAD -1[LB] -- push value of argument i LOAD 3[LB] -- push value of local var s CALL(SB) F -- call F (use SB as static link) … RETURN(0) 1 -- return, replacing 1 word argument by 0 word result W: LOAD -1[LB] -- push value of argument i LOAD -1[LB] -- push value of argument i CALL mult -- multiply: result is value of s LOAD -1[LB] -- push value of argument i LOAD 3[LB] -- push value of local var s CALL(SB) F -- call F (use SB as static link) … RETURN(0) 1 -- return, replacing 1 word argument by 0 word result proc W(i: Integer) ~ let const s~i*i in... F(i,s)...

73 TAM: Routine Protocol, Example let var g: Integer;... in begin getint(var g); W(g+1) end SB g 3 after reading g 3 just before call to W 4 ST SB g ST arg #1

74 TAM: Routine Protocol, Example proc W(i: Integer) ~ let const s~i*i in... F(i,s)... just after entering W 3 4 SB g LB arg #1 ST link data just after computing s 3 4 SB g LB arg i ST link data 16 just before calling F 3 4 SB g LB arg i ST link data 16 4 s arg #1 arg #2 static link dynamic link

75 TAM: Routine Protocol, Example func F(m: Integer, n: Integer) : Integer ~ m*n ; just before calling F 3 4 SB g LB arg i ST link data 16 4 s arg #1 arg #2 just after entering F 3 4 SB g LB arg i ST link data 16 4 s arg m arg n link data just before return from F 3 4 SB g LB arg i ST link data 16 4 s arg m arg n link data 64

76 TAM: Routine Protocol, Example func F(m: Integer, n: Integer) : Integer ~ m*n ; just before return from F 3 4 SB g LB arg i ST link data 16 4 s arg m arg n link data 64 after return from F 3 4 SB g LB arg i ST link data 16 64 s …

77 TAM Routine Protocol: Frame Layout Summary LB ST local variables and intermediate results dynamic link static link return address Local data, grows and shrinks during execution. Link data arguments Arguments for current procedure they were put here by the caller.

78 Accessing variables, addressing schemas overview (Revised) We now have a complete picture of the different kinds of addresses that are used for accessing variables and formal parameters stored on the stack. Type of variable Global Local Parameter Non-local, 1 level up Non-local, 2 levels up... Load instruction LOAD +offset[SB] LOAD +offset[LB] LOAD -offset[LB] LOAD +offset[L1] LOAD +offset[L2]

79 Arguments: by value or by reference Some programming languages allow two kinds of function/procedure parameters. Example: in Triangle (similar in Pascal) let proc S(var n:Integer, i:Integer) ~ n:=n+i; var today: record y:integer, m:Integer, d:Integer end; in begin b := {y~2002, m ~ 2, d ~ 22}; S(var b.m, 6); end let proc S(var n:Integer, i:Integer) ~ n:=n+i; var today: record y:integer, m:Integer, d:Integer end; in begin b := {y~2002, m ~ 2, d ~ 22}; S(var b.m, 6); end Constant/Value parameter Var/reference parameter

80 Arguments: by value or by reference Value parameters: At the call site the argument is an expression, the evaluation of that expression leaves some value on the stack. The value is passed to the procedure/function. A typical instruction for putting a value parameter on the stack: LOADL 6 Var parameters: Instead of passing a value on the stack, the address of a memory location is pushed. This implies a restriction that only “variable-like” things can be passed to a var parameter. In Triangle there is an explicit keyword var at the call-site, to signal passing a var parameter. In Pascal and C++ the reference is created implicitly (but the same restrictions apply). Typical instructions: LOADA 5[LB] LOADA 10[SB]

81 Recursion How are recursive functions and procedures supported on a low-level machine? => Surprise! The stack memory allocation model already works! Example: let func fac(n:Integer) ~ if (n=1) then 1 else n*fac(n-1); in begin putint(fac(6)); end let func fac(n:Integer) ~ if (n=1) then 1 else n*fac(n-1); in begin putint(fac(6)); end why does it work? because every activation of a function gets its own activation record on the stack, with its own parameters, locals etc. => procedures and functions are “reentrant”. Older languages (e.g. FORTRAN) which use static allocation for locals have problems with recursion.

82 Recursion: General Idea Why the stack allocation model works for recursion: Like other function/procedure calls, lifetimes of local variables and parameters for recursive calls behave like a stack. fac(3) fac(2) fac(1) fac(4) fac(3) fac(2) fac(4) fac(3) fac(2) fac(1) fac(3) fac(2) fac(4) fac(3) ? ? fac(4)

83 Recursion: In Detail let func fac(n:Integer) ~ if (n=1) then 1 else n*fac(n-1); in begin putint(fac(6)); end let func fac(n:Integer) ~ if (n=1) then 1 else n*fac(n-1); in begin putint(fac(6)); end SB arg 1 6 ST before call to fac SB arg 1 6 ST right after entering fac link data SB arg n 6 right before recursive call to fac link data LB ST LB 6 value of n 5 arg 1:value of n-1

84 Recursion SB arg n 6 right before recursive call to fac link data ST LB 6 value of n 5 arg SB arg n 6 right before next recursive call to fac link data ST LB 6 value of n 5 arg n link data 5 4 value of n arg SB arg n 6 right before next recursive call to fac link data LB 6 value of n 5 arg n link data 5 4 value of n arg link data 4 3 ST value of n arg

85 Recursion LB ST Is the spaghetti of static and dynamic links getting confusing? Let’s zoom in on just a single activation of the fac procedure. The pattern is always the same: argument n link data to caller context (= previous LB) to lexical context (=SB) n n-1 Intermediate results in the computation of n*fac(n-1); ? just before recursive call in fac

86 Recursion LB ST link data result = 1 just before the return from the “deepest call”: n=1 after return from deepest call LB ST result=1 ? caller frame (what’s in here?) argument n=2 link data n=2 Next step: multiply argument n=1

87 Recursion just before the return from the “deepest call”: n=1 after return from deepest call and multiply LB ST ? caller frame (what’s in here?) argument n=2 link data 2*fac(1)=2 Next step:return to caller context to lexical context (=SB) result From here on down the stack is shrinking, multiplying each time with a bigger n

88 Recursion LB ST argument n link data n recurs. arg: n-1 just before recursive call in fac LB ST argument n link data n fac(n-1) after completion of the recursive call Calling a recursive function is just like calling any other function. After completion it just leaves its result on the top of the stack! A recursive call can happen in the midst of expression evaluation. Intermediate results. local variables, etc. simply remain on the stack and computation proceeds when the recursive call is completed.

89 Summary Data Representation: how to represent values of the source language on the target machine. Storage Allocation: How to organize storage for variables (considering different lifetimes of global, local and heap variables) Routines: How to implement procedures, functions (and how to pass their parameters and return values)

1 Languages and Compilers (SProg og Oversættere) Lecture 10 Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm.

Similar presentations

Presentation on theme: "1 Languages and Compilers (SProg og Oversættere) Lecture 10 Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Languages and Compilers (SProg og Oversættere) Lecture 10 Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm.

Similar presentations

Presentation on theme: "1 Languages and Compilers (SProg og Oversættere) Lecture 10 Bent Thomsen Department of Computer Science Aalborg University With acknowledgement to Norm."— Presentation transcript:

Similar presentations

About project

Feedback