Code Generation CS 480. Can be complex To do a good job of teaching about code generation I could easily spend ten weeks But, don’t have ten weeks, so.

Slides:



Advertisements
Similar presentations
The University of Adelaide, School of Computer Science
Advertisements

CPU Review and Programming Models CT101 – Computing Systems.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
Procedures in more detail. CMPE12cGabriel Hugh Elkaim 2 Why use procedures? –Code reuse –More readable code –Less code Microprocessors (and assembly languages)
10/9: Lecture Topics Starting a Program Exercise 3.2 from H+P Review of Assembly Language RISC vs. CISC.
Computer Architecture CSCE 350
The University of Adelaide, School of Computer Science
Computer Organization and Architecture
Computer Organization and Architecture
1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 1: Bits, bytes and a simple processor dr.ir. A.C. Verschueren.
Procedures in more detail. CMPE12cCyrus Bazeghi 2 Procedures Why use procedures? Reuse of code More readable Less code Microprocessors (and assembly languages)
1 Function Calls Professor Jennifer Rexford COS 217 Reading: Chapter 4 of “Programming From the Ground Up” (available online from the course Web site)
CS 536 Spring Code generation I Lecture 20.
Honors Compilers Addressing of Local Variables Mar 19 th, 2002.
Intro to Computer Architecture
CS 300 – Lecture 20 Intro to Computer Architecture / Assembly Language Caches.
What is an instruction set?
Reduced Instruction Set Computers (RISC) Computer Organization and Architecture.
CH12 CPU Structure and Function
Lecture 18 Last Lecture Today’s Topic Instruction formats
CH13 Reduced Instruction Set Computers {Make hardware Simpler, but quicker} Key features  Large number of general purpose registers  Use of compiler.
Chapter 10 The Stack Stack: An Abstract Data Type An important abstraction that you will encounter in many applications. We will describe two uses:
Functions and Procedures. Function or Procedure u A separate piece of code u Possibly separately compiled u Located at some address in the memory used.
December 8, 2003Other ISA's1 Other ISAs Next, we discuss some alternative instruction set designs. – Different ways of specifying memory addresses – Different.
Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables
Lecture 18: 11/5/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
COP4020 Programming Languages Subroutines and Parameter Passing Prof. Xin Yuan.
Chapter 10 The Assembly Process. What Assemblers Do Translates assembly language into machine code. Assigns addresses to all symbolic labels (variables.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
Microprocessors The ia32 User Instruction Set Jan 31st, 2002.
Reduced Instruction Set Computers. Major Advances in Computers(1) The family concept —IBM System/ —DEC PDP-8 —Separates architecture from implementation.
Stored Programs In today’s lesson, we will look at: what we mean by a stored program computer how computers store and run programs what we mean by the.
Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.
Instruction Sets: Addressing modes and Formats Group #4  Eloy Reyes  Rafael Arevalo  Julio Hernandez  Humood Aljassar Computer Design EEL 4709c Prof:
What is a program? A sequence of steps
Copyright © 2007 by Curt Hill Interrupts How the system responds.
ARM-7 Assembly: Example Programs 1 CSE 2312 Computer Organization and Assembly Language Programming Vassilis Athitsos University of Texas at Arlington.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
F453 Module 8: Low Level Languages 8.1: Use of Computer Architecture.
David Kauchak CS 52 – Spring 2017
William Stallings Computer Organization and Architecture 8th Edition
Procedures (Functions)
Functions and Procedures
Chapter 10 The Stack.
Chapter 9 :: Subroutines and Control Abstraction
Central Processing Unit
The Von Neumann Model Basic components Instruction processing
MIPS Instructions.
Chapter 8 Central Processing Unit
The University of Adelaide, School of Computer Science
Advanced Computer Architecture
ECEG-3202 Computer Architecture and Organization
Computer Instructions
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
Other ISAs Next, we’ll first we look at a longer example program, starting with some C code and translating it into our assembly language. Then we discuss.
ECEG-3202 Computer Architecture and Organization
Program and memory layout
Addressing mode summary
Computer Architecture
Where is all the knowledge we lost with information? T. S. Eliot
Basic components Instruction processing
Presentation transcript:

Code Generation CS 480

Can be complex To do a good job of teaching about code generation I could easily spend ten weeks But, don’t have ten weeks, so I’ll do it in one lecture (Obviously, I’ll omit a lot of material)

Remember Reverse Polish Notation Remember RPN? To evaluate y * (x + 4) Push Y on stack Push x on stack Push 4 on stack Do addition Do multiplication

Code for (x + 12) * 7 Push fp Push -8 Add Get value (i.e., address is on stack, get value) Push 12 Add Push 7 multiply

Addressing modes Early discovery, some patterns, such as adding fp + constant, occur really often. Why not make then part of instruction Common address modes Absolute address (i.e., global) #Constant (small integer values) Register – contents of register (reg) – value at register as an address Con(reg) – value at c plus register (e.g., 8(fp) )

More unusual addressing modes PDP had a number of unusual addressing modes -(reg) use register, then decrement (reg)+ use register, then increment Look familiar? Could be used to build a great stack system

Code for (x + 12) * 7, again Move -8(fp), -(sp) Move #12, -(sp) Add (sp)+, (sp) Move #7, -(sp) Mult (sp)+, (sp) Still easy to do, but still lots of memory access

Generating stack-style code is easy We can use the AR stack just like the stack in a RPN calculator Simply do a post-order traversal of the AST Operands (variables, constants) push on stack Operators generate code to take arguments from stack, compute the result, and push back on stack Simple. This is what you are doing in prog 6

Then what’s the problem? Remember the von Neumann machine design? CPU is separated from memory by a thin slow wire, which is costly to traverse Every time you do a memory access things slow down

Memory access in stack code generation One memory access to get instruction One or two memory access to get arguments Then you can do the operation One more memory access to write result back to memory That’s a lot of memory access

A few things you can do to speed up There are a few things that you can do with hardware to speed this up just a bit Caching – (speeds instruction fetch, operand fetch not so much) Pipelines – (but results coming from memory cause frequent stalls)

Can be done much faster Move -8(fp), r1 Add #12, r1 Mult #7, r1 Registers live on-chip, allow much faster access, code will run much faster.

Bottom line for todays lecture ProCon Stack styleEasy to doSlow execution Register styleFast executionHard to do

Registers can be used to hold intermediate results Example (a + b) * (c + d) Move a, r1 Add b, r1 Move c, r2 Add d, r2 Mult r2, r1 But notice we need to use two registers

No matter now many you have Can always come up with an expression that will be more complicated than the number of registers you have (a + b) * (c + d) * (e + f) * …. But are such things common? Still. Need to handle it, called a register spill. Solution: Save temp to memory, load when you need it (basically going back to stack style).

Making effective use of addressing modes Addressing modes give rise to two classes of arguments Those that can be represented as addressing modes (local variables, constants), And those that can not (expressions) Need to make effective use of those

Four cases for addition X + y (that is, two addressing mode expressions) move x, r1 add y, r1 X + (…) (that is, one addressing mode expression) compute (…) and place into r1 add x, r1 (…) + x compute (…) and place into r1 add x, r1 (…) + (….) (that is, neither side addressing mode) compute left (…) and place into r1 compute right (…) and place into r2 add r2, r1 (and now r2 is free)

What about subtraction? Non commutative X - y (that is, two addressing mode expressions) move x, r1 sub y, r1 X - (…) (that is, one addressing mode expression) compute (…) and place into r1 sub x, r1 (NO! this is the inverse of the result we want) (…) - x compute (…) and place into r1 add x, r1 (…) - (….) (that is, neither side addressing mode) compute left (…) and place into r1 compute right (…) and place into r2 sub r2, r1 (and now r2 is free)

Choices on subtraction Use two registers, or Use one register and create an invert instruction Depends upon if you have enough registers Lots of special cases (division, multiplication, shifts).

But registers have many other uses Can use registers to hold variables that are being accessed a lot (such as loop index values) Can use registers for passing parameters (save memory access) So a good machine design would have lots of registers, right? Not so fast

Using registers for variables Can make your code dramatically faster But need to discover WHICH variables are commonly used – complicated data flow analysis Or (as C does), allow the user to give you hints (which are frequently wrong)

Using registers for parameters If the callee can use the values directly in registers, then the code can be much faster If not, when they they were going to save register values anyway, nothing is lost But callee might not know if they want parameter is memory or in register

Cost to save an restore Remember the parameter calling sequence? The more registers you have, the more you need to save and restore on procedure entry and exit So there is a trade-off – faster function invocation, slower execution of body of function, or faster execution, slower invocation

Worse yet – don’t know ahead of time don’t know until you have seen entire function body how many registers you will need Save them all – too much execution time Don’t save enough – run out of registers (old C trick, branch to end of function, and generate saves after you have seen everything else).

Interesting idea Register Window Interesting idea. Put a huge amount of registers (say, 256) on-chip, but only allow process to see a few (say, 16) at time When you execute a function, simply move up window on registers Saves and restores are automatic, make function invocation faster

Even better, overlapping windows What if six of those registers are shared with caller, and ten are new? What might you use those six for? Can make for really function invocation

Downside of register windows Eventually you run out, and they must be spilled as well (and restored) Context switch is now really slow, as you need to save an restore ALL 256 registers Still an interesting idea

How is it really done? You can spend a lot of time worrying about things that hardly ever happen (expressions with 32 temporary values in the middle, functions with 17 arguments) And programming styles change (OOP has smaller functions, but fewer arguments) So there is never a clear answer, and it keeps changing