Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computer Architecture Lecture 3 Coverage: Appendix A

Similar presentations


Presentation on theme: "Computer Architecture Lecture 3 Coverage: Appendix A"— Presentation transcript:

1 Computer Architecture Lecture 3 Coverage: Appendix A
EECS 470 Computer Architecture Lecture 3 Coverage: Appendix A

2 Role of the Compiler The primary user of the instruction set
Exceptions: getting less common Some device drivers; specialized library routines Some small embedded systems (synthesized arch) Compilers must: generate a correct translation into machine code Compilers should: fast compile time; generate fast code While we are at it: generate reasonable code size; good debug support

3 Structure of Compilers
Front-end: translate high level semantics to some generic intermediate form Intermediate form does not have any resource constraints, but uses simple instructions. Back-end: translates intermediate form into assembly/machine code for target architecture Resource allocation; code optimization under resource constraints Architects mostly concerned with optimization

4 Typical optimizations: CSE
Common sub-expression elimination c = array1[d+e] / array2[d+e]; c = array1[i] / arrray2[i]; Purpose: reduce instructions / faster code Architectural issues: more register pressure

5 Typical optimization: LICM
Loop invariant code motion for (i=0; i<100; i++) { t = 5; array1[i] = t; } Purpose: remove statements or expressions from loops that need only be executed once (idempotent) Architectural issues: more register pressure

6 Other transformations
Procedure inlining: better inst schedule greater code size, more register pressure Loop unrolling: better loop schedule Software pipelining: better loop schedule greater code size; more register pressure In general – “global”optimization: faster code

7 Compiled code characteristics
Optimized code has different characteristics than unoptimized code. Fewer memory references, but it is generally the “easy ones” that are eliminated Example: Better register allocation retains active data in register file – these would be cache hits in unoptimized code. Removing redundant memory and ALU operations leaves a higher ratio of branches in the code Branch prediction becomes more important Many optimizations provide better instruction scheduling at the cost of an increase in hardware resource pressure

8 What do compiler writers want in an instruction set architecture?
More resources: better optimization tradeoffs Regularity: same behavior in all contexts no special cases (flags set differently for immediates) Orthogonality: data type independent of addressing mode addressing mode independent of operation performed Primitives, not solutions: keep instructions simple it is easier to compose than to fit. (ex. MMX operations)

9 What do architects want in an instruction set architecture?
Simple instruction decode: tends to increase orthogonality Small structures: more resource constraints Small data bus fanout: tends to reduce orthogonality; regularity Small instructions: Make things implicit non-regular; non-orthogonal; non-primative

10 To make faster processors
Make the compiler team unhappy More aggressive optimization over the entire program More resource constraints; caches; HW schedulers Higher expectations: increase IPC Make hardware design team unhappy Tighter design constraints (clock) Execute optimized code with more complex execution characteristics Make all stages bottlenecks (Amdahl’s law)

11 Review of basic pipelining
5 stage “RISC” load-store architecture About as simple as things get Instruction fetch: get instruction from memory/cache Instruction decode: translate opcode into control signals and read regs Execute: perform ALU operation Memory: Access memory if load/store Writeback/retire: update register file

12 Pipelined implementation
Break the execution of the instruction into cycles (5 in this case). Design a separate datapath stage for the execution performed during each cycle. Build pipeline registers to communicate between the stages.

13 Stage 1: Fetch Design a datapath that can fetch an instruction from memory every cycle. Use PC to index memory to read instruction Increment the PC (assume no branches for now) Write everything needed to complete execution to the pipeline register (IF/ID) The next stage will read this pipeline register. Note that pipeline register must be edge triggered

14 Instruction bits IF / ID PC + 1
Pipeline register 1 + M U X PC + 1 Rest of pipelined datapath PC Instruction Memory/ Cache en

15 Stage 2: Decode Design a datapath that reads the IF/ID pipeline register, decodes instruction and reads register file (specified by regA and regB of instruction bits). Decode can be easy, just pass on the opcode and let later stages figure out their own control signals for the instruction. Write everything needed to complete execution to the pipeline register (ID/EX) Pass on the offset field and both destination register specifiers (or simply pass on the whole instruction!). Including PC+1 even though decode didn’t use it.

16 Instruction bits IF / ID PC + 1 ID / EX PC + 1 Control Signals
Pipeline register PC + 1 ID / EX Pipeline register Contents Of regA Of regB PC + 1 Register File regA regB en Rest of pipelined datapath Stage 1: Fetch datapath Destreg Data Control Signals

17 Stage 3: Execute Design a datapath that performs the proper ALU operation for the instruction specified and the values present in the ID/EX pipeline register. The inputs are the contents of regA and either the contents of RegB or the offset field on the instruction. Also, calculate PC+1+offset in case this is a branch. Write everything needed to complete execution to the pipeline register (EX/Mem) ALU result, contents of regB and PC+1+offset Instruction bits for opcode and destReg specifiers

18 + ID / EX Rest of pipelined datapath Result Alu EX/Mem PC+1 +offset
Pipeline register Contents Of regA Of regB Rest of pipelined datapath Result Alu EX/Mem Pipeline register PC+1 +offset + PC + 1 A L U M X Stage 2: Decode datapath contents of regB Control Signals Control Signals

19 Stage 4: Memory Operation
Design a datapath that performs the proper memory operation for the instruction specified and the values present in the EX/Mem pipeline register. ALU result contains address for ld and st instructions. Opcode bits control memory R/W and enable signals. Write everything needed to complete execution to the pipeline register (Mem/WB) ALU result and MemData Instruction bits for opcode and destReg specifiers

20 Result Alu EX/Mem Result Alu Mem/WB PC+1 +offset contents of regB
This goes back to the MUX before the PC in stage 1. Result Alu EX/Mem Pipeline register Result Alu Mem/WB Pipeline register Rest of pipelined datapath PC+1 +offset MUX control for PC input Memory Read Data Data Memory en R/W Stage 3: Execute datapath contents of regB Control Signals Control Signals

21 Stage 5: Write back Design a datapath that conpletes the execution of this instruction, writing to the register file if required. Write MemData to destReg for ld instruction Write ALU result to destReg for add or nand instructions. Opcode bits also control register write enable signal.

22 Stage 4: Memory datapath
Result Alu M U X This goes back to data input of register file Read Data Memory Stage 4: Memory datapath Control Signals This goes back to the destination register specifier M U X bits 0-2 bits 16-18 Mem/WB Pipeline register register write enable

23 Sample Code (Simple) Run the following code on a pipelined datapath:
add ; reg 3 = reg 1 + reg 2 nand ; reg 6 = reg 4 & reg 5 lw ; reg 4 = Mem[reg2+20] add ; reg 5 = reg 2 + reg 5 sw ; Mem[reg3+10] =reg 7

24 + + IF/ ID ID/ EX EX/ Mem Mem/ WB 1 eq? valA instruction valB valB
target PC+1 PC+1 R0 eq? regA R1 ALU result A L U Register file regB R2 valA PC Inst mem Data memory M U X instruction R3 ALU result mdata R4 R5 valB R6 M U X data R7 offset dest valB Bits 0-2 dest dest dest Bits 16-18 M U X Bits 22-24 op op op IF/ ID ID/ EX EX/ Mem Mem/ WB

25 + + Initial State IF/ ID ID/ EX EX/ Mem Mem/ WB 1 36 9 noop 12 18 7 41
U X + 1 + R0 R1 36 A L U Register file R2 9 PC Inst mem Data memory M U X noop R3 12 R4 18 R5 7 R6 41 M U X data R7 22 dest Initial State Bits 0-2 Bits 16-18 M U X Bits 22-24 noop noop noop IF/ ID ID/ EX EX/ Mem Mem/ WB

26 add 1 2 3 + + Fetch: add 1 2 3 IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 1 1
U X + 1 + 1 R0 R1 36 A L U Register file R2 9 PC Inst mem Data memory M U X add R3 12 R4 18 R5 7 R6 41 M U X data R7 22 dest Fetch: add 1 2 3 Bits 0-2 Bits 16-18 M U X Bits 22-24 noop noop noop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 1

27 nand 4 5 6 add 1 2 3 + + Fetch: nand 4 5 6 IF/ ID ID/ EX EX/ Mem Mem/
U X + 1 + 2 1 R0 R1 36 1 A L U Register file R2 9 2 36 PC Inst mem Data memory M U X nand R3 12 R4 18 R5 7 9 R6 41 M U X data R7 22 3 dest Fetch: nand 4 5 6 Bits 0-2 3 Bits 16-18 M U X Bits 22-24 add noop noop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 2

28 lw 2 4 20 nand 4 5 6 add 1 2 3 + + Fetch: lw 2 4 20 IF/ ID ID/ EX EX/
M U X 3 + 1 + 4 1 3 2 R0 R1 36 4 36 A L U Register file R2 9 5 18 PC Inst mem Data memory M U X lw R3 12 45 R4 18 9 R5 7 7 R6 41 M U X data R7 22 6 dest 9 Fetch: lw Bits 0-2 3 6 3 Bits 16-18 M U X Bits 22-24 nand add noop IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 3

29 add 2 5 5 lw 2 4 20 nand 4 5 6 add 1 2 3 + + Fetch: add 2 5 5 IF/ ID
M U X 6 + 1 + 8 2 4 3 R0 R1 36 2 45 18 A L U Register file R2 9 4 9 PC Inst mem Data memory M U X add R3 12 -3 R4 18 45 7 R5 7 18 R6 41 M U X data R7 22 20 dest 7 Fetch: add 2 5 5 Bits 0-2 6 3 4 6 3 Bits 16-18 M U X Bits 22-24 lw nand add IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 4

30 sw 3 7 10 add 2 5 5 lw 2 4 20 nand 4 5 6 add + + Fetch: sw 3 7 10 IF/
M U X 20 + 1 + 23 3 5 4 R0 R1 36 45 2 -3 9 A L U Register file R2 9 5 9 PC Inst mem Data memory M U X sw R3 45 29 R4 18 -3 R5 7 7 R6 41 M U X data R7 22 20 5 dest 18 Fetch: sw Bits 0-2 4 6 3 5 4 6 Bits 16-18 M U X Bits 22-24 add lw nand IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 5

31 sw 3 7 10 add 2 5 5 lw 2 4 20 nand + + No more instructions IF/ ID ID/
X 5 + 1 + 9 4 5 R0 R1 36 -3 3 29 9 A L U Register file R2 9 7 45 PC Inst mem Data memory M U X R3 45 16 99 R4 18 29 7 R5 7 22 R6 -3 M U X data R7 22 10 dest 7 No more instructions Bits 0-2 5 4 6 7 5 4 Bits 16-18 M U X Bits 22-24 sw add lw IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 6

32 sw 3 7 10 add 2 5 5 lw + + No more instructions IF/ ID ID/ EX EX/ Mem
15 5 R0 R1 36 16 45 A L U Register file R2 9 PC Inst mem Data memory M U X R3 45 99 55 R4 99 16 R5 7 R6 -3 M U X data R7 22 10 dest 22 No more instructions Bits 0-2 7 5 4 7 5 Bits 16-18 M U X Bits 22-24 sw add IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 7

33 sw 3 7 10 add + + No more instructions IF/ ID ID/ EX EX/ Mem Mem/ WB
R1 36 16 55 A L U Register file R2 9 PC Inst mem Data memory M U X R3 45 R4 99 55 22 R5 16 R6 -3 M U X data R7 22 22 dest No more instructions Bits 0-2 5 7 Bits 16-18 M U X Bits 22-24 sw IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 8

34 sw + + No more instructions IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 9 1 36
R1 36 A L U Register file R2 9 PC Inst mem Data memory M U X R3 45 R4 99 R5 16 R6 -3 M U X data R7 22 dest No more instructions Bits 0-2 Bits 16-18 M U X Bits 22-24 IF/ ID ID/ EX EX/ Mem Mem/ WB Time: 9

35 Time graphs Time: 1 2 3 4 5 6 7 8 9 add nand lw sw
fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback fetch decode execute memory writeback

36 What can go wrong? Data hazards: since register reads occur in stage 2 and register writes occur in stage 5 it is possible to read the wrong value if is about to be written. Control hazards: A branch instruction may change the PC, but not until stage 4. What do we fetch before that? Exceptions: How do you handle exceptions in a pipelined processor with 5 instructions in flight?


Download ppt "Computer Architecture Lecture 3 Coverage: Appendix A"

Similar presentations


Ads by Google