Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4: Instruction Set Architectures

Similar presentations


Presentation on theme: "Chapter 4: Instruction Set Architectures"— Presentation transcript:

1 Chapter 4: Instruction Set Architectures
CS140 Computer Organization Chapter 4: Instruction Set Architectures These slides are derived from those of Null & Lobur + the work of others. Considerable material from previous years gleaned from Patterson & Hennessy. Chapter 4: ISA

2 Chapter 4 Objectives & Introduction
Learn the components common to every modern computer system. Be able to explain how each component contributes to program execution. Understand an ISA and how it relates to a real architecture. Know how the program assembly process works. Introduction: Chapter 1 presented a general overview of computer systems. Chapter 2 discussed how data is formatted, stored and manipulated. Chapter 3 described the fundamentals of digital circuits. With this, we can understand how computer components work, and how they fit together to create useful computer systems. Chapter 4: ISA

3 Computer Components Processor Chipset Memory Video Controller
Chapters 4 & 5 Chapter 6 FSB FSB Processor Chipset Memory Video Controller PCI Controller Clock PCI Bus Chapter 7 Chapter 4: ISA

4 4.2 CPU Basics The computer’s CPU fetches, decodes, and executes program instructions. The two principal parts of the CPU are the datapath and the control unit. The datapath consists of an arithmetic-logic unit and storage units (registers) that are interconnected by a data bus that is also connected to main memory. Various CPU components perform sequenced operations according to signals provided by its control unit. The control unit determines which actions to carry out according to the values in a program counter register and a status register. It’s like plumbing – the datapath is the pipes, the control units are the faucets and valves. Chapter 4: ISA

5 4.3 The Bus PARALLEL BUS The CPU shares data with other system components by way of a data bus. A bus is a set of wires that simultaneously convey a single bit along each line. Two types of buses are commonly found in computer systems: point-to-point, and multipoint buses. This is a point-to-point bus configuration: SERIAL BUS Deferred to later when we discuss IO Chapter 4: ISA

6 4.3 The Bus Buses have data lines, control lines, and address lines.
The data lines convey bits from one device to another, Control lines determine the direction of data flow, and when each device can access the bus. Address lines determine the location of the source or destination of the data. Chapter 4: ISA

7 4.3 The Bus A multipoint bus is shown below.
A multipoint bus is a shared resource, access to it is controlled through protocols, which are built into the hardware and handled by the control lines. Chapter 4: ISA

8 4.3 The Bus In a master-slave configuration, where more than one device can be the bus master, concurrent bus master requests must be arbitrated. Four categories of bus arbitration are: Daisy chain: Permissions are passed from the highest-priority device to the lowest. Centralized parallel: Each device is directly connected to an arbitration circuit. Distributed using self-detection: Devices decide which gets the bus among themselves. Distributed using collision-detection: Any device can try to use the bus. If its data collides with the data of another device, it tries again. Chapter 4: ISA

9 4.4 Clocks Every computer contains at least one clock that synchronizes the activities of its components. A fixed number of clock cycles are required to carry out each data movement or computational operation. The clock frequency, measured in megahertz or gigahertz, determines the speed with which all operations are carried out. Clock cycle time is the reciprocal of clock frequency. Typical Intel and typical PIC clocks look like this: A 2 GHz clock has a cycle time of 0.5 nanoseconds. A 8 MHz clock has a cycle time of microseconds. One master clock has multiple frequencies used for various parts of the system. Chapter 4: ISA

10 4.4 Clocks Clock speed should not be confused with CPU performance.
The CPU time required to run a program is given by the general performance equation: We can improve CPU performance when we reduce the number of instructions in a program, reduce the number of cycles per instruction, or reduce the number of nanoseconds per clock cycle. Chapter 4: ISA

11 Digression: Sequential Logic, Clocking
4.4 Clocks Digression: Sequential Logic, Clocking Combinational circuits: no memory Output depends only on the inputs Sequential circuits: have memory How to ensure memory element is updated neither too soon, nor too late? Recall hardware circuits Flip/flop register is the writable memory element Gate propagation delay means result takes time to stabilize; Delay varies with inputs Must wait until result stable before we can write that output register to the next stage - otherwise garbage results. How to be certain ALU output is stable? Solution: let the inputs chatter and stabilize – THEN apply the clock. Chapter 4: ISA

12 Digression: Sequential Logic, Clocking
4.4 Clocks Digression: Sequential Logic, Clocking Clock: free running signal with fixed cycle time (clock period) period rising edge falling edge high (1) low (0) Clock determines when to write memory element level-triggered - store clock high (low) edge-triggered - store only on clock edge We will use negative (falling) edge-triggered methodology Chapter 4: ISA

13 Role of Clock in Processors
4.4 Clocks Role of Clock in Processors single-cycle machine: does everything in one clock cycle instruction execution = up to 5 steps must complete 5th step before cycle ends clock signal instruction execution step 1/step 2/step 3/step 4/step 5 datapath stable register(s) written falling clock edge rising clock edge Chapter 4: ISA

14 4.5 The Input/Output Subsystem
A computer communicates with the outside world through its input/output (I/O) subsystem. I/O devices connect to the CPU through various interfaces. I/O can be memory-mapped-- where the I/O device behaves like main memory from the CPU’s point of view. Or I/O can be instruction-based, where the CPU has a specialized I/O instruction set. Chapter 4: ISA

15 4.6 Memory Organization Computer memory is a linear array of addressable storage cells that are similar to registers. Memory can be byte-addressable (most common), or word-addressable, where a word consists of two or more bytes. Memory is constructed of RAM chips, often referred to in terms of length  width. If the addressable-unit of the machine is 8 bits, then a 4M  8 RAM chip gives us 4 megabytes of 8-bit memory locations. Chapter 4: ISA

16 Power 2 ^ Power 1 2 4 3 8 16 5 32 6 64 7 128 256 9 512 10 1,024 11 2,048 12 4,096 13 8,192 14 16,384 15 32,768 65,536 17 131,072 18 262,144 19 524,288 20 1,048,576 21 2,097,152 22 4,194,304 4.6 Memory Organization How does the computer access a memory location corresponds to a particular address? We see that 4 Megabytes = 222 bytes. The memory locations for this memory are numbered 0 through Thus, the memory bus of this system requires at least 22 address lines. The address lines “count” from 0 to in binary. Each line is either “on” or “off” indicating the location of the desired memory element. Chapter 4: ISA

17 Low-Order Interleaving
4.6 Memory Organization Physical memory usually consists of more than one RAM chip. Access is more efficient when memory is organized into banks of chips with the addresses interleaved across the chips With low-order interleaving, the low order bits of the address specify which memory bank contains the address of interest. Low-Order Interleaving Byte Addresses Chapter 4: ISA

18 4.7 Interrupts The normal execution of a program is altered when an event of higher-priority occurs. The CPU is alerted to such an event through an interrupt. Interrupts can be triggered by I/O requests, arithmetic errors (such as division by zero), or when an invalid instruction is encountered. For general-purpose systems, it is common to disable all interrupts during the time in which an interrupt is being processed. Typically, this is achieved by setting a bit in the flags register. Interrupts that are ignored in this case are called maskable. Nonmaskable interrupts are high-priority interrupts that cannot be ignored. (the CPU is on fire is nonmaskable.) Each interrupt is associated with a procedure that directs the actions of the CPU when an interrupt occurs. In Chapter 7 we’ll look at interrupts in more detail. Chapter 4: ISA

19 4.7 Interrupts Interrupt processing involves adding another step to the fetch-decode-execute cycle as shown below. The next slide shows a flowchart of “Process the interrupt.” Chapter 4: ISA

20 4.7 Interrupts Chapter 4: ISA

21 4.9 Instruction Processing
Detour looking at Instruction Sets How are instructions laid out so they can be simply decoded? Going from PIC to MIPS Detour looking at Hardware Multiplexers, registers, and ALUs The datapath Looking at each stage The datapath and some sample instructions. Chapter 4: ISA

22 4.9 Instruction Processing
Detour looking at Instruction Sets These are the 35 instructions available on the PIC processors we’ve used. There are four types of instructions and the formats for these instructions are defined here. Byte oriented instructions have a 00 here. Bit oriented instructions have a 01 here. Goto and Call have a 10 Literal instructions have a 11 Chapter 4: ISA

23 4.9 Instruction Processing
Detour looking at Instruction Sets There are four types of instructions and the formats for these instructions are defined here. PIC is a RISC (Reduced Instruction Set Computer.) A relatively simple set of rules can decipher these instructions. Conversely, Intel is a CISC (Complex Instruction Set Computer.) The wiring needed to decipher the thousands of Intel instructions is overwhelming. Chapter 4: ISA

24 The MIPS Instruction Formats
4.9 Instruction Processing Detour looking at Instruction Sets The MIPS Instruction Formats All MIPS instructions are 32 bits long. The three instruction formats: R-type I-type J-type The different fields are: op: operation of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the “op” field address / immediate: address offset or immediate value target address: target address of the jump instruction op rs rt rd shamt funct 6 11 16 21 26 31 6 bits 5 bits op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits One of the most important thing you need to know before you start designing a processor is how the instructions look like. Or in more technical term, you need to know the instruction format. One good thing about the MIPS instruction set is that it is very simple. First of all, all MIPS instructions are 32 bits long and there are only three instruction formats: (a) R-type, (b) I-type, and (c) J-type. The different fields of the R-type instructions are: (a) OP specifies the operation of the instruction. (b) Rs, Rt, and Rd are the source and destination register specifiers. (c) Shamt specifies the amount you need to shift for the shift instructions. (d) Funct selects the variant of the operation specified in the “op” field. For the I-type instruction, bits 0 to 15 are used as an immediate field. I will show you how this immediate field is used differently by different instructions. Finally for the J-type instruction, bits 0 to 25 become the target address of the jump. +3 = 10 min. (X:50) op target address 26 31 6 bits 26 bits Hey – these are the same flavors as the PIC instructions. MIPS is a RISC It has about 75 instructions. Chapter 4: ISA

25 The MIPS-lite Subset for Today
4.9 Instruction Processing Detour looking at Instruction Sets The MIPS-lite Subset for Today ADD and SUB addu rd, rs, rt subu rd, rs, rt OR Immediate: ori rt, rs, imm16 LOAD / STORE Word lw rt, rs, imm16 sw rt, rs, imm16 BRANCH: beq rs, rt, imm16 op rs rt rd shamt funct 6 11 16 21 26 31 6 bits 5 bits op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits In today’s lecture, I will show you how to implement the following subset of MIPS instructions: add, subtract, or immediate, load, store, branch, and the jump instruction. The Add and Subtract instructions use the R format. The Op together with the Func fields together specified all the different kinds of add and subtract instructions. Rs and Rt specifies the source registers. And the Rd field specifies the destination register. The Or immediate instruction uses the I format. It only uses one source register, Rs. The other operand comes from the immediate field. The Rt field is used to specified the destination register. (Note that dest is the Rt field!) Both the load and store instructions use the I format and both add the Rs and the immediate filed together to from the memory address. The difference is that the load instruction will load the data from memory into Rt while the store instruction will store the data in Rt into the memory. The branch on equal instruction also uses the I format. Here Rs and Rt are used to specified the registers we need to compare. If these two registers are equal, we will branch to a location offset by the immediate field. Finally, the jump instruction uses the J format and always causes the program to jump to a memory location specified in the address field. I know I went over this rather quickly and you may have missed something. But don’t worry, this is just an overview. You will keep seeing these (point to the format) all day today. +3 = 13 min. (X:53) MIPS–Lite has 6 instructions. op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits Chapter 4: ISA

26 The MIPS-lite Subset for Today
4.9 Instruction Processing Detour looking at Instruction Sets The MIPS-lite Subset for Today Things to Note: Since each instruction is 32 bits, instruction addresses are mod 4. These instructions access 3 registers. The ori can use a 16 bit immediate. Each register needs 5 bits to specify it – how many registers does this machine have? In today’s lecture, I will show you how to implement the following subset of MIPS instructions: add, subtract, or immediate, load, store, branch, and the jump instruction. The Add and Subtract instructions use the R format. The Op together with the Func fields together specified all the different kinds of add and subtract instructions. Rs and Rt specifies the source registers. And the Rd field specifies the destination register. The Or immediate instruction uses the I format. It only uses one source register, Rs. The other operand comes from the immediate field. The Rt field is used to specified the destination register. (Note that dest is the Rt field!) Both the load and store instructions use the I format and both add the Rs and the immediate filed together to from the memory address. The difference is that the load instruction will load the data from memory into Rt while the store instruction will store the data in Rt into the memory. The branch on equal instruction also uses the I format. Here Rs and Rt are used to specified the registers we need to compare. If these two registers are equal, we will branch to a location offset by the immediate field. Finally, the jump instruction uses the J format and always causes the program to jump to a memory location specified in the address field. I know I went over this rather quickly and you may have missed something. But don’t worry, this is just an overview. You will keep seeing these (point to the format) all day today. +3 = 13 min. (X:53) Chapter 4: ISA

27 The MIPS-lite Subset for Today
4.9 Instruction Processing Detour looking at Instruction Sets The MIPS-lite Subset for Today In today’s lecture, I will show you how to implement the following subset of MIPS instructions: add, subtract, or immediate, load, store, branch, and the jump instruction. The Add and Subtract instructions use the R format. The Op together with the Func fields together specified all the different kinds of add and subtract instructions. Rs and Rt specifies the source registers. And the Rd field specifies the destination register. The Or immediate instruction uses the I format. It only uses one source register, Rs. The other operand comes from the immediate field. The Rt field is used to specified the destination register. (Note that dest is the Rt field!) Both the load and store instructions use the I format and both add the Rs and the immediate filed together to from the memory address. The difference is that the load instruction will load the data from memory into Rt while the store instruction will store the data in Rt into the memory. The branch on equal instruction also uses the I format. Here Rs and Rt are used to specified the registers we need to compare. If these two registers are equal, we will branch to a location offset by the immediate field. Finally, the jump instruction uses the J format and always causes the program to jump to a memory location specified in the address field. I know I went over this rather quickly and you may have missed something. But don’t worry, this is just an overview. You will keep seeing these (point to the format) all day today. +3 = 13 min. (X:53) Chapter 4: ISA

28 4.9 Instruction Processing
Detour looking at Instruction Sets How are instructions laid out so they can be simply decoded? Going from PIC to MIPS Detour looking at Hardware Multiplexers, registers, and ALUs The datapath Looking at each stage The datapath and some sample instructions. Chapter 4: ISA

29 4.9 Instruction Processing
Detour looking at Hardware Basic Building Blocks D-Latches D-latch based on SR-Latch with NAND Gates and control input C C = 0, no change of state; Q (t + t ) = Q (t ) C = 1, change is allowed; Q (t + t ) = D (t ) No Indetermined Output Chapter 4: ISA

30 4.9 Instruction Processing
Detour looking at Hardware Basic Building Blocks 32 A B Sum Carry Adder CarryIn Adder Based on the Register Transfer Language examples we have so far, we know we will need the following combinational logic elements. We will need an adder to update the program counter. A MUX to select the results. And finally, an ALU to do various arithmetic and logic operation. +1 = 30 min. (Y:10) 32 A B Result OP ALU 32 A B Y Select MUX MUX ALU Chapter 4: ISA

31 Storage Element: Register File
4.9 Instruction Processing Detour looking at Hardware Basic Building Blocks Storage Element: Register File Register File consists of 32 registers: Two 32-bit output busses: busA and busB One 32-bit input bus: busW Register is selected by: RA (number) selects the register to put on busA (data) RB (number) selects the register to put on busB (data) RW (number) selects the register to be written via busW (data) when Write Enable is 1 Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: RA or RB valid  busA or busB valid after “access time.” Write Enable Clk busW 32 busA busB 5 RW RA RB 32 32-bit Registers We will also need a register file that consists of bit registers with two output busses (busA and busB) and one input bus. The register specifiers Ra and Rb select the registers to put on busA and busB respectively. When Write Enable is 1, the register specifier Rw selects the register to be written via busW. In our simplified version of the register file, the write operation will occurs at the clock tick. Keep in mind that the clock input is a factor ONLY during the write operation. During read operation, the register file behaves as a combinational logic block. That is if you put a valid value on Ra, then bus A will become valid after the register file’s access time. Similarly if you put a valid value on Rb, bus B will become valid after the register file’s access time. In both cases (Ra and Rb), the clock input is not a factor. +2 = 33 min. (Y:13) Chapter 4: ISA

32 4.9 Instruction Processing
Detour looking at Hardware Basic Building Blocks Multiplexer Decoder 31 Input bits Selector Output Selector Outputs Input 31 The 5 selector wires can choose one of the 32 inputs voltages and send it to the output. The 5 selector wires choose which of the 32 outputs will get the input voltage. Chapter 4: ISA

33 4.9 Instruction Processing
Detour looking at Hardware Basic Building Blocks This multiplexer is equivalent to 32 of those on the previous page Multiplexer Bit 0 Bit 31 Bit 0 Bit 31 Selector Input words …………… Word 0 …………… Output Word 31 31 End View End View Now, each of the 32 inputs has 32 bits. There are 32 x 32 bits in and 1 x 32 bits out. Side View Each of these input words COULD be a register! Chapter 4: ISA

34 4.9 Instruction Processing
Detour looking at Hardware Basic Building Blocks Multiplexer 31 Input words Selector Output Write Enable Clk busW 32 busA busB 5 RW RA RB 32 32-bit Registers So this register file is just the multiplexer shown here. Chapter 4: ISA

35 Storage Element: Idealized Memory
4.9 Instruction Processing Detour looking at Hardware Basic Building Blocks Storage Element: Idealized Memory Memory (idealized) One input bus: Data In One output bus: Data Out Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory word to be written via the Data In bus Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: Address valid  Data Out valid after “access time.” Clk Data In Write Enable 32 DataOut Address The last storage element you will need for the datapath is the idealized memory to store your data and instructions. This idealized memory block has just one input bus (DataIn) and one output bus (DataOut). When Write Enable is 0, the address selects the memory word to put on the Data Out bus. When Write Enable is 1, the address selects the memory word to be written via the DataIn bus at the next clock tick. Once again, the clock input is a factor ONLY during the write operation. During read operation, it behaves as a combinational logic block. That is if you put a valid value on the address lines, the output bus DataOut will become valid after the access time of the memory. +2 = 35 min. (Y:15) Chapter 4: ISA

36 4.9 Instruction Processing
Detour looking at Instruction Sets How are instructions laid out so they can be simply decoded? Going from PIC to MIPS Detour looking at Hardware Multiplexers, registers, and ALUs The datapath Looking at each stage The datapath and some sample instructions. Chapter 4: ISA

37 4.9 Instruction Processing
The fetch-decode-execute-store cycle is the series of steps that a computer carries out when it runs a program. We first have to fetch an instruction from memory, and place it into the Instruction Register (IR). Once in the IR, it is decoded to determine what needs to be done next. If an immediate operand is involved in the operation, it is retrieved and prepared for execution. With everything in place, the instruction is executed. If a result is to be stored in memory, that’s done next. If a result is placed in a register, that’s the last stage. The next slide shows a flowchart of this process. Chapter 4: ISA

38 Generic Steps: Datapath
4.9 Instruction Processing Generic Steps: Datapath PC instruction memory +4 rt rs rd registers ALU Data imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Chapter 4: ISA

39 Stages of the Datapath (1/6)
4.9 Instruction Processing PC instruction memory +4 rt rs rd registers ALU Data imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Stages of the Datapath (1/6) Problem: a single, atomic block which “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient Solution: break up the process of “executing an instruction” into stages, and then connect the stages to create the whole datapath Smaller stages are easier to design Easy to optimize (change) one stage without touching the others Chapter 4: ISA

40 Stages of the Datapath (2/6)
4.9 Instruction Processing PC instruction memory +4 rt rs rd registers ALU Data imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Stages of the Datapath (2/6) There is a wide variety of MIPS instructions: so what general steps do they have in common? Stage 1: instruction fetch No matter what the instruction, the 32-bit instruction word must first be fetched from memory (the cache- memory hierarchy) Also, this is where we increment PC (that is, PC = PC + 4, to point to the next instruction: byte addressing so + 4) Chapter 4: ISA

41 Stages of the Datapath (3/6)
4.9 Instruction Processing PC instruction memory +4 rt rs rd registers ALU Data imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Stages of the Datapath (3/6) Stage 2: Instruction Decode upon fetching the instruction, we next gather data from the fields (decode all necessary instruction data) first, read the Opcode to determine instruction type and field lengths second, read in data from all necessary registers for add, read two registers for addi, read one register for jal, no reads necessary Chapter 4: ISA

42 Stages of the Datapath (4/6)
4.9 Instruction Processing PC instruction memory +4 rt rs rd registers ALU Data imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Stages of the Datapath (4/6) Stage 3: ALU (Arithmetic-Logic Unit) the real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons (slt) what about loads and stores? lw $t0, 40($t1) the address we are accessing in memory = the value in $t1 + the value 40 so we do this addition in this stage Chapter 4: ISA

43 Stages of the Datapath (5/6)
4.9 Instruction Processing PC instruction memory +4 rt rs rd registers ALU Data imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Stages of the Datapath (5/6) Stage 4: Memory Access actually only the load and store instructions do anything during this stage; the others remain idle since these instructions have a unique step, we need this extra stage to account for them as a result of the cache system, this stage is expected to be just as fast (on average) as the others Chapter 4: ISA

44 Stages of the Datapath (6/6)
4.9 Instruction Processing PC instruction memory +4 rt rs rd registers ALU Data imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Stages of the Datapath (6/6) Stage 5: Register Write most instructions write the result of some computation into a register examples: arithmetic, logical, shifts, loads, slt what about stores, branches, jumps? don’t write anything into a register at the end these remain idle during this fifth stage Chapter 4: ISA

45 Generic Steps: Datapath
4.9 Instruction Processing Generic Steps: Datapath PC instruction memory +4 rt rs rd registers ALU Data imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute 4. Memory 5. Reg. Write Chapter 4: ISA

46 4.9 Instruction Processing
Detour looking at Instruction Sets How are instructions laid out so they can be simply decoded? Going from PIC to MIPS Detour looking at Hardware Multiplexers, registers, and ALUs The datapath Looking at each stage The datapath and some sample instructions. Chapter 4: ISA

47 Datapath Walkthrough #1 - add
4.9 Instruction Processing Datapath Walkthrough #1 - add add $r3, $r1, $r2 # r3 = r1+r2 Stage 1: fetch this instruction, incr. PC; Stage 2: decode to find it’s an add, read registers $r1 and $r2; Stage 3: add the two values retrieved in Stage 2; Stage 4: idle (nothing to write to memory); Stage 5: write result of Stage 3 into register $r3; Chapter 4: ISA

48 4.9 Instruction Processing
Datapath Walkthrough #1 add $r3, $r1, $r2 # r3 = r1+r2 PC instruction memory +4 registers ALU Data imm 2 1 3 add r3, r1, r2 reg[1]+reg[2] reg[2] reg[1] Chapter 4: ISA

49 Datapath Walkthroughs #2 - sw
4.9 Instruction Processing Datapath Walkthroughs #2 - sw sw $r3, 17($r1) Stage 1: fetch this instruction, inc. PC Stage 2: decode to find it’s a sw, then read registers $r1 and $r3 Stage 3: add 17 to value in register $41 (retrieved in Stage 2) Stage 4: write value in register $r3 (retrieved in Stage 2) into memory address computed in Stage 3 Stage 5: go idle (nothing to write into a register) Chapter 4: ISA

50 4.9 Instruction Processing Datapath Walkthroughs #2 sw $r3, 17($r1)
PC instruction memory +4 registers ALU Data imm 3 1 x SW r3, 17(r1) reg[1]+17 17 reg[1] MEM[r1+17]<=r3 reg[3] Chapter 4: ISA

51 4.9 Instruction Processing
Datapath Walkthroughs #3 – lw NOTE: This is the one instruction that requires all 5 stages lw $r3, 17($r1) Stage 1: fetch this instruction, inc. PC Stage 2: decode to find it’s a lw, then read register $r1 Stage 3: add 17 to value in register $r1 (retrieved in Stage 2) Stage 4: read value from memory address compute in Stage 3 Stage 5: write value found in Stage 4 into register $r3 Chapter 4: ISA

52 4.9 Instruction Processing Datapath Walkthroughs #3 lw $r3, 17($r1)
PC instruction memory +4 registers ALU Data imm 3 1 x LW r3, 17(r1) reg[1]+17 17 reg[1] MEM[r1+17] Chapter 4: ISA

53 Datapath Summary Controller
The datapath based on data transfers required to perform instructions A controller causes the right transfers to happen PC instruction memory +4 rt rs rd registers ALU Data imm Controller opcode, funct Chapter 4: ISA

54 4.13 Decoding & Control An overview of how control works  from instructions to electronics How are instructions laid out so they can be simply decoded? Going from PIC to MIPS Control operations Samples of load, store and branch The fetch unit in detail using “add” as example. Operation of controls using or immediate, store, branch Chapter 4: ISA

55 4.13 Decoding Mapping Code onto the DataPath: How does it all work? We Start With Code ############################################################# # This is code that mimics the following C program. # main( ) # { # printf( "Hello World\n" ); # } # ########################################################### .text .globl main main: lui $a0, hello ori $v0, $0, 4 # li $v0, 4 add $t0, $t1, $t2 syscall jr $ra .data hello: .asciiz "Hello World\n" Chapter 4: ISA

56 From that code we get a 32-bit Equivalent Binary Representation.
4.13 Decoding From that code we get a 32-bit Equivalent Binary Representation. Address Hex Op-code Mnemonic [0x ] 0x3c lui $2, ; 12: la $a0, hello [0x ] 0x ori $4, $2, ; [0x ] 0x ori $2, $0, ; 14: ori $v0, $0, 4 [0x c] 0x012a4020 add $8, $9, $10 ; 14: add $t0,$t1,$t2 [0x ] 0x c syscall ; 15: syscall [0x ] 0x03e jr $ ; 16: jr $ra a Chapter 4: ISA

57 What the hardware looks like.
4.13 Decoding Registers What the hardware looks like. Op Select 5 wires R0 6 wires MUX To ALU R8 A B MUX To ALU ALU c ovf Out MUX From ALU R31 Chapter 4: ISA

58 What the hardware looks like.
4.13 Decoding Op What the hardware looks like. Registers R0 MUX To ALU R8 A B MUX To ALU ALU Out MUX From ALU R31 Chapter 4: ISA

59 The hardware reads each of those fields.
4.13 Decoding The hardware reads each of those fields. Registers R0 MUX To ALU R8 A B MUX To ALU ALU Out MUX From ALU Chapter 4: ISA R31

60 An Overview of the Implementation
4.13 Decoding & Control An Overview of the Implementation Data Out Clk 5 Rw Ra Rb 32 32-bit Registers Rd ALU Data In Address Ideal Memory Instruction PC Rs Rt 32 A B Next Address Control Datapath Control Signals Conditions One thing you may noticed from our last slide is that almost all instructions, except Jump, require reading some registers, do some computation, and then do something else. Therefore our datapath will look something like this. For example, if we have an add instruction (points to the output of Instruction Memory), we will read the registers from the register file (Ra, Rb and then busA and busB). Add the two numbers together (ALU) and then write the result back to the register file. On the other hand, if we have a load instruction, we will first use the ALU to calculate the memory address. Once the address is ready, we will use it to access the Data Memory. And once the data is available on Data Memory’s output bus, we will write the data to the register file. Well, this is simple enough. But if it is this simple, you probably won’t need to take this class. So in today’s lecture, I will show you how to turn this abstract datapath into a real datapath by making it slightly (JUST slightly) more complicated so it can do real work for you. But before we do that, let’s do a quick review of the clocking methodology +3 = 16 (X:56) Chapter 4: ISA

61 4.13 Decoding & Control An overview of how control works  from instructions to electronics How are instructions laid out so they can be simply decoded? Going from PIC to MIPS Control operations Samples of load, store and branch The fetch unit in detail using “add” as example. Operation of controls using or immediate, store, branch Chapter 4: ISA

62 Overview of the Instruction Fetch Unit
4.13 Decoding & Control Control Samples The common operations Fetch the Instruction: mem[PC] Update the program counter: Sequential Code: PC  PC + 4 Branch and Jump: PC  “something else” Overview of the Instruction Fetch Unit 32 Instruction Word Address Instruction Memory PC Clk Next Address Logic Now let’s take a look at the first major component of the datapath: the instruction fetch unit. The common RTL operations for all instructions are: (a) Fetch the instruction using the Program Counter (PC) at the beginning of an instruction’s execution (PC -> Instruction Memory -> Instruction Word). (b) Then at the end of the instruction’s execution, you need to update the Program Counter (PC -> Next Address Logic -> PC). More specifically, you need to increment the PC by 4 if you are executing sequential code. For Branch and Jump instructions, you need to update the program counter to “something else” other than plus 4. I will show you what is inside this Next Address Logic block when we talked about the Branch and Jump instructions. For now, let’s focus our attention to the Add and Subtract instructions. +2 = 37 min. (Y:17) Chapter 4: ISA

63 4.13 Decoding & Control Control Samples
Add & Subtract R[rd]  R[rs] op R[rt]; Example: addu rd, rs, rt Ra, Rb, and Rw come from instruction’s rs, rt, and rd fields ALUctr and RegWr: control logic after decoding the instruction op rs rt rd shamt funct 6 11 16 21 26 31 6 bits 5 bits And here is the datapath that can do the trick. First of all, we connect the register file’s Ra, Rb, and Rw input to the Rd, Rs, and Rt fields of the instruction bus (points to the format diagram). Then we need to connect busA and busB of the register file to the ALU. Finally, we need to connect the output of the ALU to the input bus of the register file. Conceptually, this is how it works. The instruction bus coming out of the Instruction memory will set the Ra and Rb to the register specifiers Rs and Rt. This causes the register file to put the value of register Rs onto busA and the value of register Rt onto busB, respectively. By setting the ALUctr appropriately, the ALU will perform either the Add and Subtract for us. The result is then fed back to the register file where the register specifier Rw should already be set to the instruction bus’s Rd field. Since the control, which we will design in our next lecture, should have already set the RegWr signal to 1, the result will be written back to the register file at the next clock tick (points to the Clk input). +3 = 42 min. (Y:22) Rd Rs Rt RegWr ALUctr 5 5 5 busA Rw Ra Rb busW 32 Result bit Registers ALU 32 32 Clk busB 32 Chapter 4: ISA

64 Logical Operations With Immediate
4.13 Decoding & Control Control Samples Logical Operations With Immediate R[rt]  R[rs] op ZeroExt[ imm16 ] 11 op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits rd? immediate 16 15 31 16 bits 32 Result ALUctr Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs ZeroExt Mux Rt Rd RegDst 16 imm16 ALUSrc ALU Rt? Here is the datapath for the Or immediate instructions. We cannot use the Rd field here (Rw) because in this instruction format, we don’t have a Rd field. The Rd field in the R-type is used here as part of the immediate field. For this instruction type, Rw input of the register file, that is the address of the register to be written, comes from the Rt field of the instruction. Recalled from earlier slide that for R-type instruction, the Rw comes from the Rd field. That’s why we need a MUX here to put Rd onto Rw for R-type instructions and to put Rt onto Rw for the I-type instruction. Since the second operation of this instruction will be the immediate field zero extended to 32 bits, we also need a MUX here to block off bus B from the register file. Since bus B is blocked off by the MUX, the value on bus B is don’t care. Therefore we do not have to worry about what ends up on the register file’s Rb register specifier. To keep things simple, we may just as well keep it the same as the R-type instruction and put the Rt field here. So to summarize, this is how this datapath works. With Rs on Register File’s Ra input, bus A will get the value of Rs as the first ALU operand. The second operand will come from the immediate field of the instruction. Once the ALU complete the OR operation, the result will be written into the register specified by the instruction’s Rt field. +3 = 50 min. (Y:30) Chapter 4: ISA

65 4.13 Decoding & Control Control Samples
Load Operations R[rt]  Mem[R[rs] + SignExt[imm16]]; Example: lw rt, rs, imm16 11 op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits rd 32 ALUctr Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst Extender Mux 16 imm16 ALUSrc ExtOp Data In WrEn Adr Data Memory ALU MemWr W_Src ?? Rt? Once again we cannot use the instruction’s Rd field for the Register File’s Rw input because load is a I-type instruction and there is no such thing as the Rd field in the I format. So instead of Rd, the Rt field is used to specify the destination register through this two to one multiplexor. The first operand of the ALU comes from busA of the register file which contains the value of Register Rs (points to the Ra input of the register file). The second operand, on the other hand, comes from the immediate field of the instruction. Instead of using the Zero Extender I used in datapath for the or immediate datapath, I have to use a more general purpose Extender that can do both Sign Extend and Zero Extend. The ALU then adds these two operands together to form the memory address. Consequently, the output of the ALU has to go to two places: (a) First the address input of the data memory. (b) And secondly, also to the input of this two-to-one multiplexer. The other input of this multiplexer comes from the output of the data memory so we can place the output of the data memory onto the register file’s input bus for the load instruction. For Add, Subtract, and the Or immediate instructions, the output of the ALU will be selected to be placed on the input bus of the register file. In either case, the control signal RegWr should be asserted so the register file will be written at the end of the cycle. +3 = 60 min. (Y:40) Chapter 4: ISA

66 4.13 Decoding & Control Control Samples Store Operations
Mem[ R[rs] + SignExt[imm16]  R[rt] ]; Example: sw rt, rs, imm16 op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits 32 ALUctr Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst Extender Mux 16 imm16 ALUSrc ExtOp Data In WrEn Adr Data Memory MemWr ALU W_Src And here is the datapath for the store instruction. The Register File, the ALU, and the Extender are the same as the datapath for the load instruction because the memory address has to be calculated the same exact way: (a) Put the register selected by Rs onto bus A and sign extend the 16 bit immediate field. (b) Then make the ALU (ALUctr) adds these two (busA and output of Extender) together. The new thing we added here is busB extension (DataIn). More specifically, in order to send the register selected by the Rt field (Rb of the register file) to data memory, we need to connect bus B to the data memory’s Data In bus. Finally, the store instruction is the first instruction we encountered that does not do any register write at the end. Therefore the control unit must make sure RegWr is zero for this instruction. +2 = 64 min. (Y:44) Chapter 4: ISA

67 The Branch Instruction
4.13 Decoding & Control Control Samples The Branch Instruction op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits How does the branch on equal instruction work? Well it calculates the branch condition by subtracting the register selected by the Rt field from the register selected by the Rs field. If the result of the subtraction is zero, then these two registers are equal and we take a branch. Otherwise, we keep going down the sequential path (PC <- PC +4). +1 = 65 min. (Y:45) beq rs, rt, imm16 mem[PC] Fetch the instruction from memory Equal  R[rs] == R[rt] Calculate the branch condition if (Equal) Calculate the next instruction’s address PC  PC ( SignExt(imm16)  4 ) else PC  PC + 4 Chapter 4: ISA

68 Datapath for Branch Operations
4.13 Decoding & Control Control Samples Datapath for Branch Operations op rs rt immediate 16 21 26 31 6 bits 16 bits 5 bits beq rs, rt, imm16 Datapath generates condition (equal) 32 imm16 PC Clk 00 Adder Mux 4 nPC_sel busW RegWr busA busB 5 Rw Ra Rb bit Registers Rs Rt Equal? Cond PC Ext Inst Address The datapath for calculating the branch condition is rather simple. All we have to do is feed the Rs and Rt fields of the instruction into the Ra and Rb inputs of the register file. Bus A will then contain the value from the register selected by Rs. And bus B will contain the value from the register selected by Rt. The next thing to do is to ask the ALU to perform a subtract operation and feed the output Zero to the next address logic. How does the next address logic block look like? Well, before I show you that, let’s take a look at the binary arithmetics behind the program counter (PC). +2 = 67 min. (Y:47) Chapter 4: ISA

69 Summary: A Single Cycle Datapath
4.13 Decoding & Control Rs, Rt, Rd and Imed16 hardwired into datapath from Fetch Unit We have everything except control signals (the underlined pieces) Summary: A Single Cycle Datapath Adr Inst Memory Instruction<31:0> <21:25> <16:20> <11:15> <0:15> Rs Rt Rd Imm16 nPC_sel RegDst ALUctr MemWr MemtoReg Equal So here is the single cycle datapath we just built. If you push into the Instruction Fetch Unit, you will see the last slide showing the PC, the next address logic, and the Instruction Memory. Here I have shown how we can get the Rt, Rs, Rd, and Imm16 fields out of the 32-bit instruction word. The Rt, Rs, and Rd fields will go to the register file as register specifiers while the Imm16 field will go to the Extender where it is either Zero and Sign extended to 32 bits. The signals ExtOp, ALUSrc, ALUctr, MemWr, MemtoReg, RegDst, RegWr, Branch, and Jump are control signals. And I will show you how to generate them on Friday. +2 = 80 min. (Z:00) Rd Rt 1 Rs Rt 4 Adder RegWr 5 5 5 busA Mux Rw Ra Rb 00 busW = 32 32 32-bit Registers 32 ALU busB PC 32 Adder 32 Mux Mux Clk 32 WrEn Adr 1 1 Data In PC Ext imm16 Clk Extender 32 Data Memory 16 imm16 Clk ExtOp ALUSrc Chapter 4: ISA

70 Summary: Meaning of the Control Signals
4.13 Decoding & Control Summary: Meaning of the Control Signals MemWr: 1 Þ write memory MemtoReg: 0 Þ ALU; 1 Þ Mem RegDst: 0 Þ “rt”; 1 Þ “rd” RegWr: 1 Þ write register ExtOp: “zero”, “sign” ALUsrc: 0 Þ regB; 1 Þ immed ALUctr: “add”, “sub”, “or” 32 ALUctr Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst Extender Mux 16 imm16 ALUSrc ExtOp MemtoReg Data In WrEn Adr Data Memory MemWr ALU Equal 1 = Chapter 4: ISA

71 4.13 Decoding & Control An overview of how control works  from instructions to electronics How are instructions laid out so they can be simply decoded? Going from PIC to MIPS Control operations Samples of load, store and branch The fetch unit in detail using “add” as example. Operation of controls using or immediate, store, branch Chapter 4: ISA

72 4.13 Decoding & Control The add Instruction add rd, rs, rt
op rs rt rd shamt funct 6 11 16 21 26 31 6 bits 5 bits add rd, rs, rt mem[PC] Fetch the instruction from memory R[rd]  R[rs] + R[rt] The actual operation PC  PC + 4 Calculate next instruction address OK, let’s get on with today’s lecture by looking at the simple add instruction. In terms of Register Transfer Language, this is what the Add instruction need to do. First, you need to fetch the instruction from Memory. Then you perform the actual add operation. More specifically: (a) You add the contents of the register specified by the Rs and Rt fields of the instruction. (b) Then you write the results to the register specified by the Rd field. And finally, you need to update the program counter to point to the next instruction. Now, let’s take a detail look at the datapath during various phase of this instruction. +2 = 10 min. (X:50) Chapter 4: ISA

73 4.13 Decoding & Control The Fetch Unit
nPC_sel: 0 Þ PC  PC Þ PC  PC SignExt(Im16) || 00 Adr Inst Memory Adder PC Clk 00 Mux 4 nPC_sel PC Ext imm16 Chapter 4: ISA

74 Fetch Unit at Beginning (and end) of add
4.13 Decoding & Control Fetch Unit at Beginning (and end) of add Fetch the instruction from Instruction memory: Instruction  mem[PC] (This is the same for all instructions) But wait until we get to branch instructions!! Adr Inst Memory Adder PC Clk 00 Mux 4 nPC_sel imm16 Instruction<31:0> 1 Chapter 4: ISA

75 4.13 Decoding & Control An overview of how control works  from instructions to electronics How are instructions laid out so they can be simply decoded? Going from PIC to MIPS Control operations Samples of load, store and branch The fetch unit in detail using “add” as example. Operation of controls using or immediate, store, branch Chapter 4: ISA

76 The Single Cycle Datapath during Or Immediate
4.13 Decoding & Control The Single Cycle Datapath during Or Immediate op rs rt immediate 16 21 26 31 R[rt]  R[rs] or ZeroExt(Imm16) 32 ALUctr = Clk busW RegWr = busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst = Extender Mux 16 imm16 ALUSrc = ExtOp = MemtoReg = Data In WrEn Adr Data Memory MemWr = ALU Instruction Fetch Unit Zero Instruction<31:0> 1 <21:25> <16:20> <11:15> <0:15> Imm16 nPC_sel = Now let’s look at the control signals setting for the Or immediate instruction. The OR immediate instruction OR the content of the register specified by the Rs field to the Zero Extended Immediate field and write the result to the register specified in Rt. This is how it works in the datapath. The Rs field is fed to the Ra address port to cause the contents of register Rs to be placed on busA. The other operand for the ALU will come from the immediate field. In order to do this, the controller need to set ExtOp to 0 to instruct the extender to perform a Zero Extend operation. Furthermore, ALUSrc must set to 1 such that the MUX will block off bus B from the register file and send the zero extended version of the immediate field to the ALU. Of course, the ALUctr has to be set to OR so the ALU can perform an OR operation. The rest of the control signals (MemWr, MemtoReg, Branch, and Jump) are the same as theAdd and Subtract instructions. One big difference is the RegDst signal. In this case, the destination register is specified by the instruction’s Rt field, NOT the Rd field because we do not have a Rd field here. Consequently, RegDst must be set to 0 to place Rt onto the Register File’s Rw address port. Finally, in order to accomplish the register write, RegWr must be set to 1. +3 = 20 min. (X:60) Chapter 4: ISA

77 The Single Cycle Datapath during Or Immediate
4.13 Decoding & Control The Single Cycle Datapath during Or Immediate op rs rt immediate 16 21 26 31 R[rt]  R[rs] or ZeroExt(Imm16) 32 ALUctr = Or Clk busW RegWr = 1 busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst = 0 Extender Mux 16 imm16 ALUSrc = 1 ExtOp = 0 MemtoReg = 0 Data In WrEn Adr Data Memory MemWr = 0 ALU Instruction Fetch Unit Zero Instruction<31:0> 1 <21:25> <16:20> <11:15> <0:15> Imm16 nPC_sel= +4 Now let’s look at the control signals setting for the Or immediate instruction. The OR immediate instruction OR the content of the register specified by the Rs field to the Zero Extended Immediate field and write the result to the register specified in Rt. This is how it works in the datapath. The Rs field is fed to the Ra address port to cause the contents of register Rs to be placed on busA. The other operand for the ALU will come from the immediate field. In order to do this, the controller need to set ExtOp to 0 to instruct the extender to perform a Zero Extend operation. Furthermore, ALUSrc must set to 1 such that the MUX will block off bus B from the register file and send the zero extended version of the immediate field to the ALU. Of course, the ALUctr has to be set to OR so the ALU can perform an OR operation. The rest of the control signals (MemWr, MemtoReg, Branch, and Jump) are the same as theAdd and Subtract instructions. One big difference is the RegDst signal. In this case, the destination register is specified by the instruction’s Rt field, NOT the Rd field because we do not have a Rd field here. Consequently, RegDst must be set to 0 to place Rt onto the Register File’s Rw address port. Finally, in order to accomplish the register write, RegWr must be set to 1. +3 = 20 min. (X:60) Chapter 4: ISA

78 The Single Cycle Datapath during Store
4.13 Decoding & Control The Single Cycle Datapath during Store op rs rt immediate 16 21 26 31 Data Memory {R[rs] + SignExt[imm16]}  R[rt] 32 ALUctr = Clk busW RegWr = busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst = Extender Mux 16 imm16 ALUSrc = ExtOp = MemtoReg = Data In WrEn Adr Data Memory MemWr = ALU Instruction Fetch Unit Zero Instruction<31:0> 1 <21:25> <16:20> <11:15> <0:15> Imm16 nPC_sel = The store instruction performs the inverse function of the load. Instead of loading data from memory, the store instruction sends the contents of register specified by Rt to data memory. Similar to the load instruction, the store instruction needs to read the contents of register Rs (points to Ra port) and add it to the sign extended verion of the immediate filed (Imm16, ExtOp = 1, ALUSrc = 1) to form the data memory address (ALUctr = add). However unlike the Load instructoion where busB is not used, the store instruction will use busB to send the data to the Data memory. Consequently, the Rt field of the instruction has to be fed to the Rb port of the register file. In order to write the Data Memory properly, the MemWr signal has to be set to 1. Notice that the store instruction does not update the register file. Therefore, RegWr must be set to zero and consequently control signals RegDst and MemtoReg are don’t cares. And once again we need to set the control signals Branch and Jump to zero to ensure proper Program Counter updataing. Well, by now, you are probably tied of these boring stuff where Branch and Jump are zero so let’s look at something different--the bracnh instruction. +3 = 31 min. (Y:11) Chapter 4: ISA

79 The Single Cycle Datapath during Store
4.13 Decoding & Control The Single Cycle Datapath during Store op rs rt immediate 16 21 26 31 Data Memory {R[rs] + SignExt[imm16]}  R[rt] Instruction<31:0> 32 ALUctr = Add Clk busW RegWr = 0 busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst = x Extender Mux 16 imm16 ALUSrc = 1 ExtOp = 1 MemtoReg = x Data In WrEn Adr Data Memory MemWr = 1 ALU Instruction Fetch Unit Zero 1 <21:25> <16:20> <11:15> <0:15> Imm16 nPC_sel= +4 The store instruction performs the inverse function of the load. Instead of loading data from memory, the store instruction sends the contents of register specified by Rt to data memory. Similar to the load instruction, the store instruction needs to read the contents of register Rs (points to Ra port) and add it to the sign extended verion of the immediate filed (Imm16, ExtOp = 1, ALUSrc = 1) to form the data memory address (ALUctr = add). However unlike the Load instructoion where busB is not used, the store instruction will use busB to send the data to the Data memory. Consequently, the Rt field of the instruction has to be fed to the Rb port of the register file. In order to write the Data Memory properly, the MemWr signal has to be set to 1. Notice that the store instruction does not update the register file. Therefore, RegWr must be set to zero and consequently control signals RegDst and MemtoReg are don’t cares. And once again we need to set the control signals Branch and Jump to zero to ensure proper Program Counter updataing. +3 = 31 min. (Y:11) Chapter 4: ISA

80 The Single Cycle Datapath during Branch
4.13 Decoding & Control The Single Cycle Datapath during Branch op rs rt immediate 16 21 26 31 if (R[rs] – R[rt] == 0) then Zero  1; else Zero  0 32 ALUctr =Sub Clk busW RegWr = 0 busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst = x Extender Mux 16 imm16 ALUSrc = 0 ExtOp = x MemtoReg = x Data In WrEn Adr Data Memory MemWr = 0 ALU Instruction Fetch Unit Zero Instruction<31:0> 1 <21:25> <16:20> <11:15> <0:15> Imm16 nPC_sel= “Br” So how does the branch instruction work? As far as the main datapath is concerned, it needs to calculate the branch condition. That is, it subtracts the register specified in the Rt field from the register specified in the Rs field and set the condition Zero accordingly. In order to place the register values on busA and busB, we need to feed the Rs and Rt fields of the instruction to the Ra and Rb ports of the register file and set ALUSrc to 0. Then we have to instruction the ALU to perform the subtract (ALUctr = sub) operation and set the Zero bit accordingly. The Zero bit is sent to the Instruction Fetch Unit. I will show you the internal of the Instruction Fetch Unit in a second. But before we leave this slide, I want you to notice that ExtOp, MemtoReg, and RegDst are don’t cares but RegWr and MemWr have to be ZERO to prevent any write to occur. And finally, the controller needs to set the Branch signal to 1 so the Instruction Fetch Unit knows what to do. So now let’s take a look at the Instruction Fetch Unit. +2 = 33 min. (Y:13) Chapter 4: ISA

81 The instruction fetch unit at end of branch
4.13 Decoding & Control The instruction fetch unit at end of branch op rs rt immediate 16 21 26 31 if (Zero == 1) then PC = PC SignExt(imm16)4 ; else PC = PC + 4 Adr Inst Memory Adder PC Clk 00 Mux 4 nPC_sel imm16 Instruction<31:0> 1 Zero What is encoding of nPC_sel? Direct MUX select? Branch / not branch Let’s choose second option Let’s look at the case where the branch condition Zero is true (Zero = 1). If Zero is not asserted, we will have our boring case where PC + 1 is selected. With Branch = 1 and Zero = 1, the output of the second adder will be selected. That is, we will add the seqential address, that is output of the first adder, to the sign extended version of the immediate field, to form the branch target address (output of 2nd adder). With the control signal Jump set to zero, this branch target address will be written into the Program Counter register (PC) at the end of the clock cycle. +2 = 35 min. (Y:15) Chapter 4: ISA

82 Summary: A Single Cycle Processor
4.13 Decoding & Control Summary: A Single Cycle Processor 32 ALUctr Clk busW RegWr busA busB 5 Rw Ra Rb 32 32-bit Registers Rs Rt Rd RegDst Extender Mux 16 imm16 ALUSrc ExtOp MemtoReg Data In WrEn Adr Data Memory MemWr ALU Instruction Fetch Unit Zero Instruction<31:0> 1 <21:25> <16:20> <11:15> <0:15> Imm16 Main Control op 6 func 3 ALUop : Instr<5:0> Instr<31:26> Instr<15:0> nPC_sel OK, now that we have the Main Control implemented, we have everything we needed for the single cycle processor and here it is. The Instruction Fetch Unit gives us the instruction. The OP field is fed to the Main Control for decode and the Func field is fed to the ALU Control for local decoding. The Rt, Rs, Rd, and Imm16 fields of the instruction are fed to the data path. Bsed on the OP field of the instruction, the Main Control of will set the control signals RegDst, ALUSrc, .... etc properly as I showed you earlier using separate slides. Furthermore, the ALUctr is use the ALUop from the Main conrol and the func field of the instruction to generate the ALUctr signals to ask the ALU to do the right thing: Add, Subtract, Or, and so on. This processor will execute each of the MIPS instruction in the subset in one cycle. +2 = 72 min (Y:52) Chapter 4: ISA

83 Drawback of this Single Cycle Processor
4.13 Decoding & Control Drawback of this Single Cycle Processor Long cycle time: Cycle time must be long enough for the load instruction: PC’s Clock -to-Q + Instruction Memory Access Time + Register File Access Time + ALU Delay (address calculation) + Data Memory Access Time + Register File Setup Time + Clock Skew Cycle time for load is much longer than needed for all other instructions Well, the last slide pretty much illustrate one of the biggest disadvantage of the single cycle implementation: it has a long cycle time. More specifically, the cycle time must be long enough for the load instruction which has the following components: Clock to Q time of the PC, .... Having a long cycle time is a big problem but not the the only problem. Another problem of this single cycle implementation is that this cycle time, which is long enough for the load instruction, is too long for all other instructions. We will show you why this is bad and what we can do about it in the next few lectures. That’s all for today. +2 = 79 min (Y:59) Chapter 4: ISA

84 4.14 Real World Architectures
We will look at an Intel architecture, which is a CISC machine and MIPS, which is a RISC machine. CISC is an acronym for complex instruction set computer. RISC stands for reduced instruction set computer. Chapter 4: ISA

85 4.14 Real World Architectures
The classic Intel architecture, the 8086, was born in It is a CISC architecture. It was adopted by IBM for its famed PC, which was released in 1981. The 8086 operated on 16-bit data words and supported 20-bit memory addresses. Later, to lower costs, the 8-bit 8088 was introduced. Like the 8086, it used 20-bit memory addresses. What was the largest memory that the 8086 could address? Chapter 4: ISA

86 4.14 Real World Architectures
The 8086 had four 16-bit general-purpose registers that could be accessed by the half-word. It also had a flags register, an instruction register, and a stack accessed through the values in two other registers, the base pointer and the stack pointer. The 8086 had no built in floating-point processing. In 1980, Intel released the 8087 numeric coprocessor, but few users elected to install them because of their cost. Chapter 4: ISA

87 4.14 Real World Architectures
In 1985, Intel introduced the 32-bit It also had no built-in floating-point unit. The 80486, introduced in 1989, was an that had built-in floating-point processing and cache memory. The and offered downward compatibility with the 8086 and 8088. Software written for the smaller word systems was directed to use the lower 16 bits of the 32-bit registers. Chapter 4: ISA

88 4.14 Real World Architectures
Currently, Intel’s most advanced 32-bit microprocessor is the Pentium 4. It can run as fast as 3.8 GHz. This clock rate is nearly 800 times faster than the 4.77 MHz of the 8086. Speed enhancing features include multilevel cache and instruction pipelining. Intel, along with many others, is marrying many of the ideas of RISC architectures with microprocessors that are largely CISC. Chapter 4: ISA

89 4.14 Real World Architectures
The MIPS family of CPUs has been one of the most successful in its class. In 1986 the first MIPS CPU was announced. It had a 32-bit word size and could address 4GB of memory. Over the years, MIPS processors have been used in general purpose computers as well as in games. The MIPS architecture now offers 32- and 64-bit versions. Chapter 4: ISA

90 4.14 Real World Architectures
MIPS was one of the first RISC microprocessors. The original MIPS architecture had only 55 different instructions, as compared with the 8086 which had over 100. MIPS was designed with performance in mind: It is a load/store architecture, meaning that only the load and store instructions can access memory. The large number of registers in the MIPS architecture keeps bus traffic to a minimum. How does this design affect performance? Chapter 4: ISA

91 Chapter 4 Conclusion The major components of a computer system are its control unit, registers, memory, ALU, and data path. A built-in clock keeps everything synchronized. Control units can be microprogrammed or hardwired. Hardwired control units give better performance, while microprogrammed units are more adaptable to changes. Computers run programs through iterative fetch-decode-execute cycles. Computers can run programs that are in machine language. The Intel architecture is an example of a CISC architecture; MIPS is an example of a RISC architecture. Chapter 4: ISA


Download ppt "Chapter 4: Instruction Set Architectures"

Similar presentations


Ads by Google