Computer ArchitectureFall 2007 © October 3rd, 2007 Majd F. Sakr CS-447– Computer Architecture M,W 10-11:20am Lecture 11 Single Cycle Datapath
Computer ArchitectureFall 2007 © Lecture Objectives ° Learn what a datapath is, and how does it provide the required functions. ° Appreciate why different implementation strategies affects the clock rate and CPI of a machine. ° Understand how the ISA determines many aspects of the hardware implementation.
Computer ArchitectureFall 2007 © Implementation vs. Performance Performance of a processor is determined by Instruction count of a program CPI Clock cycle time (clock rate) The compiler & the ISA determine the instruction count. The implementation of the processor determines the CPI and the clock cycle time.
Computer ArchitectureFall 2007 © Possible Execution Steps of Any Instructions ° Instruction Fetch ° Instruction Decode and Register Fetch ° Execution of the Memory Reference Instruction ° Execution of Arithmetic-Logical operations ° Branch Instruction ° Jump Instruction
Computer ArchitectureFall 2007 © Instruction Processing °Five steps: Instruction fetch (IF) Instruction decode and operand fetch (ID) ALU/execute (EX) Memory (not required) (MEM) Write-back (WB) IF ID EX MEM WB
Computer ArchitectureFall 2007 © Datapath & Control Control
Computer ArchitectureFall 2007 © Datapath Elements The data path contains 2 types of logic elements: Combinational: (e.g. ALU) Elements that operate on data values. Their outputs depend on their inputs. State: (e.g. Registers & Memory) Elements with internal storage. Their state is defined by the values they contain.
Computer ArchitectureFall 2007 © State Elements
Computer ArchitectureFall 2007 © Pentium Processor Die °State Registers Memory °Control ROM °Combinational logic (Compute) REG
Computer ArchitectureFall 2007 © Abstract View of the Datapath
Computer ArchitectureFall 2007 © Single Cycle Implementation °This simple processor can compute ALU instructions, access memory or compute the next instruction's address in a single cycle.
Computer ArchitectureFall 2007 © Program Counter If each instruction needs 4 memory locations then, Next PC <= PC + 4
Computer ArchitectureFall 2007 © PC Datapath – Branch Offset PC <= PC + Branch Offset
Computer ArchitectureFall 2007 © Abstract View After PC Basic Implementation
Computer ArchitectureFall 2007 © The Register File °Arithmetic & Logical instructions (R-type), read the contents of 2 registers, perform an ALU operation, and write the result back to a register. °Registers are stored in the register file. The register file has inputs to specify the registers, outputs for the data read, input for the data written and 1 control signal to decide if data should be written in. In addition we will need an ALU to perform the operations.
Computer ArchitectureFall 2007 © The Register File
Computer ArchitectureFall 2007 © R-Type Instructions Assembly (e.g., register-register signed addition) ADD rd reg rs reg rt reg Machine encoding Semantics if MEM[PC] == ADD rd rs rt GPR[rd] ← GPR[rs] + GPR[rt] PC ← PC + 4
Computer ArchitectureFall 2007 © ADD rd rs rt
Computer ArchitectureFall 2007 © Datapath for Add
Computer ArchitectureFall 2007 © I-Type ALU Instructions °Assembly (e.g., register-immediate signed additions) ADDI rt reg rs reg immediate 16 °Machine encoding °Semantics if MEM[PC] == ADDI rt rs immediate GPR[rt] ← GPR[rs] + sign-extend (immediate) PC ← PC + 4
Computer ArchitectureFall 2007 © ADDI rt reg rs reg immediate16
Computer ArchitectureFall 2007 © Datapath for R and I-Type ALU Instructions
Computer ArchitectureFall 2007 © Data Memory °The element needed to implement load and store instructions are data memory. In addition we use the existing ALU to compute the address to access. °The data memory has 2 x-bit inputs: the address and the write data, and 1 x-output: the read data. In addition it has 2 control lines: MemWrite and MemRead.
Computer ArchitectureFall 2007 © Data Memory
Computer ArchitectureFall 2007 © Load Instruction °Assembly (e.g., load 4-byte word) LW rt reg offset 16 (base reg ) °Machine encoding °Semantics if MEM[PC]==LW rt offset16 (base) EA = sign-extend(offset) + GPR[base] GPR[rt] ← MEM[ translate(EA) ] PC ← PC + 4
Computer ArchitectureFall 2007 © LW Datapath
Computer ArchitectureFall 2007 © Branch Equal °The beq (branch if equal) instruction has 3 operands two registers that are compared for equality and a n-bit offset used to compute the branch address relative to the PC.
Computer ArchitectureFall 2007 © Branch Equal
Computer ArchitectureFall 2007 © Unconditional Jump °Assembly J immediate 26 °Machine encoding °Semantics if MEM[PC]==J immediate26 target = { PC[31:28], immediate26, 2’b00 } PC ← target
Computer ArchitectureFall 2007 © Unconditional Jump Datapath
Computer ArchitectureFall 2007 © Combining ALU and Memory Instructions °The ALU datapath and the Memory datapath are similar. The differences are: The second input to the ALU is a register (R- type) or the offset (I-type). The value stored into the destination register comes from the ALU (R-type) or from memory (I-type). °Using 2 multiplexers (Mux) we can combine both datapaths.
Computer ArchitectureFall 2007 © Combining ALU and Memory Instructions
Computer ArchitectureFall 2007 © The Complete Datapath
Computer ArchitectureFall 2007 © Complete Datapath
Computer ArchitectureFall 2007 © What’s Wrong with Single Cycle? °All instructions run at the speed of the slowest instruction. °Adding a long instruction can hurt performance What if you wanted to include multiply? °You cannot reuse any parts of the processor We have 3 different adders to calculate PC+1, PC+1+offset and the ALU °No profit in making the common case fast Since every instruction runs at the slowest instruction speed -This is particularly important for loads as we will see later
Computer ArchitectureFall 2007 © What’s Wrong with Single Cycle? 1 ns – Register read/write time 2 ns – ALU/adder 2 ns – memory access 0 ns – MUX, PC access, sign extend, ROM add: 2ns + 1ns + 2ns + 1ns = 6 ns beq: 2ns + 1ns + 2ns = 5 ns sw: 2ns + 1ns + 2ns + 2ns = 7 ns lw: 2ns + 1ns + 2ns + 2ns + 1ns = 8 ns Get read ALU mem write Instr reg operation reg
Computer ArchitectureFall 2007 © Computing Execution Time Assume: 100 instructions executed 25% of instructions are loads, 10% of instructions are stores, 45% of instructions are adds, and 20% of instructions are branches. Single-cycle execution: 100 * 8ns = 800 ns Optimal execution: 25*8ns + 10*7ns + 45*6ns + 20*5ns = 640 ns