Computer Architecture: A Constructive Approach One-Cycle Implementation Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 29, 2012L5-1
The MIPS ISA Processor State bit GPRs, R0 always contains a 0 PC, the program counter some other special registers Data types 8-bit byte, 16-bit half word 32-bit word for integers Load/Store style instruction set data addressing modes- immediate & indexed branch addressing modes- PC relative & register indirect All instructions are 32 bits Byte addressable memory- big endian mode February 29, 2012 L4-2http://csg.csail.mit.edu/6.s078 r0 - r31 PC Registers Floating point, memory management and other systems instructions are not includes in the SMIPS subset
Instruction formats Only three formats but the fields are used differently by different types of instructions opcode rsrt immediate I-type opcode rsrt rd shamt func R-type 6 26 opcode target J-type February 29, 2012 L4-3http://csg.csail.mit.edu/6.s078
Instruction formats cont Computational Instructions Load/Store Instructions opcode rsrt immediate rt (rs) op immediate rsrt rd 0 func rd (rs) func (rt) rs is the base register rt is the destination of a Load or the source for a Store addressing mode opcode rsrt displacement (rs) + displacement February 29, 2012 L4-4http://csg.csail.mit.edu/6.s078
Control Instructions Conditional (on GPR) PC-relative branch target address = (offset in words)4 + (PC+4) range: 128 KB range Unconditional register-indirect jumps Unconditional absolute jumps target address = {PC, target4} range : 256 MB range opcode rs offset BEQZ, BNEZ 6 26 opcode target J, JAL opcode rs JR, JALR jump-&-link stores PC+4 into the link register (R31) February 29, 2012 L4-5http://csg.csail.mit.edu/6.s078
Instruction Execution Execution of an instruction involves 1. Instruction fetch 2. Decode 3. Register fetch 4. ALU operation 5. Memory operation (optional) 6. Write back and the computation of the next instruction address February 29, 2012 L4-6http://csg.csail.mit.edu/6.s078
Implementing an ISA Instructions fetch requires an Instruction memory, PC Decode requires understanding the instruction format Register Fetch requires interaction with a register file with a specific number of read/write ports ALU must have the ability to carry out the specified ops Memory operations requires a data memory Write-back requires interaction with the register file Update the PC requires arithmetic ops to calculate pc and condition February 29, 2012 L4-7http://csg.csail.mit.edu/6.s078
A single-cycle implementation A single-cycle MIPS implementation requires: A register file with 2 read ports and a write port An instruction memory, separate from data memory so that we can fetch an instruction as well as perform a data operation (Load/store) on the memory fetch & execute pc iMemdMem rf CPU February 29, 2012 L4-8http://csg.csail.mit.edu/6.s078
Single-Cycle SMIPS PC Inst Memory Decode Register File Execute Data Memory +4 February 29, 2012 L4-9http://csg.csail.mit.edu/6.s078 Datapath is shown only for convenience; it will be derived automatically from the high- level textual description
Single-Cycle SMIPS code structure module mkProc(Proc); // State Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory imem <- mkMemory; Memory dmem <- mkMemory; … February 29, 2012 L4-10http://csg.csail.mit.edu/6.s078
Single-Cycle SMIPS code structure … rule fetchExecute; //fetch let instResp = imem(MemReq{…}); //decode let decInst = decode(instResp); //register read let rVal1 = rf[…]; let rVal2 = … //execute let execInst = exec(decInst, pc, rVal1, rVal2); //memory (optional) if (instType Ld || St) data = dmem(…) //writeback if(instType Alu || Ld) rf.wr(rDst,data); pc <= execInst.brTaken ? execInst.addr : pc + 4; endrule endmodule; February 29, 2012 L4-11http://csg.csail.mit.edu/6.s078
Decoding dInst is all the information we need to do the later phases of execution on the instruction Not all fields of dInst are defined in each case We may decide to pass more information from decode to execute for efficiency– in that case the definition of the type of the decoded instruction has to be adjusted accordingly February 29, 2012 L5-12http://csg.csail.mit.edu/6.s078
Instruction grouping Many instructions have the same implementation except for the ALU function they invoke We can group such instructions and reduce the amount of code we have to write Example: R-Type ALU, I-Type ALU, Br Type, Memory Type, … February 29, 2012 L5-13http://csg.csail.mit.edu/6.s078
immValid Decoding Instructions: extract fields needed for execution from each instruction instruction branchComp rDst rSrc1 rSrc2 imm ext aluFunc instType 31:26 5:0 31:26 20:16 15:11 25:21 20:16 15:0 25:0 Lot of pure combinational logic: will be derived automatically from the high-level description decode February 29, 2012 L5-14http://csg.csail.mit.edu/6.s078
decode Decoding Instructions: input-output types instruction brComp rDst rSrc1 rSrc2 imm ext aluFunc iType 31:26, 5:0 31:26 5:0 31:26 20:16 15:11 25:21 20:16 15:0 25:0 Type DecodedInst Bit#(32) IType AluFunc Rindex BrType Bit#(32) Mux control logic not shown immValid Bool February 29, 2012 L7-15http://csg.csail.mit.edu/6.s078
Typedefs typedef enum {Alu, Ld, St, J, Jr, Jal, Jalr, Br} IType deriving(Bits, Eq); typedef enum {Eq, Neq, Le, Lt, Ge, Gt} BrType deriving(Bits, Eq); typedef enum {Add, Sub, And, Or, Xor, Nor, Slt, Sltu, LShift, RShift, Sra} AluFunc deriving(Bits, Eq); typedef enum {RAlu, IAlu, LdSt, J, Jr, Br} InstDType deriving(Bits, Eq); February 29, 2012 L7-16http://csg.csail.mit.edu/6.s078
Decode Function function DecodedInst decode(Bit#(32) inst); DecodedInst dInst = ?; let opcode = inst[ 31 : 26 ]; let rs = inst[ 25 : 21 ]; let rt = inst[ 20 : 16 ]; let rd = inst[ 15 : 11 ]; let funct = inst[ 5 : 0 ]; let imm = inst[ 15 : 0 ]; let target = inst[ 25 : 0 ]; case (getInstDType(inst))... endcase return dInst; endfunction February 29, 2012 L7-17http://csg.csail.mit.edu/6.s078 initially undefined
Instruction Encoding Bit#(6) opADDIU = 6’b001001; Bit#(6) opSLTI = 6’b001010; Bit#(6) opLW = 6’b100011; Bit#(6) opSW = 6’b101011; Bit#(6) opJ = 6’b000010; Bit#(6) opBEQ = 6’b000100; … Bit#(6) opFUNC = 6’b000000; Bit#(6) fcADDU = 6’b100001; Bit#(6) fcAND = 6’b100100; Bit#(6) fcJR = 6’b001000; … Bit#(6) opRT = 6’b000001; Bit#(6) rtBLTZ = 5’b00000; Bit#(6) rtBGEZ = 5’b00100; February 29, 2012 L7-18http://csg.csail.mit.edu/6.s078
Instruction Grouping function InstDType getInstDType(Bit#(32) inst) return case (inst[31:26]) opADDIU, opSLTI, …: IAlu; opLW, opSW: LdSt; opJ, opJAL: J; opBEQ, opRT, …: Br; opFUNC: case (inst[5:0]) fcADDU, fcAND, …: RAlu; fcJR, fcJALR: Jr; endcase; endfunction February 29, 2012 L7-19http://csg.csail.mit.edu/6.s078
Decoding Instructions: R-Type ALU case (getInstDType(opcode)) … RAlu: begin dInst.iType = Alu; dInst.aluFunc = case (funct) fcADDU: Add; fcSUBU: Sub;... endcase; dInst.rDst = rd; dInst.rSrc1 = rs; dInst.rSrc2 = rt; dInst.immValid = False; end February 29, 2012 L7-20http://csg.csail.mit.edu/6.s078
Decoding Instructions: I-Type ALU case (getInstDType(opcode)) … IAlu: begin dInst.iType = Alu; dInst.aluFunc = case (opcode) opADDIU: Add;... endcase; dInst.rDst = rt; dInst.rSrc1 = rs; dInst.imm = signedIAlu(opcode) ? signExtend(imm): zeroExtend(imm); dInst.immValid = True; end February 29, 2012 L7-21http://csg.csail.mit.edu/6.s078
Decoding Instructions: Load & Store case (getInstDType(opcode)) … LdSt: begin dInst.iType = opcode==opLW ? Ld : St; dInst.aluFunc = Add; dInst.rDst = rt; dInst.rSrc1 = rs; dInst.rSrc2 = rt; dInst.imm = signExtned(imm); dInst.immValid = True; end February 29, 2012 L7-22http://csg.csail.mit.edu/6.s078
Decoding Instructions: Jump case (getInstDType(opcode)) … J: begin dInst.iType = opcode==opJ ? J : Jal; dInst.rDst = 31; dInst.imm = zeroExtend({target, 2’b00}); dInst.immValid = False; end Jr: begin dInst.iType = funct==fcJR ? Jr : Jalr; dInst.rDst = rd; dInst.rSrc1 = rs; dInst.immValid = False; end February 29, 2012 L7-23http://csg.csail.mit.edu/6.s078
Decoding Instructions: Branch case (getInstDType(opcode)) … Br: begin dInst.iType = Br; dInst.brComp = case (opcode) opBEQ : EQ; opBLEZ: LE;... endcase; dInst.rSrc1 = rs; dInst.rSrc2 = rt; dInst.imm = signExtend({imm, 2’b00}); dInst.immValid = False; end February 29, 2012 L7-24http://csg.csail.mit.edu/6.s078
Reading Registers Read registers RVal2 RVal1 RF RSrc2 Pure combinational logic February 29, 2012 L7-25http://csg.csail.mit.edu/6.s078 RSrc1
Executing Instructions execute dInst addr brTaken data iType rDst ALU Branch Address pc rVal2 rVal1 Pure combinational logic either for memory reference or branch target either for rf write or St February 29, 2012 L7-26http://csg.csail.mit.edu/6.s078 ALUBr
Some Useful Functions function Bool memType (IType i) return (i==Ld || i == St); endfunction function Bool regWriteType (IType i) return (i==Alu || i==Ld || i==Jal || i==Jalr); endfunction function Bool controlType (IType i) return (i==J || i==Jr || i==Jal || i==Jalr || i==Br); endfunction February 29, 2012 L7-27http://csg.csail.mit.edu/6.s078
Execute Function function ExecInst exec(DecodedInst dInst, Data rVal1, Data rVal2, Addr pc); ExecInst einst = ?; Data aluVal2 = (dInst.immValid)? dInst.imm : rVal2 let aluRes = alu(rVal1, aluVal2, dInst.aluFunc); let brAddr = brAddrCal(pc, rVal1, dInst.iType, dInst.imm); einst.itype = dInst.iType; einst.addr = (memType(dInst.iType)? aluRes : brAddr; einst.data = dInst.iType==St ? rVal2 : aluRes; einst.brTaken = aluBr(rVal1, aluVal2, dInst.brComp); einst.rDst = dInst.rDst; return einst; endfunction February 29, 2012 L7-28http://csg.csail.mit.edu/6.s078
ALU function Data alu(Data a, Data b, Func func); Data res = case(func) Add : add(a, b); Sub : subtract(a, b); And : (a & b); Or : (a | b); Xor : (a ^ b); Nor : ~(a | b); Slt : setLessThan(a, b); Sltu : setLessThanUnsigned(a, b); LShift: logicalShiftLeft(a, b[4:0]); RShift: logicalShiftRight(a, b[4:0]); Sra : signedShiftRight(a, b[4:0]); endcase; return res; endfunction February 29, 2012 L7-29http://csg.csail.mit.edu/6.s078
Branch Resolution function Bool aluBr(Data a, Data b, BrType brComp); Bool brTaken = case(brComp) Eq : (a == b); Neq : (a != b); Le : signedLE(a, 0); Lt : signedLT(a, 0); Ge : signedGE(a, 0); Gt : signedGT(a, 0); endcase; return brTaken; endfunction February 29, 2012 L7-30http://csg.csail.mit.edu/6.s078
Branch Address Calculation function Addr brAddrCalc(Address pc, Data val, IType iType, Data imm); let targetAddr = case (iType) J, Jal : {pc[31:28], imm[27:0]}; Jr, Jalr : val; default : pc + imm; endcase; return targetAddr; endfunction February 29, 2012 L7-31http://csg.csail.mit.edu/6.s078
Single-Cycle SMIPS module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport ; let dMem = mem.dport; rule doProc; let inst = iMem(MemReq{op: Ld, addr: pc, data: ?}); let dInst = decode(inst); Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); February 29, 2012 L7-32http://csg.csail.mit.edu/6.s078
Single-Cycle SMIPS cont let eInst = exec(dInst, rVal1, rVal2, pc); if(memType(eInst.iType)) eInst.data <- dMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data}); if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data); pc <= eInst.brTaken ? eInst.addr : pc + 4; endrule endmodule February 29, 2012 L7-33http://csg.csail.mit.edu/6.s078 state updates
Single-Cycle SMIPS PC Inst Memory Decode Register File Execute Data Memory +4 The whole system was described using one rule; lots of big combinational functions performance? February 29, 2012 L5-34http://csg.csail.mit.edu/6.s078
Harvard-Style Datapath for MIPS February 29, x4 RegWrite Add clk WBSrcMemWrite addr wdata rdata Data Memory we RegDst BSrc ExtSelOpCode z OpSel clk zero? clk addr inst Inst. Memory PC rd1 GPRs rs1 rs2 ws wd rd2 we Imm Ext ALU Control 31 PCSrc br rind jabs pc+4 old way
Hardwired Control Table OpcodeExtSelBSrcOpSelMemWRegWWBSrcRegDstPCSrc ALU ALUi ALUiu LW SW BEQZ z=0 BEQZ z=1 J JAL JR JALR BSrc = Reg / ImmWBSrc = ALU / Mem / PC RegDst = rt / rd / R31PCSrc = pc+4 / br / rind / jabs ***no yesrindPC R31 rind***no ** jabs ** * no yesPCR31 jabs * * * no ** pc+4sExt 16 *0?no ** brsExt 16 *0?no ** pc+4sExt 16 Imm+yesno** pc+4ImmOpnoyesALUrt pc+4*RegFuncnoyesALUrd sExt 16 ImmOppc+4noyesALUrt pc+4sExt 16 Imm+noyesMemrt uExt 16 February 29, 2012 old way