Multi-cycle SMIPS Implementations

Multi-cycle SMIPS Implementations
Computer Architecture: A Constructive Approach Multi-cycle SMIPS Implementations Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 5, 2012

Harvard-Style Datapath for MIPS
old way PCSrc br RegWrite clk WBSrc MemWrite addr wdata rdata Data Memory we rind jabs pc+4 0x4 Add Add RegDst BSrc ExtSel OpCode z OpSel clk zero? addr inst Inst. Memory PC rd1 GPRs rs1 rs2 ws wd rd2 we Imm Ext ALU Control 31 March 5, 2012

Hardwired Control Table
old way Opcode ExtSel BSrc OpSel MemW RegW WBSrc RegDst PCSrc ALU ALUi ALUiu LW SW BEQZz=0 BEQZz=1 J JAL JR JALR * Reg Func no yes ALU rd pc+4 sExt16 Imm Op pc+4 no yes ALU rt uExt16 pc+4 Imm Op no yes ALU rt sExt16 Imm + no yes Mem rt pc+4 pc+4 sExt16 Imm + yes no * sExt16 * 0? no * br sExt16 * 0? no pc+4 * no * jabs * no yes PC R31 jabs * no * rind * no yes rind PC R31 BSrc = Reg / Imm WBSrc = ALU / Mem / PC RegDst = rt / rd / R31 PCSrc = pc+4 / br / rind / jabs March 5, 2012

separate Instruction & Data memories
Single-Cycle SMIPS new way 2 read & 1 write ports PC Inst Memory Decode Register File Execute Data +4 separate Instruction & Data memories Datapath and control were derived automatically from a high-level rule-based description March 5, 2012 4

Single-Cycle SMIPS code structure
module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport ; let dMem = mem.dport; rule doProc; let inst = iMem(MemReq{op:Ld, addr:pc, data:?}); let dInst = decode(inst); Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, pc); update rf, pc and dMem March 5, 2012 5

Decoding Instructions: input-output types
decode 31:26, 5:0 iType IType 31:26 aluFunc 5:0 AluFunc instruction Bit#(32) 31:26 brComp BrType Type DecodedInst 20:16 Green is type… rDst 15:11 Rindex Mux control logic not shown 25:21 rSrc1 Rindex 20:16 rSrc2 Rindex 15:0 imm Bit#(32) ext 25:0 immValid Bool March 5, 2012 6

Reading Registers Read registers RVal1 RSrc1 RF RSrc2 RVal2
Pure combinational logic March 5, 2012 7

Executing Instructions
execute iType dInst rDst either for rf write or St rVal2 data ALU rVal1 either for memory reference or branch target ALU should be split… ALUBr addr Pure combinational logic Branch Address brTaken pc March 5, 2012 8

Branch Address Calculation
function Addr brAddrCalc(Address pc, Data val, IType iType, Data imm); let targetAddr = case (iType) J, Jal : {pc[31:28], imm[27:0]}; Jr, Jalr : val; default : pc + imm; endcase; return targetAddr; endfunction March 5, 2012 9

Some Useful Functions function Bool memType (IType i)
return (i==Ld || i == St); endfunction function Bool regWriteType (IType i) return (i==Alu || i==Ld || i==Jal || i==Jalr); function Bool controlType (IType i) return (i==J || i==Jr || i==Jal || i==Jalr || i==Br); March 5, 2012

Execute Function function ExecInst exec(DecodedInst dInst, Data rVal1,
Data rVal2, Addr pc); ExecInst einst = ?; Data aluVal2 = (dInst.immValid)? dInst.imm : rVal2 let aluRes = alu(rVal1, aluVal2, dInst.aluFunc); let brAddr = brAddrCal(pc, rVal1, dInst.iType, dInst.imm); einst.itype = dInst.iType; einst.addr = (memType(dInst.iType)? aluRes : brAddr; einst.data = dInst.iType==St ? rVal2 : aluRes; einst.brTaken = aluBr(rVal1, aluVal2, dInst.brComp); einst.rDst = dInst.rDst; return einst; endfunction March 5, 2012

Single-Cycle SMIPS atomic state updates
if(memType(eInst.iType)) eInst.data <- dMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data}); if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data); pc <= eInst.brTaken ? eInst.addr : pc + 4; endrule endmodule March 5, 2012 12

Single-Cycle SMIPS: Clock Speed
PC Inst Memory Decode Register File Execute Data +4 tClock > tM + tDEC + tRF + tALU+ tM+ tWB We can improve the clock speed if we execute each instruction in two clock cycles tClock > max {tM , (tDEC + tRF + tALU+ tM+ tWB )} March 5, 2012 13

Two-Cycle SMIPS Register File PC Decode Execute Inst Memory Data
stage PC ir Decode Execute +4 Inst Memory Data Memory Introduce register “ir” to hold a fetched instruction and register “stage” to remember which stage (fetch/execute) we are in March 5, 2012 14

ir: The instruction register
You may recall from our earlier discussion of pipelining that when we take multiple cycles to perform some operation (e.g., IFFT), there is a possibility that intermediate registers do not contain any meaningful data in some cycles It is straight forward to convert ir into a pipeline register We can associate (Valid/Invalid) bit with ir Equivalently, we can think of ir as a single-element FIFO March 5, 2012

Additional Types typedef struct { Addr pc; Bit#(32) inst;
} TypeFetch2Decode deriving (Bits, Eq); typedef enum {Fetch, Execute} TypeStage deriving (Bits, Eq); March 5, 2012 L8-16

Two-Cycle SMIPS module mkProc(Proc); Reg#(Addr) pc <- mkRegU;
RFile rf <- mkRFile; Memory mem <- mkTwoPortedMemory; let iMem = mem.iport; let dMem = mem.dport; Reg#(TypeFetch2Decode) ir <- mkRegU; Reg#(TypeStage) stage <- mkReg(Fetch); rule doFetch (state==Fetch); let inst = iMem(MemReq{op:Ld, addr:pc, data:?}); ir <= TypeFetch2Decode{pc: pc, inst: inst}; stage <= Execute; endrule March 5, 2012

Two-Cycle SMIPS rule doExecute(stage==Execute); let irpc = ir.pc;
let inst = ir.inst; let dInst = decode(inst); Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, irpc); if(memType(eInst.iType)) eInst.data <- dMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data}); if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data); pc <= eInst.brTaken ? eInst.addr : pc + 4; stage <= Fetch; endrule endmodule no change from single-cycle March 5, 2012 18

Princeton versus Harvard Architecture
Harvard architecture uses different memories for instructions and data needed for a single-cycle implementation Princeton architecture uses the same memory for instruction and data and thus, requires at least two cycles to execute Load/Store instructions The two-cycle implementations of Princeton and Harvard architectures are almost the same March 5, 2012

SMIPS Princeton Architecture
Register File stage PC ir Decode Execute +4 Since both the Fetch and Execute stages want to use the memory, there is a structural hazard in accessing memory Memory March 5, 2012 20

Two-Cycle SMIPS Princeton
module mkProc(Proc); Reg#(Addr) pc <- mkRegU; RFile rf <- mkRFile; Memory mem <- mkOnePortedMemory; let uMem = mem.port; Reg#(TypeFetch2Decode) ir <- mkRegU; Reg#(TypeStage) stage <- mkReg(Fetch); rule doFetch (stage==Fetch); let inst <- uMem(MemReq{op:Ld, addr:pc, data:?}); ir <= TypeFetch2Decode{pc: pc, inst: inst}; stage <= Execute; endrule March 5, 2012

Two-Cycle SMIPS Princeton
rule doExecute(stage==Execute); let irpc = ir.pc; let inst = ir.inst; let dInst = decode(inst); Data rVal1 = rf.rd1(dInst.rSrc1); Data rVal2 = rf.rd2(dInst.rSrc2); let eInst = exec(dInst, rVal1, rVal2, irpc); if(memType(eInst.iType)) eInst.data <- uMem(MemReq{ op: eInst.iType==Ld ? Ld : St, addr: eInst.addr, data: eInst.data}); if(regWriteType(eInst.iType)) rf.wr(eInst.rDst, eInst.data); pc <= eInst.brTaken ? eInst.addr : pc + 4; stage <= Fetch; endrule endmodule March 5, 2012 22

Two-Cycle SMIPS: Analysis
Register File Fetch Execute stage PC ir Decode Execute +4 Inst Memory Data Memory In any given clock cycle, lots of unused hardware! next lecture: Pipelining to increase the throughput March 5, 2012 23

Multi-cycle SMIPS Implementations

Similar presentations

Presentation on theme: "Multi-cycle SMIPS Implementations"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multi-cycle SMIPS Implementations

Similar presentations

Presentation on theme: "Multi-cycle SMIPS Implementations"— Presentation transcript:

Similar presentations

About project

Feedback