# EE204 L12-Single Cycle DP PerformanceHina Anwar Khan 20111 EE204 Computer Architecture Single Cycle Data path Performance.

## Presentation on theme: "EE204 L12-Single Cycle DP PerformanceHina Anwar Khan 20111 EE204 Computer Architecture Single Cycle Data path Performance."— Presentation transcript:

EE204 L12-Single Cycle DP PerformanceHina Anwar Khan 20111 EE204 Computer Architecture Single Cycle Data path Performance

Hina Anwar Khan Spring 2011 2EE204 L12-Single Cycle DP Performance Performance of Single-Cycle Machines Let's assume that the operation time for the following units is: Memory - 2 nanoseconds (ns), ALU and adders - 2 ns, Register file - 1 ns. We will assume that MUXs, control, sign-extension, PC accesses, and wires have no delays. Which implementation is faster? 1. Every instruction operates in 1 clock cycle of fixed length. 2. Every instruction operates in a varying length clock cycle. Lets look at the time needed by each instruction: Inst. Fetch Reg. Rd ALU op Memory Reg. Wr Total R-Type 2 1 2 0 1 6ns Load 2 1 2 2 1 8ns Store 2 1 2 2 7ns Branch 2 1 2 5ns Jump 2 2ns

Hina Anwar Khan Spring 2011 3EE204 L12-Single Cycle DP Performance Fixed vs. Variable Cycle Length Lets Assume a program has the following instruction mix: 24% loads, 12% stores, 44% R-type, 18% branches, 2% jumps. For the fixed cycle length the cycle time is 8 ns, long enough for the longest instruction (load). Thus each instruction takes 8 ns to execute. For the variable cycle time the average CPU clock cycle is: 8*24% + 7*12% + 6*44% + 5*18% + 2*2% = 6.3 ns It is obvious that the variable clock implementation is faster but it is extremely hard to implement. Variable clock implementation is 8/6.3 = 1.27 times faster When adding instructions such as multiply and divide which can take tens of cycles this scheme is too slow.

Hina Anwar Khan Spring 2011 4EE204 L12-Single Cycle DP Performance Observations on the Single Cycle Design The single-cycle datapath is straightforward, but...  It has to use 3 separate ALU’s  It has separate Instruction and Data memories  Cycle time is determined by worst-case path A multi-cycle datapath might be better  We can reuse some of the hardware  We can combine the memories  Cycle time is still constant, but instructions may take differing numbers of cycles

Hina Anwar Khan Spring 2011 5EE204 L12-Single Cycle DP Performance Multi-Cycle Implementation  Each step in execution = 1 clock  Each Instruction of different clock cycles  Functional unit can be used more than once per instruction as long as it is used on different clock cycles  Reduce and Share Hardware units

Hina Anwar Khan Spring 2011 6EE204 L12-Single Cycle DP Performance Multicycle Datapath Single Instruction & Data Memory Single ALU Registers

Hina Anwar Khan Spring 2011 7EE204 L12-Single Cycle DP Performance Multicycle Execution Instruction Register(IR)  Holds instruction until end of execution Memory Data Register(MDR) A Register B Register ALUOut Register

Hina Anwar Khan Spring 2011 8EE204 L12-Single Cycle DP Performance Multicycle Datapath Inst/Data Memory InstructionAddress Data Address Register Block ALU Arithmetic/ branch Instruction lw/sw Instruction PC = PC +4 Branch target address

Hina Anwar Khan Spring 2011 9EE204 L12-Single Cycle DP Performance Multicycle Datapath

Hina Anwar Khan Spring 2011 10EE204 L12-Single Cycle DP Performance MultiCycle Datapath & Control Signals

Hina Anwar Khan Spring 2011 11EE204 L12-Single Cycle DP Performance One Single ALU One single ALU is used to perform all of the necessary functions:  An arithmetic operation on two register operands  Add a register to a sign-extended constant, for computing memory addresses in lw/sw instructions  Compute PC+4 to increment the PC  Add a sign-extended, shifted offset to (PC+4) for branches

Hina Anwar Khan Spring 2011 12EE204 L12-Single Cycle DP Performance Implications of Shared Functional Units Need to add multiplexors or expand existing multiplexors  e.g. Memory unit now contains both instructions (address in PC) and data (address in ALUOut)  e.g. ALU now must accommodate all inputs from previous ALU and adders.

Hina Anwar Khan Spring 2011 13EE204 L12-Single Cycle DP Performance Two extra multiplexers To enable all the actions listed for the ALU, two extra multiplexers are needed  A 2-to-1 mux, ALUsrcA, selects whether the first ALU input is the PC or a register  A 4-to-1 mux, ALUSrcB, selects the 2nd input from among the register file a constant 4 a sign-extended constant, and a sign-extended and shifted constant

Hina Anwar Khan Spring 2011 14EE204 L12-Single Cycle DP Performance One single memory One single memory is used in both the instruction fetch and data access stages. The address for this memory may come from:  the PC register, when fetching an instruction  the ALU output, when doing a lw/sw instruction and need the effective memory address. => add a 2-to-1 mux, IorD, to select whether the memory is being accessed for instructions or for data.

Download ppt "EE204 L12-Single Cycle DP PerformanceHina Anwar Khan 20111 EE204 Computer Architecture Single Cycle Data path Performance."

Similar presentations