Presentation is loading. Please wait.

Presentation is loading. Please wait.

EECS 322: Computer Architecture

Similar presentations


Presentation on theme: "EECS 322: Computer Architecture"— Presentation transcript:

1 EECS 322: Computer Architecture
Single-cycle Multi-cycle FSM controller Multi-cycle microcontroller CWRU EECS 322 March 6, 2000

2 MIPS instruction formats
B y t e H a l f w o r d W R g i s M m 1 . I n 2 3 4 P C - o p r s r t I m m e d i a t e Arithmetic add $rd,$rs,$rt o p r s r t r d . . . f u n c t Data Transfer lw $rd,offset($rs) sw $rd,offset($rs) o p r s r t A d d r e s s + l a t i v e a d d r e s s i n g Conditional branch beq $rd,$rs,raddr o p r s r t A d d r e s s P C + Unconditional jump j addr 5 . P s e u d o d i r e c t a d d r e s s i n g o p A d d r e s s P C CWRU EECS 322 March 6, 2000

3 Single Cycle = 2 adders + 1 ALU
Single Cycle Implementation Calculate instruction cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns) Adder2: PCPC+signext(IR[15-0]) <<2 Adder1: PC  PC + 4 Adder3: Arithmetic ALU Single Cycle = 2 adders + 1 ALU CWRU EECS 322 March 6, 2000

4 Architectural improved performance without speeding up the clock!
Single/Multi-Clock Comparison add = 6ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+RegW(2ns) lw = 8ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+MemR(2ns)+RegW(2ns) sw = 7ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+MemW(2ns) beq = 5ns = Fetch(2ns)+RegR(1ns)+ALU(2ns) j = 2ns = Fetch(2ns) Architectural improved performance without speeding up the clock! CWRU EECS 322 March 6, 2000

5 Some Design Trade-offs
High level design techniques Algorithms: change instruction usage minimize  ninstruction * tinstruction Architecture: Datapath, FSM, Microprogramming adders: ripple versus carry lookahead multiplier types, … Lower level design techniques (closer to physical design) clocking: single verus multi clock technology: layout tools: better place and route process technology: 0.5 micron to .18 micron CWRU EECS 322 March 6, 2000

6 Single-cycle problems
what if we had a more complicated instruction like floating point? (fadd = 30ns, fmul=100ns) wasteful of area (2 adders + 1 ALU) One Solution: use a “smaller” cycle time (if the technology can do it) have different instructions take different numbers of cycles a “multicycle” datapath (1 ALU) Multi-cycle approach We will be reusing functional units: ALU used to increment PC (Adder1) and to compute address (Adder2) Memory used for instruction and data CWRU EECS 322 March 6, 2000

7 Reality Check: Intel 8086 clock cycles
Arithmetic add reg16, reg mul dx:ax, reg16 very slow!! imul dx:ax, reg div dx:ax, reg idiv dx:ax, reg16 Data Transfer mov reg16, mem mov mem16, reg16 Conditional Branch /16 je displacement8 Unconditional Jump jmp segment:offset16 CWRU EECS 322 March 6, 2000

8 Multi-cycle = 1 ALU + Controller
Multi-cycle Datapath Multi-cycle = 1 ALU + Controller CWRU EECS 322 March 6, 2000

9 Multi-cycle Datapath: with controller
CWRU EECS 322 March 6, 2000

10 Multi-cycle: 5 execution steps
T1 (a,lw,sw,beq,j) Instruction Fetch T2 (a,lw,sw,beq,j) Instruction Decode and Register Fetch T3 (a,lw,sw,beq,j) Execution, Memory Address Calculation, or Branch Completion T4 (a,lw,sw) Memory Access or R-type instruction completion T5 (a,lw) Write-back step INSTRUCTIONS TAKE FROM CYCLES! CWRU EECS 322 March 6, 2000

11 Multi-cycle Approach All operations in each clock cycle Ti are done in parallel not sequential! For example, T1, IR = Memory[PC] and PC=PC+4 are done simultaneously! T1 T2 T3 T4 T5 Between Clock T2 and T3 the microcode sequencer will do a dispatch 1 CWRU EECS 322 March 6, 2000

12 Multi-cycle using Microprogramming
Finite State Machine ( hardwired control ) Microcode controller M i c r o c o d e C o m b i n a t i o n a l s t o r a g e c o n t r o l l o g i c D a t a p a t h c o n t r o l o u t p u t s D a t a p a t h O u t p u t s c o n t r o l firmware o u t p u t s O u t p u t s I n p u t 1 I n p u t s M i c r o p r o g r a m c o u n t e r S e q u e n c i n g c o n t r o l A d d e r N e x t s t a t e A d d r e s s s e l e c t l o g i c I n p u t s f r o m i n s t r u c t i o n S t a t e r e g i s t e r r e g i s t e r o p c o d e f i e l d I n p u t s f r o m i n s t r u c t i o n r e g i s t e r o p c o d e f i e l d Requires microcode memory to be faster than main memory CWRU EECS 322 March 6, 2000

13 Microcode: Trade-offs
Distinction between specification and implementation is sometimes blurred Specification Advantages: Easy to design and write (maintenance) Design architecture and microcode in parallel Implementation (off-chip ROM) Advantages Easy to change since values are in memory Can emulate other architectures Can make use of internal registers Implementation Disadvantages, SLOWER now that: Control is implemented on same chip as processor ROM is no longer faster than RAM No need to go back and make changes CWRU EECS 322 March 6, 2000

14 Microinstruction format
CWRU EECS 322 March 6, 2000

15 Microinstruction format: Maximally vs. Minimally Encoded
No encoding: 1 bit for each datapath operation faster, requires more memory (logic) used for Vax 780 — an astonishing 400K of memory! Lots of encoding: send the microinstructions through logic to get control signals uses less memory, slower Historical context of CISC: Too much logic to put on a single chip with everything else Use a ROM (or even RAM) to hold the microcode It’s easy to add new instructions CWRU EECS 322 March 6, 2000

16 Microprogramming: program
CWRU EECS 322 March 6, 2000

17 Microprogramming: program overview
T1 T2 T3 T4 T5 Fetch Fetch+1 Dispatch 1 Rformat1 BEQ1 JUMP1 Mem1 Dispatch 2 Rformat1+1 LW2 SW2 LW2+1 CWRU EECS 322 March 6, 2000

18 Microprogram steping: T1 Fetch
(Done in parallel) IRMEMORY[PC] & PC  PC + 4 Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Fetch add pc 4 ReadPC ALU Seq CWRU EECS 322 March 6, 2000

19 T2 Fetch + 1 AReg[IR[25-21]] & BReg[IR[20-16]] & ALUOutPC+signext(IR[15-0]) <<2 Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq add pc ExtSh Read D#1 CWRU EECS 322 March 6, 2000

20 T3 Dispatch 1: Mem1 ALUOut  A + sign_extend(IR[15-0])
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Mem1 add A ExtSh D#2 CWRU EECS 322 March 6, 2000

21 T4 Dispatch 2: LW2 MDR  Memory[ALUOut]
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq LW2 ReadALU Seq CWRU EECS 322 March 6, 2000

22 T5 LW2+1 Reg[ IR[20-16] ]  MDR Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq WMDR Fetch CWRU EECS 322 March 6, 2000

23 T4 Dispatch 2: SW2 Memory[ ALUOut ]  B
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq SW2 WriteALU Fetch CWRU EECS 322 March 6, 2000

24 T3 Dispatch 1: Rformat1 ALUOut  A op(IR[31-26]) B op(IR[31-26])
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Rf...1 op A B Seq CWRU EECS 322 March 6, 2000

25 T4 Dispatch 1: Rformat1+1 Reg[ IR[15-11] ]  ALUOut
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq WALU Fetch CWRU EECS 322 March 6, 2000

26 T3 Dispatch 1: BEQ1 If (A - B == 0) { PC  ALUOut; }
ALUOut = Address computed in T2 ! Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq BEQ1 subt A B ALUOut-0 Fetch CWRU EECS 322 March 6, 2000

27 T3 Dispatch 1: Jump1 PC  PC[31-28] || IR[25-0]<<2
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Jump Jaddr Fetch CWRU EECS 322 March 6, 2000

28 The Big Picture CWRU EECS 322 March 6, 2000


Download ppt "EECS 322: Computer Architecture"

Similar presentations


Ads by Google