EECS 322: Computer Architecture

Name: EECS 322: Computer Architecture
Uploaded: 2017-11-30T23:28:02+00:00
Duration: PTM21S57
Channel: Allen Lloyd
Description: EECS 322: Computer Architecture

EECS 322: Computer Architecture
Single-cycle Multi-cycle FSM controller Multi-cycle microcontroller CWRU EECS 322 March 6, 2000

MIPS instruction formats
B y t e H a l f w o r d W R g i s M m 1 . I n 2 3 4 P C - o p r s r t I m m e d i a t e Arithmetic add $rd,$rs,$rt o p r s r t r d . . . f u n c t Data Transfer lw $rd,offset($rs) sw $rd,offset($rs) o p r s r t A d d r e s s + l a t i v e a d d r e s s i n g Conditional branch beq $rd,$rs,raddr o p r s r t A d d r e s s P C + Unconditional jump j addr 5 . P s e u d o d i r e c t a d d r e s s i n g o p A d d r e s s P C CWRU EECS 322 March 6, 2000

Single Cycle = 2 adders + 1 ALU
Single Cycle Implementation Calculate instruction cycle time assuming negligible delays except: memory (2ns), ALU and adders (2ns), register file access (1ns) Adder2: PCPC+signext(IR[15-0]) <<2 Adder1: PC  PC + 4 Adder3: Arithmetic ALU Single Cycle = 2 adders + 1 ALU CWRU EECS 322 March 6, 2000

Architectural improved performance without speeding up the clock!
Single/Multi-Clock Comparison add = 6ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+RegW(2ns) lw = 8ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+MemR(2ns)+RegW(2ns) sw = 7ns = Fetch(2ns)+RegR(1ns)+ALU(2ns)+MemW(2ns) beq = 5ns = Fetch(2ns)+RegR(1ns)+ALU(2ns) j = 2ns = Fetch(2ns) Architectural improved performance without speeding up the clock! CWRU EECS 322 March 6, 2000

Some Design Trade-offs
High level design techniques Algorithms: change instruction usage minimize  ninstruction * tinstruction Architecture: Datapath, FSM, Microprogramming adders: ripple versus carry lookahead multiplier types, … Lower level design techniques (closer to physical design) clocking: single verus multi clock technology: layout tools: better place and route process technology: 0.5 micron to .18 micron CWRU EECS 322 March 6, 2000

Single-cycle problems
what if we had a more complicated instruction like floating point? (fadd = 30ns, fmul=100ns) wasteful of area (2 adders + 1 ALU) One Solution: use a “smaller” cycle time (if the technology can do it) have different instructions take different numbers of cycles a “multicycle” datapath (1 ALU) Multi-cycle approach We will be reusing functional units: ALU used to increment PC (Adder1) and to compute address (Adder2) Memory used for instruction and data CWRU EECS 322 March 6, 2000

Reality Check: Intel 8086 clock cycles
Arithmetic add reg16, reg mul dx:ax, reg16 very slow!! imul dx:ax, reg div dx:ax, reg idiv dx:ax, reg16 Data Transfer mov reg16, mem mov mem16, reg16 Conditional Branch /16 je displacement8 Unconditional Jump jmp segment:offset16 CWRU EECS 322 March 6, 2000

Multi-cycle = 1 ALU + Controller
Multi-cycle Datapath Multi-cycle = 1 ALU + Controller CWRU EECS 322 March 6, 2000

Multi-cycle Datapath: with controller
CWRU EECS 322 March 6, 2000

Multi-cycle: 5 execution steps
T1 (a,lw,sw,beq,j) Instruction Fetch T2 (a,lw,sw,beq,j) Instruction Decode and Register Fetch T3 (a,lw,sw,beq,j) Execution, Memory Address Calculation, or Branch Completion T4 (a,lw,sw) Memory Access or R-type instruction completion T5 (a,lw) Write-back step INSTRUCTIONS TAKE FROM CYCLES! CWRU EECS 322 March 6, 2000

Multi-cycle Approach All operations in each clock cycle Ti are done in parallel not sequential! For example, T1, IR = Memory[PC] and PC=PC+4 are done simultaneously! T1 T2 T3 T4 T5 Between Clock T2 and T3 the microcode sequencer will do a dispatch 1 CWRU EECS 322 March 6, 2000

Multi-cycle using Microprogramming
Finite State Machine ( hardwired control ) Microcode controller M i c r o c o d e C o m b i n a t i o n a l s t o r a g e c o n t r o l l o g i c D a t a p a t h c o n t r o l o u t p u t s D a t a p a t h O u t p u t s c o n t r o l firmware o u t p u t s O u t p u t s I n p u t 1 I n p u t s M i c r o p r o g r a m c o u n t e r S e q u e n c i n g c o n t r o l A d d e r N e x t s t a t e A d d r e s s s e l e c t l o g i c I n p u t s f r o m i n s t r u c t i o n S t a t e r e g i s t e r r e g i s t e r o p c o d e f i e l d I n p u t s f r o m i n s t r u c t i o n r e g i s t e r o p c o d e f i e l d Requires microcode memory to be faster than main memory CWRU EECS 322 March 6, 2000

Microcode: Trade-offs
Distinction between specification and implementation is sometimes blurred Specification Advantages: Easy to design and write (maintenance) Design architecture and microcode in parallel Implementation (off-chip ROM) Advantages Easy to change since values are in memory Can emulate other architectures Can make use of internal registers Implementation Disadvantages, SLOWER now that: Control is implemented on same chip as processor ROM is no longer faster than RAM No need to go back and make changes CWRU EECS 322 March 6, 2000

Microinstruction format

Microinstruction format: Maximally vs. Minimally Encoded
No encoding: 1 bit for each datapath operation faster, requires more memory (logic) used for Vax 780 — an astonishing 400K of memory! Lots of encoding: send the microinstructions through logic to get control signals uses less memory, slower Historical context of CISC: Too much logic to put on a single chip with everything else Use a ROM (or even RAM) to hold the microcode It’s easy to add new instructions CWRU EECS 322 March 6, 2000

Microprogramming: program

Microprogramming: program overview
T1 T2 T3 T4 T5 Fetch Fetch+1 Dispatch 1 Rformat1 BEQ1 JUMP1 Mem1 Dispatch 2 Rformat1+1 LW2 SW2 LW2+1 CWRU EECS 322 March 6, 2000

Microprogram steping: T1 Fetch
(Done in parallel) IRMEMORY[PC] & PC  PC + 4 Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Fetch add pc 4 ReadPC ALU Seq CWRU EECS 322 March 6, 2000

T2 Fetch + 1 AReg[IR[25-21]] & BReg[IR[20-16]] & ALUOutPC+signext(IR[15-0]) <<2 Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq add pc ExtSh Read D#1 CWRU EECS 322 March 6, 2000

T3 Dispatch 1: Mem1 ALUOut  A + sign_extend(IR[15-0])
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Mem1 add A ExtSh D#2 CWRU EECS 322 March 6, 2000

T4 Dispatch 2: LW2 MDR  Memory[ALUOut]
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq LW2 ReadALU Seq CWRU EECS 322 March 6, 2000

T5 LW2+1 Reg[ IR[20-16] ]  MDR Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq WMDR Fetch CWRU EECS 322 March 6, 2000

T4 Dispatch 2: SW2 Memory[ ALUOut ]  B
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq SW2 WriteALU Fetch CWRU EECS 322 March 6, 2000

T3 Dispatch 1: Rformat1 ALUOut  A op(IR[31-26]) B op(IR[31-26])
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Rf...1 op A B Seq CWRU EECS 322 March 6, 2000

T4 Dispatch 1: Rformat1+1 Reg[ IR[15-11] ]  ALUOut
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq WALU Fetch CWRU EECS 322 March 6, 2000

T3 Dispatch 1: BEQ1 If (A - B == 0) { PC  ALUOut; }
ALUOut = Address computed in T2 ! Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq BEQ1 subt A B ALUOut-0 Fetch CWRU EECS 322 March 6, 2000

T3 Dispatch 1: Jump1 PC  PC[31-28] || IR[25-0]<<2
Label ALU SRC1 SRC2 RCntl Memory PCwrite Seq Jump Jaddr Fetch CWRU EECS 322 March 6, 2000

The Big Picture CWRU EECS 322 March 6, 2000

EECS 322: Computer Architecture

Similar presentations

Presentation on theme: "EECS 322: Computer Architecture"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

EECS 322: Computer Architecture

Similar presentations

Presentation on theme: "EECS 322: Computer Architecture"— Presentation transcript:

Similar presentations

About project

Feedback