1 CS 161Computer Architecture Chapter 5 Lecture 12 Instructor: L.N. BhuyanGive qualifications of instructors:DAPteaching computer architecture at Berkeley since 1977Co-athor of textbook used in classBest known for being one of pioneers of RISC and RAIDMember of NAE
2 Datapath + Control Points MemReadIRWriteRegWritePCWriteIorDMemWriteRegDstALUSrcAPCSrcP CPCWrite-CondMuxMuxAddressRead Reg125:21MemRead data1zAALURead Reg2ALU-OutMux20:16ReadDataRead data2Write RegB15:0MuxWriteData15:114IR0 1M2 u3 xRegsWrite Data3M D RMuxALUControlSgn Ext-end<<222(funct) 5:0ALUSrcBMemtoRegALUOp
5 Implementing a Finite State Machine internal storage (current state register)two combinational circuits:next state function; output functionNextstateNext-state Functioncurrent state regClockInputsOutput FunctionOutputs
7 FSM controller: execution cycles 3-5 from state 2from state 6swlwto state 0357cycle 4RegDst = 1RegWriteMemtoReg = 0MemReadIorD = 1MemWriteIorD = 1memoryaccess(step 4)memoryaccess(step 4)R-formatcompletion(step 4)4cycle 5RegDst = 0RegWriteMemtoReg = 1write-back (step 5)
8 Add Jump Note: How many state bits will we need? don’t care if not mentionedasserted if name onlyotherwise exact valueHow many state bits will we need?
9 Simple QuestionsHow many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label #assume not add $t5, $t2, $t3 sw $t5, 8($t3) Label: What is going on during the 8th cycle of execution?In what cycle does the actual addition of $t2 and $t3 takes place?
10 Implementing the FSM controller WritePCWriteCondPLA or ROMimplementation of bothnext-state and outputfunctionsIorDMemReadMemWriteDatapath ControlPointsIRWriteMemtoRegPCSrcALUOpOutputsALUSrcBALUSrcARegWriteRegDstNS3}NS2Next-stateNS1InputsNS54321pppppp321OOOOOOSSSSInstruction register opcode fieldstate register
12 ROM Implementation ROM = "Read Only Memory" values of memory locations are fixed ahead of timeA ROM can be used to implement a truth tableif the address is m-bits, we can address 2m entries in the ROM.our outputs are the bits of data that the address points to. m is the "height", and n is the "width“ equal to number of outputs.mn
13 ROM ImplementationHow many inputs are there? 6 bits for opcode, 4 bits for state = 10 address lines (i.e., 210 = 1024 different addresses)How many outputs are there? 16 datapath-control outputs, 4 state bits = 20 outputsROM is 210 x 20 = 20K bits (and a rather unusual size, so go for next size chip)Rather wasteful due to lots of don’t care situations => the outputs only depend on states, not opcodes.
14 ROM vs PLABreak up the table into two parts — 4 state bits tell you the 16 outputs, x 16 bits of ROM — 10 bits tell you the 4 next state bits, 210 x 4 bits of ROM — Total: 4.3K bits of ROM => Lots of savings.PLA is much smaller — can share product terms — only need entries that produce an active output — can take into account don't caresSize is (#inputs ´ #product-terms) + (#outputs ´ #product- terms) For this example = (10x17)+(20x17) = 510 PLA cells PLA cells usually about the size of a ROM cell (slightly bigger)
15 Alternative to FSM for Multi-cycle? MIPS-lite has (about) 7 instructions, 10 FSM statesReal machines have 100 or more instructions; real controllers have hundreds, or even thousands of states!Problem: FSM Bubble-diagram too large
16 Observation about real machines Machine Language: next instruction to be executed is usually impliedPC register determines instructionnext instruction always at PC+4 (unless branch or jump)FSM Controller: often only one exit arc from current state to next stateSuppose borrow idea from Machine Language, represent each control step as some kind of “instruction”?Leads to Microprogrammed Controlnn+1n+2n+3
17 Micro-programmed Control In microprogrammed control, FSM states become microinstructions of a microprogram (“microcode”)one FSM state=one microinstructionusually represent each micro-instruction textually, like an assembly instructionFSM current state register becomes the microprogram counter (micro-PC)normal sequencing: add 1 to micro-PC to get next micro-instructionmicroprogram branch: separate logic determines next microinstruction
18 Microprogramming Vs Hardwired Control Microprogramming offers flexibility for design and architectural changes. The control memory (ROM) can be reprogrammed or replaced. Hardwired control is difficult to design for complex set architecture. Once it is designed, no further change is possibleMicroprogramming is slow because the control memory is accessed in every cycle. Memory access is slow. Hardwired control is fast because the cycle time depends on the combinational logic delay of the control unit, which is much less than memory access time.
19 MicroprogrammingWhat are the “microinstructions” ?
20 Microprogramming A specification methodology appropriate if hundreds of opcodes, modes, cycles, etc.signals specified symbolically using microinstructions Will two implementations of the same architecture have the same microcode? What would a microassembler do?
22 Maximally vs. Minimally Encoded No encoding:1 bit for each datapath operationfaster, requires more memory (logic)used for Vax 780 — an astonishing 400K of memory!Lots of encoding:send the microinstructions through logic to get control signalsuses less memory, slowerHistorical context of CISC:Too much logic to put on a single chip with everything elseUse a ROM (or even RAM) to hold the microcodeIt’s easy to add new instructions
23 Microcode: Trade-offs Distinction between specification and implementation is sometimes blurredSpecification Advantages:Easy to design and writeDesign architecture and microcode in parallelImplementation (off-chip ROM) AdvantagesEasy to change since values are in memoryCan emulate other architecturesCan make use of internal registersImplementation Disadvantages, SLOWER now that:Control is implemented on same chip as processorROM is no longer faster than RAMNo need to go back and make changes
24 Historical Perspective In the ‘60s and ‘70s microprogramming was very important for implementing machinesThis led to more sophisticated ISAs and the VAXIn the ‘80s RISC processors based on pipelining became popularPipelining the microinstructions is also possible!Implementations of IA-32 architecture processors since 486 use:“hardwired control” for simpler instructions (few cycles, FSM control implemented using PLA or random logic)“microcoded control” for more complex instructions (large numbers of cycles, central control store)The IA-64 architecture uses a RISC-style ISA and can be implemented without a large central control store
25 Pentium 4Somewhere in all that “control we must handle complex instructionsProcessor executes simple microinstructions, 70 bits wide (hardwired)120 control lines for integer datapath (400 for floating point)If an instruction requires more than 4 microinstructions to implement, control from microcode ROM (8000 microinstructions)Its complicated!
26 Chapter 5 SummaryIf we understand the instructions… We can build a simple processor!If instructions take different amounts of time, multi-cycle is betterDatapath implemented using:Combinational logic for arithmeticState holding elements to remember bitsControl implemented using:Combinational logic for single-cycle implementationFinite state machine for multi-cycle implementation
27 Techniques illustrated in chapter 5 are at the heart of every computer Pipelining (Chap 6)Techniques illustrated in chapter 5 are at the heart of every computerAll recent computers, however, go beyond techniques of chapter 5, and use pipelining to improve performanceBy overlapping execution of multiple instructions, pipelining can achieve:throughput close to 1 instruction per clock cycle (like single-cycle machine)with a clock cycle time determined by the delay of individual datapath components (like multi-cycle machine)