Presentation on theme: "CS1104: Computer Organisation School of Computing National University of Singapore."— Presentation transcript:
CS1104: Computer Organisation http://www.comp.nus.edu.sg/~cs1104 http://www.comp.nus.edu.sg/~cs1104 School of Computing National University of Singapore
CS1104-P2-6Processor: Datapath and Control2 PII Lecture 6: Processor: Datapath and Control Datapath: Single-bus Organization Multiple-bus Organization MIPS: Multicycle Datapath and Control Stages of Instructions Datapath Walkthroughs Processor and Logic Design
CS1104-P2-6Processor: Datapath and Control3 PII Lecture 6: Processor: Datapath and Control Reading: Chapter 9 of textbook, which is Chapter 7 in “Computer Organization” by Hamacher, Vranesic and Zaky. Optional reading: Chapter 5 in “Computer Organization & Design” by Patterson and Hennessy.
CS1104-P2-6Processor: Datapath and Control4 Datapath
CS1104-P2-6Processor: Datapath and Control5 Recap: Organisation Processor Control Datapath MemoryDevices Input Output Cache Registers Bus
CS1104-P2-6Processor: Datapath and Control6 Fundamental Concepts Processor (CPU): the active part of the computer, which does all the work (data manipulation and decision-making). Datapath: portion of the processor which contains hardware necessary to perform all operations required by the computer (the brawn). Control: portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain).
CS1104-P2-6Processor: Datapath and Control7 Fundamental Concepts (2) Instruction execution cycle: fetch, decode, execute. Fetch: fetch next instruction (using PC) from memory into IR. Decode: decode the instruction. Execute: execute instruction. Instruction Fetch Instruction Decode Operand Fetch Execute Result Store Next Instruction
CS1104-P2-6Processor: Datapath and Control8 Fundamental Concepts (3) Fetch: Fetch next instruction into IR (Instruction Register). Assume each word is 4 bytes and each instruction is stored in a word, and that the memory is byte addressable. PC (Program Counter) contains address of next instruction. IR [[PC]] PC [PC] + 4
CS1104-P2-6Processor: Datapath and Control9 Single-bus Organization Data line Address line PC MAR MDR Y Internal processor bus Memory bus Z MUX A ALU B Constant 4 Select Add Sub XOR : ALU control lines Carry-in IR RO R(n–1) :::: TEMP Instruction decoder and control logic... Control signals
CS1104-P2-6Processor: Datapath and Control10 Instruction Execution An instruction can be executed by performing one or more of the following operations in some specified sequence: Transfer a word of data from one register to another or to the ALU (Arithmetic Logic Unit). Perform an arithmetic or a logic operation and store the result in a register. Fetch the contents of a given memory location and load them into a register. Store a word of data from a register into a given memory location.
CS1104-P2-6Processor: Datapath and Control11 Register Transfer Register to register transfer: For each register Ri, two control signals: Ri in used to load the data on the bus into the register. Ri out to place the register’s contents on the bus. Example: To transfer contents of R1 to R4: Set R1 out to 1. This places contents of R1 on the bus. Set R4 in to 1. This loads data from the processor bus into R4.
CS1104-P2-6Processor: Datapath and Control12 Register Transfer (2) Y Internal processor bus Z MUX A ALU B Constant 4 Select Ri X Ri in X Ri out Y in X X Z in Z out X
CS1104-P2-6Processor: Datapath and Control13 Arithmetic/Logic Operation ALU: Performs arithmetic and logic operations on its A and B inputs. To perform R3 [R1] + [R2]: 1.R1 out, Y in 2.R2 out, SelectY, Add, Z in 3.Z out, R3 in Y Internal processor bus Z MUX A ALU B Constant 4 Select Ri X Ri in X Ri out Y in X X Z in Z out X
CS1104-P2-6Processor: Datapath and Control14 Arithmetic/Logic Operation (2) If there are n operations, do we need n ALU control lines? We could use encoding, which requires log 2 n control lines for n operations. However, this will increase complexity and hardware (additional decoder needed). A ALU B Add Sub XOR : ALU control lines Carry-in
CS1104-P2-6Processor: Datapath and Control15 Reading a Word from Memory Move (R1), R2/* R2 [[R1]] 1.MAR [R1] 2.Start a Read operation on the memory bus 3.Wait for the MFC response from the memory 4.Load MDR from the memory bus 5.R2 [MDR] MDR has four control signals: MDR in, MDR out, MDR inE and MDR outE. Memory-bus data lines MDR X MDR in E X MDR out E Internal processor bus X MDR in X MDR ou t
CS1104-P2-6Processor: Datapath and Control16 Reading a Word from Memory (2) Move (R1), R2/* R2 [[R1]] Sequence of control steps: 1.R1 out, MAR in, Read 2.MDR inE, WMFC 3.MDR out, R2 in WMFC: Wait for arrival of MFC (Memory-Function- Completed) signal. MFC: To accommodate variability in response time, the processor waits until it receives an indication that the Read/Write operation has been completed. The addressed device sets MFC to 1 to indicate this.
CS1104-P2-6Processor: Datapath and Control17 Storing a Word in Memory Move R2, (R1)/* [R1] [R2] Sequence of control steps: 1.R1 out, MAR in 2.R2 out, MDR in, Write 3.MDR outE, WMFC
CS1104-P2-6Processor: Datapath and Control18 Executing a Complete Instruction Add (R3), R1/* R1 [R1] + [[R3]] Adds the contents of a memory location pointed to by R3 to register R1. Sequence of control steps: 1.PC out, MAR in, Read, Select4, Add, Z in 2.Z out, PC in, Y in, WMFC 3.MDR out, IR in 4.R3 out, MAR in, Read 5.R1 out, Y in, WMFC 6.MDR out, SelectY, Add, Z in 7.Z out, R1 in, End Steps 1 – 3: Instruction fetch
CS1104-P2-6Processor: Datapath and Control19 Multiple-Bus Organization Single-bus structure: Control sequences are long as only one data item can be transferred over the bus in a clock cycle. Figure on next slide shows a three-bus structure. All registers are combined into a single block called register file with three ports: 2 outputs allowing 2 registers to be accessed simultaneously and have their contents put on buses A and B, and 1 input allowing data on bus C to be loaded into a third register. Buses A and B are used to transfer source operands to the A and B inputs of ALU, and result transferred to destination over bus C.
CS1104-P2-6Processor: Datapath and Control20 Multiple-Bus Organization (2) Bus C Constant 4 Bus A Bus B PC Register file MUX Incrementer A ALU B R Address line Memory bus data lines Bus C Bus A Bus B MAR MDR IR Instruction decoder
CS1104-P2-6Processor: Datapath and Control21 Multiple-Bus Organization (3) For the ALU, R=A (or R=B) means that its A (or B) input is passed unmodified to bus C. Add R4, R5, R6/* R6 [R4] + [R5] Adds the contents of R4 and R5 to R6. Sequence of control steps: 1.PC out, R=B, MAR in, Read, IncPC 2.WMFC 3.MDR outB, R=B, IR in 4.R4 outA, R5 outB, SelectA, Add, R6 in, End
CS1104-P2-6Processor: Datapath and Control22 Control Hardwired control or microprogrammed control. Hardwired control: Memory bus data lines Control signals Clock... CLK :::: : :... IR Decoder/ encoder External inputs Condition codes Control step counter
CS1104-P2-6Processor: Datapath and Control23 Control (2) Microprogrammed control: Control signals generated by a program. Control word (CW) is a microinstruction that contains individual bits that represent the various control signals. Vertical organization: highly encoded schemes that use compact codes to specify only a small number of control functions in each microinstruction. Horizontal organization: minimally encoded scheme in which many resources can be controlled with a single microinstructions. Popular in Complex Instruction Set Architectures (CISC) because complex instruction sets require complex controllers that can more easily be implemented as microprograms. Memory bus data lines
CS1104-P2-6Processor: Datapath and Control24 Control (3) Example of a horizontal organization scheme: Memory bus data lines 1.PC out, MAR in, Read, Select4, Add, Z in 2.Z out, PC in, Y in, WMFC 3.MDR out, IR in 4.R3 out, MAR in, Read 5.R1 out, Y in, WMFC 6.MDR out, SelectY, Add, Z in 7.Z out, R1 in, End 12345671234567 01000000100000 10000001000000 10010001001000 10010001001000 00100100010010 00100000010000 01001000100100 10000001000000 10000101000010 10000101000010 01000010100001 00001000000100 00000010000001 00010000001000 01001000100100 00000010000001 PC in PC out EndMAR in ReadIR jn Y in SelectMDR out Z out Z in R1 out R1 in AddR3 out WMFC Micro-instruction.. Select=0: SelectYSelect=1: Select4
CS1104-P2-6Processor: Datapath and Control25 MIPS: Multicycle Datapath and Control Adapted from D. Patterson’s CS61C http://www.cs.berkeley.edu/~pattrsn/61CF00 Copyright 2000 UCB
CS1104-P2-6Processor: Datapath and Control26 Stages of a Datapath Problem: a single, atomic block which “executes an instruction” (performs all necessary operations beginning with fetching the instruction) would be too bulky and inefficient. Solution: break up the process of “executing an instruction” into stages, and then connect the stages to create the whole datapath. Smaller stages are easier to design. Easy to optimize (change) one stage without touching the others.
CS1104-P2-6Processor: Datapath and Control27 Stages of a Datapath (2) There is a wide variety of MIPS instructions: so what general steps do they have in common? Stages 1.Instruction Fetch 2.Instruction Decode 3.ALU 4.Memory Access 5.Register Write
CS1104-P2-6Processor: Datapath and Control28 Stages of a Datapath (3) Stage 1: Instruction Fetch. No matter what the instruction is, the 32-bit instruction word must first be fetched from memory (the cache-memory hierarchy). Also, this is where we increment PC (that is, PC = PC + 4, to point to the next instruction; byte addressing so + 4).
CS1104-P2-6Processor: Datapath and Control29 Stages of a Datapath (4) Stage 2: Instruction Decode Upon fetching the instruction, we next gather data from the fields (decode all necessary instruction data). First, read the opcode to determine instruction type and field lengths. Second, read in data from all necessary registers. For add, read two registers. For addi, read one register. For jal, no read necessary.
CS1104-P2-6Processor: Datapath and Control30 Stages of a Datapath (5) Stage 3: ALU (Arithmetic-Logic Unit) The real work of most instructions is done here: arithmetic (+, -, *, /), shifting, logic (&, |), comparisons ( slt ). What about loads and stores? lw $t0, 40($t1) The address we are accessing in memory = the value in $t1 plus the value 40. We do this addition at this stage.
CS1104-P2-6Processor: Datapath and Control31 Stages of a Datapath (6) Stage 4: Memory Access Actually only the load and store instructions do anything during this stage; for the other instructions, they remain idle during this stage. Since these instructions have a unique step, we need this extra stage to account for them. As a result of the cache system, this stage is expected to be just as fast (on average) as the others.
CS1104-P2-6Processor: Datapath and Control32 Stages of a Datapath (7) Stage 5: Register Write Most instructions write the result of some computation into a register. Examples: arithmetic, logical, shifts, loads, slt What about stores, branches, jumps? They do not write anything into a register at the end. These remain idle during this fifth stage.
CS1104-P2-6Processor: Datapath and Control33 Datapath: Generic Steps PC instruction memory +4 rt rs rd registers ALU Data memory imm 1. Instruction Fetch 2. Decode/ Register Read 3. Execute4. Memory5. Reg. Write
CS1104-P2-6Processor: Datapath and Control34 Datapath Walkthroughs: add add $r3,$r1,$r2 # r3 = r1+r2 Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find that it is an add instruction, then read registers $r1 and $r2. Stage 3: Add the two values retrieved in stage 2. Stage 4: Idle (nothing to write to memory). Stage 5: Write result of stage 3 into register $r3.
CS1104-P2-6Processor: Datapath and Control35 Datapath Walkthroughs: add (2) PC instruction memory +4 registers ALU Data memory imm 2 1 3 add r3, r1, r2 reg+reg reg reg
CS1104-P2-6Processor: Datapath and Control36 Datapath Walkthroughs: slti slti $r3,$r1,17 Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find it is an slti, then read register $r1. Stage 3: Compare value retrieved in stage 2 with the integer 17. Stage 4: Go idle. Stage 5: Write the result of stage 3 in register $r3.
CS1104-P2-6Processor: Datapath and Control37 Datapath Walkthroughs: slti (2) PC instruction memory +4 registers ALU Data memory imm 3 1 x slti r3, r1, 17 reg-17 17 reg
CS1104-P2-6Processor: Datapath and Control38 Datapath Walkthroughs: sw sw $r3, 20($r1) Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find it is an sw, then read registers $r1 and $r3. Stage 3: Add 20 to value in register $r1 (retrieved in stage 2). Stage 4: Write value in register $r3 (retrieved in stage 2) into memory address computed in stage 3. Stage 5: Go idle (nothing to write into a register).
CS1104-P2-6Processor: Datapath and Control39 Datapath Walkthroughs: sw (2) PC instruction memory +4 registers ALU Data memory imm 3 1 x sw r3, 20(r1) reg+20 20 reg MEM[r1+20]<-r3 reg
CS1104-P2-6Processor: Datapath and Control40 Why Five Stages? Could we have a different number of stages? Yes, and other architectures do. So why does MIPS have five stages, if instructions tend to go idle for at least one stage? There is one instruction that uses all five stages: the load.
CS1104-P2-6Processor: Datapath and Control41 Datapath Walkthroughs: lw lw $r3, 40($r1) Stage 1: Fetch this instruction, increment PC. Stage 2: Decode to find it is a lw, then read register $r1. Stage 3: Add 40 to value in register $r1 (retrieved in stage 2). Stage 4: Read value from memory address compute in stage 3. Stage 5: Write value found in stage 4 into register $r3.
CS1104-P2-6Processor: Datapath and Control42 Datapath Walkthroughs: lw (2) PC instruction memory +4 registers ALU Data memory imm 3 1 x lw r3, 40(r1) reg+40 40 reg r3<-MEM[r1+40] reg
CS1104-P2-6Processor: Datapath and Control43 What Hardware Is Needed? PC: a register which keeps track of address of the next instruction. General Purpose Registers Used in stages 2 (read) and 5 (write). We are currently working with 32 of these. Memory Used in stages 1 (fetch) and 4 (R/W). Cache system makes these two stages as fast as the others, on average.
CS1104-P2-6Processor: Datapath and Control44 Datapath: Summary Construct datapath based on register transfers required to perform instructions. Control part causes the right transfers to happen. PC instruction memory +4 rt rs rd registers ALU Data memory imm Controller opcode, funct
CS1104-P2-6Processor: Datapath and Control45 Where is Logic Design Used? Combinational circuits for ALU and other parts of the datapath. Different control signals are needed for different clock cycles and different instructions for the ALU, registers and other parts of the datapath. Sequential circuits. ALU ALU Control
CS1104-P2-6Processor: Datapath and Control46 Where is Logic Design Used? (2) High-level view of finite state machine control. Sequential logic design can be used to assert the correct control signals at the correct times. Start Instruction fetch/decode and register fetch Memory access instructions R-type instructions Branch instruction Jump instruction
CS1104-P2-6Processor: Datapath and Control47 Summary Datapath is the hardware that performs operations necessary to execute programs. Control instructs datapath on what to do next. Datapath needs: access to storage (general purpose registers and memory) computational ability (ALU) helper hardware (local registers and PC)
CS1104-P2-6Processor: Datapath and Control48 Summary (2) Five stages of datapath (executing an instruction): 1: Instruction Fetch (Increment PC) 2: Instruction Decode (Read Registers) 3: ALU (Computation) 4: Memory Access 5: Write to Registers ALL instructions must go through ALL five stages. Datapath designed in hardware.