Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS61C L17 Single Cycle CPU Datapath (1) Garcia, Fall 2005 © UCB Lecturer PSOE, new dad Dan Garcia www.cs.berkeley.edu/~ddgarcia inst.eecs.berkeley.edu/~cs61c.

Similar presentations


Presentation on theme: "CS61C L17 Single Cycle CPU Datapath (1) Garcia, Fall 2005 © UCB Lecturer PSOE, new dad Dan Garcia www.cs.berkeley.edu/~ddgarcia inst.eecs.berkeley.edu/~cs61c."— Presentation transcript:

1 CS61C L17 Single Cycle CPU Datapath (1) Garcia, Fall 2005 © UCB Lecturer PSOE, new dad Dan Garcia www.cs.berkeley.edu/~ddgarcia inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #17 Single Cycle CPU Datapath 2005-10-31 There is one handout today at the front and back of the room! Halloween plans?  Try the Castro, SF! halloweeninthecastro.com CPS today! Today 2005-10-31, from 7pm-mid ($5 donation) go at least once… Happy Halloween, everyone!

2 CS61C L17 Single Cycle CPU Datapath (2) Garcia, Fall 2005 © UCB Wed’s talk : Jim Larus, µsoft (Cal Ph.D.) “An Overview of the Singularity Project” 306 Soda Hall, Wed 2005-11-02 @ 4-5pm Dr. Larus is the author of SPIM! ftp://ftp.research.microsoft.com/pub/tr/TR-2005-135.pdf “Singularity is a research project in Microsoft Research that started with the question: what would a software platform look like if it was designed from scratch with the primary goal of dependability? Singularity is working to answer this question by building on advances in programming languages and tools to develop a new system architecture and operating system (named Singularity), with the aim of producing a more robust and dependable software platform. Singularity demonstrates the practicality of new technologies and architectural decisions, which should lead to the construction of more robust and dependable systems.”

3 CS61C L17 Single Cycle CPU Datapath (3) Garcia, Fall 2005 © UCB Review Use muxes to select among input S input bits selects 2 S inputs Each input can be n-bits wide, indep of S Implement muxes hierarchically ALU can be implemented using a mux Coupled with basic block elements N-bit adder-subtractor done using N 1- bit adders with XOR gates on input XOR serves as conditional inverter Programmable Logic Arrays are often used to implement our CL

4 CS61C L17 Single Cycle CPU Datapath (4) Garcia, Fall 2005 © UCB Anatomy: 5 components of any Computer Personal Computer Processor Computer Control (“brain”) Datapath (“brawn”) Memory (where programs, data live when running) Devices Input Output Keyboard, Mouse Display, Printer Disk (where programs, data live when not running) This week and next

5 CS61C L17 Single Cycle CPU Datapath (5) Garcia, Fall 2005 © UCB Outline of Today’s Lecture Design a processor: step-by-step Requirements of the Instruction Set Hardware components that match the instruction set requirements

6 CS61C L17 Single Cycle CPU Datapath (6) Garcia, Fall 2005 © UCB How to Design a Processor: step-by-step 1. Analyze instruction set architecture (ISA)  datapath requirements meaning of each instruction is given by the register transfers datapath must include storage element for ISA registers datapath must support each register transfer 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic

7 CS61C L17 Single Cycle CPU Datapath (7) Garcia, Fall 2005 © UCB Review: The MIPS Instruction Formats All MIPS instructions are 32 bits long. 3 formats: R-type I-type J-type The different fields are: op: operation (“opcode”) of the instruction rs, rt, rd: the source and destination register specifiers shamt: shift amount funct: selects the variant of the operation in the “op” field address / immediate: address offset or immediate value target address: target address of jump instruction optarget address 02631 6 bits26 bits oprsrtrdshamtfunct 061116212631 6 bits 5 bits oprsrt address/immediate 016212631 6 bits16 bits5 bits

8 CS61C L17 Single Cycle CPU Datapath (8) Garcia, Fall 2005 © UCB Step 1a: The MIPS-lite Subset for today ADDU and SUBU addu rd,rs,rt subu rd,rs,rt OR Immediate: ori rt,rs,imm16 LOAD and STORE Word lw rt,rs,imm16 sw rt,rs,imm16 BRANCH: beq rs,rt,imm16 oprsrtrdshamtfunct 061116212631 6 bits 5 bits oprsrtimmediate 016212631 6 bits16 bits5 bits oprsrtimmediate 016212631 6 bits16 bits5 bits oprsrtimmediate 016212631 6 bits16 bits5 bits

9 CS61C L17 Single Cycle CPU Datapath (9) Garcia, Fall 2005 © UCB Register Transfer Language RTL gives the meaning of the instructions All start by fetching the instruction {op, rs, rt, rd, shamt, funct} = MEM[ PC ] {op, rs, rt, Imm16} = MEM[ PC ] inst Register Transfers ADDUR[rd] = R[rs] + R[rt];PC = PC + 4 SUBUR[rd] = R[rs] – R[rt];PC = PC + 4 ORIR[rt] = R[rs] | zero_ext(Imm16); PC = PC + 4 LOADR[rt] = MEM[ R[rs] + sign_ext(Imm16)];PC = PC + 4 STOREMEM[ R[rs] + sign_ext(Imm16) ] = R[rt];PC = PC + 4 BEQ if ( R[rs] == R[rt] ) then PC = PC + 4 + (sign_ext(Imm16) || 00) else PC = PC + 4

10 CS61C L17 Single Cycle CPU Datapath (10) Garcia, Fall 2005 © UCB Step 1: Requirements of the Instruction Set Memory (MEM) instructions & data Registers (R: 32 x 32) read RS read RT Write RT or RD PC Extender (sign extend) Add and Sub register or extended immediate Add 4 or extended immediate to PC

11 CS61C L17 Single Cycle CPU Datapath (11) Garcia, Fall 2005 © UCB Step 2: Components of the Datapath Combinational Elements Storage Elements Clocking methodology

12 CS61C L17 Single Cycle CPU Datapath (12) Garcia, Fall 2005 © UCB Combinational Logic Elements (Building Blocks) Adder MUX ALU 32 A B Sum CarryOut 32 A B Result OP 32 A B Y Select Adder MUX ALU CarryIn

13 CS61C L17 Single Cycle CPU Datapath (13) Garcia, Fall 2005 © UCB ALU Needs for MIPS-lite + Rest of MIPS Addition, subtraction, logical OR, ==: ADDU R[rd] = R[rs] + R[rt];... SUBU R[rd] = R[rs] – R[rt];... ORIR[rt] = R[rs] | zero_ext(Imm16)... BEQ if ( R[rs] == R[rt] )... Test to see if output == 0 for any ALU operation gives == test. How? P&H also adds AND, Set Less Than (1 if A < B, 0 otherwise) ALU follows chap 5

14 CS61C L17 Single Cycle CPU Datapath (14) Garcia, Fall 2005 © UCB Administrivia Project 2 graded You have a week (2005-11-07) to regrade My wed OH this week moved to Fri @ 2p Final Exam location TBA (exam grp 14) Sat, 2005-12-17, 12:30–3:30pm ALL students are required to complete ALL of the exam (even if you aced the midterm) Same format as the midterm -3 Hours -Closed book, except for 2 study sheets + green -Leave your backpacks, books at home

15 CS61C L17 Single Cycle CPU Datapath (15) Garcia, Fall 2005 © UCB Storage Element: Idealized Memory Memory (idealized) One input bus: Data In One output bus: Data Out Memory word is selected by: Address selects the word to put on Data Out Write Enable = 1: address selects the memory word to be written via the Data In bus Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: -Address valid  Data Out valid after “access time.” Clk Data In Write Enable 32 DataOut Address

16 CS61C L17 Single Cycle CPU Datapath (16) Garcia, Fall 2005 © UCB Clk Data In Write Enable NN Data Out Storage Element: Register (Building Block) Similar to D Flip Flop except -N-bit input and output -Write Enable input Write Enable: -negated (or deasserted) (0): Data Out will not change -asserted (1): Data Out will become Data In

17 CS61C L17 Single Cycle CPU Datapath (17) Garcia, Fall 2005 © UCB Storage Element: Register File Register File consists of 32 registers: Two 32-bit output busses: busA and busB One 32-bit input bus: busW Register is selected by: RA (number) selects the register to put on busA (data) RB (number) selects the register to put on busB (data) RW (number) selects the register to be written via busW (data) when Write Enable is 1 Clock input (CLK) The CLK input is a factor ONLY during write operation During read operation, behaves as a combinational logic block: -RA or RB valid => busA or busB valid after “access time.” Clk busW Write Enable 32 busA 32 busB 555 RWRARB 32 32-bit Registers

18 CS61C L17 Single Cycle CPU Datapath (18) Garcia, Fall 2005 © UCB Step 3: Assemble DataPath meeting requirements Register Transfer Requirements  Datapath Assembly Instruction Fetch Read Operands and Execute Operation

19 CS61C L17 Single Cycle CPU Datapath (19) Garcia, Fall 2005 © UCB 3a: Overview of the Instruction Fetch Unit The common RTL operations Fetch the Instruction: mem[PC] Update the program counter: -Sequential Code: PC = PC + 4 -Branch and Jump: PC = “something else” 32 Instruction Word Address Instruction Memory PCClk Next Address Logic

20 CS61C L17 Single Cycle CPU Datapath (20) Garcia, Fall 2005 © UCB 3b: Add & Subtract R[rd] = R[rs] op R[rt] Ex.: addU rd,rs,rt Ra, Rb, and Rw come from instruction’s Rs, Rt, and Rd fields ALUctr and RegWr: control logic after decoding the instruction 32 Result ALUctr Clk busW RegWr 32 busA 32 busB 555 RwRaRb 32 32-bit Registers RsRtRd ALU oprsrtrdshamtfunct 061116212631 6 bits 5 bits Already defined register file, ALU

21 CS61C L17 Single Cycle CPU Datapath (21) Garcia, Fall 2005 © UCB Clocking Methodology Storage elements clocked by same edge Being physical devices, flip-flops (FF) and combinational logic have some delays Gates: delay from input change to output change Signals at FF D input must be stable before active clock edge to allow signal to travel within the FF, and we have the usual clock-to-Q delay “Critical path” (longest path through logic) determines length of clock period Clk........................

22 CS61C L17 Single Cycle CPU Datapath (22) Garcia, Fall 2005 © UCB Register-Register Timing: One complete cycle 32 Result ALUctr Clk busW RegWr 32 busA 32 busB 555 RwRaRb 32 32-bit Registers RsRtRd ALU Clk PC Rs, Rt, Rd, Op, Func ALUctr Instruction Memory Access Time Old ValueNew Value RegWrOld ValueNew Value Delay through Control Logic busA, B Register File Access Time Old ValueNew Value busW ALU Delay Old ValueNew Value Old ValueNew Value Old Value Register Write Occurs Here

23 CS61C L17 Single Cycle CPU Datapath (23) Garcia, Fall 2005 © UCB 3c: Logical Operations with Immediate R[rt] = R[rs] op ZeroExt[imm16] ] 32 Result ALUct r Clk busW RegWr 32 busA 32 busB 555 RwRaRb 32 32-bit Registers Rs ZeroExt Mux RtRd RegDst Mux 32 16 imm16 ALUSrc ALU 11 oprsrtimmediate 016212631 6 bits16 bits5 bits rd? immediate 0161531 16 bits 0 0 0 0 0 0 0 0 Rt? Already defined 32-bit MUX; Zero Ext? What about Rt register read??

24 CS61C L17 Single Cycle CPU Datapath (24) Garcia, Fall 2005 © UCB 3d: Load Operations R[rt] = Mem[R[rs] + SignExt[imm16]] Example: lw rt,rs,imm16 oprsrtimmediate 016212631 6 bits16 bits5 bits 32 ALUctr Clk busW RegWr 32 busA 32 busB 555 RwRaRb 32 32-bit Registers Rs RtRd RegDst Extender Mux 32 16 imm16 ALUSrc ExtOp Clk Data In WrEn 32 Adr Data Memory 32 ALU MemWr Mu x W_Src ?? Rt

25 CS61C L17 Single Cycle CPU Datapath (25) Garcia, Fall 2005 © UCB 3e: Store Operations Mem[ R[rs] + SignExt[imm16] ] = R[rt] Ex.: sw rt, rs, imm16 oprsrtimmediate 016212631 6 bits16 bits5 bits 32 ALUctr Clk busW RegWr 32 busA 32 busB 555 RwRaRb 32 32-bit Registers Rs Rt Rd RegDst Extender Mux 32 16 imm16 ALUSrc ExtOp Clk Data In WrEn 32 Adr Data Memory MemWr ALU 32 Mux W_Src

26 CS61C L17 Single Cycle CPU Datapath (26) Garcia, Fall 2005 © UCB 3f: The Branch Instruction beq rs, rt, imm16 mem[PC] Fetch the instruction from memory Equal = R[rs] == R[rt] Calculate branch condition if (Equal) Calculate the next instruction’s address -PC = PC + 4 + ( SignExt(imm16) x 4 ) else -PC = PC + 4 oprsrtimmediate 016212631 6 bits16 bits5 bits

27 CS61C L17 Single Cycle CPU Datapath (27) Garcia, Fall 2005 © UCB Datapath for Branch Operations beq rs, rt, imm16 Datapath generates condition (equal) oprsrtimmediate 016212631 6 bits16 bits5 bits 32 imm16 PC Clk 00 Adder Mux Adder 4 nPC_sel Clk busW RegWr 32 busA 32 busB 555 RwRaRb 32 32-bit Registers Rs Rt Equal? Cond PC Ext Inst Address Already MUX, adder, sign extend, zero

28 CS61C L17 Single Cycle CPU Datapath (28) Garcia, Fall 2005 © UCB Putting it All Together:A Single Cycle Datapath imm16 32 ALUctr Clk busW RegWr 32 busA 32 busB 555 RwRaRb 32 32-bit Registers Rs Rt Rd RegDst Extender Mux 32 16 imm16 ALUSrc ExtOp Mux MemtoReg Clk Data In WrEn 32 Adr Data Memory MemWr ALU Equal Instruction 0 1 0 1 0 1 Imm16RdRtRs = Adder PC Clk 00 Mux 4 nPC_sel PC Ext Adr Inst Memory

29 CS61C L17 Single Cycle CPU Datapath (29) Garcia, Fall 2005 © UCB An Abstract View of the Implementation Data Out Clk 5 RwRaRb 32 32-bit Registers Rd ALU Clk Data In Data Address Ideal Data Memory Instruction Address Ideal Instruction Memory Clk PC 5 Rs 5 Rt 32 A B Next Address Control Datapath Control Signals Conditions

30 CS61C L17 Single Cycle CPU Datapath (30) Garcia, Fall 2005 © UCB Peer Instruction A. If the destination reg is the same as the source reg, we could compute the incorrect value! B. We’re going to be able to read 2 registers and write a 3 rd in 1 cycle C. Datapath is hard, Control is easy ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

31 CS61C L17 Single Cycle CPU Datapath (31) Garcia, Fall 2005 © UCB Peer Instruction A. Our ALU is a synchronous device B. We should use the main ALU to compute PC=PC+4 C. The ALU is inactive for memory reads or writes. ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

32 CS61C L17 Single Cycle CPU Datapath (32) Garcia, Fall 2005 © UCB Peer Instruction A. SW can peek at HW (past ISA abstraction boundary) for optimizations B. SW can depend on particular HW implementation of ISA C. Timing diagrams serve as a critical debugging tool in the EE toolkit ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

33 CS61C L17 Single Cycle CPU Datapath (33) Garcia, Fall 2005 © UCB Peer Instruction A. (a+b) (a+b) = b B. N-input gates can be thought of cascaded 2- input gates. I.e., (a ∆ b ∆ c ∆ d) = a ∆ (b ∆ (c ∆ d)) where ∆ is one of AND, OR, XOR, NAND C. You can use NOR(s) with clever wiring to simulate AND, OR, & NOT ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

34 CS61C L17 Single Cycle CPU Datapath (34) Garcia, Fall 2005 © UCB A. (next slide) B. (next slide) C. You can use NOR(s) with clever wiring to simulate AND, OR, & NOT. ° NOR(a,a)= a+a = aa = a ° Using this NOT, can we make a NOR an OR? An And? ° TRUE Peer Instruction Answer A. (a+b) (a+b) = b B. N-input gates can be thought of cascaded 2- input gates. I.e., (a ∆ b ∆ c ∆ d) = a ∆ (b ∆ (c ∆ d)) where ∆ is one of AND, OR, XOR, NAND C. You can use NOR(s) with clever wiring to simulate AND, OR, & NOT ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

35 CS61C L17 Single Cycle CPU Datapath (35) Garcia, Fall 2005 © UCB A. (a+b) (a+b) =?= b (a+b)(a+b) aa+ab+ba+bbdistribution 0+b(a+a)+bcomplimentarity, commutativity, distribution, idempotent b(1)+bidentity, complimentarity b+bidentity bidempotent Peer Instruction Answer (A) TRUE

36 CS61C L17 Single Cycle CPU Datapath (36) Garcia, Fall 2005 © UCB A. B. N-input gates can be thought of cascaded 2-input gates. I.e., (a ∆ b ∆ c ∆ d) = a ∆ (b ∆ (c ∆ d)) where ∆ is one of AND, OR, XOR, NAND…FALSE Let’s confirm! CORRECT 3-input XYZ|AND|OR|XOR|NAND 000| 0 |0 | 0 | 1 001| 0 |1 | 1 | 1 010| 0 |1 | 1 | 1 011| 0 |1 | 0 | 1 100| 0 |1 | 1 | 1 101| 0 |1 | 0 | 1 110| 0 |1 | 0 | 1 111| 1 |1 | 1 | 0 CORRECT 2-input YZ|AND|OR|XOR|NAND 00| 0 |0 | 0 | 1 01| 0 |1 | 1 | 1 10| 0 |1 | 1 | 1 11| 1 |1 | 0 | 0 0 0 0 1 0 1 1 1 0 1 0 1 0 1 1 0 0 1 0 0 1 1 1 1 Peer Instruction Answer (B)

37 CS61C L17 Single Cycle CPU Datapath (37) Garcia, Fall 2005 © UCB Peer Instruction A. Truth table for mux with 4-bits of signals has 2 4 rows B. We could cascade N 1-bit shifters to make 1 N-bit shifter for sll, srl C. If 1-bit adder delay is T, the N-bit adder delay would also be T ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT

38 CS61C L17 Single Cycle CPU Datapath (38) Garcia, Fall 2005 © UCB Peer Instruction Answer A. Truth table for mux with 4-bits of signals is 2 4 rows long B. We could cascade N 1-bit shifters to make 1 N-bit shifter for sll, srl C. If 1-bit adder delay is T, the N-bit adder delay would also be T ABC 1: FFF 2: FFT 3: FTF 4: FTT 5: TFF 6: TFT 7: TTF 8: TTT A. Truth table for mux with 4-bits of signals controls 16 inputs, for a total of 20 inputs, so truth table is 2 20 rows…FALSE B. We could cascade N 1-bit shifters to make 1 N-bit shifter for sll, srl … TRUE C. What about the cascading carry? FALSE

39 CS61C L17 Single Cycle CPU Datapath (39) Garcia, Fall 2005 © UCB °5 steps to design a processor 1. Analyze instruction set  datapath requirements 2. Select set of datapath components & establish clock methodology 3. Assemble datapath meeting the requirements 4. Analyze implementation of each instruction to determine setting of control points that effects the register transfer. 5. Assemble the control logic °Control is the hard part °Next time! Summary: Single cycle datapath Control Datapath Memory Processor Input Output


Download ppt "CS61C L17 Single Cycle CPU Datapath (1) Garcia, Fall 2005 © UCB Lecturer PSOE, new dad Dan Garcia www.cs.berkeley.edu/~ddgarcia inst.eecs.berkeley.edu/~cs61c."

Similar presentations


Ads by Google