1 1999 ©UCB CS 161 Review for Test 2 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)

1 1999 ©UCB CS 161 Review for Test 2 Instructor: L.N. Bhuyan www.cs.ucr.edu/~bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)

2 1999 ©UCB How to Study for Test 2 : Chap 5 °Single-cycle (CPI=1) processor know how to reason about processor organization (datapath, control) -e.g., how to add another instruction? (must modify both control, datapath, or both) -How to add multiplexors in the datapath -How to design hardware control unit °Multicycle (CPI>1) processor -Changes to Single Cycle Datapath -Control Design through FSM -how to add new instruction to multicycle?

3 1999 ©UCB Putting Together a Datapath for MIPS Memory (Dmem) PCRegisters ALUALU Data InData Out Memory (Imem) Address Data Out Address Data Out Data In Step 1Step 2Step 3Step 4 5 °Question: Which instruction uses which steps and what is the execution time?

4 1999 ©UCB Datapath Timing: Single-cycle vs. Pipelined °Suppose the following delays for major functional units: 2 ns for a memory access or ALU operation 1 ns for register file read or write °Total datapath delay for single-cycle: °What about multi-cycle datapath? InsnInsnRegALUDataRegTotal TypeFetchReadOperAccessWriteTime beq 2ns1ns2ns5ns R-form2ns1ns2ns1ns6ns sw 2ns1ns2ns2ns7ns lw 2ns1ns2ns2ns1ns8ns

5 1999 ©UCB Implementing Main Control Main Control RegDst Branch MemRead MemtoReg ALUop MemWrite ALUSrc RegWrite op 2 Main Control has one 6-bit input, 9 outputs (7 are 1-bit, ALUOp is 2 bits) To build Main Control as sum-of-products: (1) Construct a minterm for each different instruction (or R-type); each minterm corresponds to a single instruction (or all of the R- type instructions), e.g., M R-format, M lw (2) Determine each main control output by forming the logical OR of relevant minterms (instructions), e.g., RegWrite: M R-format OR M lw

6 1999 ©UCB Single-Cycle MIPS-lite CPU Regs Read Reg1 Read data1 ALUALU Read data2 Read Reg2 Write Reg Write Data Zero ALU- con RegWrite Address Read data Write Data Sign Extend Dmem MemRead MemWrite MuxMux MemTo- Reg MuxMux Read Addr Instruc- tion Imem 4 PCPC addadd addadd << 2 MuxMux ALU Control 5:0 ALUOp ALU- src MuxMux 25:21 20:16 15:11 RegDst 15:0 31:0 Branch Main Control op=[31:26] PCSrc

7 1999 ©UCB R-format Execution Illustration (step 4) Regs Read Reg1 Read data1 ALUALU Read data2 Read Reg2 Write Reg Write Data Zero ALU- con RegWrite Address Read data Write Data Sign Extend Dmem MemRead MemWrite MuxMux MemTo- Reg=1 MuxMux Read Addr Instruc- tion Imem 4 PCPC addadd addadd << 2 MuxMux PCSrc=0 ALU Control 5:0 ALUOp ALU- src=0 MuxMux 25:21 20:16 15:11 RegDst=1 15:0 31:0 Branch Main Control [r1] + [r2]

8 1999 ©UCB Multicycle Datapath (overview) Registers Read Reg1 ALUALU Read Reg2 Write Reg Data PCPC Address Instruction or Data Memory MIPS-lite Multicycle Version A B ALU- Out Instruction Register Data Memory Data Register Read data 1 Read data 2 One ALU (no extra adders) One Memory (no separate Imem, Dmem) New Temporary Registers (“clocked”/require clock input)

9 1999 ©UCB Cycle 3 Datapath (R-format) MIPS-lite Multicycle Version ALUALU Regs Read Reg1 Read data1 Read data2 Read Reg2 Write Reg Write Data Sgn Ext- end PCPC << 2 A B ALU- Out Address Read Data Mem Write Data MDRMDR MuxMux 25:21 20:16 15:0 0 1M2 u 3 x MuxMux MuxMux MuxMux IR 4 z 15:11 ALU Control 2 2 3 (funct) 5:0 MuxMux ALUOut=A op B

10 1999 ©UCB MemRead ALUSrcA = 0 IorD = 0 IRWrite ALUSrcB = 1 ALUOp = 0 PCWrite PCSrc = 0 state 0 ALUSrcA = 0 ALUSrcB = 3 ALUOp = 0 ALUSrcA = 1 ALUSrcB = 2 ALUOp = 0 ALUSrcA = 1 ALUSrcB = 0 ALUOp =2 ALUSrcA = 1 ALUSrcB = 0 ALUOp =1 PCWriteCond PCSrc = 1 1 2 6 8 Memory Access R-format execution Branch Completion FSM diagram for Multicycle Machine start new instruction cycle 1 cycle 2 cycle 3 lw/sw R-format beq

11 1999 ©UCB Implementing the FSM controller (C.3) PCWrite PCWriteCond IorD MemtoReg PCSrc ALUOp ALUSrcB ALUSrcA RegWrite RegDst NS3 NS2 NS1 NS0 O p 5 O p 4 O p 3 O p 2 O p 1 O p 0 S 3 S 2 S 1 S 0 IRWrite MemRead MemWrite Outputs Inputs PLA or ROM implementation of both next-state and output functions Next-state } Datapath Control Points Instruction register opcode field state register

12 1999 ©UCB Micro-programmed Control (Chap. 5.5) °In microprogrammed control, FSM states become microinstructions of a microprogram (“microcode”) one FSM state=one microinstruction usually represent each micro-instruction textually, like an assembly instruction °FSM current state register becomes the microprogram counter (micro-PC) normal sequencing: add 1 to micro-PC to get next micro-instruction microprogram branch: separate logic determines next microinstruction

13 1999 ©UCB Micro-program for Multi-cycle Machine ALU Reg Mem PCNext OpIn1 In2FileOpSrcWrit  -Instr ----------------------------------------------------------- Fetch: Add PC 4RdPC ALU Add PCSE*4Rd[D1] Mem:AddASE[D2] LW:RdALU WrFetch SW:WrALUFetch Rform:funct A B WrFetch BEQ:Sub A BEquFetch D1 = { Mem, Rform, BEQ } D2 = { LW, SW }

14 1999 ©UCB How to Study for Test 2 : Chap 6 °Pipelined Processor how pipelined datapath, control differs from architectures of Chapter 5? -All instructions execute same 5 cycles -pipeline registers to separate the stages of datapath & control Problems for Pipelining -pipeline hazards: structural, data, control (how each solved?)

15 1999 ©UCB Pipelining Lessons °Pipelining doesn’t help latency (execution time) of single task, it helps throughput of entire workload °Multiple tasks operating simultaneously using different resources °Potential speedup = Number of pipe stages °What is real speedup? °Time to “fill” pipeline and time to “drain” it reduces speedup 6 PM 789 Time B C D A 30 TaskOrderTaskOrder

16 1999 ©UCB Space-Time Diagram °To simplify pipeline, every instruction takes same number of steps, called stages °One clock cycle per stage IFtchDcdExecMemWB IFtchDcdExecMemWB IFtchDcdExecMemWB IFtchDcdExecMemWB IFtchDcdExecMemWB Program Flow Time

17 1999 ©UCB Problems for Pipelining °Hazards prevent next instruction from executing during its designated clock cycle, limiting speedup Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) Control hazards: conditional branches & other instructions may stall the pipeline delaying later instructions (must check detergent level before washing next load) Data hazards: Instruction depends on result of prior instruction still in the pipeline (matching socks in later load)

18 1999 ©UCB °guess branch taken, then back up if wrong: “branch prediction” For example, Predict not taken Impact: 1 clock per branch instruction if right, 2 if wrong (static: right ~ 50% of time) More dynamic scheme: keep history of the branch instruction (~ 90%) Control Hazard : Solution 1 add beq Load ALU IM Reg DMReg ALU IM Reg DMReg IM ALU Reg DMReg I n s t r. O r d e r Time (clock cycles)

19 1999 ©UCB °Redefine branch behavior (takes place after next instruction) “delayed branch” °Impact: 1 clock cycle per branch instruction if can find instruction to put in the “delay slot” (  50% of time) Control Hazard : Solution 2 add beq Misc ALU IM Reg DMReg ALU IM Reg DMReg IM ALU Reg DMReg Load IM ALU Reg DMReg I n s t r. O r d e r Time (clock cycles)

20 1999 ©UCB Dependencies backwards in time are hazards Data Hazard on $1: Illustration add $1,$2,$3 sub $4,$1,$3 and $6,$1,$7 or $8,$1,$9 xor $10,$1,$11 IFID/RFEXMEMWB ALU IM Reg DM Reg ALU IM Reg DMReg ALU IM Reg DMReg IM ALU Reg DMReg ALU IM Reg DMReg I n s t r. O r d e r Time (clock cycles)

21 1999 ©UCB “Forward” result from one stage to another “or” OK if implement register file properly Data Hazard : Solution: add $1,$2,$3 sub $4,$1,$3 and $6,$1,$7 or $8,$1,$9 xor $10,$1,$11 IFID/RFEXMEMWB ALU IM Reg DM Reg ALU IM Reg DMReg ALU IM Reg DMReg IM ALU Reg DMReg ALU IM Reg DMReg I n s t r. O r d e r Time (clock cycles)

22 1999 ©UCB Must stall pipeline 1 cycle (insert 1 bubble) lw $1, 0($2) sub $4,$1,$6 and $6,$1,$7 or $8,$1,$9 IFID/RFEXMEMWB ALU IM Reg DM Reg ALU IM Reg DMReg ALU IM Reg DMReg IM ALU Reg DM Time (clock cycles) bub ble Data Hazard Even with Forwarding

23 1999 ©UCB How to Study for Test 2 : Chap 7 °Processor-Memory performance gap: problem for hardware designers and software developers alike °Memory Hierarchy--The Goal: want to create illusion of single large, fast memory access that hit in highest level are processed most quickly Exploit Principle of Locality to obtain high hit rate °Caches vs. Virtual Memory: how are they similar? Different?

24 1999 ©UCB Memory Hierarchy: Terminology °Hit Time: Time to access the upper level which consists of Time to determine hit/miss + Memory access time Miss Penalty: Time to replace a block in the upper level + Time to deliver the block the processor °Note: Hit Time << Miss Penalty [Note: “<<“ here means “much less than”]

25 1999 ©UCB Issues with Direct-Mapped °If block size > 1, rightmost bits of index are really the offset within the indexed block ttttttttttttttttt iiiiiiiiii oooo tagindexbyte to checkto offset if have selectwithin correct blockblockblock Q: How do Set-Associative and Fully- Associative Designs Look?

26 1999 ©UCB Read from cache at offset, return word b °000000000000000000 0000000001 0100... Valid Tag 0x0-3 0x4-70x8-b0xc-f 0 1 2 3 4 5 6 7 1022 1023... 1 0abcd Index Tag fieldIndex fieldOffset 0 0 0 0 0 0 0 0 0

27 1999 ©UCB Miss Rate Versus Block Size 256 40% 35% 30% 25% 20% 15% 10% 5% 0% M i s s r a t e 64164 Block size (bytes) 1 KB 8 KB 16 KB 64 KB 256 KB total cache size Figure 7.12 - for direct mapped cache

28 1999 ©UCB Compromise: N-way Set Associative Cache °N-way set associative: N cache blocks for each Cache Index Like having N direct mapped caches operating in parallel °Example: 2-way set associative cache Cache Index selects a “set” of 2 blocks from the cache The 2 tags in set are compared in parallel Data is selected based on the tag result (which matched the address) Where is a data written? Based on Replacement Policy, FIFO, LRU, Random

29 1999 ©UCB Improving Cache Performance °In general, want to minimize Average Access Time: = Hit Time x (1 - Miss Rate) + Miss Penalty x Miss Rate (recall Hit Time << Miss Penalty) °Generally, two ways to look at Larger Block Size Larger Cache Higher Associativity Reducing DRAM latency °Miss penalty ? ---> L2 cache approach Reduce Miss Rate Reduces Miss Penalty

30 1999 ©UCB Virtual Memory has own terminology °Each process has its own private “virtual address space” (e.g., 2 32 Bytes); CPU actually generates “virtual addresses” °Each computer has a “physical address space” (e.g., 128 MegaBytes DRAM); also called “real memory” °Library analogy: virtual address is like the title of a book physical address is the location of book in the library as given by its Library of Congress call number

31 1999 ©UCB Mapping Virtual to Physical Address Virtual Page NumberPage Offset Physical Page Number Translation 31 30 29 28 27.………………….12 11 10 29 28 27.………………….12 11 10 9 8 ……..……. 3 2 1 0 Virtual Address Physical Address 9 8 ……..……. 3 2 1 0 1KB page size

32 1999 ©UCB How Translate Fast? °Observation: since there is locality in pages of data, must be locality in virtual addresses of those pages! °Why not create a cache of virtual to physical address translations to make translation fast? (smaller is faster) °For historical reasons, such a “page table cache” is called a Translation Lookaside Buffer, or TLB °TLB organization is same as Icache or Dcache – Direct-mapped or Set Associative

33 1999 ©UCB Access TLB and Cache in Parallel? °Recall: address translation is only for virtual page number, not page offset °If cache index bits of PA “fit within” page offset of VA, then index is not translated  can read cache block while simultaneously accessing TLB °“Virtually indexed, physically tagged cache” (avoids aliasing problem) VA PA page offset virtual page number tag index ofs

1 1999 ©UCB CS 161 Review for Test 2 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)

Similar presentations

Presentation on theme: "1 1999 ©UCB CS 161 Review for Test 2 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 1999 ©UCB CS 161 Review for Test 2 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)

Similar presentations

Presentation on theme: "1 1999 ©UCB CS 161 Review for Test 2 Instructor: L.N. Bhuyan Adapted from notes by Dave Patterson (http.cs.berkeley.edu/~patterson)"— Presentation transcript:

Similar presentations

About project

Feedback