Download presentation
1
Shift Instructions (1/4)
Move (shift) all the bits in a word to the left or right by a number of bits. Example: shift right by 8 bits Example: shift left by 8 bits
2
Shift Instructions (2/4)
MIPS Shift Instruction Syntax: 1 2,3,4 where 1) operation name 2) register that will receive value 3) first operand (register) 4) shift amount (constant < 32, 5 bits) MIPS shift instructions: 1. sll (shift left logical): shifts left and fills emptied bits with 0s 2. srl (shift right logical): shifts right and fills emptied bits with 0s 3. sra (shift right arithmetic): shifts right and fills emptied bits by sign extending
3
Shift Instructions (3/4)
Example: shift right arith by 8 bits Example: shift right arith by 8 bits
4
Shift Instructions (4/4)
Since shifting may be faster than multiplication, a good compiler usually notices when C code multiplies by a power of 2 and compiles it to a shift instruction: a *= 8; (in C) would compile to: sll $s0,$s0,3 (in MIPS) Likewise, shift right to divide by powers of 2 remember to use sra
5
“Shift and Add” Signed Multiplier
Signed extend partial product at each stage Final step is a subtract n-clock cycles
6
Fast multiplication hardware
7
Chap.5 The processor: Datapath and control
Jen-Chang Liu, Spring 2006
8
Hierarchy of Machine Structures
I/O system Processor Compiler Operating System (Windows 98) Application (Netscape) Digital Design Circuit Design Instruction Set Architecture Datapath & Control transistors Memory Hardware Software Assembler
9
Five components of computer
Input, output, memory, datapath, control
10
Inside Mother board (for Pentium Pro)
11
Chapter overview Chap5: datapath and control Chap6: pipeline
Chap7: memory hierarchy Chap8: I/O Chap9: multiprocessor Inside CPU
12
Inside Processor: datapath and control
Datapath: brawn of the processor Perform the arithmetic operations Control: brain of the processor Tells the datapath, memory, and I/O what to do 生產線
13
Inside Pentium Processor
1/3 cache
14
Inside Pentium Pro Processor
15
Clocks methodology high low
Edge-triggered clocking: the content of the state elements (flip-flops, registers, memory) only change on the active clock edge 100 101 001 111 110 001 100
16
Timing constraint The clock period must be long enough to allow signals to be stable
17
Design Target: MIPS The instruction set architecture (ISA) determines the implementation We know how to execute MIPS codes manually, how to design a circuit to execute them? We design a simple implementation that includes a subset of MIPS inst. Memory-reference inst.: lw, sw Arithmetic-logic inst.: add,sub,and,or,slt Branch: beq, j
18
Outline of chapter 5 Building a datapath
Instruction fetch R-type instructions Load/store Branch Single Datapath implementation Multiple cycle implementation
19
Preview: How to carry out an instruction
4 steps to implement an instruction 執行 Instruction fetch Data/register read Instruction execution Memory/register read/write Read inst. from memory ALU add $t0, $t1, $t2 $t1, $t2 $t1 + $t2 Write to $t0 lw $t0, 0($a0) $a0 $a0 + 0 Read from memory beq $t0, $t1, loop $t0, $t1 $t0 - $t1 Write PC
20
Abstract view of carrying out an instruction
fetch Data/register read Instruction execution Memory/register read/write
21
How to build datapath for MIPS ISA?
Datapath: path to perform an instruction Consider each major components Build datapath for each instruction class
22
Outline Building a datapath 1. Instruction fetch
2. R-type instructions 3. Load/store 4. Branch Build datapath for each instruction class, then combine them
23
1. Instruction fetch Increment the Address of the Place to store
PC to next instruction Place to store the instructions Address of the instructions
24
Instruction fetch (cont.)
3 always adds, therefore no control lines 1 2
25
2. R-type instruction R-format instructions
Arithmetic-logic instrcutions add, sub Ex. add $t1, $t2, $t3 and, or slt Opcode 6 rs 5 rt 5 rd 5 funct 6 shamt 5
26
Datapath elements for R-type inst.
4 input output 1. Read register: read register no., output data 2. Write register: write register no., input data, RegWrite=1
27
Datapath for R-type inst.
4 2 1 3 Opcode 6 rs 5 rt 5 rd 5 funct 6 shamt 5
28
3. Load/store from/to memory
I-format Load/store examples lw $t1, offset_value($t2) sw $t1, offset_value($t2) Opcode 6 rs 5 rt 5 Signed offset 16 … offset $t2
29
Datapath elements for load/store
lw $t1, offset_value($t2) Register file, ALU, and data memory Base+offset Store -> MemWrite Load -> MemRead Sign-extend the 16-bit offset field
30
Datapath for load/store
Opcode 6 rs 5 rt 5 Signed offset 16 Datapath for load/store 4 2 1
31
4. Branch I-format Example beq $t1, $t2, offset PC-relative addressing
Opcode 6 rs 5 rt 5 Signed offset 16
32
Details for branch: target address calculation
Base address for offset: PC+4 Instructions are word-aligned: the offset is shifted left 2 bits … PC+4 offset Opcode 6 rs 5 rt 5 Immediate 16 00 offset
33
Opcode 6 rs 5 rt 5 Signed offset 16 Datapath for branch 2 4 1
34
How to combine these datapaths ?
We have shown datapaths for Instruction fetch R-type instructions Load/store branch How to assemble the datapaths? How to handle control lines?
35
Outline Building a datapath Single Datapath implementation
Instruction fetch R-type instructions Load/store Branch Single Datapath implementation Multiple cycle implementation
36
Single datapath implementation
Attempt to execute all instructions in 1 clock cycle No datapath resources can be used more than once per instruction Duplicated units: ex. Memory for instructions and memory for data Shared units: use multiplexor to select input 生產線 add,… lw, sw beq,…
37
1. Combine R-type and lw/sw
Opcode 6 rs 5 rt 5 rd 5 funct 6 shamt 5 1. Combine R-type and lw/sw Opcode 6 rs 5 rt 5 Signed offset 16 4 R-type 4 lw/sw
38
R-type + load/store 4 2 1
39
2. Add the instruction fetch
4
40
3. Add the branch unit 4
41
Simple datapath and control. See Fig 5.17 (p.307)
42
Trace the operation of the datapath !!!
Explain in 4 steps, but they are actually operates in a single clock cycle Quiz later !!! Instruction fetch Data/register read Instruction execution Memory/register read/write
43
add $t1,$t2,$t3 => add $9, $10, $11 =>
10 11 9 32 Step 1. Instruction fetch
44
add $t1,$t2,$t3 => 10 11 9 32 Step 2. Read source registers
45
add $t1,$t2,$t3 => 10 11 9 32 Step 3. Instruction execution
46
add $t1,$t2,$t3 => 10 11 9 32 Step 4. Write result
47
lw $t1, 0($t2) 36 9 10
48
How to combine the datapaths ?
We have shown datapaths for Instruction fetch R-type instructions Load/store branch How to assemble the datapaths? How to handle control lines?
49
Simple datapath and control. See Fig 5.19 (p.360)
50
How to generate control?
6 bits 6 bits Truth table look-up 10 bits Control signal
51
Hierarchy of control units
Instructions (binary representation) Main control unit ALUop (2 bits) Other control signals (6 1-bit) ALU control unit ALU control signals (3 bits)
52
Why multiple levels of control?
Purpose: Reduce the size of main control unit ? Potentially increase the speed of the control unit ALUop(2 bits):指令分類 define 3 classes of instructions R-type Load/store Branch
53
Design main control unit
Instructions (binary representation) Opcode[31-26] Main control unit ALUop (2 bits) Other control signals (6 1-bit) ALU control unit ALU control signals (3 bits)
54
Main control unit Observe instruction set
55
See Fig 5.19 Control signal for R-format?
56
1
57
Create truth table for main control unit
59
Design ALU control unit
Instructions (binary representation) Opcode[31-26] Main control unit ALUop (2 bits) Other control signals (6 1-bit) ALU control unit ALU control signals (3 bits)
60
ALU control unit Instruction[5-0] ALUop ALU control 3 bits ALU control
Input 1 (2 bits) Input 2 (6 bits) Output (3 bits) See Figure 4.20
61
ALU control signal (1 bit) (2 bits) ALU control line function 0 00 and
or add sub slt +
62
Instruction set formats
決定ALU 動作 instruction set
63
creating truth table 28
65
Why a single-cycle implementation is not used?
It is inefficient. Why? Single-cycle implementation => the clock cycle time is the same for every instruction Clock cycle = longest path = load Other instruction class can fit in a shorter cycle !!!
66
Performance evaluation for single-cycle implementation
Assume the operation time Memory units: 2 ns ALU: 2ns Register file: 1 ns Calculate the necessary time for each instruction class
67
Memory units: 2 ns ALU: 2ns Register file: 1 ns
68
How to improve single-cycle datapath?
A variable-speed clock for each instruction class Difficult to implement Multi-cycle implementation
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.