Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shift Instructions (1/4)

Similar presentations


Presentation on theme: "Shift Instructions (1/4)"— Presentation transcript:

1 Shift Instructions (1/4)
Move (shift) all the bits in a word to the left or right by a number of bits. Example: shift right by 8 bits Example: shift left by 8 bits

2 Shift Instructions (2/4)
MIPS Shift Instruction Syntax: 1 2,3,4 where 1) operation name 2) register that will receive value 3) first operand (register) 4) shift amount (constant < 32, 5 bits) MIPS shift instructions: 1. sll (shift left logical): shifts left and fills emptied bits with 0s 2. srl (shift right logical): shifts right and fills emptied bits with 0s 3. sra (shift right arithmetic): shifts right and fills emptied bits by sign extending

3 Shift Instructions (3/4)
Example: shift right arith by 8 bits Example: shift right arith by 8 bits

4 Shift Instructions (4/4)
Since shifting may be faster than multiplication, a good compiler usually notices when C code multiplies by a power of 2 and compiles it to a shift instruction: a *= 8; (in C) would compile to: sll $s0,$s0,3 (in MIPS) Likewise, shift right to divide by powers of 2 remember to use sra

5 “Shift and Add” Signed Multiplier
Signed extend partial product at each stage Final step is a subtract n-clock cycles

6 Fast multiplication hardware

7 Chap.5 The processor: Datapath and control
Jen-Chang Liu, Spring 2006

8 Hierarchy of Machine Structures
I/O system Processor Compiler Operating System (Windows 98) Application (Netscape) Digital Design Circuit Design Instruction Set Architecture Datapath & Control transistors Memory Hardware Software Assembler

9 Five components of computer
Input, output, memory, datapath, control

10 Inside Mother board (for Pentium Pro)

11 Chapter overview Chap5: datapath and control Chap6: pipeline
Chap7: memory hierarchy Chap8: I/O Chap9: multiprocessor Inside CPU

12 Inside Processor: datapath and control
Datapath: brawn of the processor Perform the arithmetic operations Control: brain of the processor Tells the datapath, memory, and I/O what to do 生產線

13 Inside Pentium Processor
1/3 cache

14 Inside Pentium Pro Processor

15 Clocks methodology high low
Edge-triggered clocking: the content of the state elements (flip-flops, registers, memory) only change on the active clock edge 100 101 001 111 110 001 100

16 Timing constraint The clock period must be long enough to allow signals to be stable

17 Design Target: MIPS The instruction set architecture (ISA) determines the implementation We know how to execute MIPS codes manually, how to design a circuit to execute them? We design a simple implementation that includes a subset of MIPS inst. Memory-reference inst.: lw, sw Arithmetic-logic inst.: add,sub,and,or,slt Branch: beq, j

18 Outline of chapter 5 Building a datapath
Instruction fetch R-type instructions Load/store Branch Single Datapath implementation Multiple cycle implementation

19 Preview: How to carry out an instruction
4 steps to implement an instruction 執行 Instruction fetch Data/register read Instruction execution Memory/register read/write Read inst. from memory ALU add $t0, $t1, $t2 $t1, $t2 $t1 + $t2 Write to $t0 lw $t0, 0($a0) $a0 $a0 + 0 Read from memory beq $t0, $t1, loop $t0, $t1 $t0 - $t1 Write PC

20 Abstract view of carrying out an instruction
fetch Data/register read Instruction execution Memory/register read/write

21 How to build datapath for MIPS ISA?
Datapath: path to perform an instruction Consider each major components Build datapath for each instruction class

22 Outline Building a datapath 1. Instruction fetch
2. R-type instructions 3. Load/store 4. Branch Build datapath for each instruction class, then combine them

23 1. Instruction fetch Increment the Address of the Place to store
PC to next instruction Place to store the instructions Address of the instructions

24 Instruction fetch (cont.)
3 always adds, therefore no control lines 1 2

25 2. R-type instruction R-format instructions
Arithmetic-logic instrcutions add, sub Ex. add $t1, $t2, $t3 and, or slt Opcode 6 rs 5 rt 5 rd 5 funct 6 shamt 5

26 Datapath elements for R-type inst.
4 input output 1. Read register: read register no., output data 2. Write register: write register no., input data, RegWrite=1

27 Datapath for R-type inst.
4 2 1 3 Opcode 6 rs 5 rt 5 rd 5 funct 6 shamt 5

28 3. Load/store from/to memory
I-format Load/store examples lw $t1, offset_value($t2) sw $t1, offset_value($t2) Opcode 6 rs 5 rt 5 Signed offset 16 offset $t2

29 Datapath elements for load/store
lw $t1, offset_value($t2) Register file, ALU, and data memory Base+offset Store -> MemWrite Load -> MemRead Sign-extend the 16-bit offset field

30 Datapath for load/store
Opcode 6 rs 5 rt 5 Signed offset 16 Datapath for load/store 4 2 1

31 4. Branch I-format Example beq $t1, $t2, offset PC-relative addressing
Opcode 6 rs 5 rt 5 Signed offset 16

32 Details for branch: target address calculation
Base address for offset: PC+4 Instructions are word-aligned: the offset is shifted left 2 bits PC+4 offset Opcode 6 rs 5 rt 5 Immediate 16 00 offset

33 Opcode 6 rs 5 rt 5 Signed offset 16 Datapath for branch 2 4 1

34 How to combine these datapaths ?
We have shown datapaths for Instruction fetch R-type instructions Load/store branch How to assemble the datapaths? How to handle control lines?

35 Outline Building a datapath Single Datapath implementation
Instruction fetch R-type instructions Load/store Branch Single Datapath implementation Multiple cycle implementation

36 Single datapath implementation
Attempt to execute all instructions in 1 clock cycle No datapath resources can be used more than once per instruction Duplicated units: ex. Memory for instructions and memory for data Shared units: use multiplexor to select input 生產線 add,… lw, sw beq,…

37 1. Combine R-type and lw/sw
Opcode 6 rs 5 rt 5 rd 5 funct 6 shamt 5 1. Combine R-type and lw/sw Opcode 6 rs 5 rt 5 Signed offset 16 4 R-type 4 lw/sw

38 R-type + load/store 4 2 1

39 2. Add the instruction fetch
4

40 3. Add the branch unit 4

41 Simple datapath and control. See Fig 5.17 (p.307)

42 Trace the operation of the datapath !!!
Explain in 4 steps, but they are actually operates in a single clock cycle Quiz later !!! Instruction fetch Data/register read Instruction execution Memory/register read/write

43 add $t1,$t2,$t3 => add $9, $10, $11 =>
10 11 9 32 Step 1. Instruction fetch

44 add $t1,$t2,$t3 => 10 11 9 32 Step 2. Read source registers

45 add $t1,$t2,$t3 => 10 11 9 32 Step 3. Instruction execution

46 add $t1,$t2,$t3 => 10 11 9 32 Step 4. Write result

47 lw $t1, 0($t2) 36 9 10

48 How to combine the datapaths ?
We have shown datapaths for Instruction fetch R-type instructions Load/store branch How to assemble the datapaths? How to handle control lines?

49 Simple datapath and control. See Fig 5.19 (p.360)

50 How to generate control?
6 bits 6 bits Truth table look-up 10 bits Control signal

51 Hierarchy of control units
Instructions (binary representation) Main control unit ALUop (2 bits) Other control signals (6 1-bit) ALU control unit ALU control signals (3 bits)

52 Why multiple levels of control?
Purpose: Reduce the size of main control unit ? Potentially increase the speed of the control unit ALUop(2 bits):指令分類 define 3 classes of instructions R-type Load/store Branch

53 Design main control unit
Instructions (binary representation) Opcode[31-26] Main control unit ALUop (2 bits) Other control signals (6 1-bit) ALU control unit ALU control signals (3 bits)

54 Main control unit Observe instruction set

55 See Fig 5.19 Control signal for R-format?

56 1

57 Create truth table for main control unit

58

59 Design ALU control unit
Instructions (binary representation) Opcode[31-26] Main control unit ALUop (2 bits) Other control signals (6 1-bit) ALU control unit ALU control signals (3 bits)

60 ALU control unit Instruction[5-0] ALUop ALU control 3 bits ALU control
Input 1 (2 bits) Input 2 (6 bits) Output (3 bits) See Figure 4.20

61 ALU control signal (1 bit) (2 bits) ALU control line function 0 00 and
or add sub slt +

62 Instruction set formats
決定ALU 動作 instruction set

63 creating truth table 28

64

65 Why a single-cycle implementation is not used?
It is inefficient. Why? Single-cycle implementation => the clock cycle time is the same for every instruction Clock cycle = longest path = load Other instruction class can fit in a shorter cycle !!!

66 Performance evaluation for single-cycle implementation
Assume the operation time Memory units: 2 ns ALU: 2ns Register file: 1 ns Calculate the necessary time for each instruction class

67 Memory units: 2 ns ALU: 2ns Register file: 1 ns

68 How to improve single-cycle datapath?
A variable-speed clock for each instruction class Difficult to implement Multi-cycle implementation


Download ppt "Shift Instructions (1/4)"

Similar presentations


Ads by Google