Download presentation

Presentation is loading. Please wait.

Published byArjun Reder Modified about 1 year ago

1
1 Chapter Three Last revision: 4/17/2015

2
2 Arithmetic Where we've been: –Performance (seconds, cycles, instructions) –Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: –Implementing the Architecture 32 operation result a b ALU

3
3 Bits are just bits (no inherent meaning) — conventions define relationship between bits and numbers Binary numbers (base 2) 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001... decimal: 0...2 n -1 Of course it gets more complicated: numbers are finite (overflow) fractions and real numbers negative numbers e.g., no MIPS subi instruction; addi can add a negative number) How do we represent negative numbers? i.e., which bit patterns will represent which numbers? Numbers

4
4 Sign Magnitude: One's Complement Two's Complement 000 = +0000 = +0000 = +0 001 = +1001 = +1001 = +1 010 = +2010 = +2010 = +2 011 = +3011 = +3011 = +3 100 = -0100 = -3100 = -4 101 = -1101 = -2101 = -3 110 = -2110 = -1110 = -2 111 = -3111 = -0111 = -1 Issues: balance, number of zeros, ease of operations Which one is best? Why? Possible Representations

5
5 32 bit signed numbers: 0000 0000 0000 0000 0000 0000 0000 0000 two = 0 ten 0000 0000 0000 0000 0000 0000 0000 0001 two = + 1 ten 0000 0000 0000 0000 0000 0000 0000 0010 two = + 2 ten... 0111 1111 1111 1111 1111 1111 1111 1110 two = + 2,147,483,646 ten 0111 1111 1111 1111 1111 1111 1111 1111 two = + 2,147,483,647 ten 1000 0000 0000 0000 0000 0000 0000 0000 two = – 2,147,483,648 ten 1000 0000 0000 0000 0000 0000 0000 0001 two = – 2,147,483,647 ten 1000 0000 0000 0000 0000 0000 0000 0010 two = – 2,147,483,646 ten... 1111 1111 1111 1111 1111 1111 1111 1101 two = – 3 ten 1111 1111 1111 1111 1111 1111 1111 1110 two = – 2 ten 1111 1111 1111 1111 1111 1111 1111 1111 two = – 1 ten maxint minint MIPS

6
6 Negating a two's complement number: invert all bits and add 1 –remember: “negate (+/-)” and “invert (1/0)” are quite different! Converting n bit numbers into numbers with more than n bits: –MIPS 16 bit immediate gets converted to 32 bits for arithmetic –copy the most significant bit (the sign bit) into the other bits 0010 -> 0000 0010 1010 -> 1111 1010 –"sign extension" (lbu vs. lb) Two's Complement Operations

7
7 Just like in grade school (carry/borrow 1s) 0111 0111 0110 + 0110- 0110- 0101 Two's complement operations easy –subtraction using addition of negative numbers 0111 + 1010 Overflow (result too large for finite computer word): –e.g., adding two n-bit numbers does not yield an n-bit number 0111 + 0001 note that overflow term is somewhat misleading, 1000 it does not mean a carry “overflowed” Addition & Subtraction

8
8 No overflow when adding a positive and a negative number No overflow when signs are the same for subtraction Overflow occurs when the value affects the sign: –overflow when adding two positives yields a negative –or, adding two negatives gives a positive –or, subtract a negative from a positive and get a negative –or, subtract a positive from a negative and get a positive Consider the operations A + B, and A – B –Can overflow occur if B is 0 ? –Can overflow occur if A is 0 ? Detecting Overflow

9
9 An exception (interrupt) occurs –Control jumps to predefined address for exception –Interrupted address is saved for possible resumption Details based on software system / language –example: flight control vs. homework assignment Don't always want to detect overflow — new MIPS instructions: addu, addiu, subu note: addiu still sign-extends! note: sltu, sltiu for unsigned comparisons –Roll over: circular buffers –Saturation: pixel lightness control Effects of Overflow

10
10 Problem: Consider a logic function with three inputs: A, B, and C. Output D is true if at least one input is true Output E is true if exactly two inputs are true Output F is true only if all three inputs are true Show the truth table for these three functions. Show the Boolean equations for these three functions. Show an implementation consisting of inverters, AND, and OR gates. Review: Boolean Algebra & Gates

11
11 Let's build an ALU to support the andi and ori instructions –we'll just build a 1 bit ALU, and use 32 of them Possible Implementation (sum-of-products): b a operation result opabres An ALU (arithmetic logic unit)

12
12 Selects one of the inputs to be the output, based on a control input Lets build our ALU using a MUX: S C A B 0 1 Review: The Multiplexor note: we call this a 2-input mux even though it has 3 inputs!

13
13 Not easy to decide the “best” way to build something –Don't want too many inputs to a single gate –Dont want to have to go through too many gates –for our purposes, ease of comprehension is important Let's look at a 1-bit ALU for addition: How could we build a 1-bit ALU for add, and, and or? How could we build a 32-bit ALU? Different Implementations c out = a b + a c in + b c in sum = a xor b xor c in

14
14 Building a 32 bit ALU R0 = a ^ b; R1 = a V b; R2 = a + b; Case (Op) { 0: R = R0; 1: R = R1; 2: R = R2 } Case (Op) { 0: R = a ^ b; 1: R = a V b; 2: R = a + b}

15
15 Two's complement approch: just negate b and add. How do we negate? A very clever solution: Result = A + (~B) + 1 What about subtraction (a – b) ?

16
16 Need to support the set-on-less-than instruction (slt) –remember: slt is an arithmetic instruction –produces a 1 if rs < rt and 0 otherwise –use subtraction: (a-b) < 0 implies a < b Need to support test for equality (beq $t5, $t6, $t7) –use subtraction: (a-b) = 0 implies a = b Tailoring the ALU to the MIPS

17
Supporting slt Can we figure out the idea? 0 3 Result Operation a 1 CarryIn CarryOut 0 1 Binvert b 2 Less a.

18
18

19
19 Test for equality Notice control lines: 000 = and 001 = or 010 = add 110 = subtract 111 = slt Note: zero is a 1 when the result is zero! 0 3 Result Operation a 1 CarryIn CarryOut 0 1 Binvert b 2 Less a.

20
20 Conclusion We can build an ALU to support the MIPS instruction set –key idea: use multiplexor to select the output we want –we can efficiently perform subtraction using two’s complement –we can replicate a 1-bit ALU to produce a 32-bit ALU Important points about hardware –all of the gates are always working –the speed of a gate is affected by the number of inputs to the gate –the speed of a circuit is affected by the number of gates in series (on the “critical path” or the “deepest level of logic”) Our primary focus: comprehension, however, –Clever changes to organization can improve performance (similar to using better algorithms in software) –we’ll look at two examples for addition and multiplication

21
21 Is a 32-bit ALU as fast as a 1-bit ALU? Is there more than one way to do addition? –two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c 1 = b 0 c 0 + a 0 c 0 + a 0 b 0 c 2 = b 1 c 1 + a 1 c 1 + a 1 b 1 c 2 = c 3 = b 2 c 2 + a 2 c 2 + a 2 b 2 c 3 = c 4 = b 3 c 3 + a 3 c 3 + a 3 b 3 c 4 = Not feasible! Why? Problem: ripple carry adder is slow

22
22 An approach in-between our two extremes Motivation: – If we didn't know the value of carry-in, what could we do? –When would we always generate a carry? g i = a i b i –When would we propagate the carry? p i = a i + b i Did we get rid of the ripple? c 1 = g 0 + p 0 c 0 c 2 = g 1 + p 1 c 1 c 2 = c 3 = g 2 + p 2 c 2 c 3 = c 4 = g 3 + p 3 c 3 c 4 = Feasible! Why? Carry-lookahead adder

23
23 Can’t build a 16 bit adder this way... (too big) Could use ripple carry of 4-bit CLA adders Better: use the CLA principle again! Use principle to build bigger adders

24
24 More complicated than addition –accomplished via shifting and addition More time and more area Let's look at 3 versions based on gradeschool algorithm 0010 (multiplicand) __x_1011 (multiplier) Negative numbers: convert and multiply –there are better techniques, we won’t look at them Multiplication

25
25 Multiplication: Implementation

26
26 Second Version

27
27 Final Version

28
28 Divide: Paper & Pencil 1001 Quotient Divisor 1000 1001010 Dividend –1000 10 101 1010 –1000 10 Remainder (or Modulo result) See how big a number can be subtracted, creating quotient bit on each step Binary => 1 * divisor or 0 * divisor Dividend = Quotient x Divisor + Remainder => | Dividend | = | Quotient | + | Divisor | 3 versions of divide, successive refinement

29
29 DIVIDE HARDWARE Version 1 64-bit Divisor reg, 64-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Remainder Quotient Divisor 64-bit ALU Shift Right Shift Left Write Control 32 bits 64 bits

30
30 2b. Restore the original value by adding the Divisor register to the Remainder register, & place the sum in the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. Divide Algorithm Version 1 Takes n+1 steps for n-bit Quotient & Rem. Remainder QuotientDivisor 0000 0111 00000010 0000 Test Remainder Remainder < 0 Remainder 0 1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. 2a. Shift the Quotient register to the left setting the new rightmost bit to 1. 3. Shift the Divisor register right1 bit. Done Yes: n+1 repetitions (n = 4 here) Start: Place Dividend in Remainder n+1 repetition? No: < n+1 repetitions

31
31 Divide Algorithm I example (7 / 2) Remainder QuotientDivisor 0000 0111000000010 0000 1:1110 0111000000010 0000 2:0000 0111000000010 0000 3:0000 0111000000001 0000 1:1111 0111000000001 0000 2:0000 0111000000001 0000 3:0000 0111000000000 1000 1:1111 1111000000000 1000 2:0000 0111000000000 1000 3:0000 0111000000000 0100 1:0000 0011000000000 0100 2:0000 0011000010000 0100 3:0000 0011000010000 0010 1:0000 0001000010000 0010 2:0000 0001000110000 0010 3:0000 0001000110000 0010 Answer: Quotient = 3 Remainder = 1

32
32 Observations on Divide Version 1 1/2 bits in divisor always 0 => 1/2 of 64-bit adder is wasted => 1/2 of divisor is wasted Instead of shifting divisor to right, shift remainder to left? 1st step cannot produce a 1 in quotient bit (otherwise too big) => switch order to shift first and then subtract, can save 1 iteration

33
33 Divide Algorithm I example: wasted space Remainder QuotientDivisor 0000 0111000000010 0000 1:1110 0111000000010 0000 2:0000 0111000000010 0000 3:0000 0111000000001 0000 1:1111 0111000000001 0000 2:0000 0111000000001 0000 3:0000 0111000000000 1000 1:1111 1111000000000 1000 2:0000 0111000000000 1000 3:0000 0111000000000 0100 1:0000 0011000000000 0100 2:0000 0011000010000 0100 3:0000 0011000010000 0010 1:0000 0001000010000 0010 2:0000 0001000110000 0010 3:0000 0001000110000 0010

34
34 Divide: Paper & Pencil 01010 Quotient Divisor 0001 00001010 Dividend 00001 –0001 0000 0001 –0001 0 00 Remainder (or Modulo result) – Notice that there is no way to get a 1 in leading digit! (this would be an overflow, since quotient would have n+1 bits)

35
35 DIVIDE HARDWARE Version 2 32-bit Divisor reg, 32-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Remainder Quotient Divisor 32-bit ALU Shift Left Write Control 32 bits 64 bits Shift Left

36
36 Divide Algorithm Version 2 Remainder Quotient Divisor 0000 0111 0000 0010 3b. Restore the original value by adding the Divisor register to the left half of the Remainderregister, &place the sum in the left half of the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. Test Remainder Remainder < 0 Remainder 0 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. 3a. Shift the Quotient register to the left setting the new rightmost bit to 1. 1. Shift the Remainder register left 1 bit. Done Yes: n repetitions (n = 4 here) nth repetition? No: < n repetitions Start: Place Dividend in Remainder

37
37 Observations on Divide Version 2 Eliminate Quotient register by combining with Remainder as shifted left –Start by shifting the Remainder left as before. –Thereafter loop contains only two steps because the shifting of the Remainder register shifts both the remainder in the left half and the quotient in the right half –The consequence of combining the two registers together and the new order of the operations in the loop is that the remainder will shifted left one time too many. – Thus the final correction step must shift back only the remainder in the left half of the register

38
38 DIVIDE HARDWARE Version 3 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg) Remainder (Quotient) Divisor 32-bit ALU Write Control 32 bits 64 bits Shift Left “HI”“LO”

39
39 Divide Algorithm Version 3 Remainder Divisor 0000 01110010 3b. Restore the original value by adding the Divisor register to the left half of the Remainderregister, &place the sum in the left half of the Remainder register. Also shift the Remainder register to the left, setting the new least significant bit to 0. Test Remainder Remainder < 0 Remainder 0 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. 3a. Shift the Remainder register to the left setting the new rightmost bit to 1. 1. Shift the Remainder register left 1 bit. Done. Shift left half of Remainder right 1 bit. Yes: n repetitions (n = 4 here) nth repetition? No: < n repetitions Start: Place Dividend in Remainder

40
40 Observations on Divide Version 3 Same Hardware as Multiply: just need ALU to add or subtract, and 64-bit register to shift left or shift right Hi and Lo registers in MIPS combine to act as 64-bit register for multiply and divide Signed Divides: Simplest is to remember signs, make positive, and complement quotient and remainder if necessary –Note: Dividend and Remainder must have same sign –Note: Quotient negated if Divisor sign & Dividend sign disagree e.g., –7 ÷ 2 = –3, remainder = –1 –What about? –7 ÷ 2 = –4, remainder = +1 Possible for quotient to be too large: if divide 64-bit integer by 1, quotient is 64 bits (“called saturation”)

41
41 Floating Point (a brief look) We need a way to represent –numbers with fractions, e.g., 3.1416 –very small numbers, e.g.,.000000001 –very large numbers, e.g., 3.15576 10 9 Representation: –sign, exponent, significand: (–1) sign significand 2 exponent –more bits for significand gives more accuracy –more bits for exponent increases range IEEE 754 floating point standard: –single precision: 8 bit exponent, 23 bit significand –double precision: 11 bit exponent, 52 bit significand

42
42 Recall Scientific Notation 6.02 x 10 1.673 x 10 23 -24 exponent radix (base) Mantissa decimal point Sign, magnitude IEEE F.P. ± 1.M x 2 e - 127 Issues: –Arithmetic (+, -, *, / ) –Representation, Normal form –Range and Precision –Rounding –Exceptions (e.g., divide by zero, overflow, underflow) –Errors –Properties ( negation, inversion, if A B then A - B 0 )

43
Floating-Point Arithmetic Representation of floating point numbers in IEEE 754 standard: single precision 1823 sign exponent: excess 127 binary integer mantissa: sign + magnitude, normalized binary significand w/ hidden integer bit: 1.M actual exponent is e = E - 127 SE M N = (-1) 2 (1.M) S E-127 0 < E < 255 0 = 0 00000000 0... 0 -1.5 = 1 01111111 10... 0 Magnitude of numbers that can be represented is in the range: 2 -126 (1.0) to2 127 (2 - 2 -23 ) which is approximately: 1.8 x 10 -38 to3.40 x 10 38 (integer comparison valid on IEEE Fl.Pt. numbers of same sign!)

44
44 IEEE 754 floating-point standard Leading “1” bit of significand is implicit Exponent is “biased” to make sorting easier –all 0s is smallest exponent all 1s is largest –bias of 127 for single precision and 1023 for double precision –summary: (–1) sign significand) 2 exponent – bias Example: –decimal: -.75 = -3/4 = -3/2 2 –binary: -.11 = -1.1 x 2 -1 –floating point: exponent = 126 = 01111110 –IEEE single precision: 10111111010000000000000000000000

45
45 Floating Point Complexities Operations are somewhat more complicated (see text) In addition to overflow we can have “underflow” Accuracy can be a big problem –IEEE 754 keeps two extra bits, guard and round –four rounding modes –positive divided by zero yields “infinity” –zero divide by zero yields “not a number” –other complexities Implementing the standard can be tricky Not using the standard can be even worse –see text for description of 80x86 and Pentium bug!

46
46 Floating-Point Addition Use, assume four decimal digits of significand and two decimal digits of exponent Step 1. Align number for smaller exponent – Step 2. Significand addition – Step 3. Normalized scientific notation (overflow or underflow check) – Step 4. Round the number –

47
47 Flow Diagram of Floating –Point Addition Start 1. Compare the exponents of the two numbers. Shift the smaller number to the right until its exponent would match the larger exponent 2. Add the significands 3. Normalize the sum, either shifting right and incrementing the exponent or shifting left and decrementing the exponent Overflow or Underflow? Exception 4. Round the siginificand to the appropriate number of bits Still normalized? Done Yes No Yes

48
48 Block Diagram of Floating-Point Addition SignExponentFractionSignExponentFraction Small ALU SignExponentFraction 0 1 Exponent difference 0 1 Big ALU Shift right Shift left or right Increment or decrement Rounding hardware Control

49
49 Floating-Point Multiplication (I) Use, assume four decimal digits of significand and two decimal digits of exponent Step 1. Adding exponent together, and subtract the bias from the sum: – Step 2. Multiplication on significands: –

50
50 Floating-Point Multiplication (II) Step 3. Normalized scientific notation (overflow or underflow check) – Step 4. Round the number – Step 5. Sign determination –

51
51 Flow Diagram of Floating –Point Multiplication Start 1. Add the biased exponents of the two numbers, subtracting the bias from the sum to get the biased exponent 2. Multiply the significands 3. Normalize the product if necessary, shifting it right and incrementing the exponent Overflow or Underflow? Exception 4. Round the siginificand to the appropriate number of bits Still normalized? Done Yes No Yes 5. Set the sign of the product to positive if the signs of the original operands are the same; if they differ make the sign negative

52
52 Floating-Point Instructions in MIPS NameExampleComments 32 floating-point register $f0, $f1, $f2, …, $f31MIPS floating-point registers are used in pairs for double precision numbers. 2 30 memory words Memory[0], Memory[4], …, Memory[4294967292] Accessed only by data transfer instructions. MIPS uses byte addresses, so sequential word addresses differ by 4. Memory holds data structures, such as arrays, and spilled register, such as those saved on procedure calls. CategoryInstructionExampleMeaningComments Arithmetic FP add singleadd.s $f2,$f4,$f6$f2 = $f4 + $f6FP add (single precision) FP subtract singlesub.s $f2,$f4,$f6$f2 = $f4 - $f6FP subtract (single precision) FP multiply singlemul.s $f2,$f4,$f6$f2 = $f4 × $f6FP multiply (single precision) FP divide singlediv.s $f2,$f4,$f6$f2 = $f4 / $f6FP divide (single precision) FP add doubleadd.d $f2,$f4,$f6$f2 = $f4 + $f6FP add (double precision) FP subtract doublesub.d $f2,$f4,$f6$f2 = $f4 - $f6FP subtract (double precision) FP multiply doublemul.d $f2,$f4,$f6$f2 = $f4 × $f6FP multiply (double precision) FP divide doublediv.d $f2,$f4,$f6$f2 = $f4 / $f6FP divide (double precision) Data transfer load word copr. 1lwc1 $f1,100($s2)$f1 = Memory[$s2+100]32-bit data to FP register store word copr. 1swc1 $f1,100($s2)Memory[$s2+100] = $f132-bit data to memory Conditional branch branch on FP truebc1t 25if(cond == 1) go to PC+4+100PC-relative branch if FP cond. branch on FP falsebc1f 25if(cond == 0) go to PC+4+100PC-relative branch if not cond. FP compare single (eq,ne,lt,le,gt,ge) c.lt.s $f2,$f4 if($f2<$f4) cond = 1; else cond = 0; FP compare less than single precision FP compare double (eq,ne,lt,le,gt,ge) c.lt.d $f2,$f4 if($f2<$f4) cond = 1; else cond = 0; FP compare less than double precision MIPS floating-point operands MIPS floating-point assembly language

53
53 MIPS Floating-Point Architecture NameFormatExampleComments add.sR17166420add.s $f2,$f4,$f6 sub.sR17166421sub.s $f2,$f4,$f6 mul.sR17166422mul.s $f2,$f4,$f6 div.sR17166423div.s $f2,$f4,$f6 add.dR17 6420add.d $f2,$f4,$f6 sub.dR17 6421sub.d $f2,$f4,$f6 mul.dR17 6422mul.d $f2,$f4,$f6 div.dR17 6423div.d $f2,$f4,$f6 lwc1I49202100lwc1 $f1,100($s2) swc1I57202100swc1 $f1,100($s2) bc1tI178125bc1t 25 bc1fI178025bc1f 25 c.lt.sR171642060c.lt.s $f2,$f4 c.lt.dR17 42060c.lt.d $f2,$f4 Field size6 bits5 bits 6 bitsAll MIPS instructions 32 bits

54
54 Floating-Point C Program Compiling Convert a temperature in Fahrenheit to Celsius C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr-32.0)); } MIPS ‘code’: f2c:lwc1$f16, const5($gp) lwc1$f18, const9($gp) div.s$f16, $f16, $f18 lwc1$f18, const32($gp) sub.s$f18, $f12, $f18 mul.s$f0, $f16, $f18 jr$ra

55
55 Floating-Point C Procedure with Two-Dimensional Matrices (I) Perform matrix multiplication of X = X + Y *Z C code: void mm (double x[][], double y[][], double z[][]) { int i, j, k; for(i = 0; i != 32; i = i + 1) for(j = 0; j != 32; i = j + 1) for(k = 0; k != 32; i = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; }

56
56 Floating-Point C Procedure with Two-Dimensional Matrices (II) MIPS ‘code’: mm:… li $t1, 32; loop termination li $s0, 0; i = 0 L1: li $s1, 0; j = 0 L2: li $s2, 0; k = 0 sll $t2, $s0, 5 addu $t2, $t2, $s1 sll $t2, $t2, 3; byte offset of [i][j] addu $t2, $a0, $t2; byte address of x[i][j] l.d $f4, 0($t2) L3: sll $t0, $s2, 5 addu $t0, $t0, $s1 sll $t0, $t0, 3; byte offset of [k][j] addu $t0, $a2, $t0; byte address of z[k][j] l.d $f16, 0($t0) sll $t2, $s0, 5 addu $t0, $t0, $s2 sll $t0, $t0, 3; byte offset of [i][k] addu $t0, $a1, $t0; byte address of y[i][k] l.d $f18, 0($t0) mul.d $f16, $f18, $f16 add.d $f4, $f4, $f16; f4 = x[i][j] + y[i][k] * z[k][j] addiu $s2, $s2, 1; $k = k + 1 bne $s2, $t1, L3 s.d $f4, 0($t2); x[i][j] = $f4 addiu $s1, $s1, 1; $j = j + 1 bne $s1, $t1, L2 addiu $s0, $s0, 1; $i = i + 1 bne $s0, $t1, L1 …

57
57 Rounding with Guard Digits Add 2.56 ten ×10 0 to 2.34 ten ×10 2, assuming three significant decimal digits and round to the nearest decimal number With guard and round digits –Shift the smaller number to right to align the exponents, so 2.56 ten ×10 0 become 0.0256 ten ×10 2 –Guard hold 5 digit, round hold 6 digit – yield 2.37 ten ×10 2 Without guard and round digits – – yield 2.36 ten ×10 2

58
58 Chapter Four Summary Computer arithmetic is constrained by limited precision Bit patterns have no inherent meaning but standards do exist –two’s complement –IEEE 754 floating point Computer instructions determine “meaning” of the bit patterns Performance and accuracy are important so there are many complexities in real machines (i.e., algorithms and implementation). We are ready to move on (and implement the processor) you may want to look back (Section 4.12 is great reading!)

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google