# Lecture 11 Oct 12 Circuits for floating-point operations addition multiplication division (only sketchy)

## Presentation on theme: "Lecture 11 Oct 12 Circuits for floating-point operations addition multiplication division (only sketchy)"— Presentation transcript:

Lecture 11 Oct 12 Circuits for floating-point operations addition multiplication division (only sketchy)

Floating-Point Addition Consider a 4-digit decimal example 9.999 × 10 1 + 1.610 × 10 –1 1. Align decimal points Shift number with smaller exponent 9.999 × 10 1 + 0.016 × 10 1 2. Add significands 9.999 × 10 1 + 0.016 × 10 1 = 10.015 × 10 1 3. Normalize result & check for over/underflow 1.0015 × 10 2 4. Round and renormalize if necessary 1.002 × 10 2

Floating-Point Addition Now consider a 4-digit binary example 1.000 2 × 2 –1 + –1.110 2 × 2 –2 (0.5 + – 0.4375) 1. Align binary points Shift number with smaller exponent 1.000 2 × 2 –1 + –0.111 2 × 2 –1 2. Add significands 1.000 2 × 2 –1 + –0.111 2 × 2 –1 = 0.001 2 × 2 –1 3. Normalize result & check for over/underflow 1.000 2 × 2 –4, with no over/underflow 4. Round and renormalize if necessary 1.000 2 × 2 –4 (no change) = 0.0625

FP Adder Hardware Much more complex than integer adder Doing it in one clock cycle would take too long Much longer than integer operations Slower clock would penalize all instructions FP adder usually takes several cycles Can be pipelined

FP Adder Hardware Step 1 Step 2 Step 3 Step 4

Addition and Subtraction Exponents of both operands must be equal before adding or subtracting significands Significands aligned by shifting the significand of the smaller operand |E1-E2| base-  positions to the right, increasing its exponent, until exponents are equal E1  E2 - Exponent of larger number not decreased - this will result in a significand larger than 1 - a larger significand adder required

Addition/Subtraction – post-normalization Addition - resultant significand M (sum of two aligned significands) is in range 1/   M < 2 If M >1 - a postnormalization step - shifting significand to the right to yield M3 and increasing exponent by one - is required (an exponent overflow may occur) Subtraction - Resultant significand M is in range 0  |M|< 1 - postnormalization step - shifting significand to left and decreasing exponent - is required if M<1/  (an exponent underflow may occur) In extreme cases, the postnormalization step may require a shift left operation over all bits in significand, yielding a zero result

Example F1=(0.100000) 16  16 3 ; F2=(0.FFFFFF) 16  16 2 Short IBM format ; calculate F1 – F2 Significand of smaller number (F2) is shifted to the right - least-significant digit lost Shift is time consuming - result is wrong

Example - Continued Correct result (with “unlimited" number of significand digits) Error (also called loss of significance) is 0.1  16 -2 - 0.1  16 -3 = 0.F  16 -3 Solution to problem - guard digits - additional digits to the right of the significand to hold shifted- out digits In example - a single (hexadecimal) guard digit is sufficient

Steps in Addition/Subtraction of Floating-Point Numbers Step 1: Calculate difference d of the two exponents - d=|E1 - E2| Step 2: Shift significand of smaller number by d base-  positions to the right Step 3: Add aligned significands and set exponent of result to exponent of larger operand Step 4: Normalize resultant significand and adjust exponent if necessary Step 5: Round resultant significand and adjust exponent if necessary

Floating Point Complexities Operations are somewhat more complicated In addition to overflow we can have “underflow” Accuracy can be a big problem IEEE 754 keeps two extra bits, guard and round four rounding modes positive divided by zero yields “infinity” zero divide by zero yields “not a number” other complexities Implementing the standard can be tricky Not using the standard can be even worse see text for description of 80x86 and Pentium bug!

Floating-Point Multiplication Consider a 4-digit decimal example 1.110 × 10 10 × 9.200 × 10 –5 1. Add exponents For biased exponents, subtract bias from sum New exponent = 10 + –5 = 5 2. Multiply significands 1.110 × 9.200 = 10.212  10.212 × 10 5 3. Normalize result & check for over/underflow 1.0212 × 10 6 4. Round and renormalize if necessary 1.021 × 10 6 5. Determine sign of result from signs of operands +1.021 × 10 6

Floating-Point Multiplication Now consider a 4-digit binary example 1.000 2 × 2 –1 × –1.110 2 × 2 –2 (0.5 × –0.4375) 1. Add exponents Unbiased: –1 + –2 = –3 Biased: (–1 + 127) + (–2 + 127) = –3 + 254 – 127 = –3 + 127 2. Multiply significands 1.000 2 × 1.110 2 = 1.1102  1.110 2 × 2 –3 3. Normalize result & check for over/underflow 1.110 2 × 2 –3 (no change) with no over/underflow 4. Round and renormalize if necessary 1.110 2 × 2 –3 (no change) 5. Determine sign: +ve × –ve  –ve –1.110 2 × 2 –3 = –0.21875

FP Arithmetic Hardware FP multiplier is of similar complexity to FP adder But uses a multiplier for significands instead of an adder FP arithmetic hardware usually does Addition, subtraction, multiplication, division, reciprocal, square-root FP  integer conversion Operations usually takes several cycles Can be pipelined

Floating-Point Dividers Block diagram of a floating-point divider Rounding considerations Quotient must be produced with two extra bits (G and R), in case of the need for a normalizing left shift s1 / s2  (0.5, 2): may need postshifting (  s1  b e1 ) / (  s2  b e2 ) = (  s1 / s2 )  b e1-e2 Overflow or underflow can occur during division or normalization

round and sticky bits Original source: CMU 18-447 S’09 L4-21© 2009 J. C. Hoe

FP Instructions in MIPS FP hardware is coprocessor 1 Adjunct processor that extends the ISA Separate FP registers 32 single-precision: \$f0, \$f1, … \$f31 Paired for double-precision: \$f0/\$f1, \$f2/\$f3, … FP instructions operate only on FP registers Programs generally don’t do integer ops on FP data, or vice versa More registers with minimal code-size impact FP load and store instructions lwc1, ldc1, swc1, sdc1 e.g., ldc1 \$f8, 32(\$sp)

FP Instructions in MIPS Single-precision arithmetic add.s, sub.s, mul.s, div.s e.g., add.s \$f0, \$f1, \$f6 Double-precision arithmetic add.d, sub.d, mul.d, div.d e.g., mul.d \$f4, \$f4, \$f6 Single- and double-precision comparison c.xx.s, c.xx.d (xx is eq, lt, le, …) Sets or clears FP condition-code bit e.g. c.lt.s \$f3, \$f4 Branch on FP condition code true or false bc1t, bc1f e.g., bc1t TargetLabel

Floating-Point Instructions Floating-point arithmetic instructions for MIPS: add.s \$f0,\$f8,\$f10# set \$f0 to (\$f8) + fp (\$f10) sub.d \$f0,\$f8,\$f10# set \$f0 to (\$f8) – fp (\$f10) mul.d \$f0,\$f8,\$f10# set \$f0 to (\$f8)  fp (\$f10) div.s \$f0,\$f8,\$f10# set \$f0 to (\$f8) / fp (\$f10) neg.s \$f0,\$f8# set \$f0 to – (\$f8) The common floating-point instruction format for MIPS and components for arithmetic instructions. The extension (ex) field distinguishes single (* = s) from double (* = d) operands.

Floating-Point Format Conversions

Floating-Point Data Transfers MIPS instructions for floating-point load, store, and move: lwc1 \$f8,40(\$s3) # load mem[40+(\$s3)] into \$f8 swc1 \$f8,A(\$s3) # store (\$f8) into mem[A+(\$s3)] mov.s \$f0,\$f8 # load \$f0 with (\$f8) mov.d \$f0,\$f8 # load \$f0,\$f1 with (\$f8,\$f9) mfc1 \$t0,\$f12 # load \$t0 with (\$f12) mtc1 \$f8,\$t4 # load \$f8 with (\$t4) Instructions for floating-point data movement in MIPS.

Floating-Point Branches and Comparisons MIPS instructions for floating-point load, store, and move: bc1t L # branch on fp flag true bc1f L # branch on fp flag false c.eq.* \$f0,\$f8 # if (\$f0)=(\$f8), set flag to “true” c.lt.* \$f0,\$f8 # if (\$f0)<(\$f8), set flag to “true” c.le.* \$f0,\$f8 # if (\$f0)  (\$f8), set flag to “true” Floating-point branch and comparison instructions in MIPS.

Floating-Point Instructions MIPS InstructionUsage Move s/d registersmov.* fd,fs Move fm coprocessor 1mfc1 rt,rd Move to coprocessor 1mtc1 rd,rt Add single/doubleadd.* fd,fs,ft Subtract single/doublesub.* fd,fs,ft Multiply single/doublemul.* fd,fs,ft Divide single/doublediv.* fd,fs,ft Negate single/doubleneg.* fd,fs Compare equal s/dc.eq.* fs,ft Compare less s/dc.lt.* fs,ft Compare less or eq s/dc.le.* fs,ft Convert integer to singlecvt.s.w fd,fs Convert integer to double cvt.d.w fd,fs Convert single to doublecvt.d.s fd,fs Convert double to singlecvt.s.d fd,fs Convert single to integercvt.w.s fd,fs Convert double to integer cvt.w.d fd,fs Load word coprocessor 1lwc1 ft,imm(rs) Store word coprocessor 1swc1 ft,imm(rs) Branch coproc 1 truebc1t L Branch coproc 1 falsebc1f L Copy Control transfer Conversions Arithmetic Memory access ex # 0 4 # 0 1 0 1 rs 8 fn 6 0 1 2 3 7 50 60 62 32 33 32 36 * s/d for single/double # 0/1 for single/double

FP Example: °F to °C C code: float f2c (float fahr) { return ((5.0/9.0)*(fahr - 32.0)); } fahr in \$f12, result in \$f0, constants in addresses specified. Compiled MIPS code: f2c: lwc1 \$f16, const5 lwc2 \$f18, const9 div.s \$f16, \$f16, \$f18 lwc1 \$f18, const32 sub.s \$f18, \$f12, \$f18 mul.s \$f0, \$f16, \$f18 jr \$ra

FP Example: Array Multiplication X = X + Y × Z All 32 × 32 matrices, 64-bit double-precision elements C code: void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i= i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j]; } Addresses of x, y, z in \$a0, \$a1, \$a2, and i, j, k in \$s0, \$s1, \$s2

FP Example: Array Multiplication MIPS code: li \$t1, 32 # \$t1 = 32 (row size/loop end) li \$s0, 0 # i = 0; initialize 1st for loop L1: li \$s1, 0 # j = 0; restart 2nd for loop L2: li \$s2, 0 # k = 0; restart 3rd for loop sll \$t2, \$s0, 5 # \$t2 = i * 32 (size of row of x) addu \$t2, \$t2, \$s1 # \$t2 = i * size(row) + j sll \$t2, \$t2, 3 # \$t2 = byte offset of [i][j] addu \$t2, \$a0, \$t2 # \$t2 = byte address of x[i][j] l.d \$f4, 0(\$t2) # \$f4 = 8 bytes of x[i][j] L3: sll \$t0, \$s2, 5 # \$t0 = k * 32 (size of row of z) addu \$t0, \$t0, \$s1 # \$t0 = k * size(row) + j sll \$t0, \$t0, 3 # \$t0 = byte offset of [k][j] addu \$t0, \$a2, \$t0 # \$t0 = byte address of z[k][j] l.d \$f16, 0(\$t0) # \$f16 = 8 bytes of z[k][j] …

FP Example: Array Multiplication … sll \$t0, \$s0, 5 # \$t0 = i*32 (size of row of y) addu \$t0, \$t0, \$s2 # \$t0 = i*size(row) + k sll \$t0, \$t0, 3 # \$t0 = byte offset of [i][k] addu \$t0, \$a1, \$t0 # \$t0 = byte address of y[i][k] l.d \$f18, 0(\$t0) # \$f18 = 8 bytes of y[i][k] mul.d \$f16, \$f18, \$f16 # \$f16 = y[i][k] * z[k][j] add.d \$f4, \$f4, \$f16 # f4=x[i][j] + y[i][k]*z[k][j] addiu \$s2, \$s2, 1 # \$k k + 1 bne \$s2, \$t1, L3 # if (k != 32) go to L3 s.d \$f4, 0(\$t2) # x[i][j] = \$f4 addiu \$s1, \$s1, 1 # \$j = j + 1 bne \$s1, \$t1, L2 # if (j != 32) go to L2 addiu \$s0, \$s0, 1 # \$i = i + 1 bne \$s0, \$t1, L1 # if (i != 32) go to L1

A simple example – with MIPS implementation Example: Write a MIPS program to load the smallest positive real number into the co- processor, multiply it by 10 and display the result. Loading a given number into a FP register involves using the instruction l.s \$fxx num # num is a memory location

Code in MIPS. text.globl main main: l.s \$f6, small l.s \$f8, ten mul.s \$f12, \$f6, \$f8 exit: li \$v0, 4 # print message la \$a0, msg2 syscall li \$v0,2 # print sum syscall li \$v0,4 # print an end of line la \$a0, cr syscall li \$v0,10 # exit syscall.data small:.word 0x00000001 # this is number 2^(-140) ten:.float 10.0 msg2:.asciiz "Ten times 2^(-140) = " cr:.asciiz "\n"

MARS output

Example of FP underflow In the previous example, suppose we multiply the result by 0.5 what will be the result? We can try this in MARS. This is the correct answer, since the operation has caused FP underflow.

Accurate Arithmetic IEEE Std 754 specifies additional rounding control Extra bits of precision (guard, round, sticky) Choice of rounding modes Allows programmer to fine-tune numerical behavior of a computation Not all FP units implement all options Most programming languages and FP libraries just use defaults Trade-off between hardware complexity, performance, and market requirements

Associativity Parallel programs may interleave operations in unexpected orders Assumptions of associativity may fail §3.6 Parallelism and Computer Arithmetic: Associativity Need to validate parallel programs under varying degrees of parallelism

Concluding Remarks ISAs support arithmetic Signed and unsigned integers Floating-point approximation to reals Bounded range and precision Operations can overflow and underflow MIPS ISA Core instructions: 54 most frequently used 100% of SPECINT, 97% of SPECFP Other instructions: less frequent