Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Chapter 3 Arithmetic for Computers 陳瑞奇 ( J.C. Chen ) 亞洲大學資訊工程學系 Adapted from class notes by Prof. C.T. King, NTHU, Prof. M.J. Irwin, PSU and Prof. D.

Similar presentations


Presentation on theme: "1 Chapter 3 Arithmetic for Computers 陳瑞奇 ( J.C. Chen ) 亞洲大學資訊工程學系 Adapted from class notes by Prof. C.T. King, NTHU, Prof. M.J. Irwin, PSU and Prof. D."— Presentation transcript:

1 1 Chapter 3 Arithmetic for Computers 陳瑞奇 ( J.C. Chen ) 亞洲大學資訊工程學系 Adapted from class notes by Prof. C.T. King, NTHU, Prof. M.J. Irwin, PSU and Prof. D. Patterson, UCB

2 2 Review: MIPS Addressing Modes 1. Operand: Register addressing op rs rt rd funct Register word operand op rs rt offset 2. Operand: Base addressing base register Memory word or byte operand 3. Operand: Immediate addressing op rs rt operand 4. Instruction: PC-relative addressing op rs rt offset Program Counter (PC) Memory branch destination instruction 5. Instruction: Pseudo-direct addressing op jump address Program Counter (PC) Memory jump destination instruction||

3 3 p.169 ( 頁 ) Fig. 3.1

4 4 MIPS Arithmetic Logic Unit (ALU)  Must support the Arithmetic/Logic operations of the ISA add, addi, addiu, addu sub, subu, neg mult, multu, div, divu sqrt and, andi, nor, or, ori, xor, xori beq, bne, slt, slti, sltiu, sltu 32 m (operation) result A B ALU 4 zeroovf 1 1  With special handling for sign extend – addi, andi, ori, xori, slti zero extend – lbu, addiu, sltiu no overflow detected – addu, addiu, subu, multu, divu, sltiu, sltu CarryOut

5 Fig. C.5.14 MIPS Arithmetic Logic Unit (cont.) add, addi, addiu, addu sub, subu, beq, bne FIGURE C.5.13 The values of the three ALU control lines, Bnegate, and Operation, and the corresponding ALU operations.

6 6 7 使用 4 種基本硬體元件來建構 ALU 1. AND gate (c=ab) 2. OR gate (c=a+b) 3. Inverter (c=-a) 4. Multiplexor (if d==0, c=a; Else c=b) Review: ALU Construction

7 7 3.2 Addition & Subtraction p.225 ( 頁 227) Fig. 3.1 Subtraction Addition 3.1

8 8  Just like in grade school (carry/borrow 1s)  Two's complement operations easy subtraction using addition of negative numbers  Overflow (result too large for finite computer word): e.g., adding two n-bit numbers does not yield an n-bit number Addition & Subtraction (cont.)

9 9 Review: A Full Adder 1-bit Full Adder A B S carry_in carry_out S = A  B  carry_in (odd parity function) carry_out = A&B | A&carry_in | B&carry_in  How can we use it to build a 32-bit adder?  How can we modify it easily to build an adder/subtractor? ABcarry_incarry_outS

10 10 A 32-bit Ripple Carry Adder/Subtractor  Remember 2’s complement is just complement all the bits add a 1 in the least significant bit A 0111  0111 B  + 1-bit FA S0S0 c 0 =carry_in c1c1 1-bit FA S1S1 c2c2 S2S2 c3c3 c 32 =carry_out 1-bit FA S 31 c A0A0 A1A1 A2A2 A 31 B0B0 B1B1 B2B2 B 31 add/sub B0B0 control (0=add,1=sub) B 0 if control = 0, !B 0 if control = = A - B

11 11 Overflow Detection  Overflow: the result is too large to represent in 32 bits l No overflow when adding a positive and a negative number l No overflow when signs are the same for subtraction  Overflow occurs when l adding two positives yields a negative l or, adding two negatives gives a positive l or, subtract a negative from a positive gives a negative l or, subtract a positive from a negative gives a positive  On your own: Prove you can detect overflow by: l Carry into MSB xor Carry out of MSB, ex for 4 bit signed numbers –4 – 5

12 12 Overflow Detection (cont.)  Overflow: the result is too large to represent in 32 bits  Overflow occurs when l adding two positives yields a negative l or, adding two negatives gives a positive l or, subtract a negative from a positive gives a negative l or, subtract a positive from a negative gives a positive  On your own: Prove you can detect overflow by: l Carry into MSB xor Carry out of MSB, ex for 4 bit signed numbers – –4 –

13 13 Overflow Detection (cont.)

14 14 Overflow Conditions p.226 ( 頁 228) 3.2

15 15  An exception (interrupt) occurs l Control jumps to predefined address for exception l Interrupted address is saved for possible resumption (EPC)  Don't always want to detect overflow — new MIPS instructions: addu, addiu, subu Effects of Overflow p.227 ( 頁 229)

16 16 move from coprocessor 0 jr $s1 p.227 ( 頁 229)

17 17 MIPS R2000 CPU and FPU Fig. B.10.1

18 18 Arithmetic for Multimedia  Graphics and media processing operates on vectors of 8-bit and 16-bit data l Use 64-bit adder, with partitioned carry chain -Operate on 8×8-bit, 4×16-bit, or 2×32-bit vectors l SIMD (single-instruction, multiple-data)  Saturating operations l On overflow, result is largest representable value -c.f. 2s-complement modulo arithmetic l E.g., clipping in audio, saturation in video

19 19 Clipping in Audio Clipping is a form of waveform distortion that occurs when an amplifier is overdriven and attempts to deliver an output voltage or current beyond its maximum capability.

20 20 17 Unsigned multiply example : (1000) 2 x (1011) 2 : 1000 x _ 0000__ 1000___ Multiplicand Multiplier Example. (0010) 2 x (0011) 2 : 0010 x _ 0000__ 0000___ Multiplication p.230 ( 頁 233)

21 21 Multiplication (cont.)  Binary multiplication is just a bunch of left shifts and adds multiplicand multiplier partial product array double precision product n 2n n can be formed in parallel and added in parallel for faster multiplication

22 22  More complicated than addition l accomplished via shifting and addition  More time and more area  Let's look at 3 versions based on a gradeschool algorithm 0010 (multiplicand) __x_1011 (multiplier)  Negative numbers: convert and multiply l there are better techniques, we won’t look at them Multiplication (cont.)

23 x _ 0000__ 1000___ Multiplication: Implementation Datapath (Fig. 3.4) Control (Fig. 3.5) First version Done! p ( 頁 ) X

24 24 Multiplication: Refined Version What goes here? Multiplier starts in right half of product 把被乘數加到乘積的 左半邊 , 然後把結果放到乘積 暫存器的左半邊 Fig. 3.6 p.233 ( 頁 236) Done! Multiplicand 32-bit ALU Shift right X

25 25 Faster Multiplier  Uses multiple adders l Cost/performance tradeoff Fig. 3.8 p.236 ( 頁 239) … … 1 C 32 1 B X Bit 1 33 bits Bit bits Bit 1

26 26 MIPS Multiply Instruction  Multiply produces a double precision product mult $t1, $t2 # hi||lo=$t1 * $t2 move from register Lo

27 27  Multiply produces a double precision product mult $s0, $s1 # hi||lo = $s0 * $s1 Low-order word of the product is left in processor register lo and the high-order word is left in register hi Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file MIPS Multiply Instruction (cont.)  Multiplies are done by fast, dedicated hardware and are much more complex (and slower) than adders  Hardware dividers are even more complex and even slower p.181( 頁 178) op rs rt rd shamt funct

28 Division  Division is just a bunch of quotient digit guesses and right shifts and subtracts  Dividend = Quotient X Divisor + Remainder dividend (2n – 1) divisor partial remainder array quotientn n remainder n (2n) (n + 1)

29 29 Division: First version Fig Subtract Add repetitions =#dividend - #divisor + 1 = … p. 186( 頁 183) Subtract Add Subtract Add Subtract Subtract p.237 ( 頁 241)

30 30 Division (cont.) Fig p.238 ( 頁 242)

31 31 Optimized Divider Fig … Quotient Subtract Add 32 repetitions! Final repetition: Shift Remainder Right: 0010 p. 183 ( 頁 180) Divisor 32-bit ALU Control test Shift right Shift left Write QuotientRemainder 32-bit ALU Shift right Subtract Subtract Add Subtract p.240( 頁 244)

32 32  Divide generates the reminder in hi and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1 Instructions mfhi rd and mflo rd are provided to move the quotient and reminder to (user accessible) registers in the register file MIPS Divide Instruction  As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0. op rs rt rd shamt funct Quotient Remainder

33 33 MIPS Multiply/Divide Summary Move To register Lo

34 34 MIPS Multiply/Divide Summary (cont.) p.243 ( 頁 246) Fig. 3.13

35 Floating Point (a brief look)  We need a way to represent l numbers with fractions, e.g., l very small numbers, e.g., very large numbers, e.g.,  10 9  Representation: sign, exponent, significand: (–1) sign  significand  2 exponent l more bits for significand gives more accuracy l more bits for exponent increases range  IEEE 754 floating point standard: l single precision: 8 bit exponent, 23 bit significand l double precision: 11 bit exponent, 52 bit significand

36 36 Representing Big (and Small) Numbers  What if we want to encode the approx. age of the earth? 4,600,000,000 or 4.6 x 10 9 or the weight in kg of one a.m.u. (atomic mass unit) or 1.6 x There is no way we can encode either of the above in a 32-bit integer.  Floating point representation (-1) sign x F x 2 E l Still have to fit everything in 32 bits (single precision) s E (exponent) F (fraction) 1 bit 8 bits 23 bits l The base (2, not 10) is hardwired in the design of the FPALU l More bits in the fraction (F) or the exponent (E) is a trade-off between precision (accuracy of the number) and range (size of the number) p.245 ( 頁 249)

37 37 23 Sign and magnitude representation (-1) * (1+Significand) * 2 Single precision Double precision Overflow( 溢位 )/Underflow( 短值 ) 是由於指數太大 / 太小 而無法在指數欄位上表示出來 s E ExponentSignificandS 1-bit8-bit23-bit ExponentSignificandS 1-bit11-bit52-bit IEEE 754 FP Standard 32 bits 64 bits hidden bit eg., =0.110x2 -1 =1.100x2 -2

38 38 IEEE 754 的偏差值 (Bias) 在單精度方面為 127 , 在倍精度方面為 1023 (-1) * (1+Significand) * 2 Exponent = E + bias E = Exponent - bias S E E … 0 … -127 Exponent … … 0 Bias 127 IEEE 754 FP Standard (cont.)

39 39 25 IEEE 754 FP Standard (cont.) p.248 ( 頁 252) 範例 以 IEEE 754 二進位表示法,說明十進位 的單精度及 倍精度的格式 。 » 解答. = -(0.11) 2 = -(1.1) 2 * 2 -1 單精度: …0 (126) 10 倍精度: … (1022) 10 ( ) – 127 = -1 ( ) – 1023 = -1 hidden bit

40 40 IEEE 754 FP Standard (cont.)  Most computers these days conform to the IEEE 754 floating point standard (-1) sign x (1+F) x 2 E-bias l Formats for both single and double precision l F is stored in normalized form where the msb in the fraction is 1 (so there is no need to store it!) – called the hidden bit l To simplify sorting FP numbers, E comes before F in the word and E is represented in excess (biased) notation Single PrecisionDouble PrecisionObject Represented E (8)F (23)E (11)F (52) 0000true zero (0) 0nonzero0 ± denormalized number 1-254anything1-2046anything± floating point number ± infinity 255nonzero2047nonzeronot a number (NaN) p.246( 頁 251) Fig S±±±S±±±

41 41 Floating Point Complexities  Operations are somewhat more complicated (see text)  In addition to overflow we can have “underflow”  Accuracy can be a big problem l IEEE 754 keeps two extra bits, guard and round l four rounding modes l positive divided by zero yields “infinity” l zero divide by zero yields “not a number” l other complexities  Implementing the standard can be tricky

42 42 Floating Point Addition  Addition (and subtraction) (  F1  2 E1 ) + (  F2  2 E2 ) =  F3  2 E3 l Step 1: Restore the hidden bit in F1 and in F2 l Step 1: Align fractions by right shifting F2 by E1 - E2 positions (assuming E1  E2) keeping track of (three of) the bits shifted out in a guard bit, a round bit, and a sticky bit l Step 2: Add the resulting F2 to F1 to form F3 l Step 3: Normalize F3 (so it is in the form 1.XXXXX …) -If F1 and F2 have the same sign  F3  [1,4)  1 bit right shift F3 and increment E3 -If F1 and F2 have different signs  F3 may require many left shifts each time decrementing E3 l Step 4: Round F3 and possibly normalize F3 again p.250 ( 頁 254) x 2 4 vs x2 2 = x , … … =

43 43 Floating point addition  p.252 ( 頁 257) Fig eg., Still normalized? l Step 5: Rehide the most significant bit of F3 before storing the result

44 44  p.254 ( 頁 259) Fig 大小比較 大 大 小 加減運算 正規化 指數遞增減

45 45 MIPS R2000 CPU and FPU Fig. B.10.1 $f0 $f1 $f31

46 46 MIPS Floating Point Instructions  MIPS has a separate Floating Point Register File ( $f0, $f1, …, $f31 ) (whose registers are used in pairs for double precision values) with special instructions to load to and store from them lwcl $f0,54($s2) #$f0 = Memory[$s2+54] swcl $f0,58($s4) #Memory[$s4+58] = $f0  And supports IEEE 754 single add.s $f2,$f4,$f6 #$f2 = $f4 + $f6 and double precision operations add.d $f2,$f4,$f6 #$f2||$f3 = $f4||$f5 + $f6||$f7 similarly for sub.s, sub.d, mul.s, mul.d, div.s, div.d From/To coprocessor 1

47 47 MIPS Floating Point Instructions, Con’t  And floating point single precision comparison operations c.lt.s $f2,$f4 #if($f2 < $f4) cond=1; else cond=0 where lt may be replaced with eq, neq, le, gt, ge and branch operations bclt 25 #if(cond==1) go to PC bclf 25 #if(cond==0) go to PC  And double precision comparison operations c.lt.d $f2,$f4 #$f2||$f3 < $f4||$f5 cond=1; else cond=0

48 48 p.205 ( 頁 203) Fig MIPS FP Multiplication

49 49 p.260( 頁 265) Fig #$f2||$f3 =$f4||$f5 + $f6||$f7

50 50 p. 260( 頁 265) Fig. 3.18

51 51 FlPt p. 260( 頁 265) Fig. 3.18

52 52 p.261( 頁 266) Fig.3.19 MIPS FP instruction encoding 1617

53 p.261( 頁 266) Fig.3.19 MIPS FP instruction encoding

54 Fallacies p. 221( 頁 219)

55 FP add, subtract associative?  Parallel programs may interleave operations in unexpected orders l Assumptions of associativity may fail  Need to validate parallel programs under varying degrees of parallelism

56 56 x86 FP Architecture  Originally based on 8087 FP coprocessor l 8 × 80-bit extended-precision registers l Used as a push-down stack l Registers indexed from TOS: ST(0), ST(1), …  FP values are 32-bit or 64 in memory l Converted on load/store of memory operand l Integer operands can also be converted on load/store  Very difficult to generate and optimize code l Result: poor FP performance

57 57 Streaming SIMD Extension 2 (SSE2)  Adds 4 × 128-bit registers l Extended to 8 registers in AMD64/EM64T  Can be used for multiple FP operands l 2 × 64-bit double precision l 4 × 32-bit double precision l Instructions operate on them simultaneously -Single-Instruction Multiple-Data

58 58 Streaming SIMD Extensions  In computing, Streaming SIMD Extensions (SSE) is a SIMD instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series processors as a reply to AMD's 3DNow! (which had debuted a year earlier). SSE contains 70 new instructions.  SSE originally added eight new 128-bit registers known as XMM0 through XMM7. The AMD64 extensions from AMD (originally called x86-64 and later duplicated by Intel) add a further eight registers XMM8 through XMM15.  SSE2, introduced with the Pentium 4, is a major enhancement to SSE. SSE2 adds new math instructions for double-precision (64-bit) floating point and also extends MMX instructions to operate on 128- bit XMM registers.  SSE3, is an incremental upgrade to SSE2, adding a handful of DSP- oriented mathematics instructions and some process (thread) management instructions.  SSE4 is another major enhancement, adding a dot product instruction, additional integer instructions, a popcnt instruction, and more.

59 59 Right Shift and Division  Left shift by i places multiplies an integer by 2 i  Right shift divides by 2 i ? l Only for unsigned integers  For signed integers l Arithmetic right shift: replicate the sign bit l e.g., –5 / >> 2 = = –2 -Rounds toward –∞ l c.f >>> 2 = = +62 §3.8 Fallacies and Pitfalls

60 60 Who Cares About FP Accuracy?  Important for scientific code l But for everyday consumer use? -“My bank balance is out by ¢!”   The Intel Pentium FDIV bug l The market expects accuracy l See Colwell, The Pentium Chronicles

61 61 Concluding Remarks  ISAs support arithmetic l Signed and unsigned integers l Floating-point approximation to reals  Bounded range and precision l Operations can overflow and underflow  MIPS ISA l Core instructions: 54 most frequently used -100% of SPECINT, 97% of SPECFP l Other instructions: less frequent §3.9 Concluding Remarks

62 62  Questions?  Exercises: 3.2.1, 3.2.2, 3.3.1, 3.3.2, , , ,  Midterm Exam. l 2012/04/18 (Wed.) 15:30-16:50 l 管理大樓地下一樓舊國際會議廳 M001 ( 對號入座 ) l Chapter 1 ~ Chapter 3 l 閉書考 l 分題目卷與答案卷 ( 寫入答案卷才計分, 題號請標明 )  Thank you! Summary

63 63  Questions?  Exercises: 3.2.1, 3.2.2, 3.3.1, 3.3.2, , , , Summary

64 64 期中考 (Midterm Exam)  2014/11/24 週一 (Mon.) 早上上課時間  資訊大樓三樓上課教室 I311 ( 對號入座 )  Chapter 1 ~ Chapter 3 (closed book)  分題目卷與答案卷 ( 寫入答案卷才計分, 題號請標明 )  考試時間 : 80 分鐘

65 65 Midterm Exam  2013/11/06 (Wed.) 15:30-16:50 訓輔時間  管理大樓地下一樓舊會議廳 M001 ( 對號入座 )  Chapter 1 ~ Chapter 3 (closed book)  分題目卷與答案卷 ( 寫入答案卷才計分, 題號請標明 )


Download ppt "1 Chapter 3 Arithmetic for Computers 陳瑞奇 ( J.C. Chen ) 亞洲大學資訊工程學系 Adapted from class notes by Prof. C.T. King, NTHU, Prof. M.J. Irwin, PSU and Prof. D."

Similar presentations


Ads by Google