Computer Architecture Chapter 4 Arithmetic for Computers

Computer Architecture Chapter 4 Arithmetic for Computers
Give qualifications of instructors: DAP teaching computer architecture at Berkeley since 1977 Co-athor of textbook used in class Best known for being one of pioneers of RISC currently author of article on future of microprocessors in SciAm Sept 1995 RY took 152 as student, TAed 152,instructor in 152 undergrad and grad work at Berkeley joined NextGen to design fact 80x86 microprocessors one of architects of UltraSPARC fastest SPARC mper shipping this Fall

Contents Introduction Representation of Numbers Addition and Subtraction Logical Operations Construction of ALU Multiplication Floating Point Summary

Arithmetic Where we've been: What's up ahead:
Performance (seconds, cycles, instructions) Abstractions: Instruction Set Architecture Assembly Language and Machine Language What's up ahead: Implementing the Architecture 32 operation result a b ALU

Numbers Bits are just bits (no inherent meaning) conventions define relationship between bits and numbers Binary numbers (base 2) decimal: n-1 Of course it gets more complicated: numbers are finite (overflow) fractions and real numbers negative numbers How do we represent negative numbers? i.e., which bit patterns will represent which numbers?

Possible Representations
Sign Magnitude: One's Complement Two's Complement = = = = = = = = = = = = = = = = = = = = = = = = -1 Issues: balance, number of zeros, ease of operations Which one is best? Why?

MIPS 32 bit signed numbers: two = 0ten two = + 1ten two = + 2ten two = + 2,147,483,646ten two = + 2,147,483,647ten two = - 2,147,483,648ten two = - 2,147,483,647ten two = - 2,147,483,646ten two = - 3ten two = - 2ten two = - 1ten maxint minint

Two's Complement Operations
Negating a two's complement number: invert all bits and add 1 remember: negate and invert are quite different! MSB(Most Significant Bit) is sign bit Converting n bit numbers into numbers with more than n bits: MIPS 16 bit immediate gets converted to 32 bits for arithmetic copy the most significant bit (the sign bit) into the other bits > > "sign extension" (lbu vs. lb)

Addition & Subtraction
Just like in grade school (carry/borrow 1s) Two's complement operations easy subtraction using addition of negative numbers Overflow (result too large for finite computer word): e.g., adding two n-bit numbers does not yield an n-bit number note that overflow term is somewhat misleading, it does not mean a carry overflowed

No overflow when adding a positive and a negative number
Detecting Overflow No overflow when adding a positive and a negative number No overflow when signs are the same for subtraction Overflow occurs when the value affects the sign: overflow when adding two positives yields a negative or, adding two negatives gives a positive or, subtract a negative from a positive and get a negative or, subtract a positive from a negative and get a positive

An exception (interrupt) occurs
Effects of Overflow An exception (interrupt) occurs Control jumps to predefined address for exception Interrupted address is saved for possible resumption Details based on software system / language example: flight control vs. homework assignment Don't always want to detect overflow - new MIPS instructions: addu, addiu, subu note: addiu still sign-extends! note: sltu, sltiu for unsigned comparisons

An ALU (arithmetic logic unit)
Let's build an ALU to support the andi and ori instructions we'll just build a 1 bit ALU, and use 32 of them Possible Implementation (sum-of-products): operation result op a b res a b

An ALU (arithmetic logic unit)

Different Implementations
Let's look at a 1-bit ALU for addition: How could we build a 1-bit ALU for add, and, and or? How could we build a 32-bit ALU? cout = a b + a cin + b cin sum = a xor b xor cin

Building a 32 bit ALU

What about subtraction (a - b) ?
Two's complement approach: just negate b and add. How do we negate? A very clever solution:

Tailoring the ALU to the MIPS
Need to support the set-on-less-than instruction (slt) remember: slt is an arithmetic instruction produces a 1 if rs < rt and 0 otherwise use subtraction: (a-b) < 0 implies a < b Need to support test for equality (beq $t5, $t6, $t7) use subtraction: (a-b) = 0 implies a = b

Supporting slt Can we figure out the idea?

Test for equality Notice control lines: = and 001 = or 010 = add 110 = subtract 111 = slt Note: zero is a 1 when the result is zero!

Problem: ripple carry adder is slow
Is a 32-bit ALU as fast as a 1-bit ALU? Is there more than one way to do addition? two extremes: ripple carry and sum-of-products Can you see the ripple? How could you get rid of it? c1 = b0c0 + a0c0 + a0b0 c2 = b1c1 + a1c1 + a1b1 c2 = c3 = b2c2 + a2c2 + a2b2 c3 = c4 = b3c3 + a3c3 + a3b3 c4 = Not feasible! Why?

Carry-lookahead adder
An approach in-between our two extremes Motivation: If we didn't know the value of carry-in, what could we do? When would we always generate a carry? gi = ai bi When would we propagate the carry? pi = ai + bi Did we get rid of the ripple? c1 = g0 + p0c0 c2 = g1 + p1c1 c2 = g1 + p1(g0 + p0c0 ) = g1 + p1g0 + p1p0c0 c3 = g2 + p2c2 c3 = c4 = g3 + p3c3 c4 = Feasible! Why?

Use principle to build bigger adders
Can’t build a 16 bit adder this way... (too big) Could use ripple carry of 4-bit CLA adders Better: use the CLA principle again!

MULTIPLY (unsigned) Paper and pencil example (unsigned):
Multiplicand Multiplier Product m bits x n bits = m+n bit product Binary makes it easy: 0 => place 0 ( 0 x multiplicand) 1 => place a copy ( 1 x multiplicand) 4 versions of multiply hardware & algorithm: successive refinement

Unsigned Combinational Multiplier
A0 A1 A2 A3 B0 A0 A1 A2 A3 B1 A0 A1 A2 A3 B2 A0 A1 A2 A3 B3 P7 P6 P5 P4 P3 P2 P1 P0 Stage i accumulates A * 2 i if Bi == 1

How does it work? at each stage shift A left ( x 2)
B0 A0 A1 A2 A3 B1 B2 B3 P0 P1 P2 P3 P4 P5 P6 P7 at each stage shift A left ( x 2) use next bit of B to determine whether to add in shifted multiplicand accumulate 2n bit partial product at each stage

Unsigned shift-add multiplier (Version 1)
64-bit Multiplicand reg, 64-bit ALU, 64-bit Product reg, 32-bit multiplier reg Product Multiplier Multiplicand 64-bit ALU Shift Left Shift Right Write Control 32 bits 64 bits

Multiply Algorithm Version 1
Start Multiplier0 = 1 Multiplier0 = 0 Test Multiplier0 1. 1a. Add multiplicand to product & place the result in Product register Product Multiplier Multiplicand 2. Shift the Multiplicand register left 1 bit. 3. Shift the Multiplier register right 1 bit. 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done

Observations on Multiply Version 1
1/2 bits in multiplicand always 0  64-bit adder is wasted 0’s inserted in left of multiplicand as shifted  least significant bits of product never changed once formed Instead of shifting multiplicand to left, shift product to right?

MULTIPLY HARDWARE Version 2
32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, 32-bit Multiplier reg Multiplicand 32 bits Multiplier Shift Right 32-bit ALU 32 bits Shift Right Product Control Write 64 bits

Start Multiplier Multiplicand Product Multiplier0 = 1 Multiplier0 = 0 Test Multiplier0 1. 1a. Add multiplicand to the left half of product & place the result in the left half of Product register Product Multiplier Multiplicand 2. Shift the Product register right 1 bit. 3. Shift the Multiplier register right 1 bit. 32nd repetition? No: < 32 repetitions Yes: 32 repetitions Done

What’s going on? Multiplicand stay’s still and product moves right A0
A0 A1 A2 A3 B0 A0 A1 A2 A3 B1 A0 A1 A2 A3 B2 A0 A1 A2 A3 B3 P7 P6 P5 P4 P3 P2 P1 P0 Multiplicand stay’s still and product moves right

Break 5-minute Break/ Do it yourself Multiply Multiplier Multiplicand Product

Product register wastes space that exactly matches size of multiplier combine Multiplier register and Product register

MULTIPLY HARDWARE Version 3
32-bit Multiplicand reg, 32 -bit ALU, 64-bit Product reg, (0-bit Multiplier reg) Multiplicand 32 bits 32-bit ALU Shift Right Product (Multiplier) Control Write 64 bits

Done Yes: 32 repetitions 2. Shift the Product register right 1 bit. No: < 32 repetitions 1. Test Product0 Product0 = 0 Product0 = 1 1a. Add multiplicand to the left half of product & place the result in the left half of Product register 32nd repetition? Start Multiplicand Product

2 steps per bit because Multiplier & Product combined MIPS registers Hi and Lo are left and right half of Product Gives us MIPS instruction MultU How can you make it faster? What about signed multiplication? easiest solution is to make both positive & remember whether to complement product when done (leave out the sign bit, run for 31 steps) apply definition of 2’s complement need to sign-extend partial products and subtract at the end Booth’s Algorithm is elegant way to multiply signed numbers using same hardware as before and save cycles can handle multiple bits at a time

Motivation for Booth’s Algorithm
Example 2 x 6 = 0010 x 0110: x shift (0 in multiplier) add (1 in multiplier) add (1 in multiplier) shift (0 in multiplier) ALU with add or subtract gets same result in more than one way: 6 = – = – = For example x shift (0 in multiplier) – sub (first 1 in multpl.) shift (mid string of 1s) add (prior step had last 1)

Booth’s Algorithm Current Bit Bit to the Right Explanation Example Op
1 0 Begins run of 1s sub 1 1 Middle of run of 1s none 0 1 End of run of 1s add 0 0 Middle of run of 0s none Originally for Speed (when shift was faster than add) Replace a string of 1s in multiplier with an initial subtract when we first see a one and then later add for the bit after the last one –1 01111

Booths Example (2 x 7) Operation Multiplicand Product next?
0. initial value  sub 1a. P = P - m shift P (sign ext) 1b  nop, shift  nop, shift  add 4a shift 4b done

Booths Example (2 x -3) Operation Multiplicand Product next?
0. initial value  sub 1a. P = P - m shift P (sign ext) 1b  add 2a shift P 2b  sub 3a shift 3b  nop 4a shift 4b done

logical-- value shifted in is always "0"
Shifters Two kinds: logical-- value shifted in is always "0" arithmetic-- on right shifts, sign extend "0" msb lsb "0" msb lsb "0" Note: these are single bit shifts. A given instruction might request 0 to 32 bits to be shifted!

Combinational Shifter from MUXes
Basic Building Block 1 sel D 8-bit right shifter 1 S2 S1 S0 A0 A1 A2 A3 A4 A5 A6 A7 R0 R1 R2 R3 R4 R5 R6 R7 What comes in the MSBs? How many levels for 32-bit shifter? What if we use 4-1 Muxes ?

General Shift Right Scheme using 16 bit example
(0, 1) S 1 (0, 2) S 2 (0, 4) S 3 (0, 8) If added Right-to-left connections could support Rotate (not in MIPS but found in ISAs)

Barrel Shifter Technology-dependent solutions: transistor per switch
SR3 SR2 SR1 SR0 D3 D2 A6 D1 A5 D0 A4 A3 A2 A1 A0

Divide: Paper & Pencil 1001 Quotient
Divisor Dividend – – Remainder (or Modulo result) See how big a number can be subtracted, creating quotient bit on each step Binary  1 * divisor or 0 * divisor Dividend = Quotient x Divisor + Remainder  | Dividend | = | Quotient | + | Divisor | 3 versions of divide, successive refinement

DIVIDE HARDWARE Version 1
64-bit Divisor reg, 64-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Shift Right Divisor 64 bits Quotient Shift Left 64-bit ALU 32 bits Write Remainder Control 64 bits

Divide Algorithm Version 1
Start: Place Dividend in Remainder Takes n+1 steps for n-bit Quotient & Rem. Remainder Quotient Divisor 1. Subtract the Divisor register from the Remainder register, and place the result in the Remainder register. Remainder > 0 Test Remainder Remainder < 0 2a. Shift the Quotient register to the left setting the new rightmost bit to 1. 2b. Restore the original value by adding the Divisor register to the Remainder register, & place the sum in the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. 3. Shift the Divisor register right1 bit. n+1 repetition? No: < n+1 repetitions Yes: n+1 repetitions (n = 4 here) Done

Observations on Divide Version 1
1/2 bits in divisor always 0  1/2 of 64-bit adder is wasted  1/2 of divisor is wasted Instead of shifting divisor to right, shift remainder to left? 1st step cannot produce a 1 in quotient bit (otherwise too big)  switch order to shift first and then subtract, can save 1 iteration

32-bit Divisor reg, 32-bit ALU, 64-bit Remainder reg, 32-bit Quotient reg Remainder Quotient Divisor 32-bit ALU Shift Left Write Control 32 bits 64 bits

Start: Place Dividend in Remainder Remainder Quotient Divisor 1. Shift the Remainder register left 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Remainder > 0 Test Remainder Remainder < 0 3a. Shift the Quotient register to the left setting the new rightmost bit to 1. 3b. Restore the original value by adding the Divisor register to the left half of the Remainder register, &place the sum in the left half of the Remainder register. Also shift the Quotient register to the left, setting the new least significant bit to 0. nth repetition? No: < n repetitions Yes: n repetitions (n = 4 here) Done

Eliminate Quotient register by combining with Remainder as shifted left Start by shifting the Remainder left as before. Thereafter loop contains only two steps because the shifting of the Remainder register shifts both the remainder in the left half and the quotient in the right half The consequence of combining the two registers together and the new order of the operations in the loop is that the remainder will shifted left one time too many. Thus the final correction step must shift back only the remainder in the left half of the register

32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg, (0-bit Quotient reg) Divisor 32 bits 32-bit ALU “HI” “LO” Shift Left Remainder (Quotient) Control 64 bits Write

Start: Place Dividend in Remainder Remainder Divisor 1. Shift the Remainder register left 1 bit. 2. Subtract the Divisor register from the left half of the Remainder register, & place the result in the left half of the Remainder register. Remainder > 0 Test Remainder Remainder < 0 3a. Shift the Remainder register to the left setting the new rightmost bit to 1. 3b. Restore the original value by adding the Divisor register to the left half of the Remainder register, &place the sum in the left half of the Remainder register. Also shift the Remainder register to the left, setting the new least significant bit to 0. nth repetition? No: < n repetitions Yes: n repetitions (n = 4 here) Done. Shift left half of Remainder right 1 bit.

Same Hardware as Multiply: just need ALU to add or subtract, and 63-bit register to shift left or shift right Hi and Lo registers in MIPS combine to act as 64-bit register for multiply and divide Signed Divides: Simplest is to remember signs, make positive, and complement quotient and remainder if necessary Note: Dividend and Remainder must have same sign Note: Quotient negated if Divisor sign & Dividend sign disagree e.g., –7 ÷ 2 = –3, remainder = –1 Possible for quotient to be too large: if divide 64-bit integer by 1, quotient is 64 bits (“called saturation”)

Floating Point (a brief look)
We need a way to represent numbers with fractions, e.g., very small numbers, e.g., very large numbers, e.g., ´ 109 Representation: sign, exponent, significand: (-)sign  significand  2exponent more bits for significand gives more accuracy more bits for exponent increases range IEEE 754 floating point standard: single precision: 8 bit exponent, 23 bit significand double precision: 11 bit exponent, 52 bit significand

IEEE 754 floating-point standard
Leading a bit of significand is implicit Exponent is biased to make sorting easier all 0s is smallest exponent all 1s is largest bias of 127 for single precision and 1023 for double precision summary: (-)sign  (1+significand)  2(exponent – bias) Example: decimal: = -3/4 = -3/22 binary: = -1.1 x 2-1 floating point: exponent = 126 = IEEE single precision:

Floating Point Complexities
Operations are somewhat more complicated (see text) In addition to overflow we can have underflow Accuracy can be a big problem IEEE 754 keeps two extra bits, guard and round four rounding modes positive divided by zero yields infinity zero divide by zero yields not a number other complexities Implementing the standard can be tricky Not using the standard can be even worse see text for description of 80x86 and Pentium bug!

Summary Computer arithmetic is constrained by limited precision
Bit patterns have no inherent meaning but standards do exist Two’s complement IEEE 754 floating point Computer instructions determine meaning of the bit patterns Performance and accuracy are important so there are many complexities in real machines (i.e., algorithms and implementation). We are ready to move on (and implement the processor) you may want to look back (Section 4.12 is great reading!)

Unsigned Multiply 4x4 multiply (p = a x b) 예)

Unsigned Multiplier 4x4 multiplier (p = a x b)
An array multiplier for unsigned numbers

Carry Save Adder

Signed Multiplier 8x8 signed multiply Sign bit extension 예)

Booth Multiplier Architecture
(Modified) Booth Encoding Partial Product Wallace Tree(Partial Product Adding) Sum & Carry Adder

Origin of Booth’s Algorithm

Modified Booth’s Algorithm

Algorithm Explanation(1)

Booth’s Algorithm(radix-4)
Booth Multiply Ex1. Booth’s Algorithm(radix-4) Multiplier에서 디코딩 과정을 거쳐 Partial product 수를 줄임으로써 빠른 연산을 수행 디코딩 방법

Booth Multiply Ex2.

Sign Extension 적은 Sign extension을 이용한 multiplier 구현

Wallace Tree Structure
Carry 전달 지연시간을 줄여 고속의 addition을 하기 위한 구조

Homework 8*8 bit 곱셈기에서 modified booth algorithm을 사용하였을 경우 전체 곱셈기에 대한 wallace tree를 완성하고 ripple carry adder를 사용하여 최종 곱셈 결과를 출력도록 하여라.

Computer Architecture Chapter 4 Arithmetic for Computers

Similar presentations

Presentation on theme: "Computer Architecture Chapter 4 Arithmetic for Computers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Architecture Chapter 4 Arithmetic for Computers

Similar presentations

Presentation on theme: "Computer Architecture Chapter 4 Arithmetic for Computers"— Presentation transcript:

Similar presentations

About project

Feedback