Chapter 4: Arithmetic for Computers (Part 1)

Name: Chapter 4: Arithmetic for Computers (Part 1)
Uploaded: 2017-11-02T06:38:56+00:00
Duration: PTM31S56
Description: Chapter 4: Arithmetic for Computers (Part 1)

Chapter 4: Arithmetic for Computers (Part 1)
CS 447 Jason Bakos

Notes on Project 1 word1 .byte 0,1,2,3 word2 .half 0,1
There are two different ways the following two words can be stored in a computer memory… word1 .byte 0,1,2,3 word2 .half 0,1 One way is big-endian, where the word is stored in memory in its original order… word1: word2: Another way is little-endian, where the word is stored in memory in reverse order… Of course, this affects the way in which the lw instruction works… 00 01 02 03 0000 0001 03 02 01 00 0001 0000

Notes on Project 1 MIPS uses the endian-style that the architecture underneath it uses Intel uses little-endian, so we need to deal with that This affects assignment 1 because the input data is stored as a series of bytes If you use lw’s on your data set, the values will be loaded into your dest. register in reverse order Hint: Try the lb/sb instruction This instruction will load/store a byte from an unaligned address and perform the translation for you

Notes on Project 1 Hint: Use SPIM’s breakpoint and single-step features to help debug your program Also, make sure you use the registers and memory/stack displays Hint: You may want to temporarily store your input set into a word array for sorting Make sure you check Appendix A for additional useful instructions that I didn’t cover in class Make sure you comment your code!

Goals of Chapter 4 Data representation
Hardware mechanisms for performing arithmetic on data Hardware implications on the instruction set design

Review of Binary Representation
Binary/Hex -> Decimal conversion Decimal -> Binary/Hex conversion Least/Most significant bits Highest representable number/maximum number of unique representable symbols Two’s compliment representation One’s compliment Finding signed number ranges (-2n-1 to 2n-1-1) Doing arithmetic with two’s compliment Sign extending with load half/byte Unsigned loads Signed/unsigned comparison

Binary Addition/Subtraction
Binary subtraction works exactly like addition, except the second operand is converted to two’s compliment Overflow in signed arithmetic occurs under the following conditions: Operation Operand A Operand B Result A+B Positive Negative A-B

What Happens When Overflow Occurs?
MIPS detects overflow with an exception/interrupt When an interrupt occurs, a branch occurs to code in the kernel at address where special registers (BadVAddr, Status, Cause, and EPC) are used to handle the interrupt SPIM has a simple interrupt handler built-in that deals with interrupts We may come back to interrupts later

Review of Shift and Logical Operations
MIPS has operations for SLL, SRL, and SRA We covered this in the last chapter MIPS implements bit-wise AND, OR, and XOR logical operations These operations perform a bit-by-bit parallel logical operation on two registers In C, use << and >> for arithmetic shifts, and &, |, ^, and ~ for bitwise and, or, xor, and NOT, respectively

Review of Logic Operations
The three main parts of a CPU ALU (Arithmetic and Logic Unit) Performs all logical, arithmetic, and shift operations CU (Control Unit) Controls the CPU – performs load/store, branch, and instruction fetch Registers Physical storage locations for data

In this chapter, our goal is to learn how the ALU is implemented The ALU is entirely constructed using boolean functions as hardware building blocks The 3 basic digital logic building blocks can be used to construct any digital logic system: AND, OR, and NOT These functions can be directly implemented using electric circuits (wires and transistors)

These “combinational” logic devices can be assembled to create a much more complex digital logic system A B A AND B 1 A B A OR B 1 A not A 1

We need another device to build an ALU… This is called a multiplexor… it implements an if-then-else in hardware A B D C (out) 0 (a) 1 0 (b) 1 (b) 1 (a)

A 1-bit ALU Perform logic operations in parellel and mux the output
Next, we want to include addition, so let’s build a single-bit adder Called a full adder

Full Adder From the following table, we can construct the circuit for a full adder and link multiple full adders together to form a multi-bit adder We can also add this input to our ALU How do we give subtraction ability to our adder? How do we detect overflow and zero results? Inputs Outputs Comments A B CarryIn CarryOut Sum 0+0+0=00 1 0+0+1=01 0+1+0=1 0+1+1=10 1+0+0=01 1+0+1=10 1+1+0=10 1+1+1=11

Chapter 4: Arithmetic for Computers (Part 2)
CS 447 Jason Bakos

Logic/Arithmetic From the truth table for the mux, we can use sum-of-products to derive the logic equation With sum-of-products, for each ‘1’ row for each output, we AND together all the inputs (inverting the input 0’s), then OR all the row products To make it simpler, let’s add “don’t cares” to the table…

Logic/Arithmetic This gives us the following equation
B D C (out) X 0 (a) 1 0 (b) 1 (a) 1 (b) This gives us the following equation (A and (not D)) or (B and D) We don’t need the inputs for the “don’t cares” in our partial products This is one way to simplify our logic equation Other ways include propositional calculus, Karnaugh Maps, and the Quine-McCluskey algorithm

Logic/Arithmetic Here is a (crude) digital logic design for the 2-to-1 mux Note that multiple muxes can be assembled in stages to implement multiple-input muxes

Logic/Arithmetic For the adder, let’s minimize the logic using a Karnaugh Map… For CarryOut, we need 23 entries… We can minimize this to CarryOut=AB+CarryInB+CarryInC AB CarryIn 00 01 11 10 1

Logic/Arithmetic There’s no way to minimize this equation, so we need the full sum of products: Sum=(not A)(not B)CarryIn + ABCarryIn + (not A)BCarryIn + A(not B)CarryIn AB CarryIn 00 01 11 10 1

Logic/Arithmetic In order to implement subtraction, we can invert the B input to the adder and set CarryIn to be 1 This can be implemented with a mux: select B or not B (call this input Binvert) Now we can build a 1-bit ALU using an AND, OR, addition, and subtraction operation We can perform the AND, OR, and ADD in parallel and switch the results with a 4-input mux (Operation will be our D-input) To make the adder a subtractor, we’ll need to have to set Binvert and CarryIn to 1

Lecture 4: Arithmetic for Computers (Part 3)
CS 447 Jason Bakos

Chapter 4 Review So far, we’ve covered the following topics for this chapter Binary representation of signed integers 16 to 32 bit signed conversion Binary addition/subtraction Overflow detection/overflow exception handling Shift and logical operations Parts of the CPU AND, OR, XOR, and inverter gates Multiplexor (mux) and full adder Sum-of-products logic equations (truth tables) Logic minimization techniques Don’t cares and Karnaugh Maps

1-bit ALU Design A 1-bit ALU can be constructed Components Interface
AND, OR, and adder 4-to-1 mux “Binverter” (inverter and 2-to-1 mux) Interface Inputs: A, B, Binvert, Operation (2 bits), CarryIn, and Less Outputs: CarryOut and Result Digital functions are performed in parallel and the outputs are routed into a mux The mux will also accept a Less input which we’ll accept from outside the 1-bit ALU The select lines of the mux make up the “operation” input to the ALU

32-bit ALU In order to create a multi-bit ALU, array 32 1-bit ALUs
Connect the CarryOut of each bit to the CarryIn of the next bit A and B of each 1-bit ALU will be connected to each successive bit of the 32-bit A and B The Result outputs of each 1-bit ALU will form the 32-bit result We need to add an SLT unit and connect the output to the least significant 1-bit ALU’s Less input Hardwire the other “Less” inputs to 0 We need to add an Overflow unit We need to add a Zero detection unit

SLT Unit To compute SLT, we need to make sure that when the 1-bit ALU’s Operation is set to 11, a subtract operation is also being computed With this happening, the SLT unit can compute Less based on the MSB (sign) of A, B, and Result Asign Bsign Rsign Less 1 X

Overflow Unit When doing signed arithmetic, we need to follow this table, as we covered previously… How do we implement this in hardware? Operation Operand A Operand B Result A+B Positive Negative A-B

Overflow Unit We need a truth table…
Since we’ll be computing the logic equation with SOP, we only need the rows where the output is 1 Operation A(31) B(31) R(31) Overflow 010 (add) 1 110 (sub)

Zero Detection Unit “Or” together all the 1-bit ALU outputs – the result is the Zero output to the ALU

32-bit ALU Operation We need a 3-bit ALU Operation input into our 32-bit ALU The two least significant bits can be routed into all the 1-bit ALUs internally The most significant bit can be routed into the least significant 1-bit ALU’s CarryIn, and to Binvert of all the 1-bit ALUs

32-bit ALU Operation Here’s the final ALU Operation table:
Function 000 and 001 or 010 add 110 subtract 111 set on less than

32-bit ALU In the end, our ALU will have the following interface:
Inputs: A and B (32 bits each) ALU Operation (3 bits) Outputs: CarryOut (1 bit) Zero (1 bit) Result (32 bits) Overflow (1 bit)

Carry Lookahead The adder architecture we previously looked at requires n*2 gate delays to compute its result (worst case) The longest path that a digital signal must propagate through is called the “critical path” This is WAAAYYYY too slow! There other ways to build an adder that require lg n delay Obviously, using SOP, we can build a circuit that will compute ANY function in 2 gate delays (2 levels of logic) Obviously, in the case of a 64-input system, the resulting design will be too big and too complex

Carry Lookahead For example, we can easily see that the CarryIn for bit 1 is computed as: c1=(a0b0)+(a0c0)+(b0c0) c2=(a1b1)+(a1c1)+(b1c1) Hardware executes in parallel, so using the following fast CarryIn computation, we can perform an add with 3 gate delays c2=(a1b1)+(a1a0b0)+(a1a0c0)+(a1b0c0)+(b1a0b0)+(b1a0c0)+(b1b0c0) I used the logical distributive law to compute this As you can see, the CarryIn logic gets bigger and bigger for consecutive bits

Carry Lookahead Carry Lookahead adders are faster than ripple-carry adders Recall: ci+1=(aibi)+(aici)+(bici) ci can be factored out… ci+1=(aibi)+(ai+bi)ci So… c2=(a1b1)+(a1+b1)((a0b0)+(a0+b0)c0)

Carry Lookahead Note the repeated appearance of (aibi) and (ai+bi)
They are called generate (gi) and propagate (pi) gi=aibi, pi=ai+bi ci+1=gi+pici This means if gi=1, a CarryOut is generated If pi=1, a CarryOut is propagated from CarryIn

Carry Lookahead c1=g0+(p0c0) c2=g1+(p1g0)+(p1p0c0)
c3=g2+(p2g1)+(p2p1g0)+(p2p1p0c0) c4=g3+(p3g2)+(p3p2g1)+(p3p2p1g0)+(p3p2p1p0c0) …This system will give us an adder with 5 gate delays but it is still too complex

Carry Lookahead To solve this, we’ll build our adder using 4-bit adders with carry lookahead, and connect them using “super”-propagate and generate logic The superpropagate is only true if all the bits propagate a carry P0=p0p1p2p3 P1=p4p5p6p7 P2=p8p9p10p11 P3=p12p13p14p15

Carry Lookahead The supergenerate follows a similar equation:
G0=g3+(p3g2)+(p2p2g1)+(p3p2p1g0) G1=g7+(p7g6)+(p7p6g5)+(p7p6p5g4) G2=g11+(p11g10)+(p11p10g9)+(p11p10p9g8) G3=g15+(p15g14)+(p15p14g13)+(p15p14p13g12) The supergenerate and superpropagate logic for the 4-4 bit Carry Lookahead adders is contained in a Carry Lookahead Unit This yields a worst-case delay of 7 gate delays Reason?

Carry Lookahead We’ve covered all ALU functions except for the shifter
We’ll talk after the shifter later

CS 447 Jason Bakos

Binary Multiplication
In multiplication, the first operand is called the multiplicand, and the second is called the multiplier The result is called the product Not counting the sign bits, if we multiply an n-bit multiplicand with a m-bit multiplier, we’ll get a n+m-bit product

Binary Multiplication
Binary multiplication works exactly like decimal multiplication In fact, multiply by and pretend you’re using decimal numbers

First Hardware Design for Multiplier
Note that the multiplier is not routed into the ALU

Second Hardware Design for Multiplier
Architects realized that at the least, half of the bits in the multiplicand register were 0 Reduce ALU to 32 bits, shift the product right instead of shifting the multiplicand left In this case, the product is only 32 bits

Second Hardware Design for Multiplier

Final Hardware Design for Multiplier
Let’s combine the product register with the multiplier register… Put the multiplier in the right half of the product register and initialize the left half with zeros – when we’re done, the product will be in the right half

For the first two designs, we need to convert the multiplicand and the multiplier must be converted to positive The signs would need to be remembered so the product can be converted to whatever sign it needs to be The third design will deal with signed numbers, as long as the sign bit is extended in the product register

Booth’s Algorithm Booth’s Algorithm starts with the observation that if we have the ability to both add and subtract, there are multiple ways to compute a product For every 0 in the multiplier, we shift the multiplicand For every 1 in the multiplier, we add the multiplicand to the product, then shift the multiplicand

Booth’s Algorithm Instead, when a 1 is seen in the multiplier, subtract instead of add Shift for all 1’s after this, until the first 0 is seen, then add The method was developed because in Booth’s era, shifters were faster than adders

Booth’s Algorithm Example: 0010 == 2 x 0110 == 6 0000 == 0 shift
0010 == -2 (*21) subtract (first 1) == 0 shift (second 1) == 2 (*23) (first 0) -4+16=2*6=12

CS 447 Jason Bakos

Binary Division Like last lecture, we’ll start with some basic terminology… Again, let’s assume our numbers are base 10, but let’s only use 0’s and 1’s

Binary Division Recall:
Dividend=Quotient*Divisor + Remainder Let’s assume that both the dividend and divisor are positive and hence the quotient and the remainder are nonnegative The division operands and both results are 32-bit values and we will ignore the sign for now

First Hardware Design for Divider
Initialize the Quotient register to 0, initialize the left-half of the Divisor register with the divisor, and initialize the Remainder register with the dividend (right-aligned)

Second Hardware Design for Divider
Much like with the multiplier, the divisor and ALU can be reduced to 32-bits if we shift the remainder right instead of shifting the divisor to the left Also, the algorithm must be changed so the remainder is shifted left before the subtraction takes place

Third Hardware Design for Divider
Shift the bits of the quotient into the remainder register… Also, the last step of the algorithm is to shift the left half of the remainder right 1 bit

Signed Division Simplest solution: remember the signs of the divisor and the dividend and then negate the quotient if the signs disagree The dividend and the remainder must have the same signs

Considerations The same hardware can be used for both multiply and divide Requirement: 64-bit register that can shift left or right and a 32-bit ALU that can add or subtract

Floating Point Floating point (also called real) numbers are used to represent values that are fractional or that are too big to fit in a 32-bit integer Floating point numbers are expressed in scientific notation (base 2) and are normalized (no leading 0’s) 1.xxxx2 * 2yyyy In this case, xxxx is the significand and yyyy is the exponent

Floating Point In MIPS, a floating point is represented in the following manner (IEEE 754 standard): bit 31: sign of significand bit (8) exponent (2’s comp) bit (23) significand Note that size of exponent and significand must be traded off... accuracy vs. range This allows us representation for signed numbers as small as 2x10-38 to 2x1038 Overflow and underflow must be detected Double-precision floating point numbers are 2 words... the significand is extended to 52 bits and the exponent to 11 bits Also, the first bit of the significand is implicit (only the fractional part is specified) In order to represent 0 in a float, put 0 in the exponent field So here’s the equation we use: (-1)S x (1+Significand) x 2E Or: (-1)S X (1+ (s1x2-1) + (s2x2-2) + (s3x2-3) + (s4x2-4) + ...) x 2E

Considerations IEEE 754 sought to make floating-point numbers easier to sort sign is first bit exponent comes first But we want an all-0 (+1) exponent to represent the most-negative exponent and an all-1 exponent to be the most positive This is called biased-notation, so we’ll use the following equation: (-1)S x (1 + Significand) x 2(Exponent-Bias) Bias is 127 for single-precision and 1023 for double-precision

CS 447 Jason Bakos

Converting Decimal Floating Point to Binary
Use the method I showed last lecture... Significand: Use the iterative method to convert the fractional part to binary Convert the integer part to binary using the “old-fashioned” method Shift the decimal point to the left until the number is normalized Drop the leading 1, and set the exponent to be the number of positions you shifted the decimal point Adjust the exponent for bias (127/1023)

Floating Point Addition
Let’s add two decimal floating point numbers... Let’s try x x 10-1 Assume we can only store 4 digits of the significand and two digits of the exponent

Floating Point Addition
Match exponents for both operands by un-normalizing one of them Match to the exponent of the larger number Add significands Normalize result Round significand

Binary Floating Point Addition

Floating Point Multiplication
Example: x 1010 X x 10-5 Assume 4 digits for significand and 2 digits for exponent Calculate the exponent of the product by simply adding the exponents of the operand 10+(-5)=5 Bias the exponents =259 Something’s wrong! We added the biases with the exponents... 5+127=132

Multiply the significands... 1.110 x 9.200= Normalize and add add 1 to exponent x 106 Round significand to four digits 1.021 Set sign based on signs of operands = x 106

Accurate Arithmetic Integers can represent every value between the largest and smallest possible values This is not the case with floating point Only 253 unique values can be represented with double precision fp IEEE 754 always keeps 2 extra bits on the right of the significand during intermediate calculation called guard and round to minimize rounding errors

Accurate Arithmetic Since the worst case for rounding would be when the actual number is halfway between two floating point representations, accuracy is measured as number of least-significant error bits This is called units in the last place (ulp) IEEE 754 guarantees that the computer is within .5 ulp (using guard and round)

Chapter 4: Arithmetic for Computers (Part 1)

Similar presentations

Presentation on theme: "Chapter 4: Arithmetic for Computers (Part 1)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 4: Arithmetic for Computers (Part 1)

Similar presentations

Presentation on theme: "Chapter 4: Arithmetic for Computers (Part 1)"— Presentation transcript:

Similar presentations

About project

Feedback