Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.

Similar presentations


Presentation on theme: "Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science."— Presentation transcript:

1 Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science

2

3

4 Addition And Subtraction For both the non-negative and two’s complement notations, addition and subtraction are fairly straightforward. However, when the result cannot be represented as an 8-bit value, a problem arises. For the non- negative notation, consider the addition 255+1, 1111 1111 + 0000 0001. Straight binary addition yields the result 1 0000 0000, a 9-bit value, which cannot be stored in an 8-bit register.

5 Overflow Arithmetic overflow: The extra bit generates a carry out from the parallel adder. In non-negative notation, this carry bit can set an overflow flag, signaling the rest of the system that an overflow has occurred, and that the result generated is not entirely correct. The rest of the system can either fix the result of handle the error appropriately.

6

7 Overflow In two’s complement notation an overflow can occur at either end of the numeric range. At the positive end, adding 127+1, 0111 1111 + 0000 0001, yields a result of 1 0000 0000. However, in two’s complement notation, this is -128, not the desired value of +128. In this notation, the key to recognizing overflow is to check not only the carry out, but also the carry in to the most significant bit of the result. If two carries are equal, then there is no overflow.

8 Overflow Overflow only occurs when two numbers with the same sign are added. Adding two numbers with different signs always produces a valid result.

9 Overflow generation in unsigned two’s complement addition 126 01111110 +1 00000001 01111111 00 127 01111111 +1 00000001 10000000 01 -127 10000001 +(-1) 11111111 10000000 11 -128 10000000 +(-1) 11111111 01111111 10

10 n Whether there is an overflow or not depends on the interpretation (signed or unsigned number).

11 each partial product is the product of one bit of the multiplier times the multiplicand. There are as many partial products as there are bits in the multiplier, and there are as many bits in each partial product as there are bits in the multiplicand.

12

13 In looking for an algorithmic statement of this approach to binary multiplication, it was found that a group of Russian peasants used precisely this method to multiply decimal numbers, and as a result, the binary multiplication algorithm given here is commonly known as the Russian Peasant Algorithm:

14

15 The best of these fast multiplication algorithms was developed by HP for the HP PA RISC architecture. In developing the first generation of this architecture, the HP designers concluded that a hardware multiply instruction was not justified, because most multiplies are multiplies by constants and can be replaced by add-and-shift instructions (as on the Hawk) and because the rare multiply of one variable by another could be done quickly enough in software.

16 The HP PA RISC multiply algorithm is based on the following notions: 1.First, do the multiply in base 16, so each step involves multiplying the multiplicand by the least significant 4 bits of the multiplier and then adding the partial product to the result. 2.To multiply the multiplicand by one hex digit of the multiplier, use an efficient case/select control structure to select one of 16 different blocks of code for multiplying by a constant. Each of these blocks will be only a few instructions long because of the use of add-and-shift instructions. 3.To avoid the cost of loop control instructions, note that the fixed word size implies that each multiply can be done in a fixed number of iterations (4 hex digits in a 16 bit multiplier, or 8 hex digits in a 32 bit multiplier). So, just write out the body of the loop 4 or 8 times in series and eliminate any need for loop counters.

17

18

19

20

21

22

23

24

25

26

27

28

29 What are BCD Numbers? Binary Coded Decimal numbers are actually binary numbers that are "coded" to represent decimal numbers Coded numbers are "not real numbering systems". In fact, they are just what they say they are, "codes that represent actual numbers". Although the actual numbers can be mathematically manipulated, codes follow no such rules.

30

31

32

33 Example 1 BCD (365)_10 -------------> 0011 | 0110 | 0101 Example 2.

34 Addition Addition is analogous to decimal addition with normal binary addition taking place from right to left. For example,

35

36

37

38 Where the result of any addition exceeds 9(1001) then six (0110) must be added to the sum to account for the six invalid BCD codes that are available with a 4-bit number. This is illustrated in the example below

39 When one adds two BCD digits, if the binary sum is less than 1010, the corresponding BCD sum digit is correct. if the binary sum is greater than or equal to 1010, add (0110) 2 to the corresponding BCD sum digit and produce a carry.

40

41

42

43

44 8.4.1 Pipelining 1. Data enters a stage of the pipeline, and it will through go through different stages of arithmetic operation till the final computation is completed. Note: Each stage only perform its specific function. 2. Pipeline improves the performance by overlapping computation; that is, each stage operate on different data simultaneously.

45 Compare process time between Pipeline and Non-Pipeline Consider this code: For i = 1 to 100 Do {A[i] (B[i] * C[i]) + D[i]} (assume multiplication & Addition require 10ns) Non-pipeline: 10ns * 2 * 100 = 2000ns

46 An Example on Arithmetic Pipeline Consider the following snippet code: FOR I = 1 to 100 DO {A[I]  (B[I]. C[I]) + D[I]} Assume that each operation, multiplication and addition, requires 10 ns to complete. A non-pipelined uniprocessor take 20 ns to calculate A[I], and 2000 ns to execute the code. A pipeline unit could break this computation into two stages as shown in the next slide.

47 A two-stage pipeline to implement the program loop B[i] C[i] CLK D[i] CLK Stage 1 (*) Stage 2 (+) * Latch +

48 Example : A two-stage pipeline to implement the program loop During the first 10 ns, the first stage calculates B[1].C[1]. In the next 10 ns, stage 2 adds this value to D[1]and stores the result in A[1]. At the same time, stage 1 multiplies B[2] and C[2]. During the following 10 ns, stage 1 forms B[3].C[3] and stage 2 calculate the final value A[2]. Instead of 2000 ns which a non-pipelined uniprocessor would take, this pipeline executes the code in 1010 ns.

49 Speedup for Pipeline A pipeline’s speedup S is the time needed to process n pieces of data using non-pipeline (T 1 ), divided by the time needed to process same data using k-stage pipeline (T k ). Speedup of pipeline can expressed as: n Sn =Sn = nT 1 (k + n -1) T k

50 Continue: Apply Speedup expression using previous example: S 100 = 100 * 20ns (2 + 100 - 1) * 10 ns = 1.98 T 1, is the time to to process n pieces data using non- pipeline (* and +). n TkTk k = Two stages (* and +)

51 Speed up speedupThe most popular metrics used to measure the performance of a pipeline are throughput (the number of results generated per time unit) and speedup. A pipeline’s speedup, Sn, is the amount of time needed to process n pieces of data using a non-pipelined arithmetic unit, divided by the time need to process the same data using a k-stage pipeline. Sn = nT1 / (k + n - 1) x Tk

52 Calculating Speedup Sn = nT1 / (k + n -1) x Tk T1 : the time required to calculate using non-pipeline Tk : is the the clock period of the k-stage pipeline The pipeline unit requires k time units, each of duration Tk, to move the first piece of data to the pipeline. Using the last example, the speedup can be calculated as followed: S100 = 100x20ns / (2 + 100 - 1) x 10ns = 1.98 As n approaches infinity, n / (k + n - 1) approaches 1, so S  = T1 / Tk The maximum speedup occurs when each stage has the same delay.

53 8.4.2 Lookup Tables 1. Theoretically, any combinatorial circuit can be implemented by a ROM configured as a lookup table. 2. The ROM is programmed with data such that the correct values are output for any possible input values. 3. A 4x1 ROM is like a two-input AND gate. The input of the AND gate serves as the address input for ROM, and the output of ROM corresponds to the AND gate output.

54 Lookup ROM equivalent to AND gate A1 ROM D0 A0 Address Data 0 0 0 0 1 0 1 0 0 1 1 1 x & y x y AND x & y x y 4 x 1 All possible And gate inputs are stored in ROM


Download ppt "Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science."

Similar presentations


Ads by Google