Download presentation

Presentation is loading. Please wait.

Published byJocelyn Dawson Modified over 2 years ago

1
1 Introduction to Integer Arithmetic

2
2 Suggested Reading Computer Arithmetic – Behrooz Parhami – Oxford Press, pages (Basic Division Schemes); pages (Division by Convergence) and pages (Square-Rooting Methods) Computer Arithmetic – Behrooz Parhami – Oxford Press, pages (Basic Division Schemes); pages (Division by Convergence) and pages (Square-Rooting Methods) Computer Arithmetic – Digital Computer Arithmetic – Joseph F. F. Cavanagh – McGraw-Hill Computer Arithmetic – Digital Computer Arithmetic – Joseph F. F. Cavanagh – McGraw-Hill Computer Arithmetic Simulator by Israel Koren. Computer Arithmetic Simulator by Israel Koren.

3
3 Numeric Encodings Unsigned & Twos complement Unsigned & Twos complement Programming Implications Programming Implications C promotion rules C promotion rules Basic operations Basic operations Addition, negation, multiplication Programming Implications Programming Implications Consequences of overflow Using shifts to perform power-of-2 multiply/divide Using shifts to perform power-of-2 multiply/divideTopics

4
4 Number Range Decimal X = (x k-1 x k-2 … x 1 x 0.x -1 … x -l ) 10 X min X max (check!) 10 k l 0 Binary Number system X = (x k-1 x k-2 … x 1 x 0.x -1 … x -l ) k - 2 -l Conventional fixed-radix X = (x k-1 x k-2 … x 1 x 0.x -1 … x -l ) r 0r k - r -l ulp = r -l Notation: Unit in the least significant position Unit in the last position

5
5 Representations of signed numbers Signed-magnitude Biased Complement Radix-complement Diminished-radix complement (Digit complement) Twos complementOnes complement r = 2 Most Used Representation [-8, +7] -> [0,15] F.P. Exponent

6
Signed- magnitude Biased Twos complement Ones complement

7
7 Encoding Integers short int x = 15213; short int y = ; C short 2 bytes long C short 2 bytes long Sign Bit Sign Bit For 2s complement, most significant bit indicates sign For 2s complement, most significant bit indicates sign 0 for nonnegative (or positive)0 for nonnegative (or positive) 1 for negative1 for negative Unsigned Twos Complement Sign Bit

8
8 Encoding Example (Cont.) x = 15213: y = :

9
9 Numeric Ranges Unsigned Values Unsigned Values UMin=0 UMin=0000…0 UMax = 2 w – 1 UMax = 2 w – 1111…1 Twos Complement Values TMin= –2 w–1 100…0 TMax = 2 w–1 – 1 011…1 Other Values Minus 1(-1) 111…1 Values for W = 16

10
10 Values for Different Word Sizes Observations Observations |TMin | = TMax + 1 |TMin | = TMax + 1 Asymmetric rangeAsymmetric range UMax=2 * TMax + 1 UMax=2 * TMax + 1 C Programming C Programming #include #include K&R App. B11K&R App. B11 Declares constants, e.g., Declares constants, e.g., ULONG_MAX ULONG_MAX LONG_MAX LONG_MAX LONG_MIN LONG_MIN Values platform-specific Values platform-specific

11
11 Unsigned & Signed Numeric Values Equivalence Equivalence Same encodings for nonnegative values Same encodings for nonnegative values Uniqueness Uniqueness Every bit pattern represents unique integer value Every bit pattern represents unique integer value Each representable integer has unique bit encoding Each representable integer has unique bit encoding XB2T(X)B2U(X) –88 –79 –610 –511 –412 –313 –214 –

12
12 short int x = 15213; unsigned short int ux = (unsigned short) x; short int y = ; unsigned short int uy = (unsigned short) y; Casting Signed to Unsigned C Allows Conversions from Signed to Unsigned C Allows Conversions from Signed to Unsigned Resulting Value Resulting Value No change in bit representation No change in bit representation Nonnegative values unchanged Nonnegative values unchanged ux = 15213ux = Negative values change into (large) positive values ! ! Negative values change into (large) positive values ! ! uy = 50323uy = 50323

13
13 Signed vs. Unsigned in C Constants Constants By default are considered to be signed integers By default are considered to be signed integers Unsigned if have U as suffix Unsigned if have U as suffix 0U, U Casting Casting Explicit casting between signed & unsigned same as U2T and T2U Explicit casting between signed & unsigned same as U2T and T2U int tx, ty; unsigned ux, uy; tx = (int) ux; uy = (unsigned) ty; Implicit casting also occurs via assignments and procedure calls Implicit casting also occurs via assignments and procedure calls tx = ux; uy = ty;

14
14 Sign Extension Task: Task: Given w-bit signed integer x Given w-bit signed integer x Convert it to w+k-bit integer with same value Convert it to w+k-bit integer with same value Rule: Rule: Make k copies of sign bit: Make k copies of sign bit: X = x w–1,…, x w–1, x w–1, x w–2,…, x 0 X = x w–1,…, x w–1, x w–1, x w–2,…, x 0 k copies of MSB X X w w k

15
15 Sign Extension Example Converting from smaller to larger integer data type Converting from smaller to larger integer data type C automatically performs sign extension C automatically performs sign extension short int x = 15213; int ix = (int) x; short int y = ; int iy = (int) y; DecimalHexBinary x B 6D ix B 6D y C iy FF FF C

16
16 Negating with Complement & Increment Claim: Following Holds for 2s Complement Claim: Following Holds for 2s Complement ~x + 1 == -x ~x + 1 == -x Complement Complement Observation: ~x + x == 1111…11 2 == -1 Observation: ~x + x == 1111…11 2 == -1 Increment Increment ~x + x + (-x + 1)==-1 + (-x + 1) ~x + x + (-x + 1)==-1 + (-x + 1) (Adding (-x +1) on both sides of equation ) ~x + 1==-x ~x + 1==-x x ~x

17
17 Comp. & Incr. Examples x =

18
18 Unsigned Addition Standard Addition Function Standard Addition Function Ignores carry output Ignores carry output u v + u + v True Sum: w+1 bits Operands: w bits Discard Carry: w bits UAdd w (u, v)

19
19 Class Exercise - 1 Suppose that you have a number (positive or negative) represented in two´s complement, using 4 bits of word length. Suppose that you have a number (positive or negative) represented in two´s complement, using 4 bits of word length. Specify the steps which are necessary to perform fast division by 2 and obtain the correct result for positive and negative numbers. Specify the steps which are necessary to perform fast division by 2 and obtain the correct result for positive and negative numbers. Remember that arithmetic shift operations are shifts where the sign bit is propagated from right to left. Remember that arithmetic shift operations are shifts where the sign bit is propagated from right to left. Think about 4-bit numbers in twos complement, that is they are in the range [-8 to +7]. Think about 4-bit numbers in twos complement, that is they are in the range [-8 to +7].

20
20 Carry and Overflow Detection in Software for Two´s Complement Arithmetic Detection in Hardware is slightly Different because the result (Sum) bit is not used – Cy and Cy bits are used instead. Detection in Hardware is slightly Different because the result (Sum) bit is not used – Cy i-1 and Cy i-2 bits are used instead. CARRY (Addition or Subtraction) CARRY (Addition or Subtraction) Sign Sign Carry In Carry Out A B Cyin Cyout A i-1 B i-1 Cyin i-1 Cyout i Cyout= A.B + (A Θ B ). Cyin Cyout i = A i-1.B i-1 + (A i-1 Θ B i-1 ). Cyin i-1 OVERFLOW OVERFLOW Sign Sign Carry-in Overflow A B Cyin Ovf A i-1 B i-1 Cyin i-1 Ovf i Ovf = A.B.Cyin + A B. Cyin (hardware method of detection) or the software method of detection: Ovf i = A i-1.B i-1.Cyin i-1 + A i-1 B i-1. Cyin i-1 (hardware method of detection) or the software method of detection: Numbers have equal signs and resulting sign is different from numbers signs It is possible to have a CARRY out and not have an OVERFLOW !!!!!!!!!!!!

21
21 Basic Operations in Two´s Complement: Addition and Subtraction Have the Same Treatment A = -7 ; B= +8 A = -7 ; B= +8 A – B = A + (-B) = ? (4 bits) 1001 (-7) 1001 (-7) (-8) (-8) (CY=1) 0001 (-15) (Borrow=0) OVERFLOW A = +7 ; B = + 8 A = +7 ; B = + 8 A - B = A + (-B) = ? 0111 (+7) 0111 (+7) (-8) (-8) (CY= 0) 1111 (-1) ( Borrow = 1) NO OVERFLOW A= +7 ; B = +6 A= +7 ; B = +6 A - B = A + (-B)? 0111 (+7) 0111 (+7) (-6) (-6) (CY= 1) 0001 (+1) ( Borrow = 0) NO OVERFLOW NOTES: 1 – Multi operand arithmetic (eg. a 16-bit subtraction on a 8-bit microcontroller) demands the use of arithmetic operations which use the CARRY FLAG. 2 – The Hardware has only one flag, which is usually termed CARRY and instructions are usually termed ADDc, SUBc (or SUBnc). 3- Looking to the left side of this slide we can see that actually in subtraction, it has to be SUBTRACT on BORROW (propagate borrow then there is no CARRY). 4 – OVERFLOW (hardware detection) Overflow occurs when the sign bits are zero and there is a carry from the previous bits or when the sign bits are one and there is no carry from the previous bits: Ov = Xn-1.Yn-1.CY´n-1 + X´n-1.Y´n-1.CYn-1 5 – OVERFLOW (Software detection) If both numbers have the same signs and the result of the addition is of a different sign then an overflow occurred !!!

22
22 Multiplication in Two´s Complement It can be easily performed in software by a sequence of multiply-add operations, but it takes many clock cycles. The number of clock cycles is directly proportional to the number of bits of the operands. It can be easily performed in software by a sequence of multiply-add operations, but it takes many clock cycles. The number of clock cycles is directly proportional to the number of bits of the operands. The software algorithm can be improved as it will be seen in the next slide. The software algorithm can be improved as it will be seen in the next slide. If the Microprocessor has a Parallel Combinational Multiplier it can be done in one clock cycle (for single operands) or approximately 6 clock cycles for double-word operands (DSPs and other modern microprocessors have such a multiplier) If the Microprocessor has a Parallel Combinational Multiplier it can be done in one clock cycle (for single operands) or approximately 6 clock cycles for double-word operands (DSPs and other modern microprocessors have such a multiplier) X

23
23 Optimized Multiplication in Software (http://www.convict.lu/Jeunes/Math/Fast_operations.htm) The algorithm is based on a particularity of binary notation. Imagine the multiplying of the base 10 numbers x 10 = 7 and y 10 = 5 x 2 = 111 y 2 = 101, which signifies y 10 = 1* * *2 0 = 1* * *1 2 The distributive rule gives us: 111 * 101 = 111 * (1* *10 + 1*1) = 111*(1*100) + 111*(0*10) + 111*(1*1) The associative and commutative rules give us: = (111*100)*1 + (111*10)*0 + (111*1)*1 In binary notation, multiplying by factors of 2 is equivalent to shifting the number: = 11100* * *1 = = = Thus a simple algorithm may be written for multiplication: Operate the muliplication z = x * y z := 0 z := 0 while y <> 0 do while y <> 0 do is the least significant bit of y 1 ? is the least significant bit of y 1 ? yes: z := z + x; no: continue; yes: z := z + x; no: continue; shift x one digit to the left; shift x one digit to the left; shift y one digit to the right; shift y one digit to the right; Let's now analyze the function MULV8 which may be accessed from within a program by preparing the temporary variables TEMPX and TEMPY, calling the function and finally retrieving the product from the variable RESULT. Let's now analyze the function MULV8 which may be accessed from within a program by preparing the temporary variables TEMPX and TEMPY, calling the function and finally retrieving the product from the variable RESULT. For example, we want our program to compute: z := x * y z := x * y In PIC-assembler this will sound: MOVF x,W MOVWF TEMPX MOVF y,W MOVWF TEMPY CALL MULV8 MOVF RESULT,W MOVWF z MOVF x,W MOVWF TEMPX MOVF y,W MOVWF TEMPY CALL MULV8 MOVF RESULT,W MOVWF z O tempo da Multiplicação (ou número de ciclos) vai depender da configuração dos bits do multipliando e do multiplicador. Por exemplo, a multiplicação por 3 é bastante rápida, pois y logo valerá zero e ele sai do loop.

24
24 Optimized Multiplication in Software – 8 bits here is what the computer will do: clrf means 'clear file' (in PIC-language a file is an 8-bit register) movf 'transfer value from file to itself (F) or the accumulator (W) btfsc means 'skip next instruction if the designed bit is clear') bcf 'bit clear at file' Status,C = CLEAR THE CARRY- FLAG rrf 'rotate right file and store it to itself or the accumulator rlf 'rotate left file and store... movlw 'fill accumulator with litteral value movwf 'transfer value from accumulator to file btfss 'skip next instruction if designed bit is set' Status,Z = ZERO-FLAG SET? MULV8 CLRF RESULT MULU8LOOP MOVF TEMPX,W BTFSC TEMPY,0 ADDWF RESULT BCF STATUS,C RRF TEMPY,F BCF STATUS,C RLF TEMPX,F MOVF TEMPY,F BTFSS STATUS,Z GOTO MULU8LOOP RETURN

25
25 Multiplication for 16 bits ADD16 MOVF TEMPX16,W ADDWF RESULT16 BTFSC STATUS,C INCF RESULT16_H MOVF TEMPX16_H,W ADDWF RESULT16_H RETURN MULV16 CLRF RESULT16 CLRF RESULT16_H MULU16LOOP BTFSC TEMPY16,0 CALL ADD16 BCF STATUS,C RRF TEMPY16_H,F RRF TEMPY16,F BCF STATUS,C RLF TEMPX16,F RLF TEMPX16_H,F MOVF TEMPY16,F BTFSS STATUS,Z GOTO MULU16LOOP MOVF TEMPY16_H,F BTFSS STATUS,Z GOTO MULU16LOOP RETURN ADD16 MOVF TEMPX16,W ADDWF RESULT16 BTFSC STATUS,C INCF RESULT16_H MOVF TEMPX16_H,W ADDWF RESULT16_H RETURN MULV16 CLRF RESULT16 CLRF RESULT16_H MULU16LOOP BTFSC TEMPY16,0 CALL ADD16 BCF STATUS,C RRF TEMPY16_H,F RRF TEMPY16,F BCF STATUS,C RLF TEMPX16,F RLF TEMPX16_H,F MOVF TEMPY16,F BTFSS STATUS,Z GOTO MULU16LOOP MOVF TEMPY16_H,F BTFSS STATUS,Z GOTO MULU16LOOP RETURN

26
26 Fixed-Point Arithmetic Representation Representation Using 2s Complement: Integer Part: 2 m -1 positive values 2 m negative values 2 m negative values Fractional Part: [ 2 -n, 1), with n bits Smallest Number: 2 = Smallest Number: 2 -n = Largest Number: ~ 1 Let us Suppose a Fractional Part with 10 bits The smallest fraction is: = 1/ 1024 ~ ~ The largest fraction is: 1/ / /2 10 = (2 10 –1)/2 10 = 1023/1024 ~ Where is the position of the decimal point ? Depends on the Application S m bits n bits m n

27
27 Class Exercise - 2 Let us suppose an application (a software for construction engineers) that requires objects as small as 1mm, or as large as one medium size building, to be represented on the screen. Consider that the computer has a word length of 16 bits. Let us suppose an application (a software for construction engineers) that requires objects as small as 1mm, or as large as one medium size building, to be represented on the screen. Consider that the computer has a word length of 16 bits. Suppose also that the image used by the application can be rotated by very small degrees (in fractions or radians) to give the illusion of continuous movement (The viewer can navigate inside the building like in a video game). Suppose also that the image used by the application can be rotated by very small degrees (in fractions or radians) to give the illusion of continuous movement (The viewer can navigate inside the building like in a video game). Devise and justify one possible fixed-point representation for this application that would be capable of satisfying the restrictions above. Devise and justify one possible fixed-point representation for this application that would be capable of satisfying the restrictions above. Now suppose a flight simulator. What distances and object sizes can be represented using the same word partition above ? Now suppose a flight simulator. What distances and object sizes can be represented using the same word partition above ?

28
28 Class-Exercise 3 - Represent the following real numbers (in base 10) in twos complement fixed-point arithmetic, with a total of eight bits, being four bits in the fractional part and perform the following operations: A= = B= = A + B = A – B = WHAT IF ?: A =+ 5.5 = B = +7.5 = A + B = A + B =

29
29 Addition and Subtraction Using Fixed-Point Arithmetic Addition: It all happens as if the number being added was an integer number. The integer unit of the ALU is used.Let us consider a number in twos complement with a total of 8 bits and 4 bits in the fractional part: Addition: It all happens as if the number being added was an integer number. The integer unit of the ALU is used.Let us consider a number in twos complement with a total of 8 bits and 4 bits in the fractional part: A= = B= = A+B= = => (propagates from the fractional part into the integer part) A+B= = => (propagates from the fractional part into the integer part) A-B=A+(-B) = > A-B=A+(-B) = > A= = A= = B=-2.375= B=-2.375= A+(-B)=+2.375= A+(-B)=+2.375= WHAT IF ?: A =+ 5.5 = B = +7.5 = A + B =+13.0= OVERFLOW !!!!!!!!!!!!!!!!!!! A + B =+13.0= OVERFLOW !!!!!!!!!!!!!!!!!!!

30
30 Class-Exercise 4 - Represent the following numbers in twos complement fixed-point arithmetic, with a total of eight bits and four bits in the fractional part and obtain their product. The result has to fit into 8 bits using the same representation: A= = B= = A * B =

31
31 Fixed-Point Multiplication Fixed-Point Arithmetic, together with Scaling, is Used to Deal with Integer and Fractional Values. Fixed-Point Arithmetic, together with Scaling, is Used to Deal with Integer and Fractional Values. Overflow can Occur and has to be Treated by Software Overflow can Occur and has to be Treated by Software Multiplication Example: Multiplication Example: A= = B= = A * B = ? NOTES: If x = ( ) Small Number x 2 UNDERFLOW !! y = ( ) Large Number y 2 OVERFLOW !! IntegerFraction IntegerFraction Integer H Fraction H 015 IntegerFraction Fraction L B = A = (A * B) 32 = Integer L (A * B) 16 = SCALING UnderflowOverflow

32
32 Loosing Precision Because of Truncation Consider an Application that uses twos complement with a word length of 16 bits, 5 bits for the integer part and 10 bits for the fractional part. Consider an Application that uses twos complement with a word length of 16 bits, 5 bits for the integer part and 10 bits for the fractional part. Consider that we have to multiply a distance of meters by the cosine of 45 (0.707). Consider that we have to multiply a distance of meters by the cosine of 45 0 (0.707). The correct value should be: * = The correct value should be: * = A= = (16 bits) A= = (16 bits) B= = (16 bits) B= = (16 bits) A * B = (32 bits) A * B = (scaled to 16 ) A * B = Because of Truncation, The result Differs by 2.3%

33
33 Does the Order of Computation Matter? Let us consider: a = 48221; b = and c = 33600, three values that we want to add and scale so that the result is translated into a domain with a 10-bit fractional part. Should we scale before and add afterwards, or vice-versa? Let us consider: a = 48221; b = and c = 33600, three values that we want to add and scale so that the result is translated into a domain with a 10-bit fractional part. Should we scale before and add afterwards, or vice-versa? Scaling Operands Before Performing Computations Scaling Operands Before Performing Computations Result1 = int[a:1024] + int[b:1024] + int [c:1024] = 129 Result1 = int[a:1024] + int[b:1024] + int [c:1024] = 129 Scaling after adding the three values: Scaling after adding the three values: Result2 = int[(a + b + c) : 1024] = 130 Result2 = int[(a + b + c) : 1024] = 130 Result2 is more accurate than Result1 !! Result2 is more accurate than Result1 !! CONCLUSION: Sometimes we have to change the order of the operations to obtain better results. CONCLUSION: Sometimes we have to change the order of the operations to obtain better results.

34
34 Some Comments About Arithmetic Operations on Embedded Systems Many Arithmetic Operations can be Speeded Up by using tables. E.g. trigonometric functions, division Many Arithmetic Operations can be Speeded Up by using tables. E.g. trigonometric functions, division Software tricks (which would be very hard to implement in hardware) can be used in software to speed up arithmetic operations in embedded systems. Software tricks (which would be very hard to implement in hardware) can be used in software to speed up arithmetic operations in embedded systems. One of the techniques is to analyse the operands and take a decision about how many loops to iterate. One of the techniques is to analyse the operands and take a decision about how many loops to iterate.

35
35 Floating-Point x Fixed-Point Floating-Point - provides large dynamic range and Fixed- point does not. What about precision ? Floating-Point - provides large dynamic range and Fixed- point does not. What about precision ? A Floating-Point co-processor is very convenient for the programmer because he(she) does not have to worry about data ranges and alignment of the decimal point, neither overflow or underflow detection. In Fixed-Point, the programmer has to worry about all these problems. A Floating-Point co-processor is very convenient for the programmer because he(she) does not have to worry about data ranges and alignment of the decimal point, neither overflow or underflow detection. In Fixed-Point, the programmer has to worry about all these problems. However, a floating-point unit demands a considerable silicon area and power, which is not commensurate with low-power embedded devices. In fact, most DSP processors have avoided floating-point units because of these restrictions. However, a floating-point unit demands a considerable silicon area and power, which is not commensurate with low-power embedded devices. In fact, most DSP processors have avoided floating-point units because of these restrictions. When there is no Floating-Point unit, most arithmetic and trigonometric functions have to be done in software. When there is no Floating-Point unit, most arithmetic and trigonometric functions have to be done in software.

36
36 Class Exercise 5 Consider the following problem where numbers are in twos complement, 8 bits total and 4 bits in fractional part. Consider the following problem where numbers are in twos complement, 8 bits total and 4 bits in fractional part. A = 7.5 B = 0.25 C = 6.25 We want to do: (A + B + C) / – Can I perform the addition and then divide the result – does the result of the addition fit into the word length? What can I do? What if: What if: A = 0.25 B = 0.5 C = We want to do: (A + B + C) / 1.25 * – What happens if I perform the operations from left to right ? What can I do to avoid loosing significant bits during my operations ?

37
37 Block Floating Point Operations - I Block Floating Point Provides Some of the Benefits of Floating Point Representation, but by Scaling Blocks of Numbers Rather than each Individual Number. Block Floating Point Provides Some of the Benefits of Floating Point Representation, but by Scaling Blocks of Numbers Rather than each Individual Number. Block Floating Point Numbers are Represented by the Full Word Length of a Fixed Point Number. Block Floating Point Numbers are Represented by the Full Word Length of a Fixed Point Number. If Any One of a Block of Numbers Becomes Too Large for the Available Word Length, the Programmer Scales Down all the Numbers in the Block, by Shifting Them to the Right. Example with word length = 8 bits. In the example, variables A, B, and C are the result of some computation and bits 8 and 9 do not fit into the original word length (overflow). To continue use them and maintain their relative values, they have to be scaled as a group (as a block), and undo the scaling operation later. If Any One of a Block of Numbers Becomes Too Large for the Available Word Length, the Programmer Scales Down all the Numbers in the Block, by Shifting Them to the Right. Example with word length = 8 bits. In the example, variables A, B, and C are the result of some computation and bits 8 and 9 do not fit into the original word length (overflow). To continue use them and maintain their relative values, they have to be scaled as a group (as a block), and undo the scaling operation later. Example: A = 10 | Example: A = 10 | B = 00 | C = 01 | Similarly, if the Largest of a Block of Numbers is Small, the Programmer Scales up all the Numbers in the Block to Use the Full Available word length of the Mantissa. Example with word length = 8 bits. A, B and C are the result of some previous computation (where there was an underflow – in yellow). If we scale up the block of variables we do not loose the least significant bits. We have to undo the scale up later to bring the result to its proper domain. Similarly, if the Largest of a Block of Numbers is Small, the Programmer Scales up all the Numbers in the Block to Use the Full Available word length of the Mantissa. Example with word length = 8 bits. A, B and C are the result of some previous computation (where there was an underflow – in yellow). If we scale up the block of variables we do not loose the least significant bits. We have to undo the scale up later to bring the result to its proper domain. Example: A = | 1101 Example: A = | 1101 B = | 1000 C = |

38
38 Block Floating Point Operations - II This Approach is Used to Make the Most of the Mantissa of the Operands and also to Minimize Loss of Significant Bits During Arithmetic Operations with Scaling (Truncation). This Approach is Used to Make the Most of the Mantissa of the Operands and also to Minimize Loss of Significant Bits During Arithmetic Operations with Scaling (Truncation). EXAMPLE: Normalize Operands (Left shift until MSB=1) EXAMPLE: Normalize Operands (Left shift until MSB=1) Values Afterwards 1001 Shared Exponent Values Before 0 SS

39
39Example 16-Bit word processor After converting our Floating-Point Representation into a Fixed- Point Representation, suppose that we have: A = 5000 B = 9000 C = 8000 Suppose that we have to perform: (-B + SQRT (B * B - 4 * A * C) ) / (2 * A) The Intermediate results of (B*B) and (4*A *C) are too big to fit into a 16-bit word. However, it is expected that the result fits into a 16-bit word. The Intermediate results of (B*B) and (4*A *C) are too big to fit into a 16-bit word. However, it is expected that the result fits into a 16-bit word. Thus, we can use a block floating-point representation by shifting the data by the same amount and then perform the operations. A = (5000 >> 10) (divide by 2 or 1024) => Thus, exponent = 10 A = (5000 >> 10) (divide by 2 10 or 1024) => Thus, exponent = 10 B = (9000 >> 10) C = (8000 >> 10) After the operation the result is shifted back by the amount of bits specified by the exponent

40
40 Normalization Operations for 2´s Complement Numbers To use Block Floating Point (and also other arithmetic operations) Normalization Operations are Required To use Block Floating Point (and also other arithmetic operations) Normalization Operations are Required Let us suppose 5-bit 2´s complement numbers. I have to calculate the normalization factor. How do I calculate it ? Let us suppose 5-bit 2´s complement numbers. I have to calculate the normalization factor. How do I calculate it ? Let us try with the numbers +2 (00010) and –2 (11110). Let us try with the numbers +2 (00010) 2 and –2 (11110) 2. For positive numbers I do left shift 2 positions (x4) For positive numbers I do left shift 2 positions (x4) For negative numbers I do left shift 3 positions (x8) For negative numbers I do left shift 3 positions (x8) So I have two different normalization factors ? Does this work ? So I have two different normalization factors ? Does this work ? It works because after the arithmetic operations, the resulting number is right shifted by the same amount. It works because after the arithmetic operations, the resulting number is right shifted by the same amount. Thus, calculate the number of left-shift positions (up to sign bit) for the most significant 1 for positive numbers and for the most significant 0 for negative numbers Thus, calculate the number of left-shift positions (up to sign bit) for the most significant 1 for positive numbers and for the most significant 0 for negative numbers

41
41 Division

42
42Division It is much more Difficult to Accelerate than Multiplication Some Existing Methods of Implementation Are: Shift and Subtract, or Programmed Division (Similar to Paper and Pencil Method) Shift and Subtract, or Programmed Division (Similar to Paper and Pencil Method) Restoring Method Restoring Method Non-Restoring Method Non-Restoring Method Division By Convergence – Obtain the Reciprocate (inverse) of the Divisor by some Convergence Method and Multiply it by the Dividend – Also a software method but it assumes that the Microprocessor has a hardware (very fast) multiplier. Division By Convergence – Obtain the Reciprocate (inverse) of the Divisor by some Convergence Method and Multiply it by the Dividend – Also a software method but it assumes that the Microprocessor has a hardware (very fast) multiplier. Successive Approximation Methods to Obtain the Reciprocate of the Divisor. Successive Approximation Methods to Obtain the Reciprocate of the Divisor. Look-up table for the Reciprocate (Partial or Total) Look-up table for the Reciprocate (Partial or Total) High-Radix Division – Mostly Methods for Implementing in Hardware High-Radix Division – Mostly Methods for Implementing in Hardware

43
43 Programmed (Restoring) Division Example – Integer Numbers ======== INTEGER DIV ===== z (dend) = (117) z (dend) = (117) 10 2d = (10) 2 4 d = (10) 10========================= s s (0) s 2s (0) q d {q 3 =1} s s (1) s 2s (1) q d {q 2 =0} s s (2) s 2s (2) q1.2 4 d {q1=1} s s (3) s 2s (3) q d {q 0 =1} s s (4) s = 7 (remainder) q = 11 (quotient) This method assumes that the dividend has 2n bits and the divisor has n bits. The method is similar to the paper and pencil method. Negative numbers have to be converted to positive first. Firstly, compare the value of the divisor with the higher part of the dividend. If the divisor is larger, shift the dividend, subtract the divisor from the higher part and set the corresponding quotient bit to 1. If the higher part of the shifted dividend is lower than the divisor, do not subtract anything from the higher part of the dividend and set the corresponding quotient bit to 0. The number of iterations is equal to number of bits of the divisor. The remainder is left in the higher part of the dividend

44
44 Programmed (Restoring) Division Example – Fractional (Real) Numbers ======== FRACTIONAL DIV ===== z frac = d frac = ========================= s s (0) s 2s (0) q -1 d {q -1 =1} s s (1) s 2s (1) q -2 d {q -2 =0} s s (2) s 2s (2) q -3 d {q -3 =1} s s (3) s 2s (3) q -4 d {q -4 =1} s s (4) sfrac (remainder) qfrac (quotient) For Fractional, or Real, Numbers, the procedure is exactly the same as for integer numbers. The only difference is that the remainder, which is left in the higher part of the shifted dividend, has to be transferred to the lower part of it to be correct. Them main problem with this method is that it requires a comparison (can be done by subtraction) operation on each step. This implies in more clock cycles than necessary. The next slide shows NonRestoring Division, which is simpler to implement, either in software or in Hardware.

45
45 Nonrestoring Unsigned Division ========================= z = (117)No overflow since in higher part: z = (117) 10 No overflow since in higher part: 2d = (10)(0111) < (1010) 2 4 d = (10) 10 (0111) two < (1010) two -2d d ========================= s s (0) s 2s (0) Positive, +(-2 4 d) so subtract s s (1) s 2s (1) Positive, so set q 3 =1 +(-2 4 d) and subtract s s (2) s 2s (2) Negative, so set q 2 = d and add s s (3) s 2s (3) Positive, so set q 1 =1 +(-2 4 d) and subtract s Positive, so set q=1 s (4) Positive, so set q 0 =1 s = 7 (remainder) q = 11 (quotient) z = Dividend s = Remainder d = Divisor The big Advantage of this Method is that it is easy to test and decide if we have to add or subtract the quotient on each iteration. This means a simple implementation.

46
46 Programmed Division Using Left Shifts – Pseudo ASM Using left shifts, divide unsigned 2k-bit dividend, z_high | z_low, storing the k- bit quotient and remainder. Registers: R0 holds 0 Rc for Counter Rd for divisor Rs for z_high & rem Rd for divisor Rs for z_high & rem Rq for z_low & quotient } Rq for z_low & quotient } {Load operands into regs Rd, Rs and Rq } div:loadRd with divisor loadRs with z_high loadRq with z_low {Check for exceptions } branchd_by_0 if Rd=R0 branch d_ovfl if Rs > Rd {Initialize Counter} load k into Rc {Begin division loop} d_loop: shift Rq left 1 {zero to LSB, MSB to cy} rotate Rs left 1 {cy to LSB, MSB to cy} rotate Rs left 1 {cy to LSB, MSB to cy} skip if carry=1 skip if carry=1 branch no_sub if Rs < Rd branch no_sub if Rs < Rd sub Rd from Rs {2´s compl. Subtract} sub Rd from Rs {2´s compl. Subtract} incr Rq {set quotient digit to1} incr Rq {set quotient digit to1} No_sub: decr Rc {decrement counter by 1} branch d_loop if Rc 0 branch d_loop if Rc 0 {Store the quotient and remainder } store Rq into quotient store Rq into quotient store Rs into remainder store Rs into remainder d_by_0: d_ovfl: d_done: Even though it is an unsigned division, a 2s complement subtraction instruction is required. Ignoring operand load and result store instructions, the function of a divide instruction is accomplished by executing between 6k+3 and 8k+3 machine instructions. For a 16-bit divisor this means well over 100 instructions on average. Rd(divisor) Rs(p.rem) Rq(rem/quot)

47
47 Division Algorithm - 1 (http://www.sxlist.com/techref/microchip/math/div/24by16.htm )

48
48 Division Algorithm – 2 Fast division for PICs If you went through our fast multiplying, now try the fast division if you dare. fast multiplyingfast multiplying The algorithm that has been applied here belongs to the CORDIC family. Also have a look at our CORDIC square-root function. CORDIC square-root functionCORDIC square-root function Normally division-algorithms follow the way, children are tought to operate. Let's take an example: is the numerator, 27 the divisor is the numerator, 27 the divisor : start with the left-most digit: : start with the left-most digit: if 1 < 27 then add the second digit if 1 < 27 then add the second digit if 16 < 27 then add the third digit if 16 < 27 then add the third digit 165 > 27, so integer-divide 165 div 27 = > 27, so integer-divide 165 div 27 = 6 get the remainder, which is * 27 = 3 get the remainder, which is * 27 = 3 now restart at with the remainder now restart at with the remainder With RISC-technology, at assembler level, the tests are operated with substractions, checking whether the results are negative, zero or positive. The integer-division is done by successive substractions until the result is negative. A counter then indicates how often substractions were made. As already pointed out, CORDIC has a very different approach to mathematical operations. The incredible speed of the algorithms are the result from a divide and conquer approach. Practically let's have see how our CORDIC division works: Suppose you want to integer-divide = through 6 10 = numerator base_index := = 1 divisor result:=0 rotate divisor and base_index until the most significant bits of numerator and divisor are equal: rotate divisor and base_index until the most significant bits of numerator and divisor are equal: = = = = = = = = 16 now substract both numerator and altered divisor: now substract both numerator and altered divisor: < 0 < 0 if negative -which is the case here- rotate back divisor and base_index one digit to the right: if negative -which is the case here- rotate back divisor and base_index one digit to the right: = = 8 substract again rotated divisor from numerator: substract again rotated divisor from numerator: , positive remainder , positive remainder now replace the divisor by the remainder: now replace the divisor by the remainder: new numerator:= new numerator:= this time add the base_index to result: this time add the base_index to result: result:= result(0) + 8 = 8 result:= result(0) + 8 = 8 now rotate to the right divisor and base_index one digit: now rotate to the right divisor and base_index one digit: = = 4 substract again: substract again: , remainder positive, so , remainder positive, so new numerator:= new numerator:= result:=result + base_index = 8+4 = 12 result:=result + base_index = 8+4 = 12 rotate: rotate: = = 2 substract : substract : , remainder positive, so , remainder positive, so new numerator:= new numerator:= result:=result + base_index = 12+2 = 14 result:=result + base_index = 12+2 = 14 rotate: rotate: = = 1 substract : substract : < 0, so do nothing < 0, so do nothing stop stop Here PIC 16F84 and 628 code: DIVV8 MOVF TEMPY8,F BTFSC STATUS,Z ;SKIP IF NON-ZERO RETURN CLRF RESULT8 MOVLW 1 MOVWF IDX16 SHIFT_IT8 BCF STATUS,C RLF IDX16,F BCF STATUS,C RLF TEMPY8,F BTFSS TEMPY8,7 GOTO SHIFT_IT8DIVU8LOOP MOVF TEMPY8,W SUBWF TEMPX8 BTFSC STATUS,C GOTO COUNT8 ADDWF TEMPX8 GOTO FINAL8 COUNT8 MOVF IDX16,W ADDWF RESULT8 FINAL8 BCF STATUS,C RRF TEMPY8,F BCF STATUS,C RRF IDX16,F BTFSS STATUS,C GOTO DIVU8LOOP RETURN DIVV8 MOVF TEMPY8,F BTFSC STATUS,Z ;SKIP IF NON-ZERO RETURN CLRF RESULT8 MOVLW 1 MOVWF IDX16 SHIFT_IT8 BCF STATUS,C RLF IDX16,F BCF STATUS,C RLF TEMPY8,F BTFSS TEMPY8,7 GOTO SHIFT_IT8DIVU8LOOP MOVF TEMPY8,W SUBWF TEMPX8 BTFSC STATUS,C GOTO COUNT8 ADDWF TEMPX8 GOTO FINAL8 COUNT8 MOVF IDX16,W ADDWF RESULT8 FINAL8 BCF STATUS,C RRF TEMPY8,F BCF STATUS,C RRF IDX16,F BTFSS STATUS,C GOTO DIVU8LOOP RETURN SUB16 MOVF TEMPY16_H,W MOVWF TEMPYY MOVF TEMPY16,W SUBWF TEMPX16 BTFSS STATUS,C INCF TEMPYY,F MOVF TEMPYY,W SUBWF TEMPX16_H RETURNADD16BIS MOVF TEMPY16,W ADDWF TEMPX16 BTFSC STATUS,C INCF TEMPX16_H,F MOVF TEMPY16_H,W ADDWF TEMPX16_H RETURNDIVV16 MOVF TEMPY16,F BTFSS STATUS,Z GOTO ZERO_TEST_SKIPPED MOVF TEMPY16_H,F BTFSC STATUS,Z RETURNZERO_TEST_SKIPPED MOVLW 1 MOVWF IDX16 CLRF IDX16_H CLRF RESULT16 CLRF RESULT16_HSHIFT_IT16 BCF STATUS,C RLF IDX16,F RLF IDX16_H,F BCF STATUS,C RLF TEMPY16,F RLF TEMPY16_H,F BTFSS TEMPY16_H,7 GOTO SHIFT_IT16DIVU16LOOP CALL SUB16 BTFSC STATUS,C GOTO COUNTX CALL ADD16BIS GOTO FINALX COUNTX MOVF IDX16,W ADDWF RESULT16 BTFSC STATUS,C INCF RESULT16_H,F MOVF IDX16_H,W ADDWF RESULT16_H FINALX BCF STATUS,C RRF TEMPY16_H,F RRF TEMPY16,F BCF STATUS,C RRF IDX16_H,F RRF IDX16,F BTFSS STATUS,C GOTO DIVU16LOOP RETURN... somewhere in the code CALL DIVV16 Note that these programs work only for unsigned variables. Worst case for DIVV8 is about 144 cycles, which at 20 MHz is about 30 microseconds. The interest of this algorithm appears more clearly, if larger variables should be used.

49
49 CORDIC – Square Root Square-root based on CORDIC We explained the CORDIC basics for trig-functions earlier. The solution of exercise 2 of that page will be shown here. But some preliminary explanations. trig-functions Perhaps you know the following card-game: You tell a candidate to select and remind a number from 1 to 31. Then you show him the following five cards one by one. He must answer the question whether the number is yes or no written on that card. By miracle you can tell him the number he chose. The card-order is irrelevant. The trick is to mentally add the first number of each card where he answered YES. Let's take an example: the candidate chooses 23 card 1: Yes, so mind 1 card 1: Yes, so mind 1 card 2: Yes, so add 2 -->3 card 2: Yes, so add 2 -->3 card 3: Yes, add 4 -->7 card 3: Yes, add 4 -->7 card 4: No, do nothing card 4: No, do nothing card 5: Yes, add 16 -->23 card 5: Yes, add 16 -->23 How does this game work? By answering yes or no, the candidate is simply converting the decimal number 23 in a binary number: = = [Yes, Yes, Yes, No, Yes], where Yes=1 and No=0 Each card shows all the numbers with the same binary-digit set to 1. The quiz-master computes the reconversion to decimal by calculating the base-polynomial: 1 x x x x x 2 0 = 1 x x x x x 1 = 23 In fact CORDIC-algorithms are based on this sort of computing. The interest is of course the proximity of the binary-system to computer-systems. Multiplying by 2 is equivalent of shifting the binary number 1 digit to the left. Dividing by 2 is the same as rotating 1 digit to the right: x 2 10 = DIV 2 10 = These shift-operations are extremely quick. NOTE: in only one of our examples this shift-trick is used, for only few higher computer- languages allow access to these low-level functions. But CORDIC has another speed- advantage which comes from the exponential approach. Multiplying and dividing may be reduced to additions and substractions. To compute a square-root with CORDIC the number is yielded by multiplying, adding and testing. To compute a square-root with CORDIC the number is yielded by multiplying, adding and testing. L2^Lyx= initial value x 128 > 12056do nothing x ( ) ( ) 2 > do nothing 38104(96 + 8) ( ) ( ) 2 > do nothing 01109( ) a.s.o.and so on and so on and so on Here a C-routine for integer-square-rooting for numbers between 0 and 65536: int sqrt (int x) int sqrt (int x) { int base, i, y ; int base, i, y ; base = 128 ; base = 128 ; y = 0 ; y = 0 ; for (i = 1; i <= 8; i++) for (i = 1; i <= 8; i++) { { y + = base ; y + = base ; if ( (y * y) > x ) if ( (y * y) > x ) { { y - = base ; // base should not have been added, so we substract again y - = base ; // base should not have been added, so we substract again } } base >> 1 ; // shift 1 digit to the right = divide by 2 base >> 1 ; // shift 1 digit to the right = divide by 2 } } return y ; return y ; } Here a Robolab-version: (you may use our text-based modifiers or use the variable numbers of your choice) text-based modifierstext-based modifiers

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google