# 10/10/2006Catalin Patulea Efficient Arbitrary Bit Length Sign Extension on Intel x86 Evaluating 3½ methods analytically and putting them to the test.

## Presentation on theme: "10/10/2006Catalin Patulea Efficient Arbitrary Bit Length Sign Extension on Intel x86 Evaluating 3½ methods analytically and putting them to the test."— Presentation transcript:

10/10/2006Catalin Patulea Efficient Arbitrary Bit Length Sign Extension on Intel x86 Evaluating 3½ methods analytically and putting them to the test

2 Introduction Sign extension?! What’s that? Integers represented in binary –Positive? Piece of cake! 5 10 = 0101 2 15 10 = 1111 2 341 985 713 10 = 10100011000100100100110110001 2 –Negative? Uh-oh. Need a convention (rule). “2’s complement”: representation such that its “regular” sum with the positive value of the integer yields 0. –Motivation: Reuses most pre-existing arithmetic operation logic and it’s nice and consistent –Consequence: high-order (most significant) bit indicates the sign of the integer –3 10 = 1101 2 since 1101 2 + 0011 2 = 0 (in 4 bits!) –67 10 = 10111101 2 since 10111101 2 + 01000011 2 = 0 (in 8 bits!) in 4 bits! in 8 bits!

3 Introduction (cont’d) Bitfield width is important! ––45 10 in 8 bits is 11010011 2 ––45 10 in 16 bits is 1111111111010011 2 Need to convert from one width to another –Intermediate calculation results from e.g., signed multiplication implemented in software –Storage format different from processor’s native (optimal) register width –Assignment for SYSC 2001! Simply padding with 0’s gets us nowhere: ––45 10 in 8 bits = 11010011 2  0000000011010011 2 = 211 10 ≠ –45 10 Sign extension means: –If most significant bit is 0, number is positive; prepend 0’s to desired width –If MSB is 1, number is negative; prepend 1’s to desired width Let’s concentrate on sign extending 12 bits to 16 bits

4 Method 1 The simple one Advantages –Straightforward Disadvantages –A lot of code (relatively) –Slow because of branching! if (val & 0x800) { val |= 0xF000; } /* else, do nothing */ test ah,8 jz no_or or ah,0F0h no_or: F6 C4 08 74 03 80 CC F0 2 7/3 2 CAssemblyMachine codeClocks ~98 bytes

5 Method 2 Taking advantage of the CPU’s existing ability to sign-extend Advantages –No branches!! –Smaller code Disadvantages –Kind of brute force –Shifting is a relatively “expensive” operation val = (signed short)(val << 4) >> 4; shl ax,4 sar ax,4 66 C1 E0 04 66 C1 F8 04 3333 CAssemblyMachine codeClocks 68 bytes

6 Method 3 Bit-banging Advantages –No branching –Smaller code –Elementary, well optimized operations that are likely to be available on a variety of architectures Disadvantages –Too cool for school? val = (val + 0xF800) ^ 0xF800; add ax,0F800h xor ax,0F800h 66 05 F8 00 66 35 F8 00 2222 CAssemblyMachine codeClocks 48 bytes

7 Method 3b Register-only bit-banging Advantages –Same as method 3 –Even smaller code since we’re using native (default) operand size and registers Disadvantages –Clobbers an additional register val = (val + 0xF800) ^ 0xF800; add eax,ebx xor eax,ebx 03 C3 33 C3 2222 CAssemblyMachine codeClocks 44 bytes Relies on prior initialization of ebx = 0xF800!!

8 Theory == practice? Turns out we weren’t too far off Performance gains aren’t very significant here, but could be important for embedded or high-performance applications such as DSP Values for each method were obtained by averaging 10 timed trials of 16 million (2 24 ) loops of the algorithm each Time is in millions (10 6 ) of RDTSC units on a Pentium Celeron at 1.8 GHz (divide by 1800 to get approximate time in seconds)

9 Questions?

Download ppt "10/10/2006Catalin Patulea Efficient Arbitrary Bit Length Sign Extension on Intel x86 Evaluating 3½ methods analytically and putting them to the test."

Similar presentations