Download presentation

Presentation is loading. Please wait.

Published byPayton Guise Modified over 4 years ago

1
10/10/2006Catalin Patulea Efficient Arbitrary Bit Length Sign Extension on Intel x86 Evaluating 3½ methods analytically and putting them to the test

2
2 Introduction Sign extension?! What’s that? Integers represented in binary –Positive? Piece of cake! 5 10 = 0101 2 15 10 = 1111 2 341 985 713 10 = 10100011000100100100110110001 2 –Negative? Uh-oh. Need a convention (rule). “2’s complement”: representation such that its “regular” sum with the positive value of the integer yields 0. –Motivation: Reuses most pre-existing arithmetic operation logic and it’s nice and consistent –Consequence: high-order (most significant) bit indicates the sign of the integer –3 10 = 1101 2 since 1101 2 + 0011 2 = 0 (in 4 bits!) –67 10 = 10111101 2 since 10111101 2 + 01000011 2 = 0 (in 8 bits!) in 4 bits! in 8 bits!

3
3 Introduction (cont’d) Bitfield width is important! ––45 10 in 8 bits is 11010011 2 ––45 10 in 16 bits is 1111111111010011 2 Need to convert from one width to another –Intermediate calculation results from e.g., signed multiplication implemented in software –Storage format different from processor’s native (optimal) register width –Assignment for SYSC 2001! Simply padding with 0’s gets us nowhere: ––45 10 in 8 bits = 11010011 2 0000000011010011 2 = 211 10 ≠ –45 10 Sign extension means: –If most significant bit is 0, number is positive; prepend 0’s to desired width –If MSB is 1, number is negative; prepend 1’s to desired width Let’s concentrate on sign extending 12 bits to 16 bits

4
4 Method 1 The simple one Advantages –Straightforward Disadvantages –A lot of code (relatively) –Slow because of branching! if (val & 0x800) { val |= 0xF000; } /* else, do nothing */ test ah,8 jz no_or or ah,0F0h no_or: F6 C4 08 74 03 80 CC F0 2 7/3 2 CAssemblyMachine codeClocks ~98 bytes

5
5 Method 2 Taking advantage of the CPU’s existing ability to sign-extend Advantages –No branches!! –Smaller code Disadvantages –Kind of brute force –Shifting is a relatively “expensive” operation val = (signed short)(val << 4) >> 4; shl ax,4 sar ax,4 66 C1 E0 04 66 C1 F8 04 3333 CAssemblyMachine codeClocks 68 bytes

6
6 Method 3 Bit-banging Advantages –No branching –Smaller code –Elementary, well optimized operations that are likely to be available on a variety of architectures Disadvantages –Too cool for school? val = (val + 0xF800) ^ 0xF800; add ax,0F800h xor ax,0F800h 66 05 F8 00 66 35 F8 00 2222 CAssemblyMachine codeClocks 48 bytes

7
7 Method 3b Register-only bit-banging Advantages –Same as method 3 –Even smaller code since we’re using native (default) operand size and registers Disadvantages –Clobbers an additional register val = (val + 0xF800) ^ 0xF800; add eax,ebx xor eax,ebx 03 C3 33 C3 2222 CAssemblyMachine codeClocks 44 bytes Relies on prior initialization of ebx = 0xF800!!

8
8 Theory == practice? Turns out we weren’t too far off Performance gains aren’t very significant here, but could be important for embedded or high-performance applications such as DSP Values for each method were obtained by averaging 10 timed trials of 16 million (2 24 ) loops of the algorithm each Time is in millions (10 6 ) of RDTSC units on a Pentium Celeron at 1.8 GHz (divide by 1800 to get approximate time in seconds)

9
9 Questions?

Similar presentations

Presentation is loading. Please wait....

OK

Fixed-Point Arithmetics: Part I

Fixed-Point Arithmetics: Part I

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google