Download presentation

Presentation is loading. Please wait.

Published byPayton Guise Modified over 2 years ago

1
10/10/2006Catalin Patulea Efficient Arbitrary Bit Length Sign Extension on Intel x86 Evaluating 3½ methods analytically and putting them to the test

2
2 Introduction Sign extension?! What’s that? Integers represented in binary –Positive? Piece of cake! 5 10 = = = –Negative? Uh-oh. Need a convention (rule). “2’s complement”: representation such that its “regular” sum with the positive value of the integer yields 0. –Motivation: Reuses most pre-existing arithmetic operation logic and it’s nice and consistent –Consequence: high-order (most significant) bit indicates the sign of the integer –3 10 = since = 0 (in 4 bits!) –67 10 = since = 0 (in 8 bits!) in 4 bits! in 8 bits!

3
3 Introduction (cont’d) Bitfield width is important! ––45 10 in 8 bits is ––45 10 in 16 bits is Need to convert from one width to another –Intermediate calculation results from e.g., signed multiplication implemented in software –Storage format different from processor’s native (optimal) register width –Assignment for SYSC 2001! Simply padding with 0’s gets us nowhere: ––45 10 in 8 bits = = ≠ –45 10 Sign extension means: –If most significant bit is 0, number is positive; prepend 0’s to desired width –If MSB is 1, number is negative; prepend 1’s to desired width Let’s concentrate on sign extending 12 bits to 16 bits

4
4 Method 1 The simple one Advantages –Straightforward Disadvantages –A lot of code (relatively) –Slow because of branching! if (val & 0x800) { val |= 0xF000; } /* else, do nothing */ test ah,8 jz no_or or ah,0F0h no_or: F6 C CC F0 2 7/3 2 CAssemblyMachine codeClocks ~98 bytes

5
5 Method 2 Taking advantage of the CPU’s existing ability to sign-extend Advantages –No branches!! –Smaller code Disadvantages –Kind of brute force –Shifting is a relatively “expensive” operation val = (signed short)(val << 4) >> 4; shl ax,4 sar ax,4 66 C1 E C1 F CAssemblyMachine codeClocks 68 bytes

6
6 Method 3 Bit-banging Advantages –No branching –Smaller code –Elementary, well optimized operations that are likely to be available on a variety of architectures Disadvantages –Too cool for school? val = (val + 0xF800) ^ 0xF800; add ax,0F800h xor ax,0F800h F F CAssemblyMachine codeClocks 48 bytes

7
7 Method 3b Register-only bit-banging Advantages –Same as method 3 –Even smaller code since we’re using native (default) operand size and registers Disadvantages –Clobbers an additional register val = (val + 0xF800) ^ 0xF800; add eax,ebx xor eax,ebx 03 C3 33 C CAssemblyMachine codeClocks 44 bytes Relies on prior initialization of ebx = 0xF800!!

8
8 Theory == practice? Turns out we weren’t too far off Performance gains aren’t very significant here, but could be important for embedded or high-performance applications such as DSP Values for each method were obtained by averaging 10 timed trials of 16 million (2 24 ) loops of the algorithm each Time is in millions (10 6 ) of RDTSC units on a Pentium Celeron at 1.8 GHz (divide by 1800 to get approximate time in seconds)

9
9 Questions?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google