Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Lab for Multimedia.

Similar presentations


Presentation on theme: "Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Lab for Multimedia."— Presentation transcript:

1 Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Lab for Multimedia and Security Department of Electrical Engineering Princeton University 18 th IEEE Symposium on Computer Arithmetic (ARITH-18) Montpellier, France, June 25-27, 2007

2 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 2 Background and Motivation Advanced bit manipulations are not well supported by commodity microprocessors  These operations are performed using “programming tricks” (cf. Hacker’s Delight) Bit manipulations play a role in applications of increasing importance We propose a brand new shifter architecture that replaces the shifter with a new unit that directly supports bit manipulation operations

3 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 3 Outline Background and motivation Advanced bit manipulation operations  Delineation and example usage New shift-permute functional unit Summary and conclusions

4 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 4 Advanced Bit Manipulation Instructions Bit Permutation  Butterfly (bfly) and Inverse Butterfly (ibfly) Bit Gather and Bit Scatter  Parallel Extract and Parallel Deposit

5 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 5 Any of the n! permutations of n bits can be done with one pass of bfly and ibfly instructions bfly+ibfly = general permutation circuit 8-bit Butterfly lg(n) stages of n 2:1 MUXes split into n/2 pairs that pass through or swap inputs 8-bit Inverse Butterfly

6 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 6 Bit Gather (Parallel Extract) and Bit Scatter (Parallel Deposit) Parallel Extract  pex r 1 = r 2, r 3  extracts bits from r 2 flagged by 1’s in r 3 and compresses and right justifies in result register Parallel extract maps to ibfly datapath Parallel Deposit  pdep r 1 = r 2, r 3  deposits in the result register, at positions flagged by 1’s in r 3, the right justified bits from r 2 Parallel deposit maps to bfly datapath

7 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 7 Example Usage: Bioinformatics - DNA Sequence Reversal DNA Bases A, C, G and T represented by two bit codes Reversing DNA sequence is equivalent to reversing order of bit pairs  bfly or ibfly permutation 1 ibfly instruction equivalent to 11-23 ALU and shifter instructions  2×(and, and, shift, shift, or) + byte reverse instruction, at minimum

8 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 8 Advanced Bit Manipulation Functional Unit We propose adding a new functional unit to directly perform advanced bit manipulations To minimize the cost, we intend for this new functional unit to replace the shifter unit  Shifter currently performs basic bit manipulation operations Our new functional unit represents an evolution of shifter designs

9 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 9 Basic Bit Manipulation Operations shift r 1 = r 2, s extract r 1 = r 2, pos, len mix r 1 = r 2, r 3 rotate r 1 = r 2, s deposit r 1 = r 2, pos, len

10 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 10 Parallel Extract and Parallel Deposit Parallel Extract Parallel Deposit

11 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 11 Evolution of Shifter Designs Barrel Shifter Log Shifter Our proposed design ?

12 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 12 New Shifter Design Inverse butterfly (or butterfly) circuit enhanced with extra multiplexer stage is basis of new shifter design We will show that either butterfly or inverse butterfly individually can do rotate Rotations are the basic operation underlying shift, extract, deposit and mix  Model other basic bit manipulation operations as rotate + zeroing sign bit propagation or merging

13 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 13 New Shift-Permute Functional Unit Implementation

14 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 14 Configuring Inverse Butterfly for Rotations Hard Problem: generating control bits for rotations on inverse butterfly circuit We derive an expression for the control bits based on recursive function of shift amount, s, and stage number, j

15 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 15 Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit The input is right rotated by 5 after each stage within each subcircuit

16 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 16 Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit After stage 1, input is right rotated by 5 (mod 2) = 1 within each 2-bit subcircuit

17 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 17 Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit After stage 2, input is right rotated by 5 (mod 4) = 1 within each 4-bit subcircuit Bits that wrapped at output of previous stage are swapped

18 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 18 Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit After stage 2, input is right rotated by 5 (mod 4) = 1 within each 4-bit subcircuit Bits that wrapped at output of previous stage are swapped

19 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 19 Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit After stage 3, input is right rotated by 5 Bits that wrapped at output of previous stage are passed through

20 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 20 Rotations in general on n-bit Inverse Butterfly Circuit shift amount, s < n/2 → swap bits that wrapped shift amount, s ≥ n/2 → pass through bits that wrapped

21 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 21 Circuit Implementation of Rotation Control Bit Generator

22 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 22 Comparison to Barrel and Log Shifters BarrelLogIBFLY # of Gatesn2n2 n×log 4 (n)n×lg(n) Control Linesnlg(n)n/2×lg(n) Gate delay (of datapath) 1log 4 (n)lg(n) Mux Width (Capacitance) n4 2 Relative Delay (Logical Effort) 1.16×11.19× Bit Manipulation Capabilities basic basic + advanced

23 PALMS Princeton University Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently 23 Summary and Conclusions We proposed evolving the shifter to a new design using butterfly and inverse butterfly datapaths  New shifter subsumes basic shifter, multimedia shift- permute unit and advanced bit manipulation unit We have shown how to perform basic shifter operations on these datapaths  Rotation control bit generator  Extra multiplexer stage for masking and merging Use of the new shifter design in future microprocessor implementations allows for increased capabilities at only marginal cost


Download ppt "Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Lab for Multimedia."

Similar presentations


Ads by Google