Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis 15 th IEEE International Conference on Application Specific Architectures.

Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis 15 th IEEE International Conference on Application Specific Architectures and Processors (ASAP) Farzan Fallah Advanced CAD Research Fujitsu Labs. of America Farzan Fallah Advanced CAD Research Fujitsu Labs. of America Anup Hosangadi Ryan Kastner ECE Department, UCSB Anup Hosangadi Ryan Kastner ECE Department, UCSB

Outline Introduction Arithmetic expressions and polynomial formulation Eliminating multiple variable common subexpressions Results Limitations of proposed technique Conclusions

Introduction Multiplications by constants encountered in many application areas –DSP transforms in Audio, Video, Image processing (DFT, DCT, IDCT etc..) –Filtering operations in Communication (FIR, IIR filters) –Multiple Input Multiple Output (MIMO) systems –Polynomials in Computer graphics

Introduction Multiplication is expensive in hardware Decompose constant multiplications into shifts and additions –13*X = (1101) 2 *X = X + X<<2 + X<<3 Signed digits can reduce the number of additions/subtractions –Canonical Signed Digits (CSD) (Knuth’74) –(57) 10 = (0110111) 2 = (100-1001) CSD Further reduction possible by common subexpression elimination –Upto 50% reduction (R.Hartley TCS’96)

Introduction Common subexpressions = common digit patterns = common digit patterns –F 1 = 7*X = (0111)*X = X + X<<1 + X<<2 F 2 = 13*X = (1101)*X = X + X<<2 + X<<3 F 2 = 13*X = (1101)*X = X + X<<2 + X<<3 –D 1 = X + X<<2 F 1 = D 1 + X<<1 F 1 = D 1 + X<<1 F 2 = D 1 + X<<3 F 2 = D 1 + X<<3 –Good for single variable: FIR filters (transposed form) –Multiple variable? (DFT, DCT etc..??) “0101” => X + X<<2 3+, 3<< 4+, 4<<

Introduction Matrix form of linear systems Y 1 a 11 a 12 a 13 X 1 Y 1 a 11 a 12 a 13 X 1 Y 2 = a 21 a 22 a 23 x X 2 Y 2 = a 21 a 22 a 23 x X 2 Y 3 a 31 a 32 a 33 X 3 Y 3 a 31 a 32 a 33 X 3 101100011101 100101 All Distinct S ij X j and C ik D k Y1Y1 Y2Y2 Y3Y3 Potkonjak TCAD’95

Arithmetic expressions & Polynomial formulation View linear systems as set of arithmetic expressions –Expressions consisting of +,-,<< operators –Develop methodology for extracting common subexpressions Polynomial formulation C×X=  (±X×L i ) (14) (10) ×X=(1110) (2) ×X = X<<3 + X<<2 + X<<1 = XL 3 + XL 2 + XL 1 = (100-10) (CSD) × X = XL 4 - XL (14) (10) ×X=(1110) (2) ×X = X<<3 + X<<2 + X<<1 = XL 3 + XL 2 + XL 1 = (100-10) (CSD) × X = XL 4 - XL

Arithmetic expressions and Polynomial formulation Y 1 = 5 7 X 1 Y 1 = 5 7 X 1 Y 2 4 12 X 2 Y 2 4 12 X 2 Polynomial formulation Polynomial formulation 5 = 0101 7 = 0111 4 = 0100 12 = 1100 Y 1 = (1) X 1 + (2) X 1 L 2 + (3) X 2 + (4) X 2 L + (5) X 2 L 2 Y 2 = (6) X 1 L 2 + (7) X 2 L 2 + (8) X 2 L 3 Y 1 = (1) X 1 + (2) X 1 L 2 + (3) X 2 + (4) X 2 L + (5) X 2 L 2 Y 2 = (6) X 1 L 2 + (7) X 2 L 2 + (8) X 2 L 3 6 <<, 6 +

Digit pattern matching techniques 0 1 0 1 0 1 1 1 0 1 0 0 1 1 0 0 D 1 = X 2 + X 2 <<1 Y 1 = X 1 + X 1 <<2 + D 1 + X 2 <<2 Y 2 = X 1 <<2 + D 1 <<2 D 1 = X 2 + X 2 <<1 Y 1 = X 1 + X 1 <<2 + D 1 + X 2 <<2 Y 2 = X 1 <<2 + D 1 <<2 5 <<, 5 + X1X1 X2X2

Algebraic techniques for factoring and eliminating common subexpressions Algebraic methods in multi-level logic synthesis (MLLS) –Reducing literal count in a set of Boolean expressions –Factoring, decomposition: Established algebraic techniques Can be applied to linear arithmetic expressions as well D 1 = X 1 + X 2 <<2 Y 1 = D 1 + D 1 <<3 + X 1 <<3 Y 2 = D 1 + X 2 <<2

Finding candidate common subexpressions (kernels) Terminology –Divisor: An expression having at least one term with a non-zero exponent of L –eg. X 1 + X 2 L + X 3 L 2 is a divisor –X 1 L + X 2 L 2 + X 3 L 2 is not a divisor –Kernel: Divisor obtained from original expression by division by an exponent of L. –Co-kernel: Exponent of L that is used to obtain the kernel Example –P = X 1 L 3 + X 2 L 3 + X 2 L 2 + X 3 –Division by L 2  kernel = X 1 L + X 2 L + X 2 ; co-kernel = L 2

Kernel generation algorithm » Divide Y 1 by L » Divide again by L » Divide Y 2 by L 2 Y 1 = (1) X 1 + (2) X 1 L 2 + (3) X 2 + (4) X 2 L + (5) X 2 L 2 Y 2 = (6) X 1 L 2 + (7) X 2 L 2 + (8) X 2 L 3 Y 1 = (1) X 1 + (2) X 1 L 2 + (3) X 2 + (4) X 2 L + (5) X 2 L 2 Y 2 = (6) X 1 L 2 + (7) X 2 L 2 + (8) X 2 L 3 Recursively divide by the smallest non-zero exponent of L

Kernel generation All kernels and co-kernels for example linear system ( (1) X 1 + (2) X 1 L 2 + (3) X 2 + (4) X 2 L + (5) X 2 L 2 )[1] ( (2) X 1 L + (4) X 2 + (5) X 2 L)[L] ( (2) X 1 + (5) X 2 )[L 2 ] ( (6) X 1 L 2 + (7) X 2 L 2 + (8) X 2 L 3 )[1] ( (6) X 1 + (7) X 2 + (8) X 2 L)[L 2 ]

Importance of Kernels Theorem: There exists a k-term common subexpression iff there is a k-term “non-overlapping” intersection between at least two kernels Proof –If: Non-overlapping k-term intersection => K-term common subexpression => K-term common subexpression Only If: If there are 2 instances of k-term subexpression Case1: “divisor” => Each instance will be a part of some kernel expression Case2: “non-divisor” => dividing by smallest non-zero exponent of L will convert it into a “divisor”

Kernel generation eg. 10*X = (1010)*X = (1) XL + (2) XL 3 14*X = (1110)*X = (3) XL + (4) XL 2 + (5) XL 3 14*X = (1110)*X = (3) XL + (4) XL 2 + (5) XL 3 –common subexpression = XL + XL 3 = (X + XL 2 )L –kernels involved in intersection: ( (1) X + (2) XL 2 ) ( (3) X + (4) XL + (5) XL 2 )

Overlapping kernels Consider (1001001)*X (1001001)*X = (1) XL 6 + (2) XL 3 + (3) X –Kernels [1] ( (1) XL 6 + (2) XL 3 + (3) X) [L 3 ] ( (1) XL 3 + (2) X) 1 0 0 1 0 0 1

Finding kernel intersections Form Kernel Cube Matrix (KCM) –One row for each kernel generated –One column for each distinct kernel cube –Each non-zero element represents a term 123456 X 1 X 1 L 2 X2X2 X 2 LX2L2X2L2 X1LX1L CoKernelsCoKernels 111 (1) 1 (2) 1 (3) 1 (4) 1 (5) 0 2L001 (4) 1 (5) 01 (2) 3L2L2 01 (5) 000 4L2L2 1 (6) 01 (7) 1 (8) 00 Y 1 = X 1 + X 1 L 2 + X 2 + X 2 L + X 2 L 2 Y 2 = X 1 L 2 + X 2 L 2 + X 2 L 3 X2L2X2L2

Finding kernel intersections Each rectangle with non-overlapping terms = a common subexpression –Rectangle: Set of rows and columns such that all elements are ‘1’ Search only for prime rectangles –Prime rectangle: Rectangle that is not covered by any other rectangle Prime rectangle may have overlapping terms –Find a non-overlapping rectangle within the prime rectangle (MIR = Maximum Irredundant Rectangle) (MIR = Maximum Irredundant Rectangle) Value of a rectangle (R = #Rows, C = #Cols) –Value = # of additions/subtractions saved by selecting rectangle –V(R,C) = (R-1)*(C-1)

Finding kernel intersections Selecting common subexpressions –Greedy selection of most valued non-overlapping rectangle in each iteration –This is very expensive Worst case O(2 MN ) prime rectangles to be considered M = # of expressions; N = Bit-width –Heuristic required (ping-pong) Start with a seed row/column Build rectangle by intersections with other rows/cols Complexity = Linear in #Rows/Columns

Finding kernel intersections 123456 X 1 X 1 L 2 X2X2 X 2 LX2L2X2L2 X1LX1L CoKernelsCoKernels 11 1 (1) 1 (2) 1 (3) 1 (4) 1 (5) 0 2L 001 (4) 1 (5) 01 (2) 3L2L2 01 (5) 000 4L2L2 1 (6) 01 (7) 1 (8) 00 34 7 8 4 5 7 8 OR MIR =

Extracting kernel intersections (1 st Iteration) 123456 X 1 X 1 L 2 X2X2 X 2 LX2L2X2L2 X1LX1L CoKernelsCoKernels 11 1 (1) 1 (2) 1 (3) 1 (4) 1 (5) 0 2L 001 (4) 1 (5) 01 (2) 3L2L2 01 (5) 000 4L2L2 1 (6) 01 (7) 1 (8) 00 Select D 1 = X 1 + X 2 + X 2 L, saves 2 additions!

Extracting Kernel intersections (2 nd iteration) 123456 D 1 X 1 L 2 X2L2X2L2 X 1 X2X2 X2LX2L 111 (1) 1 (2) 1 (3) 000 2L2L2 0001 (2) 1 (3) 0 310001 (5) 1 (6) 1 (7) D 2 = X 1 + X 2 D 1 = D 2 + X 2 <<1 Y 1 = D 1 + D 2 <<2 Y 2 = D 1 <<1 D 2 = X 1 + X 2 D 1 = D 2 + X 2 <<1 Y 1 = D 1 + D 2 <<2 Y 2 = D 1 <<1 Final Implementation 3 <<, 3 +

Experimental Setup Goal –Reduction in #additions/subtractions –Effect on area/latency on synthesis Transforms DCT, IDCT,DFT, DST, DHT. 8x8 constant matrices 16 digits precision (CSD representation) Compare with –Potkonjak (TCAD’95) –RESANDS (Nguyen et. al TVLSI’2000)

Experimental results Example # of additions/subtractions % Improvement over Original(I)RESANDS(II)Potkonjak(III) Our Technique (IV)(I)(II)(III) DCT27420222717436.513.123.3 IDCT24218322216233.011.527.0 R-DFT25319320816534.814.520.7 I-DFT20717819813435.324.732.3 DST32023825220037.516.020.6 DHT28420921117538.416.317.0 Average263.3200.5219.7168.335.916.023.5

Experimental results Synthesis results ( Minimum Latency constraints) Example Area (Library Units) Area (Library Units) Latency (Clock cycles) Latency (Clock cycles) (II)(III)(IV)(II)(III)(IV) DCT906679637573311101111 IDCT818689977166864101111 R-DFT904968477069827101211 I-DFT751408486455940101110 DST10810110649884715111211 DHT939397940971272111111 Average90110919487032210.311.310.8

Limitations of this technique Results dependant on initial representation of constants –Mixed representation Too many: O(3 N ) per constant Factoring of constants –eg. 105*X = 15*7*X = (16-1)*(8-1)*X = ( (X<<4 -1)<<3 – 1) = ( (X<<4 -1)<<3 – 1) –Factoring in general is very hard Common subexpressions with reversed signs –eg. (X 1 – X 2 ) = -(X 2 – X 1 ) cannot be detected

Conclusions Contributions –Novel polynomial transformation –Adapting rectangle covering methods –Single var and multi-var subexpressions eliminated together => better results Future work –Addressing shortcomings of current method –Optimization for timing, power

Conclusions Thank you!! Questions??

Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis 15 th IEEE International Conference on Application Specific Architectures.

Similar presentations

Presentation on theme: "Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis 15 th IEEE International Conference on Application Specific Architectures."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis 15 th IEEE International Conference on Application Specific Architectures.

Similar presentations

Presentation on theme: "Common Subexpression Elimination Involving Multiple Variables for Linear DSP Synthesis 15 th IEEE International Conference on Application Specific Architectures."— Presentation transcript:

Similar presentations

About project

Feedback