Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Tor Aamodt and Paul Chow University of Toronto { aamodt, pc

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Tor Aamodt and Paul Chow University of Toronto { aamodt, pc }@eecg.utoronto.ca 3rd ACM International Conference on Compilers, Architectures and Synthesis for Embedded Systems, Nov. 17-18th, 2000, San Jose CA

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation What is this presentation about? ðFOCUS: Signal processing applications developed using high-level language representation and floating-point data types... ðWANT: Faster fixed-point software development... ðQUESTION: Are there “better” fixed-point DSP instruction-sets in terms of runtime, power, or roundoff-noise performance?

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Presentation Outline ðMotivation & Background ðFocus on… Automatic Conversion to Fixed-Point Architectural Enhancements Some Experimental Results ðSummary / Future Directions

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Motivation ð80% of DSPs in use are Fixed-Point. Why? ðBecause fixed-point hardware is cheaper and uses less power … ð… however, it is much harder to develop signal-processing software for.

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Background ðUTDSP Project: DSP Compiler/Architecture Co-design Traditional DSP architectures are hard for compilers to generate efficient code for… eg. extended precision accumulators First Generation Silicon Sept. 30, 1999: 108 pin PGA 0.35 µm CMOS / 63 MHz (Sean Peng’s M.A.Sc.) 16-bit Fixed-Point VLIW DSP with novel 2-level Instruction fetching architecture (reduced pin-count) ðJune 2000: Synopsys CoCentric Fixed-Point Designer Tool First commercial tool for transforming floating-point ANSI C programs into fixed-point ($20,000 US)

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Background: Fixed-Point versus Floating-Point Fractional PartInteger Partsign bit 8 bit exponent (excess 127) 23+1 bit normalized mantissa Fixed-Point: 32 bit Floating-Point (IEEE): implied binary-point explicit binary-point

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Background: Using Fixed-Point Arithmetic y n =  y n-1 + x n y n = ( (  y n-1 >>3 ) + x n ) << 1 Floating-Point: Fixed-Point: Explicit Scaling Operations

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Automatic Conversion Process Input Program ParserOptimizerCode Generator Processor Traditional Optimizing Compiler: CONSTRAINT: Input/Output Invariance GOAL: Application Speedup ie. make code faster, but do not break anything!!!

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Automatic Conversion Process Input Program Parser OptimizerCode Generator Processor Floating-Point to Fixed-Point Translator “RELAX” CONSTRAINTS… GOALS: “Good” Input/Ouput Fidelity (eg. good signal-to-noise ratio) Fast/Low-Power Operation (10-500  faster than FP emulation) Traditional Optimizing Compiler: Sample Inputs

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Floating-Point to Fixed-Point Translation float a, b, x[N]; y = a*x[i] + b*x[i+1]; int a, b, x[N]; y = ax[i] >> 2 + bx[i+1]; 1. Type Conversion 3. Fractional Fixed-Point Operations 2. Scaling Operations

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Floating-Point to Fixed-Point Translator SUIF Parser * *SUIF = Stanford University Intermediate Format See: http://suif.stanford.edu Identifier Assignment Optimizer Instrument Code Profile Sample Inputs Fixed-PointConversion

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Collecting Dynamic Range Information profile(tmp_1,1); profile(tmp_2,2); profile(y,0); Code Instrumentation: Consider the ANSI C code: float a, b, x[N]; y = a*x[i] + b*x[i+1]; tmp_1 = a*x[i]; tmp_2 = b*x[i+1]; y = tmp_1 * tmp_2; ID Assignment: “1” : tmp_1 “2” : tmp_2 “0” : Equivalent Expression Tree: + * * a x[i+1] b x[i] y

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Generating Scaling Operations ðSignal Scaling: Integer Word Length (IWL)  definition: IWL[x] =  log 2 max(x)  + 1 Fractional PartInteger PartSign bit IWL

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Generating Scaling Operations IWL A measured IWL A current IWL A op B measured IWL A op B current IWL B measured IWL B current Converted Sub-Expressions Example: “A op B”: op AB    ?

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Automatic Conversion Process: IRP: Using Intermediate Result Profile Data ðPrevious Algorithms: ‘Worst-Case Evaluation’: Markus Willems et. al. FRIDGE: An Interactive Code Generation Environment for HW/SW CoDesign. ICASSP, April 1997. (a.k.a. Predecessor to Synopsys CoCentric Fixed- Point Designer Tool) A ‘Statistical’ Approach: Ki-Il Kum, Jiyang Kang, and Wonyong Sung. A Floating-Point to Fixed-Point C Converter for Fixed- Point Digital Signal Processors. In Proc. 2 nd SUIF Compiler Workshop, August 1997. ðNeither use Intermediate Result Profile data, instead, they combine range information from leaf nodes  Is Useful Information Lost?

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation IRP: Additive Operations where:n A = IWL A current - IWL A measured n B = IWL A current - IWL B measured n = IWL A measured - IWL B measured “A  B”  “(A > [n-n B ])” IWL A+B current = IWL A measured n “A ± B” B: A: For example, assume |A| > |B|, and IWL A+B measured  IWL A measured >> n

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation IRP: Multiplication “A B”  “(A << n A ) (B << n B )” where:n A = IWL A current - IWL A measured n B = IWL A current - IWL B measured IWL AB current = IWL A measured + IWL B measured

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation IRP: Division “A / B”  “(A >> [n dividend - n A ]) / (B << n B )” n A = IWL A current - IWL A measured n B = IWL A current - IWL B measured n diff = IWL A/B measured - IWL A measured + IWL B measured n dividend = n diff, if n diff  0 0, otherwise

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation IRP-SA: Using ‘Shift Absorption’ Question: Is information discarded unnecessarily here? y = (a*x[i]<<1) + b*x[i+1] Consider the following alternative: Example: y = ( a*x[i] + (b*x[i+1]>>1)) << 1 BUT: Can we really discard most significant bits and get roughly the same answer???? YES!

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Architectural Support Fractional Multiplication with internal Left Shift IWL A + IWL B A*B: IWL B IWL A A: B: Common occurrence (using IRP-SA): AB << n n

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Experimental Results Benchmarks 4 th Order Cascaded/Parallel IIR Filter (IIR-C, IIR-P) (Normalized) Lattice Filter (LAT, NLAT) 128-Point Radix 2 Decimation in Time FFT (FFT-NR, FFT-MW) Levinson-Durbin Recursion (LEVDUR) 10x10 Matrix-Multiply (MMUL10) Nonlinear Control (INVPEND) Trig Function (SIN)

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation SQNR Enhancement: FMLS and/or IRP-SA

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation What Is The Effect of “Shift Absorption” ?

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Experimental Results: Rotational Inverted Pendulum U of T System Control Group Non-linear Testbench

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Closed-Loop System Response: Rotational Inverted Pendulum 12-bit Controller Comparison WC : 32.8 dB IRP-SA: 41.1 dB IRP-SA w/ fmls: 48.0 dB

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation 128-Point Radix-2 FFT (Generated by MATLAB RealTime Workshop)

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Speedup? Rotational Inverted Pendulum: Fractional Multiply Output Shift Relative Frequencies

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation …Yup!

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Speedup* Using FMLS

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation SQNR Enhancement for various Output Shift Sets

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Summary ðThe Fractional Multiply with internal Left Shift (FMLS) operation can improve runtime and signal-to-noise performance. Speedups of up to 35% and SQNR enhancement equivalent of up to 2 bits maybe even 4 bits (depending on how you choose to measure it) ðEasy VLSI implementation, and easy for compiler to use.

of 32 Tor Aamodt & Paul Chow University of Toronto Embedded ISA Support for Enhanced Floating- Point to Fixed-Point ANSI C Compilation Future Directions ðHigher Level Transformations: Automatic Generation of Block-Floating-Point... Quantization Error Feedback… BOTH need signal-flow-graph representation… therefore probably need a better DSP language than ANSI C ðVariable Precision Arithmetic (How much precision does each operation need?)

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Tor Aamodt and Paul Chow University of Toronto { aamodt, pc

Similar presentations

Presentation on theme: "Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Tor Aamodt and Paul Chow University of Toronto { aamodt, pc"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Tor Aamodt and Paul Chow University of Toronto { aamodt, pc

Similar presentations

Presentation on theme: "Embedded ISA Support for Enhanced Floating-Point to Fixed-Point ANSI C Compilation Tor Aamodt and Paul Chow University of Toronto { aamodt, pc"— Presentation transcript:

Similar presentations

About project

Feedback