Carnegie Mellon Adaptive Mapping of Linear DSP Algorithms to Fixed-Point Arithmetic Lawrence J. Chang Inpyo Hong Yevgen Voronenko Markus Püschel Department.

Slides:



Advertisements
Similar presentations
Algorithm Analysis Input size Time I1 T1 I2 T2 …
Advertisements

Finite Difference Discretization of Elliptic Equations: 1D Problem Lectures 2 and 3.
Hashing.
Fast Algorithms For Hierarchical Range Histogram Constructions
Offline Adaptation Using Automatically Generated Heuristics Frédéric de Mesmay, Yevgen Voronenko, and Markus Püschel Department of Electrical and Computer.
90-723: Data Structures and Algorithms for Information Processing Copyright © 1999, Carnegie Mellon. All Rights Reserved. 1 Lecture 2: Basics Data Structures.
EE663 Image Processing Histogram Equalization Dr. Samir H. Abdul-Jauwad Electrical Engineering Department King Fahd University of Petroleum & Minerals.
MATH 685/ CSI 700/ OR 682 Lecture Notes
Theoretical Program Checking Greg Bronevetsky. Background The field of Program Checking is about 13 years old. Pioneered by Manuel Blum, Hal Wasserman,
Parameterized Timing Analysis with General Delay Models and Arbitrary Variation Sources Khaled R. Heloue and Farid N. Najm University of Toronto {khaled,
Visual Recognition Tutorial
ECIV 201 Computational Methods for Civil Engineers Richard P. Ray, Ph.D., P.E. Error Analysis.
Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources Lerong Cheng 1, Jinjun Xiong 2, and Prof. Lei He 1 1 EE Department, UCLA.
1 Optimisation Although Constraint Logic Programming is somehow focussed in constraint satisfaction (closer to a “logical” view), constraint optimisation.
Complexity Analysis (Part I)
CS107 Introduction to Computer Science
Evaluating Hypotheses
Development of Empirical Models From Process Data
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Data Flow Analysis Compiler Design Nov. 8, 2005.
Lec 5 Feb 10 Goals: analysis of algorithms (continued) O notation summation formulas maximum subsequence sum problem (Chapter 2) three algorithms image.
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
A Low-Power Low-Memory Real-Time ASR System. Outline Overview of Automatic Speech Recognition (ASR) systems Sub-vector clustering and parameter quantization.
Lecture II-2: Probability Review
Kathy Grimes. Signals Electrical Mechanical Acoustic Most real-world signals are Analog – they vary continuously over time Many Limitations with Analog.
Prepared by: Hind J. Zourob Heba M. Matter Supervisor: Dr. Hatem El-Aydi Faculty Of Engineering Communications & Control Engineering.
- 1 - EE898-HW/SW co-design Hardware/Software Codesign “Finding right combination of HW/SW resulting in the most efficient product meeting the specification”
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Ch 8.3: The Runge-Kutta Method
Numerical Computations in Linear Algebra. Mathematically posed problems that are to be solved, or whose solution is to be confirmed on a digital computer.
SPL: A Language and Compiler for DSP Algorithms Jianxin Xiong 1, Jeremy Johnson 2 Robert Johnson 3, David Padua 1 1 Computer Science, University of Illinois.
03/12/20101 Analysis of FPGA based Kalman Filter Architectures Arvind Sudarsanam Dissertation Defense 12 March 2010.
Short Vector SIMD Code Generation for DSP Algorithms
Fixed-Point Arithmetics: Part II
High Performance Linear Transform Program Generation for the Cell BE
Automatic Generation of Staged Geometric Predicates Aleksandar Nanevski, Guy Blelloch and Robert Harper PSCICO project
Combined Uncertainty P M V Subbarao Professor Mechanical Engineering Department A Model for Propagation of Uncertainty ….
Practical Statistical Analysis Objectives: Conceptually understand the following for both linear and nonlinear models: 1.Best fit to model parameters 2.Experimental.
1 Integer transform Wen - Chih Hong Graduate Institute of Communication Engineering National Taiwan University, Taipei,
Analysis of Algorithms
Carnegie Mellon Generating High-Performance General Size Linear Transform Libraries Using Spiral Yevgen Voronenko Franz Franchetti Frédéric de Mesmay Markus.
Round-off Errors.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Chapter.
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
 Embedded Digital Signal Processing (DSP) systems  Specification with floating-point data types  Implementation in fixed-point architectures  Precision.
Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.
CPS120: Introduction to Computer Science Operations Lecture 9.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
ESPL 1 Wordlength Optimization with Complexity-and-Distortion Measure and Its Application to Broadband Wireless Demodulator Design Kyungtae Han and Brian.
Carnegie Mellon High-Performance Code Generation for FIR Filters and the Discrete Wavelet Transform Using SPIRAL Aca Gačić Markus Püschel José M. F. Moura.
Numerical Methods.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Automatic Evaluation of the Accuracy of Fixed-point Algorithms Daniel MENARD 1, Olivier SENTIEYS 1,2 1 LASTI, University of Rennes 1 Lannion, FRANCE 2.
Random Interpretation Sumit Gulwani UC-Berkeley. 1 Program Analysis Applications in all aspects of software development, e.g. Program correctness Compiler.
Carnegie Mellon Program Generation with Spiral: Beyond Transforms This work was supported by DARPA DESA program, NSF-NGS/ITR, NSF-ACR, Mercury Inc., and.
Nonlinear State Estimation
Ch 8.2: Improvements on the Euler Method Consider the initial value problem y' = f (t, y), y(t 0 ) = y 0, with solution  (t). For many problems, Euler’s.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Custom Reduction of Arithmetic in Linear DSP Transforms S. Misra, A. Zelinski, J. C. Hoe, and M. Püschel Dept. of Electrical and Computer Engineering Carnegie.
Confidence Intervals Cont.
Fang Fang James C. Hoe Markus Püschel Smarahara Misra
Chapter 7. Classification and Prediction
CSCI1600: Embedded and Real Time Software
Introduction to Data Structures
Presentation transcript:

Carnegie Mellon Adaptive Mapping of Linear DSP Algorithms to Fixed-Point Arithmetic Lawrence J. Chang Inpyo Hong Yevgen Voronenko Markus Püschel Department of Electrical & Computer Engineering Carnegie Mellon University Supported by NSF awards ACR , SYS , and ITR/NGS

Carnegie Mellon Motivation  Embedded DSP applications (SW and HW) typically use fixed- point arithmetic for reduced power/area and better throughput  Typically DSP algorithms are manually mapped to fixed-point implementation  time consuming, non-trivial task  difficult trade-off between range (to avoid overflow) and precision  usually done using simulations (not an exact science)  Our goal: automatically generate overflow-proof, and accurate fixed-point code (SW) for linear DSP kernels using the SPIRAL code generator

Carnegie Mellon Outline  Background  Approach using SPIRAL  Mapping to Fixed Point Code (Affine Arithmetic)  Accuracy Measure  Probabilistic Analysis  Results

Carnegie Mellon Background: SPIRAL  Generates fast, platform-adapted code for linear DSP transforms (DFT, DCTs, DSTs, filters, DWT, …)  Adapts by searching in the algorithm space and implementation space for the best match to the platform  Floating-point code only  Our goal: extend SPIRAL to generate overflow-proof, accurate fixed-point code S P I R A L DSP transform Formula Generator Formula Compiler Search Engine adapted implementation Performance Eval. runtime

Carnegie Mellon Background: Transform Algorithms  Reduce computation cost from O(n 2 ) to O(n log n) or below  For every transform there are many algorithms  An algorithm can be represented as  Sparse matrix factorization  Data flow DAG (Directed Acyclic Graph)  Program t1 = a * x2 t2 = t1 + x0 t3 = -s * x1 + c * x3 y3 = t2 + t3 y0 = t2 – t3 … Multiplication by constant s addition

Carnegie Mellon  Uses integers to represent fractional numbers:  Operations  Dynamic range:  -2 IB... 2 IB -1  much smaller than in floating-point ) risk of overflow  Problem: for a given application, choose IB (and thus FB) to avoid overflow  We present an algorithm to automatically choose, application dependent, “best” IB (and thus FB) for linear DSP kernels Background: Fixed-Point Arithmetic integer bitsfractional bitssign register width: RW = 1 + IB + FB (typically 16 or 32) IBFB a+b addition multiplication Example (RW=9, IB=FB=4) = = a·b » fb

Carnegie Mellon Outline  Background  Approach using SPIRAL  Mapping to Fixed Point Code (Affine Arithmetic)  Accuracy Measure  Probabilistic Analysis  Results

Carnegie Mellon Overview of Approach  Extension of SPIRAL code generator  Fixed-point mapping: maps floating-point code into fixed-point code, given the input range  Use SPIRAL to automatically search for the fixed-point implementation  with highest accuracy, or  with fastest runtime DSP transform Formula Generator Formula Compiler Search Engine adapted implementation Fixed-Point Mapping Performance Ev runtimeaccuracy input range

Carnegie Mellon Tool: Affine Arithmetic  Basic idea: propagate ranges through the computation (interval arithmetic, IA); each variable becomes an interval  Problem: leads to range overestimation, since correlations between variables are not considered  Solution: affine arithmetic (AA) [1]  represents range as affine expression  captures correlations IA: A(x) = [-M,M] AA: A(x) = c 0 ·E 0 +c 1 ·E 1 +… E i are ranges, e.g.,E i =[-1,1] [1] Fang Fang, Rob A. Rutenbar, Markus Püschel, and Tsuhan Chen Toward Efficient Static Analysis of Finite- Precision Effects in DSP Applications via Affine Arithmetic Modeling Proc. DAC 2003, pp

Carnegie Mellon Algorithm 1 [Range Propagation]  Input: Program with additions and multiplications by constants, ranges of inputs  Output: Ranges of outputs and intermediate results  Denote input ranges by x i with i2 [1, N]  We represent all variables v as affine expressions A:  Traverse all variables from input to output, and compute A: where c i are constants  Variable ranges R=[R min,R max ] are given by

Carnegie Mellon Program t1 = x1 + x2 t2 = x1 - x2 y1 = 1.2 * t1 y2 = -2.3 * t2 y3 = y1 + y2 Affine Expressions A(t1) = x1 + x2 A(t2) = x1 - x2 A(y1) = 1.2 x x2 A(y2) = -2.3 x x2 A(y3) = -1.1 x x2 Computed Ranges R(t1) = [-2,2] R(t2) = [-2,2] R(y1) = [-2.4,2.4] R(y2) = [-2.6,2.6] R(y3) = [-4.6,4.6] Example Given Ranges R(x1) = [-1,1] R(x2) = [-1,1] ranges are exact (not worst cases)

Carnegie Mellon Algorithm 2 [Error Propagation]  Input: Program with additions and multiplications by constants, ranges of inputs  Output: Error bounds on outputs and intermediate results  Denote by  i in [-1,1] independent random error variables  We augment affine expressions A with error terms:  Traverse all variables from input to output, and compute A  : where f i are error magnitude constants  Maximum error is given by f new error variable introduced

Carnegie Mellon Fixed-Point Mapping  Input:  floating point program (straightline code) for linear transform  ranges of input  Output: fixed-point program  Algorithm:  Determine the affine expressions of all intermediate and output variables; compute their maximal ranges  Mode 1: Global format  the largest range determines the fixed point format globally  Mode 2: Local format  allow different formats for all intermediate and output variables  Convert floating-point constants into fixed-point constants  Convert floating-point operations into fixed-point operations  Output fixed-point code

Carnegie Mellon Accuracy Measure  Goal: evaluate a SPIRAL generated fixed-point program for accuracy to enable search for best = most accurate algorithm  Choose input independent accuracy measure: matrix norm matrix for exact (floating-point) program matrix for fixed-point program max row sum norm Note: can be used to derive input dependent error bounds

Carnegie Mellon Outline  Background  Approach using SPIRAL  Mapping to Fixed Point Code (Affine Arithmetic)  Accuracy Measure  Probabilistic Analysis  Results

Carnegie Mellon Probabilistic Analysis leads to a range estimate of Fixed point mapping chooses range conservatively, namely: However: not all values in [-M,M] are equally likely Analysis:  Assume xi are uniformly distributed, independent random variables  Use Central Limit Theorem: A(x) is approximately Gaussian  Extend Fixed-Point Mapping to include a probabilistic mode (range satisfied with given probability p)

Carnegie Mellon Overestimation due to Central Limit Theorem affine expression with: 4 terms 16 terms 64 terms assuming input/error variables are independent

Carnegie Mellon Outline  Background  Approach using SPIRAL  Mapping to Fixed Point Code (Affine Arithmetic)  Accuracy Measure  Probabilistic Analysis  Results

Carnegie Mellon Accuracy Histogram DCT, size 32 10,000 random algorithms Spiral generated  Spread 10x, most within 2x  Need for search

Carnegie Mellon Global vs. Local Mode several transforms local mode a factor of better several transforms

Carnegie Mellon Local vs. Gaussian Local Mode 99.99% confidence for each variable gain: about a factor of 2.5-4

Carnegie Mellon Summary  An automatic method to generate accurate, overflow-proof fixed- point code for linear DSP kernels  Using SPIRAL to find the most accurate algorithm: 2x  Floating-point to fixed-point using affine arithmetic analysis (global, local: 2x, probabilistic: 4x)  16x  Current work:  Extend approach to handle loop code and thus arbitrary size transforms  Refine probabilistic mode to get statements as: prob(overflow) < p  Further down the road:  Fixed-point mapping compiler for more general numerical DSP kernels/applications