Accuracy-Configurable Adder for Approximate Arithmetic Designs

Slides:

Advertisements

Similar presentations

Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.

Advertisements

(1/25) UCSD VLSI CAD Laboratory - ISQED10, March. 23, 2010 Toward Effective Utilization of Timing Exceptions in Design Optimization Kwangok Jeong, Andrew.

OCV-Aware Top-Level Clock Tree Optimization

-1- VLSI CAD Laboratory, UC San Diego Post-Routing BEOL Layout Optimization for Improved Time- Dependent Dielectric Breakdown (TDDB) Reliability Tuck-Boon.

Timing Margin Recovery With Flexible Flip-Flop Timing Model

UC San Diego / VLSI CAD Laboratory NOLO: A No-Loop, Predictive Useful Skew Methodology for Improved Timing in IC Implementation Tuck-Boon Chan, Andrew.

Yuanlin Lu Intel Corporation, Folsom, CA Vishwani D. Agrawal

UCSD VLSI CAD Laboratory and UIUC PASSAT Group - ASPDAC, Jan. 21, 2010 Slack Redistribution for Graceful Degradation Under Voltage Overscaling Andrew B.

1 A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri.

UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.

Design Sensitivities to Variability: Extrapolations and Assessments in Nanometer VLSI Y. Kevin Cao *, Puneet Gupta +, Andrew Kahng +, Dennis Sylvester.

Architectural-Level Prediction of Interconnect Wirelength and Fanout Kwangok Jeong, Andrew B. Kahng and Kambiz Samadi UCSD VLSI CAD Laboratory

1 Oct 24-26, 2006 ITC'06 Fault Coverage Estimation for Non-Random Functional Input Sequences Soumitra Bose Intel Corporation, Design Technology, Folsom,

NTHU-CS VLSI/CAD LAB TH EDA Student : Da-Cheng Juan Advisor : Shih-Chieh Chang Fine-Grained Sleep Transistor Sizing Algorithm for Leakage Power Minimization.

Chung-Kuan Cheng†, Andrew B. Kahng†‡,

On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.

1 UCSD VLSI CAD Laboratory ISQED-2009 Revisiting the Linear Programming Framework for Leakage Power vs. Performance Optimization Kwangok Jeong, Andrew.

Toward Performance-Driven Reduction of the Cost of RET-Based Lithography Control Dennis Sylvester Jie Yang (Univ. of Michigan,

Jan. 2007VLSI Design '071 Statistical Leakage and Timing Optimization for Submicron Process Variation Yuanlin Lu and Vishwani D. Agrawal ECE Dept. Auburn.

Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.

Accurate Pseudo-Constructive Wirelength and Congestion Estimation Andrew B. Kahng, UCSD CSE and ECE Depts., La Jolla Xu Xu, UCSD CSE Dept., La Jolla Supported.

Topography-Aware OPC for Better DOF margin and CD control Puneet Gupta*, Andrew B. Kahng*†‡, Chul-Hong Park†, Kambiz Samadi†, and Xu Xu‡ * Blaze-DFM Inc.

UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.

UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.

By Praveen Venkataramani Vishwani D. Agrawal TEST PROGRAMMING FOR POWER CONSTRAINED DEVICES 5/9/201322ND IEEE NORTH ATLANTIC TEST WORKSHOP 1.

Enhanced Metamodeling Techniques for High-Dimensional IC Design Estimation Problems Andrew B. Kahng, Bill Lin and Siddhartha Nath VLSI CAD LABORATORY,

UC San Diego / VLSI CAD Laboratory Reliability-Constrained Die Stacking Order in 3DICs Under Manufacturing Variability Tuck-Boon Chan, Andrew B. Kahng,

-1- UC San Diego / VLSI CAD Laboratory Methodology for Electromigration Signoff in the Presence of Adaptive Voltage Scaling Wei-Ting Jonas Chan, Andrew.

Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.

-1- UC San Diego / VLSI CAD Laboratory A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction Kwangsoo.

A New Methodology for Reduced Cost of Resilience Andrew B. Kahng, Seokhyeong Kang and Jiajia Li UC San Diego VLSI CAD Laboratory.

UC San Diego / VLSI CAD Laboratory Toward Quantifying the IC Design Value of Interconnect Technology Improvement Tuck-Boon Chan, Andrew B. Kahng, Jiajia.

Assuring Application-level Correctness Against Soft Errors Jason Cong and Karthik Gururaj.

Copyright © 2008 UCI ACES Laboratory Kyoungwoo Lee 1, Aviral Shrivastava 2, Nikil Dutt 1, and Nalini Venkatasubramanian 1.

UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.

Energy-Efficient Cache Design Using Variable-Strength Error-Correcting Codes Alaa R. Alameldeen, Ilya Wagner, Zeshan Chishti, Wei Wu,

Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.

-1- UC San Diego / VLSI CAD Laboratory Construction of Realistic Gate Sizing Benchmarks With Known Optimal Solutions Andrew B. Kahng, Seokhyeong Kang VLSI.

CML RESIDUE NUMBER SYSTEM ENHANCEMENTS FOR PROGRAMMABLE PROCESSORS Arizona State University Rooju Chokshi 7 th November, 2008 Compiler-Microarchitecture.

A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design

Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.

Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

1 A Cost-effective Substantial- impact-filter Based Method to Tolerate Voltage Emergencies Songjun Pan 1,2, Yu Hu 1, Xing Hu 1,2, and Xiaowei Li 1 1 Key.

Design of an 8-bit Carry-Skip Adder Using Reversible Gates Vinothini Velusamy, Advisor: Prof. Xingguo Xiong Department of Electrical Engineering, University.

A Test Time Theorem and Its Applications Praveen Venkataraman i Suraj Sindia Vishwani D. Agrawal

ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Improving Timing, Area, and Power Speaker: 黃乃珊 Adviser: Prof.

Outline Introduction: BTI Aging and AVS Signoff Problem

-1- Statistical Analysis and Modeling for Error Composition in Approximate Computation Circuits Wei-Ting Jonas Chan 1, Andrew B. Kahng 1, Seokhyeong.

Axilog: Language Support for Approximate Hardware Design DATE 2015 Georgia Institute of Technology Alternative Computing Technologies (ACT) Lab Georgia.

Patricia Gonzalez Divya Akella VLSI Class Project.

UC San Diego / VLSI CAD Laboratory Learning-Based Approximation of Interconnect Delay and Slew Modeling in Signoff Timing Tools Andrew B. Kahng, Seokhyeong.

Harnessing Soft Computation for Low-Budget Fault Tolerance Daya S Khudia Scott Mahlke Advanced Computer Architecture Laboratory University of Michigan,

In-Place Decomposition for Robustness in FPGA Ju-Yueh Lee, Zhe Feng, and Lei He Electrical Engineering Dept., UCLA Presented by Ju-Yueh Lee Address comments.

0 Optimizing Stochastic Circuits for Accuracy-Energy Tradeoffs Armin Alaghi 3, Wei-Ting J. Chan 1, John P. Hayes 3, Andrew B. Kahng 1,2 and Jiajia Li 1.

-1- UC San Diego / VLSI CAD Laboratory Optimization of Overdrive Signoff Tuck-Boon Chan, Andrew B. Kahng, Jiajia Li and Siddhartha Nath Tuck-Boon Chan,

-1- Delay Uncertainty and Signal Criticality Driven Routing Channel Optimization for Advanced DRAM Products Samyoung Bang #, Kwangsoo Han ‡, Andrew B.

-1- UC San Diego / VLSI CAD Laboratory Optimal Reliability-Constrained Overdrive Frequency Selection in Multicore Systems Andrew B. Kahng and Siddhartha.

Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.

Characterizing Processors for Energy and Performance Management Harshit Goyal and Vishwani D. Agrawal Department of Electrical and Computer Engineering,

Proximity Optimization for Adaptive Circuit Design Ang Lu, Hao He, and Jiang Hu.

PROCEED: Pareto Optimization-based Circuit-level Evaluation Methodology for Emerging Devices Shaodi Wang, Andrew Pan, Chi-On Chui and Puneet Gupta Department.

Inexact and Approximate Circuits for Error Tolerant Applications IcySoc RTD 2013 Jérémy Schlachter, Vincent Camus, Christian Enz Ecole polytechnique fédérale.

AN ENHANCED LOW POWER HIGH SPEED ADDER FOR ERROR TOLERANT APPLICATIONS BY K.RAJASHEKHAR, , VLSI Design.

High-Speed Stochastic Circuits Using Synchronous Analog Pulses M

Supervised Learning Based Model for Predicting Variability-Induced Timing Errors Xun Jiao, Abbas Rahimi, Balakrishnan Narayanaswamy, Hamed Fatemi, Jose.

Fine-Grain CAM-Tag Cache Resizing Using Miss Tags

Approximate Fully Connected Neural Network Generation

Circuit Design Techniques for Low Power DSPs

Measuring the Gap between FPGAs and ASICs

Presentation transcript:

Accuracy-Configurable Adder for Approximate Arithmetic Designs Andrew B. Kahng, Seokhyeong Kang VLSI CAD LABORATORY, UC San Diego 49th Design Automation Conference June 6th, 2012

Outline Background and Motivation Accuracy Configurable Adder Design Experimental Setup and Results Conclusions and Ongoing Works

Why Approximate Designs? Threats to traditional IC design approach ... Extreme variations: PVT variation uncertainty lead to design overhead Reliability issues: Hard errors (NBTI, latchup), Soft errors (α-particle) Cost: Cost (power/performance) of perfect accuracy is too high! Approximate designs Relaxing the requirement of correctness can dramatically reduce costs of the design Threats to traditional IC design approach ... Extreme variations / Reliability issues / Cost: Approximate designs Relaxing the requirement of correctness can dramatically reduce costs of the design What is the square root of 10 ? “a little more than three” “3.162278....” Approximation could be faster and more powerful

Previous Approximate Adders Lu et al. IEEE Computer 2004 Faster adder w/ shorter carry chain High performance with small error rate Large area overhead: not applicable for low energy design Zhu et al. TVLSI 2010 ETAI : accurate part + inaccurate part Reduce error size Error rate is high Output accuracy is fixed  benefits can be limited by required accuracy

Our Work: Accuracy-Configurable Approximate Adder How power benefits can be achieved … Accuracy-configurable design adapts to changing requirements by using different modes in each situation

Our Work: Accuracy-Configurable Approximate Adder How power benefits can be achieved … Accuracy-configurable approximate adder approximate adder error collection (ECC-1) error collection (ECC-2) accuracy: 90% accuracy: 95% accuracy: 100% Mode 1: turn-off ECC-1, ECC-2 Mode 2: turn-off ECC-2 Mode 3: turn-on All ECC

Outline Background Motivation Accuracy Configurable Adder Design Experimental Setup and Results Conclusions and Ongoing Works

Approximate Adder Implementation 16-bit adder case Carry chain is cut to reduce critical path delay Sub-adders generate results of partial summation Middle sub-adder improves accuracy (error 50%  5.5%)

Approximate Adder Implementation N-bit adder case carry Probability of correct result : Estimation over CLA (N=16) K 2 3 4 5 6 Min. clock cycle 0.5 0.65 0.75 0.83 0.89 area 0.87 1.05 1.12 1.15 power 0.44 0.68 0.84 0.95 1.00 pass rate 0.554 0.829 0.942 0.982 0.995 Approximate adder can be configured with “k”

Error Detection and Correction Variable latency operation Error can be detected and corrected with small overhead Error detection: ‘and’ gates Error correction: incrementor circuit Error detection and correction can take more time than critical path delay of “sub-adder”; the throughput can be reduced

Accuracy Configuration with Pipeline power gating power gating power gating Each stage generates a result with different accuracy Can turn off later stages with power gating according to accuracy requirement Config. Power- gating Accuracy Power reduction Mode-1 None 1.000 -11.5% Mode-2 Stage 4 0.960 12.4% Mode-3 Stage-3, 4 0.925 31.0% Mode-4 Stage-2, 3, 4 0.900 51.6%

Outline Background Motivation Accuracy Configurable Adder Design Experimental Setup and Results Conclusions and Ongoing Works

Experimental Setup and Metrics Library: TSMC 65GP Implementation: Synopsys Design Compiler Simulation: Cadence NC-SIM Input patterns: random data and actual data Library preparation: Cadence Library Characterizer Accuracy Metrics Metric Definition Data type ACCamp 1-|Rc-Re|/Rc Amplitude data ACCinf 1-Be/Bw Information data Rc and Re : correct and obtained results Be: number of error bits, Bw: bit-width of data

Approximate Adder Comparison Accuracy vs. power consumption Image smoothing (Gaussian filter) Original image Accurate adder ACA (PSNR 24.5dB) ETAI (25.3dB) ETAII (16.2dB) LU (11.1dB) (a) (b) (c) (c)~(f) have 50% power of accurate adder (b) (d) (e) (f) * ETAI cannot detect and correct errors

Approximate Adder Comparison Accuracy vs. power consumption w/voltage scaling Voltage scaling (1.0V~0.6V) ACA adder shows fine results (accuracy vs. power) on both ACCamp and ACCinf metrics

Accuracy Configuration and Power Saving Power saving from voltage scaling + mode change 4-stage 32-bit adder case accurate result Accuracy: 1.0 → 0.9 voltage scaling mode change 4X reduction voltage scaling mode change Accuracy configuration w/ mode change is more effective than w/ voltage scaling

Accuracy Configuration and Power Saving Power consumption when accuracy requirement is varying (w/ SPEC 2006 benchmarks) High accuracy Average 30% power savings over no accuracy configuration

Outline Background Motivation Accuracy Configurable Adder Design Experimental Setup and Results Conclusions and Ongoing Works

Conclusions and Ongoing Works We proposed accuracy-configurable approximate (ACA) adder, which can adapt to changing accuracy requirement ACA can provide 30% power reduction with accuracy configuration during runtime Ongoing Works Accuracy-configurable design for other arithmetic units (multiplier, divider) Automated synthesis flow (minimize power under the required accuracy) RTL Required accuracy exact adder approximate adder Synthesis Accuracy estimation

Thank You!

Accuracy-Configurable Approximate Design Required accuracy can change during runtime Idea of High-Efficiency Math highlighted by Intel Labs at ISSCC-2012 Variable-precision floating point unit w/ accuracy tracking : 24-bit  12-bit  6-bit as needed Variable-precision Mantissa Accuracy-configurable design adapts to changing requirements, maximizing benefits of approximate design paradigm The required accuracy … according to the applications. Intel Labs presented. As shown in this figure, In this work, we propose an accuracy-configurable approximate design.