Abbas Rahimi, Luca Benini, Rajesh K. Gupta

Slides:



Advertisements
Similar presentations
Tunable Sensors for Process-Aware Voltage Scaling
Advertisements

International Symposium on Low Power Electronics and Design Energy-Efficient Non-Minimal Path On-chip Interconnection Network for Heterogeneous Systems.
Performance Counter Based Architecture Level Power Modeling ( ) MethodologyResults Motivation & Goals Processor power is increasing.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Subthreshold Logic Energy Minimization with Application- Driven Performance EE241 Final Project Will Biederman Dan Yeager.
Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Jieyi Long and Seda Ogrenci Memik Dept. of EECS, Northwestern Univ. Automated Design.
University of Michigan Electrical Engineering and Computer Science 1 StageNet: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang.
1 paper I design and implementation of the aegis single-chip secure processor using physical random functions, isca’05 nuno alves 28/sep/06.
UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD Laboratory UC San Diego Computer Engineering VLSI CAD.
DDRO: A Novel Performance Monitoring Methodology Based on Design-Dependent Ring Oscillators Tuck-Boon Chan †, Puneet Gupta §, Andrew B. Kahng †‡ and Liangzhen.
From Crash-and-Recover to Sense- and-Adapt: Our Evolving Models of Computing Machines Rajesh K. Gupta UC San Diego.
University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.
Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.
1 A Variability-Aware OpenMP Environment for Efficient Execution of Accuracy-Configurable Computation on Shared-FPU Processor Clusters Abbas Rahimi, Andrea.
Accuracy-Configurable Adder for Approximate Arithmetic Designs
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
Power Reduction for FPGA using Multiple Vdd/Vth
Andrea Marongiu Luca Benini ETH Zurich Daniele Cesarini University of Bologna.
Low-Power Wireless Sensor Networks
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
Safe Overclocking Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor © 2012 Guy Lemieux Alex Brant, Ameer Abdelhadi, Douglas Sim,
Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi.
Low Power – High Speed MCML Circuits (II)
Luca Benini/ UNIBO and ETHZ
XIAOYU HU AANCHAL GUPTA Multi Threshold Technique for High Speed and Low Power Consumption CMOS Circuits.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
1 A Cost-effective Substantial- impact-filter Based Method to Tolerate Voltage Emergencies Songjun Pan 1,2, Yu Hu 1, Xing Hu 1,2, and Xiaowei Li 1 1 Key.
1 Variability.org Aging-Aware Compiler-Directed VLIW Assignment for GPGPU Architectures Abbas Rahimi ‡, Luca Benini †, Rajesh K. Gupta ‡ ‡ UC San Diego,
Outline Introduction: BTI Aging and AVS Signoff Problem
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
11 Online Computing and Predicting Architectural Vulnerability Factor of Microprocessor Structures Songjun Pan Yu Hu Xiaowei Li {pansongjun, huyu,
Patricia Gonzalez Divya Akella VLSI Class Project.
An Integrated GPU Power and Performance Model (ISCA’10, June 19–23, 2010, Saint-Malo, France. International Symposium on Computer Architecture)
Yuxi Liu The Chinese University of Hong Kong Circuit Timing Problem Driven Optimization.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Gopakumar.G Hardware Design Group
Power-Optimal Pipelining in Deep Submicron Technology
YASHWANT SINGH, D. BOOLCHANDANI
CS203 – Advanced Computer Architecture
Raghuraman Balasubramanian Karthikeyan Sankaralingam
Floating-Point FPGA (FPFPGA)
Warped Gates: Gating Aware Scheduling and Power Gating for GPGPUs
Abbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡ ‡CSE, UC San Diego
SECTIONS 1-7 By Astha Chawla
Supervised Learning Based Model for Predicting Variability-Induced Timing Errors Xun Jiao, Abbas Rahimi, Balakrishnan Narayanaswamy, Hamed Fatemi, Jose.
/ Computer Architecture and Design
Microarchitectural Techniques for Power Gating of Execution Units
Flavius Gruian < >
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
The University of British Columbia
Challenges in Nanoelectronics: Process Variability
Timing Analysis 11/21/2018.
Hyunchul Park, Kevin Fan, Manjunath Kudlur,Scott Mahlke
A. Rahimi, A. Marongiu, P. Burgio, R. K. Gupta, L. Benini
Day 26: November 1, 2013 Synchronous Circuits
Circuits Aging Min Chen( ) Ran Li( )
Yiyu Shi*, Jinjun Xiong+, Howard Chen+ and Lei He*
Post-Silicon Tuning for Optimized Circuits
Circuits Aging Min Chen( ) Ran Li( )
Impact of Parameter Variations on Multi-core chips
FPGA Glitch Power Analysis and Reduction
†UCSD, ‡UCSB, EHTZ*, UNIBO*
Post-Silicon Calibration for Large-Volume Products
ECE 352 Digital System Fundamentals
Guihai Yan, Yinhe Han, and Xiaowei Li
A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P
Abbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡ ‡CSE, UC San Diego
Presentation transcript:

Hierarchically Focused Guardbanding: An Adaptive Approach to Mitigate PVT Variations and Aging Abbas Rahimi, Luca Benini, Rajesh K. Gupta UC San Diego and Università di Bologna

Rajesh K. Gupta / UC San Diego Outline Device Variability Process, voltage, and temperature, and aging Resilient Techniques Hierarchically Focused Guardbanding Analysis Flow for Timing Error Rate Parametric Model Fitting Hierarchical Sensors Observability Online Utilization of HFG Throughput improvement Conclusion 14-Nov-18 Rajesh K. Gupta / UC San Diego

Ever-increasing PVTA Variations Variability in transistor characteristics is a major challenge in nanoscale CMOS, PVTA Static Process variation: effective transistor channel length and threshold voltage Dynamic variations: Temperature fluctuations, supply Voltage droops, and device Aging (NBTI, HCI) To handle variations designers use conservative guardbands  loss of operational efficiency  Temperature Clock actual circuit delay guardband Aging VCC Droop Across-wafer Frequency 14-Nov-18 Rajesh K. Gupta / UC San Diego

Rajesh K. Gupta / UC San Diego Resilient Techniques Sense & Adapt Observation using in situ monitors (Razor, EDS) with cycle-by-cycle corrections (leveraging CMOS knobs or replay) Predict & Prevent Relying on external or replica monitors Model-based rule  derive adaptive guardband to prevent error Sense (detect) Adapt (correct) Sensors Model Prevent 14-Nov-18 Rajesh K. Gupta / UC San Diego

Rajesh K. Gupta / UC San Diego Our Resilient View Sense & Adapt We have done cross-layer vulnerability analysis: Manifestation of variability from instruction-level to task-level Model & Prevent In this work, we present Hierarchically Focused Guardbanding (HFG), a model-based rule to derive guardband adaptively, for avoiding PVTA-induced timing error. [ILV] A. Rahimi, L. Benini, R. K. Gupta, “Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations,” DATE, 2012. [SLV] A. Rahimi, L. Benini, R. K. Gupta, “Application-Adaptive Guardbanding to Mitigate Static and Dynamic Variability,” IEEE Tran. on Computer, 2013. [PLV] A. Rahimi, L. Benini, R. K. Gupta, “Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters,” ISLPED, 2012. [TLV] A. Rahimi, A. Marongiu, P. Burgio, R. K. Gupta, L. Benini, “Variation-Tolerant OpenMP Tasking on Tightly-Coupled Processor Clusters,” DATE, 2013. 14-Nov-18 Rajesh K. Gupta / UC San Diego

Rajesh K. Gupta / UC San Diego Contributions A new high-level model for Timing Error Rate of various integer as well as floating-point functional units (FUs) in presence of PVTA variations. Online: a model-based rule to derive guardband from the PVTA sensor readings Offline: identifying vulnerable FUs Notion of Hierarchically “Focused” Guardbanding (HFG) which is guided by online utilization of the model in view of monitors, observation granularity, and reaction times. Applying HFG on GPU at two distinct granularities: Fine-grained granularity of instruction-by-instruction monitoring and adaptive guardbanding Coarse-grained granularity of kernel-level monitoring and adaptive guardbanding 14-Nov-18 Rajesh K. Gupta / UC San Diego

HFG Analysis Flow for TER The model takes into account PVTA parameter variations Clock frequency Physical details of Placed-and-Routed FUs in 45nm TSMC technology Analyzed FUs: 10 32-bit integer 15 single precision floating-point (fully compatible with the IEEE 754 standard) A full permutation of PVTA parameters and clock frequency are applied. For each FUi working with tclk and a given PVTA variations, we defined Timing Error Rate (TER): CriticalPaths are those paths with a negative slack that cannot meet the setup-time of flip-flops with the clock period of tclk under certain PVTA variations, and Σ Paths is the total number of paths in FUi. After the back-end optimizations, dur-ing the sign-off, we calculate TER by analysis of FU PVTA parameter variations Start Point End Point Step # of Points Voltage 0.88V 1.10V 0.01V 23 Temperature 0°C 120°C 10°C 13 Process (σWID) 0% 9.6% 3.2% 4 Aging (∆Vth) 0mV 100mV 25mV 5 tclk 0.2ns 5.0ns 25 14-Nov-18 Rajesh K. Gupta / UC San Diego

Parametric Model Fitting PVTA   Linear discriminant analysis tclk HFG ASIC Analysis Flow for TER TER Classes of TER TER Class TER=0% 33%>= TER >0% 66%>= TER >33% 100%>= TER >66% Class0 (C0) ClassLow (CL) ClassMedium (CM) ClassHigh (CH) Parametric Model We used Supervised learning (linear discriminant analysis) to generate a parametric model at the level of FU that relates PVTA parameters variation and tclk to classes of TER. On average, for all FUs the resubstitution error is 0.036, meaning the models classify nearly all data correctly. For extra characterization points, the model makes correct estimates for 97% of out-of-sample data. The remaining 3% is misclassified to the high-error rate class, CH, thus will have safe guardband. 14-Nov-18 Rajesh K. Gupta / UC San Diego

Delay Variation and TER Characterization During design time the delay of the FP adder has a large uncertainty of [0.73ns,1.32ns], since the actual values of PVTA parameters are unknown. 14-Nov-18 Rajesh K. Gupta / UC San Diego

Hierarchical Sensors Observability The question is that mix of monitors that would be useful? The more sensors we provide for a FU, the better conservative guardband reduction for that FU. Sensor overheads: In-situ PVT sensors impose 1−3% area overhead [Bowman’09] Five replica PVT sensors increase area of by 0.2% [Lefurgy’11] The banks of 96 NBTI aging sensors occupy less than 0.01% of the core's area [Singh’11] The guardband of FP adder can be reduced up to 8% (P_sensor), 24% (PA_sensors), 28% (PAT_sensors), 44% (PATV_sensors) 14-Nov-18 Rajesh K. Gupta / UC San Diego

Online Utilization of HFG The control system tunes the clock frequency through an online model-based rule. To support fast controller's computation, the parametric model generates distinct Look Up Tables (LUTs) for every FUs We apply HFG to architecture at two granularities Fine-grained granularity of instruction-by-instruction monitoring and adaptation that signals of PATV sensors come from individual FUs Coarse-grained granularity of kernel-level monitoring uses a representative PATV sensors for the entire execution stage of pipeline Since TER characterization considers the static critical paths (which might not be activated during execution of certain dynamic inputs), the model always returns an upper bound of the actual TER, thus returned tclk of LUTs guarantees the target TER. 14-Nov-18 Rajesh K. Gupta / UC San Diego

Throughput benefit of HFG At kernel-level monitoring, on average, the throughput increases by 70%, when the PE moves from only P_sensor to PATV_sensors scenario. The target TER is set to “0” in preference to the error-intolerant applications. Instruction-by-instruction monitoring and adaptation improves the throughput by 1.8×−2.1× depends to the PATV sensors configuration and kernel's instructions. 14-Nov-18 Rajesh K. Gupta / UC San Diego

Rajesh K. Gupta / UC San Diego Conclusion We present a model ‡ and its usage for online variation-aware resource management as well as design time analysis of vulnerable functional units through an accurate 45nm TSMC flow. The model is used as an adaptive resource management technique to proactively prevent timing error by applying a focused guardbanding. We demonstrate the effectiveness of HFG on GPU architecture at two granularities of observation and adaptation: (i) fine-grained instruction-level; and (ii) coarse-grained kernel-level. ‡publicly available for download at: http://mesl.ucsd.edu/site/PVTA_MODELS/models.htm 14-Nov-18 Rajesh K. Gupta / UC San Diego

Rajesh K. Gupta / UC San Diego Thank You! ERC MultiTherman NSF Variability Expedition 14-Nov-18 Rajesh K. Gupta / UC San Diego