Abbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡ ‡CSE, UC San Diego

Slides:



Advertisements
Similar presentations
Tunable Sensors for Process-Aware Voltage Scaling
Advertisements

ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects Sarangi et al Prateeksha Satyamoorthy CS
CENTRAL PROCESSING UNIT
IVF: Characterizing the Vulnerability of Microprocessor Structures to Intermittent Faults Songjun Pan 1,2, Yu Hu 1, and Xiaowei Li 1 1 Key Laboratory of.
Decomposition of Instruction Decoder for Low Power Design TingTing Hwang Department of Computer Science Tsing Hua University.
Memory Redundancy Elimination to Improve Application Energy Efficiency Keith Cooper and Li Xu Rice University October 2003.
DLX Instruction Format
EECC551 - Shaaban #1 Spring 2004 lec# Definition of basic instruction blocks Increasing Instruction-Level Parallelism & Size of Basic Blocks.
VHDL Synthesis of a MIPS-32 Processor Bryan Allen Dave Chandler Nate Ransom.
The Processor Andreas Klappenecker CPSC321 Computer Architecture.
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Pipelining By Toan Nguyen.
Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.
1 A Variability-Aware OpenMP Environment for Efficient Execution of Accuracy-Configurable Computation on Shared-FPU Processor Clusters Abbas Rahimi, Andrea.
Andrea Marongiu Luca Benini ETH Zurich Daniele Cesarini University of Bologna.
UC San Diego / VLSI CAD Laboratory Toward Quantifying the IC Design Value of Interconnect Technology Improvement Tuck-Boon Chan, Andrew B. Kahng, Jiajia.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Safe Overclocking Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor © 2012 Guy Lemieux Alex Brant, Ameer Abdelhadi, Douglas Sim,
Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi.
Low Power – High Speed MCML Circuits (II)
CSE 340 Computer Architecture Summer 2014 Basic MIPS Pipelining Review.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Outline Introduction: BTI Aging and AVS Signoff Problem
CS /02 Semester II Help Session IIA Performance Measures Colin Tan S
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
D ATA P ATH OF A PROCESSOR (MIPS) Module 1.1 : Elements of computer system UNIT 1.
Hrushikesh Chavan Younggyun Cho Structural Fault Tolerance for SOC.
Patricia Gonzalez Divya Akella VLSI Class Project.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
CSE431 L06 Basic MIPS Pipelining.1Irwin, PSU, 2005 MIPS Pipeline Datapath Modifications  What do we need to add/modify in our MIPS datapath? l State registers.
Introduction to Computer Organization Pipelining.
UNIT III -PIPELINE.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Stored Program Concept Learning Objectives Learn the meaning of the stored program concept The processor and its components The fetch-decode-execute and.
Protection in Virtual Mode
Abbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡ ‡CSE, UC San Diego
Supervised Learning Based Model for Predicting Variability-Induced Timing Errors Xun Jiao, Abbas Rahimi, Balakrishnan Narayanaswamy, Hamed Fatemi, Jose.
Morgan Kaufmann Publishers
Performance of Single-cycle Design
ELEN 468 Advanced Logic Design
Morgan Kaufmann Publishers The Processor
CDA 3101 Spring 2016 Introduction to Computer Organization
A Review of Processor Design Flow
The University of British Columbia
Abbas Rahimi, Luca Benini, Rajesh K. Gupta
Pipelining and Vector Processing
Challenges in Nanoelectronics: Process Variability
Superscalar Processors & VLIW Processors
A. Rahimi, A. Marongiu, P. Burgio, R. K. Gupta, L. Benini
CS 704 Advanced Computer Architecture
Rocky K. C. Chang 6 November 2017
Circuit Design Techniques for Low Power DSPs
Overheads for Computers as Components 2nd ed.
Overview Parallel Processing Pipelining
†UCSD, ‡UCSB, EHTZ*, UNIBO*
Post-Silicon Calibration for Large-Volume Products
Central Processing Unit
Overview What are pipeline hazards? Types of hazards
Pipelining: Basic Concepts
Reducing pipeline hazards – three techniques
MIPS Assembly.
Low Power Digital Design
Measuring the Gap between FPGAs and ASICs
Guest Lecturer: Justin Hsia
Presentation transcript:

Analysis of ISA and Instruction Sequence Vulnerability to Dynamic Voltage and Temperature Variations Abbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡ ‡CSE, UC San Diego †DEIS, Università di Bologna http://mesl.ucsd.edu http://variability.org

Our Cross-layer Vision Instruction-level Vulnerability (ILV) Sequence-level Vulnerability (SLV) Procedure-level Vulnerability (PLV) Our Cross-layer Vision [ILV] A. Rahimi, L. Benini, R. K. Gupta, “Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations,” DATE, 2012. [SLV] A. Rahimi, L. Benini, R. K. Gupta, “Application-Adaptive Guardbanding to Mitigate Static and Dynamic Variability,” TC, 2013.

Agenda Dynamic Voltage and Temperature Variation Delay Variability Among Pipeline Stages Instruction-level Vulnerability (ILV) Sequence-level Vulnerability (SLV) Classification of Instructions Classification of Sequence of Instructions Adaptive Guardbanding Utilizing ILV and SLV Experimental Results

Increasing Dynamic Variations Increasing dynamic environmental variations in ambient condition such as temperature fluctuations and supply voltage droops. Dynamic Variations contain high-frequency and low-frequency components which occur locally as well as globally across the die.

Quantifying Effects of Operating Conditions We analyze the effect of a full range of operating conditions (a temperature range of -40C−125C, and a voltage range of 0.72V−1.1V) on the delay and power of LEON ‎processor in 65nm TSMC. Critical path (ns) Dynamic variations cause the critical path delay to increase by a factor of 6.1×. Consequently, a large conservative guardband into the operating frequency is needed to ensure the error-free operation in presence of the dynamic variations.

Delay Variability Among Pipeline Stages T= 125°C The execute and memory parts are very sensitive to voltage and temperature variations, and also exhibit a large number of critical paths in comparison to the rest of processor. Similarly, we anticipate that the instructions that significantly exercise the execute and memory stages are likely to be more vulnerable to voltage and temperature variations Instruction-level Vulnerability (ILV) VDD= 1.1V

Methodology for ISA-level and Sequence-level Analysis For SPARC V8 instructions (V, T, F) are varied and ILVi is evaluated for every instructioni with random operands SLVi is evaluated for a high-frequent sequencei of instructions

ILV and SLV Metadata The ILV (SLV) for each instructioni (sequencei) at every operating condition is quantified: where Ni (Mi) is the total number of clock cycles in Monte Carlo simulation of instructioni (sequencei) with random operands. Violationj indicates whether there is a violated stage at clock cyclej or not. ILVi (SLVi) defines as the total number of violated cycles over the total simulated cycles for the instructioni (sequencei).

Classification of Instructions, cont. ILV at 0.88V, while varying temperature: (V, T) (0.88V, -40°C) (0.88V, 0°C) (0.88V, 125°C) Cycle time (ns) 1 1.02 1.06 1.08 1.10 1.12 1.04 1.16 1.18 Logical & Arithmetic add and or sll sra srl sub xnor xor Mem load 0.824 0.707 0.796 store 0.847 0.743 0.823 Mul.&Div mul 0.996 0.064 0.027 0.017 0.065 0.018 0.876 0.016 06 div 0.991 0.989 0.984 0.994 0.973 Instructions are partitioned into three main classes: (i) Logical & arithmetic; (ii) Memory; (iii) Multiply & divide. The 1st class shows an abrupt behavior when the clock cycle is slightly varied, mainly because the path distribution of the exercised part by this class is such that most of the paths have the same length, then we have all-or-nothing effect, which implies that either all instructions within this class fail or all make it.

Classification of Instructions ILV at 0.72V, while varying temperature: Corners (0.72V, -40°C) (0.72V, 0°C) (0.72V, 125°C) Cycle time (ns) 4.10 4.12 4.14 4.16 3.58 3.60 3.62 3.64 3.66 2.88 2.90 2.92 2.94 2.98 3.00 3.20 Logical & Arithmetic add 1 and or sll sra srl sub xnor xor Mem load 0.823 0.796 store 0.847 Mul.&Div mul 0.995 0.996 0.994 0.998 0.997 div 0.812 0.993 0.991 All instruction classes act similarly across the wide range of operating conditions: as the cycle time increases gradually, the ILV becomes 0, firstly for the 1st class, then for the 2nd class, and finally for the 3rd class. For every operating conditions ILV (3rd Class) ≥ ILV (2nd Class) ≥ ILV (1st Class)

Classification of Sequence of Instructions SLV at (0.81V, 125C) CT (ns) Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10 Seq11 Seq12 Seq13 Seq14 Seq15 Seq16 Seq17 Seq18 Seq19 Seq20 1.26 1 1.27 0.690 1.28 1.29 1.30 1.31 1.32 1.33 0.878 0.811 0.881 0.880 0.884 0.892 0.877 0.859 0.879 0.758 0.883 0.952 0.805 0.810 1.34 0.366 0.515 0.512 0.393 0.429 01 03 0.403 0.407 1.35 The top 20 high-frequent sequences (Seq1-Seq20) are extracted from 80 Billion dynamic instructions of 32 benchmarks. Sequences are classified into two classes based on their similarities in SLV values: Class I (Seq1-Seq19) is a mixture of all types of instructions including the memory, arithmetic/logical, and control instructions. Class II (Seq20) only consists of the arithmetic/logical instructions.

Classification of Sequence of Instructions (V,T) = (0.81V, 125°C) (V,T) = (0.72V, 125°C) CT (ns) Class I Class II 1.26 1 1.78 1.27 0.69 1.79 0.58 1.28 1.80 1.29 1.81 1.30 1.82 1.31 1.83 1.32 1.84 0.81 1.33 1.85 0.13 1.34 1.86 1.35 1.87 For every operating conditions: SLV (Class I) ≥ SLV (Class II) The sequence classification is consistent across operating corners. SLV value of two classes of the sequences at the same corner and with the same cycle time is not equal. Sequences in Class I need higher guardbands compared to Class II, because in addition of ALU's critical paths, the critical paths of memory are activated (for the load/store instructions) as well as the critical paths of integer code conditions (for the control instructions).

ILV (3rd Class) ≥ ILV (2nd Class) ≥ ILV (1st Class) ILV and SLV ? For every operating conditions: ILV (3rd Class) ≥ ILV (2nd Class) ≥ ILV (1st Class) SLV (Class I) ≥ SLV (Class II)

Adaptive Guardbanding Utilizing ILV and SLV We define an adaptive clock scaling for each class of instructions/ sequences to mitigate the conservative inter- and intra-corner guardbanding. At the runtime, in every cycle, the PLUT module sends the desired frequency to the adaptive clocking circuit utilizing the characterized SLV metadata of the current sequence and the operating condition monitored by CPM.

Effectiveness of Adaptive Guardbanding Full layout results on 45nm TSMC technology confirms: For the intolerant applications, in comparison to the worst-case design, the adaptive guardbanding eliminates the inter-corner guardbanding by up to 40% for sequences of Class II, and 37% for sequences of Class I. Simultaneously, it reduces intra-corner guardbanding among the two classes of the sequences by up to 5%. It achieves up to 87% performance improvement for error-tolerant (probabilistic) applications in comparison to the traditional worst-case design. It incurs only 0.022%, 0.012%, and 0.034% overheads for the total area, leakage power, and total power respectively.

Conclusion The notion of ILV and SLV to dynamic voltage and temperature variations is defined. Based on that, ISA partitioned into three classes: (i) ALU, (ii) MEMORY, (iii) MULTPLY & DIVIDE Sequence of instructions are partitioned into two classes: (i) a mixture of all types including ALU, MEM, CONTROL, etc. (ii) only ALU type instructions Leveraging these classifications across adaptive guardbanding techniques enables up to 40% speedup for error-intolerant (traditional) applications and 87% speedup for error-tolerant (probabilistic) application, in 45nm TSMC technology.

Thank you! http://mesl.ucsd.edu http://variability.org

Classification of Sequence of Instructions SLV at (0.81V, -40C). CT (ns) Seq1 Seq2 Seq3 Seq4 Seq5 Seq6 Seq7 Seq8 Seq9 Seq10 Seq11 Seq12 Seq13 Seq14 Seq15 Seq16 Seq17 Seq18 Seq19 Seq20 1.36 1 0.475 1.37 1.38 1.39 1.40 1.41 1.42 0.878 0.811 0.881 0.880 0.884 0.892 0.877 0.859 0.988 0.758 0.882 0.883 0.815 0.870 0.807 0.810 1.43 01 0.479 0.396 06 04 0.901 0.805 0.131 1.44