Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.

Slides:



Advertisements
Similar presentations
Instruction Level Parallelism and Superscalar Processors
Advertisements

RISC Instruction Pipelines and Register Windows
PIPELINE AND VECTOR PROCESSING
Computer Organization and Architecture
Tunable Sensors for Process-Aware Voltage Scaling
ILP: IntroductionCSCE430/830 Instruction-level parallelism: Introduction CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Chapter 8: Central Processing Unit
Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.
Superscalar processors Review. Dependence graph S1S2 Nodes: instructions Edges: ordered relations among the instructions Any ordering-based transformation.
CPE 731 Advanced Computer Architecture ILP: Part V – Multiple Issue Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
CS455/CpE 442 Intro. To Computer Architecure
Mitigating the Performance Degradation due to Faults in Non-Architectural Structures Constantinos Kourouyiannis Veerle Desmet Nikolas Ladas Yiannakis Sazeides.
Chapter 14 Superscalar Processors. What is Superscalar? “Common” instructions (arithmetic, load/store, conditional branch) can be executed independently.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Multiscalar processors
Pipelined Processor II CPSC 321 Andreas Klappenecker.
11/11/05ELEC CISC (Complex Instruction Set Computer) Veeraraghavan Ramamurthy ELEC 6200 Computer Architecture and Design Fall 2005.
Shift Instructions (1/4)
Appendix A Pipelining: Basic and Intermediate Concepts
The Processor Andreas Klappenecker CPSC321 Computer Architecture.
Pipelining By Toan Nguyen.
RISC and CISC. Dec. 2008/Dec. and RISC versus CISC The world of microprocessors and CPUs can be divided into two parts:
Accuracy-Configurable Adder for Approximate Arithmetic Designs
Computer Organization and Architecture Instruction-Level Parallelism and Superscalar Processors.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Andrea Marongiu Luca Benini ETH Zurich Daniele Cesarini University of Bologna.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Procedure Hopping: a Low Overhead Solution to Mitigate Variability in Shared-L1 Processor Clusters Abbas Rahimi.
Low Power – High Speed MCML Circuits (II)
A Robust Pulse-triggered Flip-Flop and Enhanced Scan Cell Design
XIAOYU HU AANCHAL GUPTA Multi Threshold Technique for High Speed and Low Power Consumption CMOS Circuits.
RISC Architecture RISC vs CISC Sherwin Chan.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
1 Instruction Set Architecture (ISA) Alexander Titov 10/20/2012.
CS.305 Computer Architecture Enhancing Performance with Pipelining Adapted from Computer Organization and Design, Patterson & Hennessy, © 2005, and from.
CMPE 421 Parallel Computer Architecture
EE204 L12-Single Cycle DP PerformanceHina Anwar Khan EE204 Computer Architecture Single Cycle Data path Performance.
Outline Introduction: BTI Aging and AVS Signoff Problem
Pipelining and Parallelism Mark Staveley
CS /02 Semester II Help Session IIA Performance Measures Colin Tan S
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
Hrushikesh Chavan Younggyun Cho Structural Fault Tolerance for SOC.
Power Analysis of Embedded Software : A Fast Step Towards Software Power Minimization 指導教授 : 陳少傑 教授 組員 : R 張馨怡 R 林秀萍.
EKT303/4 Superscalar vs Super-pipelined.
Combining Software and Hardware Monitoring for Improved Power and Performance Tuning Eric Chi, A. Michael Salem, and R. Iris Bahar Brown University Division.
Introduction to Computer Organization Pipelining.
UNIT III -PIPELINE.
1 RELOCATE Register File Local Access Pattern Redistribution Mechanism for Power and Thermal Management in Out-of-Order Embedded Processor Houman Homayoun,
CS 286 Computer Architecture & Organization
CSE 3322 Computer Architecture
A Closer Look at Instruction Set Architectures
Abbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡ ‡CSE, UC San Diego
Supervised Learning Based Model for Predicting Variability-Induced Timing Errors Xun Jiao, Abbas Rahimi, Balakrishnan Narayanaswamy, Hamed Fatemi, Jose.
ELEN 468 Advanced Logic Design
CDA 3101 Spring 2016 Introduction to Computer Organization
Central Processing Unit CPU
Abbas Rahimi, Luca Benini, Rajesh K. Gupta
A. Rahimi, A. Marongiu, P. Burgio, R. K. Gupta, L. Benini
CMSC 611: Advanced Computer Architecture
CS 704 Advanced Computer Architecture
†UCSD, ‡UCSB, EHTZ*, UNIBO*
Sampoorani, Sivakumar and Joshua
CS455/CpE 442 Intro. To Computer Architecure
Computer Architecture
Lecture 3: MIPS Instruction Set
ARM ORGANISATION.
CMSC 611: Advanced Computer Architecture
Abbas Rahimi‡, Luca Benini†, and Rajesh Gupta‡ ‡CSE, UC San Diego
Presentation transcript:

Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org mesl.ucsd.edu Abbas Rahimi ‡, Luca Benini †, Rajesh Gupta ‡ † Dipartimento di Elettronica, Informatica e Sistemistica, Università di Bologna micrel.deis.unibo.it

Outline Dynamic variations Dynamic variability among pipeline stages Methodology and quantifying instruction- level vulnerability (ILV) Classification of instructions Adaptive clock scaling utilizing ILV Conclusion 2

Motivation: Increasing Dynamic Variations 3 Increasing dynamic environmental variations in ambient condition such as temperature fluctuations and supply voltage droops. Dynamic Variations contain high-frequency and low-frequency components which occur locally as well as globally across the die.

4 Voltage Temperature Aging Process HW  SW Compiler Instruction What will be happened for instructions when we have dynamic voltage and temperature variations? Instruction is a bridge to software side

Quantifying effects of operating conditions We analyze the effect of a full range of operating conditions on the performance and power of the LEON-3 ‎processor compliant with the SPARC V8 architecture. 5 Specifically, we used a temperature range of -40°C−125°C, and a voltage range of 0.72V−1.1V. Dynamic variations cause the critical path delay to increase by a factor of 6.1X. Consequently, a large conservative guard-band into the operating frequency is needed to ensure the error-free operation in presence of the dynamic variations. Critical path (ns)

6 Variability among pipeline stages The execute and memory parts are very sensitive to voltage and temperature variations, and also exhibit a large number of critical paths in comparison to the rest of processor. Similarly, we anticipate that the instructions that significantly exercise the execute and memory stages are likely to be more vulnerable to voltage and temperature variations  Instruction-level Vulnerability (ILV) V DD = 1.1V T= 125°C

7 Methodology for ISA-level Vulnerability Post-Synthesis Simulation for 32-bit RISC Leon3 Processor Voltage variation Temperature variation SPARC V8 instructions with random operands Probability of failure( PoF) for every different (Voltage, Temperature, Frequency)  ILV to dynamic variations

Quantifying ILV To quantify the ILV to voltage and temperature variations, we define the probability of failure (PoF) for each instruction i – where N i is the total number of clock cycles in Monte Carlo simulation which takes to execute instruction i with random operands – Violation j indicates whether there is a violated stage at clock cycle j or not ( If any of the analyzed stages has one or more violated flip- flop at clock cycle j, we consider that stage as a violated stage at cycle j ) 8

9 Classification of instructions, cont. (V, T)(0.88V, -40°C)(0.88V, 0°C)(0.88V, 125°C) Cycle time (ns) Logical & Arithmetic add and or sll sra srl sub xnor xor Mem load store Mul.& Div mul div Probability of failure of ISA at 0.88V, while varying temperature 1 st 2 nd 3 rd Instructions are partitioned into three main classes: 1 st Logical & arithmetic; 2 nd Memory; 3 rd Multiply & divide. The 1st class shows an abrupt behavior when the clock cycle is slightly varied, mainly because the path distribution of the exercised part by this class is such that most of the paths have the same length, then we have a all-or-nothing effect, which implies that either all instructions within this class fail or all make it.

Classification of instructions Corners(0.72V, -40°C)(0.72V, 0°C)(0.72V, 125°C) Cycle time (ns) Logical & Arithmetic add and or sll sra srl sub xnor xor Mem load store Mul.&Di v mul div Probability of failure of ISA at 0.72V, while varying temperature All instruction classes act similarly across the wide range of operating conditions: as the cycle time increases gradually, the PoF becomes 0, firstly for the 1st class, then for the 2nd class, and finally for the 3rd class: PoF (3 rd Class) ≥ PoF (2 nd Class) ≥ PoF (1 st Class)

Adaptive clock scaling utilizing ILV 11 We define an adaptive clock cycle for each class of instructions to mitigate the conservative guard-banding, not only within a fix process corner, but also across corners. The ILV a valuable mechanism to alleviate the guard-banding: I.within a fixed corner, by acquiring the knowledge about which class of instructions is running, the processor can adapt the guard-banding accordingly. II.across every corner, processor adjusts its guard-banding by using a low-overhead variability observer. Therefore, adaptive clock scaling can decide on the clock speed of the processor at a very fine grain: just looking at the fetched instructions and keeping track of their entry into the stages of the pipeline and at the same time monitoring the current corner with a low- overhead monitoring hardware.

Effectiveness of Adaptive clock scaling 12 This figure shows how a procedure consists of various classes of instructions can benefit by this technique under different operating conditions: the performance improvement when processor runs a procedure only consists of specific classes, in comparison to the traditional worst-case design.

Conclusion The concept of instruction-level vulnerability to dynamic voltage and temperature variations is defined. Based on that, all exercised instruction in the integer pipeline of LEON-3 are partitioned into three classes for the full range of operating condition: – (i) the logical and arithmetic instructions – (ii) the memory instructions – (iii) the multiply and divide instructions. Leveraging this classification in conjunction with less intrusive variability observers, not only provides us a great opportunity to enhance processor performance by 1.1X- 5.5X, in TSMC 65nm technology. 13

14 Thank you! micrel.deis.unibo.it