AGING AWARE DESIGN OF A MICROPROCESSOR BY DUTY CYCLE BALANCING ABHINAY RAJ KALAMBUR SABARAJAN - 50133612 Guided by: Professor Sridhar Ramalingam.

Slides:



Advertisements
Similar presentations
Subthreshold SRAM Designs for Cryptography Security Computations Adnan Gutub The Second International Conference on Software Engineering and Computer Systems.
Advertisements

Final Project : Pipelined Microprocessor Joseph Kim.
ISA Issues; Performance Considerations. Testing / System Verilog: ECE385.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
1 Cleared for Open Publication July 30, S-2144 P148/MAPLD 2004 Rea MAPLD 148:"Is Scaling the Correct Approach for Radiation Hardened Conversions.
1/1/ /e/e eindhoven university of technology Microprocessor Design Course 5Z008 Dr.ir. A.C. (Ad) Verschueren Eindhoven University of Technology Section.
THE MIPS R10000 SUPERSCALAR MICROPROCESSOR Kenneth C. Yeager IEEE Micro in April 1996 Presented by Nitin Gupta.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
University of Michigan Electrical Engineering and Computer Science University of Michigan Electrical Engineering and Computer Science August 20, 2009 Enabling.
1 Lecture 17: Basic Pipelining Today’s topics:  5-stage pipeline  Hazards and instruction scheduling Mid-term exam stats:  Highest: 90, Mean: 58.
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms Lin Huang, Feng Yuan and Qiang Xu Reliable Computing Laboratory Department.
Chapter XI Reduced Instruction Set Computing (RISC) CS 147 Li-Chuan Fang.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Mehdi Amirijoo1 Power estimation n General power dissipation in CMOS n High-level power estimation metrics n Power estimation of the HW part.
Instruction Set Architecture (ISA) for Low Power Hillary Grimes III Department of Electrical and Computer Engineering Auburn University.
EET 4250: Chapter 1 Performance Measurement, Instruction Count & CPI Acknowledgements: Some slides and lecture notes for this course adapted from Prof.
Restrictive Compression Techniques to Increase Level 1 Cache Capacity Prateek Pujara Aneesh Aggarwal Dept of Electrical and Computer Engineering Binghamton.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
University of Michigan Electrical Engineering and Computer Science 1 StageNet: A Reconfigurable CMP Fabric for Resilient Systems Shantanu Gupta Shuguang.
Architectural and Compiler Techniques for Energy Reduction in High-Performance Microprocessors Nikolaos Bellas, Ibrahim N. Hajj, Fellow, IEEE, Constantine.
Pipelining By Toan Nguyen.
8/16/2015\course\cpeg323-08F\Topics1b.ppt1 A Review of Processor Design Flow.
Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.
Advanced Computing and Information Systems laboratory Device Variability Impact on Logic Gate Failure Rates Erin Taylor and José Fortes Department of Electrical.
Low-Power Wireless Sensor Networks
1 Performance Evaluation of Computer Systems and Networks Introduction, Outlines, Class Policy Instructor: A. Ghasemi Many thanks to Dr. Behzad Akbari.
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
Modern Floor-planning Based on B ∗ -Tree and Fast Simulated Annealing Paper by Chen T. C. and Cheng Y. W (2006) Presented by Gal Itzhak
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
A Decompression Architecture for Low Power Embedded Systems Lekatsas, H.; Henkel, J.; Wolf, W.; Computer Design, Proceedings International.
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches Aarul Jain, Cambridge Silicon Radio, Phoenix Aviral Shrivastava, Arizona State.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
EE5970 Computer Engineering Seminar Spring 2012 Michigan Technological University Based on: A Low-Power FPGA Based on Autonomous Fine-Grain Power Gating.
Morgan Kaufmann Publishers
1 chapter 1 Computer Architecture and Design ECE4480/5480 Computer Architecture and Design Department of Electrical and Computer Engineering University.
1 Copyright  2001 Pao-Ann Hsiung SW HW Module Outline l Introduction l Unified HW/SW Representations l HW/SW Partitioning Techniques l Integrated HW/SW.
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
Lecture 14: Caching, cont. EEN 312: Processors: Hardware, Software, and Interfacing Department of Electrical and Computer Engineering Spring 2014, Dr.
Static Identification of Delinquent Loads V.M. Panait A. Sasturkar W.-F. Fong.
An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.
Integrated Microsystems Lab. EE372 VLSI SYSTEM DESIGNE. Yoon 1-1 Panorama of VLSI Design Fabrication (Chem, physics) Technology (EE) Systems (CS) Matel.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
Sunpyo Hong, Hyesoon Kim
33 rd IEEE International Conference on Computer Design ICCD rd IEEE International Conference on Computer Design ICCD 2015 Improving Memristor Memory.
CML Path Selection based Branching for CGRAs ShriHari RajendranRadhika Thesis Committee : Prof. Aviral Shrivastava (Chair) Prof. Jennifer Blain Christen.
Taniya Siddiqua, Paul Lee University of Virginia, Charlottesville.
Real-World Pipelines Idea –Divide process into independent stages –Move objects through stages in sequence –At any given times, multiple objects being.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
EE 653: Group #3 Impact of Drowsy Caches on SER Arjun Bir Singh Mohammad Abdel-Majeed Sameer G Kulkarni.
History of Computers and Performance David Monismith Jan. 14, 2015 Based on notes from Dr. Bill Siever and from the Patterson and Hennessy Text.
PipeliningPipelining Computer Architecture (Fall 2006)
Basic Computer Organization and Design
Raghuraman Balasubramanian Karthikeyan Sankaralingam
Design-Space Exploration
Morgan Kaufmann Publishers
Architecture & Organization 1
Fundamentals of Computer Science Part i2
Architecture & Organization 1
Scalable Memory-Less Architecture for String Matching With FPGAs
A High Performance SoC: PkunityTM
Chapter 1 Introduction.
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

AGING AWARE DESIGN OF A MICROPROCESSOR BY DUTY CYCLE BALANCING ABHINAY RAJ KALAMBUR SABARAJAN Guided by: Professor Sridhar Ramalingam

Overview Introduction Device Aging My work Aging aware Instruction Set Encoding(ArISE) Duty Cycle Balancing HP cacti 6.5 for cache memory analysis Algorithm for ArISE based on duty cycle balancing Simplescalar Results of SimpleScalar Simulation Reference

INTRODUCTION The degradation of CMOS devices over the lifetime. The Negative Bias Temperature Instability (NBTI). Microprocessors fabricated at nanoscale nodes are exposed to accelerated transistor aging. Device delays increase over time reducing the Mean Time To Failure (MTTF) of the processor. Novel Aging-aware Instruction Set Encoding methodology (ArISE) that improves the instruction encoding. based on Duty Cycle balancing which directly relates to the variation in threshold voltages of CMOS transistors.

DEVICE AGING  Transistors do age… like humans  Rate of aging related to stress on the devices.  Circuit aging refers to the deterioration of circuit performance over time.  Aging wasn't significant until the Moore's Law pushed the transistor channel lengths to 0.18 µm.  use of extremely small channel lengths and higher operating frequencies has elevated circuit aging.

Different Aging Phenomena  NBTI - Transistor holds same data for long periods.  HCI - Related to amount of switching.  Oxide Breakdown - Breakdown of gate-oxide between semi-conductor slowly over time.  Electron-Migration - Electron flow in same direction.

My Work  Technique to improve MTTF and reduce delay in microprocessors due to device aging.  Analysis of Out-of-order 11 stage pipeline microprocessor.

My Work(Contd.)  Fetch, decode and Execution unit most important.  Cache memory, decode and Execution unit – analyzed for duty cycle balancing.  Done at a system level using Cacti 6.5 for analyzing Cache memories.  Optimum Duty Cycle.  A system level iterative algorithm for developing new ISE.  Obtained ISE is implemented in Simplescalar.  Provides Aging aware solution for Cache memory and Decode stages.

The aging of devices are proportional to the device stress time and the switching frequency of the internal nodes. A highly biased duty cycle ratio, will have a heavy stress and the aging of the device will be accelerated. Microarchitecture solution to balance the utilization and the aging stress. lifetime behaviors of the microprocessor and divide them into two groups, invalid and valid paths. Finding optimum duty cycle value from the above method. Reduces aging effect significantly. Duty Cycle Balancing

Caches are extremely important for single core processor performance. CACTI is a memory modeling tool that currently comes from HP Labs. CACTI is a tool that allows you to explore the performance, area, and power impacts of cache memories. CACTI can correctly evaluate the power, area, and timing overhead of adding the power management units, including the penalties of wakeup latency and wakeup energy. Using CACTI 6.5, cache memory was simulated according to our requirements. Trade off between Cacti and Simcache  Cacti helps in segregating cache memory based on pipeline stages.  Simcache segregates cache memory into I-cache and D-cache. HP cacti 6.5 for cache memory analysis

Decoding and Execution stages Decoding and Execution are two stages which must be analyzed at the gate level and device level for duty cycle analysis. Analysis done by using cadence. For execution part, a simple ALU along with mux and register blocks are considered. Cadence provided values which was indeed close to the real world scenarios.

Duty Cycle Data

Aging aware Instruction Set Encoding(ArISE) Instruction Set Encoding (ISE) has a considerable impact on the wearout of the decoding stages. An aging-aware opcode for each instruction in such a way, that the overall lifetime of the decoding stages is improved. only the representing bit patterns are modified, while the opcode length remains unchanged. Improving the ISE is a challenging task. most encodings infer modifications in the gate-level implementation of the stages.

Simplescalar Modern processor are incredibly complex and are becoming increasingly hard to evaluate. SimpleScalar tool set - fast, flexible, and accurate simulation of modern processors. implement the SimpleScalar architecture (a close derivative of the MIPS architecture). model applications that simulate real programs running on a range of modern processors and systems. can emulate the Alpha, PISA, ARM, and x86 instruction sets.

Algorithm for ArISE based on duty cycle balancing ITERATIVE APPROACH 1. Select a starting instruction set encoding (ISE): ISE old 2. Compute duty cycle 3. While solution is not good enough or number of steps < limit do 3.0. Adjust temperature T 3.1. Generate ISE new 3.2. compute duty cycle 3.3. If duty cycle not equal to optimum duty cycle then GoTo else ISE old = ISE new ISE best = ISE old /* store best ISE */ GoTo 3.1. EndIf End.

Instruction Set Encoding based on Iterative approach Basic MIPS ISE - Eg ADD – MUL – DIV – Iterative Approach Encoding using Smaller bits ADD – MUL – DIV – This model gave an improvement of 10% in delay and MTTF.

Delay values for various ISE models ISE 1 ISE 2 ISE 3 No of Hits Delay (ns) No of Hits Delay (ns) No of Hits Delay (ns)

Algorithm for ArISE based on duty cycle balancing HIERARCHICAL APPROACH 1. Partition instructions into groups and subgroups /*Instruction groups, subgroups inside groups, */ /*instructions insides subgroups, etc.*/ 2. Rank each group (and subgroups subsequently) 2.1 Based on their hardware-impact 2.2 If there are groups/subgroups with same ranking then use occurrence frequency to rank these 3. For the coarsest down to the finest hierarchy-level do For the highest ranked group down to the lowest do 3.1 Find the best encoding for the elements within that group /*Either exhaustive or with simulated annealing*/ 3.2 Stop as soon as duty cycle is satisfactory. Endfor

Instruction Set Encoding based on Hierarchical approach Instructions are categorized into groups and subgroups Based on hardware impact and occurrence frequency Group 1 ADD, MOV, AND ADDU, MOVL, OR Sub-group1 Sub-group2 Group 2 SUB, XOR, MULT NOR, DIV, SRL Sub-group1 Sub-group2

Delay values for various ISE models ISE 1 ISE 2 ISE 3 No of Hits Delay (ns) No of Hits Delay (ns) No of Hits Delay (ns)

Results of SimpleScalar Simulation User Interface of Sim Out-of-order simulator

Results of SimpleScalar Simulation Duty Cycle results on Simple Scalar.

Delay model for both algorithms

RESULT The Optimum value of Duty cycle I got was 64% including valid and invalid paths. By using simplescalar and Hierarchical algorithm, I was able to reach as much as 67%. Initially, I used the iterative algorithm which improved the delay and MTTF by 10% which was not satisfactory. Later I used the Hierarchical algorithm with which I was able to improve the delay and MTTF by 54%. With proper grouping and subgrouping of instructions and current device level aging inhibition techniques, the delay and MTTF can be improved as much as to 80%.

REFERENCES [1] ArISE: Aging-aware instruction set encoding for lifetime improvement by Oboril, Fabian; Tahoori, Mehdi, th Asia and South Pacific Design Automation Conference (ASP-DAC), [2] Aging-Aware Instruction Cache Design by Duty Cycle Balancing by Tao Jin; Shuai Wang, 2012 IEEE Computer Society Annual Symposium on VLSI, [3] System-Level Modeling And Reliability Analysis Of Microprocessor Systems, Dissertation Presented by Chang-Chih Chen. [4] Efficient Instruction Encoding for Automatic Instruction Set Design of Configurable ASIPs, by Lee, Jong-eun; Choi, Kiyoung; Dutt, Nikil, Proceedings of the 2002 IEEE/ACM international conference on computer-aided design, 11/2002. [5] The SimpleScalar Tool Set, Version 2.0, by Doug burger and Todd M Austin, SimpleScalar LLC. [6] Aging-Aware Design of Microprocessor Instruction Pipelines, Fabian Oboril and Mehdi B. Tahoori, Ieee Transactions On Computer-Aided Design Of Integrated Circuits And Systems, Vol. 33, No. 5, May 2014.

[7] Aging-Aware Instruction Cache Design by Duty Cycle Balancing, Tao Jin and Shuai Wang, 2012 IEEE Computer Society Annual Symposium on VLSI. [8] Aging-aware Timing Analysis Considering Combined Effects of NBTI and PBTI, Saman Kiamehr, Farshad Firouzi, Mehdi. B. Tahoori, International Symposium on Quality Electronic Design (ISQED), [9] System-level modeling of microprocessor reliability degradation due to BTI and HCI, by Chen, Chang-Chih; Soonyoung Cha; Taizhi Liu; Milor, Linda, 2014 IEEE International Reliability Physics Symposium, [10] Aging mitigation in memory arrays using self-controlled bit-flipping technique, by Gebregiorgis, Anteneh; Ebrahimi, Mojtaba; Kiamehr, Saman; Oboril, Fabian; Hamdioui, Said; Tahoori, Mehdi B, The 20th Asia and South Pacific Design Automation Conference, 2015.

Questions?

Thank You!!!