Architectural Optimizations Ed Carlisle. DARA: A LOW-COST RELIABLE ARCHITECTURE BASED ON UNHARDENED DEVICES AND ITS CASE STUDY OF RADIATION STRESS TEST.

Slides:



Advertisements
Similar presentations
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
Advertisements

Microprocessors A Beginning.
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
CPU Review and Programming Models CT101 – Computing Systems.
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Control path Recall that the control path is the physical entity in a processor which: fetches instructions, fetches operands, decodes instructions, schedules.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
1 ITCS 3181 Logic and Computer Systems B. Wilkinson Slides9.ppt Modification date: March 30, 2015 Processor Design.
Microprocessor Reliability
FAULT TOLERANCE IN FPGA BASED SPACE-BORNE COMPUTING SYSTEMS Niharika Chatla Vibhav Kundalia
Chapter 8. Pipelining. Instruction Hazards Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline.
CS 7810 Lecture 25 DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design T. Austin Proceedings of MICRO-32 November 1999.
7. Fault Tolerance Through Dynamic (or Standby) Redundancy The lowest-cost fault-tolerance technique in multiprocessors. Steps performed: When a fault.
Architectural Support for OS March 29, 2000 Instructor: Gary Kimura Slides courtesy of Hank Levy.
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Pipeline Exceptions & ControlCSCE430/830 Pipeline: Exceptions & Control CSCE430/830 Computer Architecture Lecturer: Prof. Hong Jiang Courtesy of Yifeng.
Recap – Our First Computer WR System Bus 8 ALU Carry output A B S C OUT F 8 8 To registers’ input/output and clock inputs Sequence of control signal combinations.
7/2/ _23 1 Pipelining ECE-445 Computer Organization Dr. Ron Hayne Electrical and Computer Engineering.
CprE 458/558: Real-Time Systems
Reducing Cache Misses 5.1 Introduction 5.2 The ABCs of Caches 5.3 Reducing Cache Misses 5.4 Reducing Cache Miss Penalty 5.5 Reducing Hit Time 5.6 Main.
1 Runahead Execution A review of “Improving Data Cache Performance by Pre- executing Instructions Under a Cache Miss” Ming Lu Oct 31, 2006.
1 Lecture 7: Static ILP and branch prediction Topics: static speculation and branch prediction (Appendix G, Section 2.3)
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Software-Based Online Detection of Hardware Defects: Mechanisms, Architectural Support, and Evaluation Kypros Constantinides University of Michigan Onur.
What are Exception and Interrupts? MIPS terminology Exception: any unexpected change in the internal control flow – Invoking an operating system service.
Basic Operational Concepts of a Computer
Presenter: Jyun-Yan Li Multiplexed redundant execution: A technique for efficient fault tolerance in chip multiprocessors Pramod Subramanyan, Virendra.
Chapter 5 Basic Processing Unit
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
IBM S/390 Parallel Enterprise Server G5 fault tolerance: A historical perspective by L. Spainhower & T.A. Gregg Presented by Mahmut Yilmaz.
SafetyNet: improving the availability of shared memory multiprocessors with global checkpoint/recovery Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill,
Operating Systems Lecture No. 2. Basic Elements  At a top level, a computer consists of a processor, memory and I/ O Components.  These components are.
AMD Opteron Overview Michael Trotter (mjt5v) Tim Kang (tjk2n) Jeff Barbieri (jjb3v)
Super computers Parallel Processing By Lecturer: Aisha Dawood.
Experimental Evaluation of System-Level Supervisory Approach for SEFIs Mitigation Mrs. Shazia Maqbool and Dr. Craig I Underwood Maqbool 1 MAPLD 2005/P181.
Idempotent Processor Architecture Marc de Kruijf Karthikeyan Sankaralingam Vertical Research Group UW-Madison MICRO 2011, Porto Alegre.
Interrupt driven I/O. MIPS RISC Exception Mechanism The processor operates in The processor operates in user mode user mode kernel mode kernel mode Access.
Transformer: A Functional-Driven Cycle-Accurate Multicore Simulator 1 黃 翔 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan,
Spring 2003CSE P5481 Precise Interrupts Precise interrupts preserve the model that instructions execute in program-generated order, one at a time If an.
Computer Architecture: Wrap-up CENG331 - Computer Organization Instructors: Murat Manguoglu(Section 1) Erol Sahin (Section 2 & 3) Adapted from slides of.
Interrupt driven I/O Computer Organization and Assembly Language: Module 12.
Middleware for Fault Tolerant Applications Lihua Xu and Sheng Liu Jun, 05, 2003.
Structure and Role of a Processor
CS717 1 Hardware Fault Tolerance Through Simultaneous Multithreading (part 2) Jonathan Winter.
1 3 Computing System Fundamentals 3.2 Computer Architecture.
Niagara: A 32-Way Multithreaded Sparc Processor Kongetira, Aingaran, Olukotun Presentation by: Mohamed Abuobaida Mohamed For COE502 : Parallel Processing.
University of Michigan Electrical Engineering and Computer Science 1 Low Cost Control Flow Protection Using Abstract Control Signatures Daya S Khudia and.
A New Approach to Software-Implemented Fault Tolerance
Modularity Most useful abstractions an OS wants to offer can’t be directly realized by hardware Modularity is one technique the OS uses to provide better.
Catalog of useful (structural) modules and architectures
MPOC “Many Processors, One Chip”
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
UnSync: A Soft Error Resilient Redundant Multicore Architecture
Pipelining: Advanced ILP
Computer Architecture
The processor: Pipelining and Branching
NVIDIA Fermi Architecture
15-740/ Computer Architecture Lecture 5: Precise Exceptions
Middleware for Fault Tolerant Applications
Today’s agenda Hardware architecture and runtime system
Control unit extension for data hazards
The Processor Lecture 3.1: Introduction & Logic Design Conventions
Architectural Support for OS
Central Processing Unit
Computer Architecture
Control unit extension for data hazards
Co-designed Virtual Machines for Reliable Computer Systems
Architectural Support for OS
Control unit extension for data hazards
Presentation transcript:

Architectural Optimizations Ed Carlisle

DARA: A LOW-COST RELIABLE ARCHITECTURE BASED ON UNHARDENED DEVICES AND ITS CASE STUDY OF RADIATION STRESS TEST Jun Yao, Shogo Okada, Masaki Masuda, Kazutoshi Kobayashi, and Yasuhiko Nakashima IEEE Transactions on Nuclear Science, December of 16

Outline Background System Overview Adaptive Redundancy Error Recovery Instruction Decomposition for Atomic Updates Unhardened vs Hardened Circuits Radiation Testing Results Shortfalls Conclusions 3 of 16

Background As processor switching voltages and feature sizes decrease, susceptibility to SEEs increases Typical causes of Single Event Effects:  Cosmic Rays  Solar Energetic Particles  Trapped protons in the Van Allen Belts Circuits can be hardened by process or by design Typical approaches:  Triple Modular Redundancy (TMR)  Watchdog timers facilitating rollback and recovery from system checkpoints 4 of 16

DARA System Overview Dynamic Adaptive Redundancy Architecture Stage-level data bypassing to facilitate data comparison between pipelines Well-tuned instruction decomposition to ensure atomic updates in commercial instruction set architectures (ISA) Fast roll-back recovery scheme 5 of 16

Adaptive Redundancy DMR (Dual-Modular Redundancy) is used for fast, power-efficient SEE tolerance Third module is disabled via power-gating If errors occur frequently third module can be enabled to identify defective pipeline Once defective module has been disabled, system reverts back to DMR operation 6 of 16

Checkpoint and Rollback Many rollback strategies typically rely on a coarse-grained checkpoint that is stored in hardened storage  Contents include register file data, control register status, and memory updates These checkpoints can incur a large overhead depending on the size of an application’s working set Rollback procedures also incur a performance penalty, particularly if the system experiences a high error rate Instead DARA, uses a fine-grained fast recovery scheme that makes full use of the redundant information inside the dual-pipeline architecture 7 of 16

DARA Error Recovery Fast recovery procedure: a)Error detected from instruction I2 in execution stage b)Recovery preparation; pipeline behaves as if instruction I1 was a mispredicted branch by flushing the preceding pipeline stages c)Execution continues with instruction I2 restarting in the instruction fetch pipeline stage Emulating mispredicted branch behavior allows for implementation in out-of-order processors 8 of 16

Instruction Decomposition for Atomic Updates DARA’s roll-back based recovery requires updating atomicity inside one instruction  This is not always guaranteed by all ISAs DARA implements the SH-2 RISC ISA  Example problematic instruction: LD Performs two operations: memory load (Rn and address update (Rm++) Causes issue for recovery if an error occurs during memory load while address update is successful This issue is resolved by performing instruction decomposition in the instruction decode pipeline stage 9 of 16

Instruction Decomposition for Atomic Updates Decomposition rules: 1.Always perform address updates after memory access 2.Use shadow registers for intermediate values 3.Program Counter should only be updated in the final sub- instruction Example:  RTE instruction performs LD LD Decomposed as: a)TMP1 <- R15 (stack pointer) b)TMP2 <- R15 + #4 c)SR d)R15 <- TMP2 e)PC 10 of 16

Unhardened vs Hardened Circuits Radiation testing is performed to compare architecture implemented with both unhardened and hardened circuits Unhardened circuit uses typical D flip flops Hardened circuit uses Bi-stable Cross-coupled Dual-Modular (BCDMR) flip flops 11 of 16

Radiation Testing Circuits are exclusively enabled by the selector Without a practical method to inject hard faults, only DMR configuration is tested L2 cache contents are not protected by DARA, they are physically stored in host server DIMMs Host server handles start/stop signals and L1 misses Radiation source is calibrated so that DARA is the only component exposed to radiation 12 of 16

Results Average number of recoveries is recorded to track the number of errors the device experienced Programs ran on both DARA-DFF and DARA-BCDMR give the same memory data access sequences and identical final memory results for both radiation and non-radiation tests Execution time differences represent overhead for error recovery roll-back Circuit hardening results in a 71% increase in area and a 28% increase in power consumption 13 of 16

Shortfalls Did not test operation of TMR configuration Hardened and unhardened circuits were manufactured on the same chip 14 of 16

Conclusions DARA was able to achieve hardened circuit reliability while using unhardened circuits  Unhardened circuits use less power and require less area than their hardened counterparts Adaptive DMR/TMR redundancy further reduces power consumption while still providing both soft and hard error protection DARA’s fine-grained rollback scheme offers reduced overhead and faster recovery compared to typical checkpointing schemes 15 of 16

QUESTIONS? 16 of 16