Presentation is loading. Please wait.

Presentation is loading. Please wait.

11 Increasing Reliability of Performance-critical Pipeline structures Niranjan Soundararajan Advisors: Vijaykrishnan Narayanan Anand Sivasubramaniam Anand.

Similar presentations


Presentation on theme: "11 Increasing Reliability of Performance-critical Pipeline structures Niranjan Soundararajan Advisors: Vijaykrishnan Narayanan Anand Sivasubramaniam Anand."— Presentation transcript:

1 11 Increasing Reliability of Performance-critical Pipeline structures Niranjan Soundararajan Advisors: Vijaykrishnan Narayanan Anand Sivasubramaniam Anand Sivasubramaniam Computer Systems Lab (CSL) Microsystems Design Lab (MDL) Computer Science and Engineering The Pennsylvania State University

2 2 Reliability – Increasing Importance Decreasing transistor size More transistors Power/Temperature Hotspots Increasing Market Segments HARDWARE RELIABILITY 2

3 3 Performance critical pipeline structures FetchDecode BHT BTB Icache RAT Issue Queue Load/Store Queue Reorder Buffer ARF Dcache ALU FRONT ENDBACK END Inst Retires Out-of-order entry activity Back-to-Back wakeup Multi-width pipeline Clock frequency increase Alloc

4 4 Transistor Failure Manufacturing Defects Wearout Failure Rate Time Random Errors Solutions to reduce non- uniform aging due to NBTI, HCE on microprocessor structures Solutions to address impact of Process Variations on Issue Queue Soft Error impact of DVFS on vulnerability of GALS architectures Bounding vulnerability of processor structures to provide reliability guarantees

5 5OutlineMotivationContributions Vulnerability bounding mechanisms Other solutions –Impact of DVFS on architectural vulnerability of GALS architectures –Address process variations in issue queue –Mitigate NBTI, HCE degradation in structures Conclusion and Future work 5

6 6 Introduction to Soft Errors p n+ - + - + - + - + N 1 0 Error Strike creates electron-hole pairs that can be absorbed by source/diffusion areas of the transistor to change state of device Source: M. Tahoori

7 77 Impact of Soft Errors Severity – In 2003, Fujitsu released SPARC64 with 80% of 200,000 latches covered by transient fault protection Single Event Upset (SEU) model Metrics – MTBF : Mean Time Between Failures – FIT : Failure in Time = 1 failure in a billion hours. FIT eff = FIT raw * AVF Severity of Soft Error Rates Source: Shekar Borkar, Intel 2004

8 8 Architectural Vulnerability Factor (AVF) LD A BR ST B ADD Wrong Path Dead Store User Visible Output Architecturally Correct Execution (ACE) Instruction AVF - Fraction of bits in a structure vulnerable to soft errors - ACE bits / (ACE bits + UnACE bits) - Fn (Size, Time) unACE Instruction

9 9 AVF: Why is it important to Micro-architects? System Specification Architectural DesignLogic Synthesis Circuit Design Physical DesignFabrication and Packaging AVF FIT raw AVF per structure System Reliability = ∑ (FIT raw * AVF)

10 10 State-of-Art Microprocessor design: Multi-dimensional problem involving Performance, Power and Reliability Transient Fault Tolerance – –Simultaneous Redundant Threading (SRT) – –Lockstepping Optimization techniques – –Parashar et al., ISCA’04 – –Gomaa et al., ISCA’05 – –Parashar et al., ASPLOS’06 – –Reddy et al., ASPLOS’06 Performance Overhead Single point in Performance-Reliability space

11 11 Micro-architectural Reliability Knob Reliability Performance FIT required More Reliable Less Performance Less Reliable More Performance FIT eff = FIT raw * AVF FIT raw and AVF being constants Ideal Solution FIT raw inflexible Tune AVF to meet specifications “Challenge for computer architects is not to provide absolute guarantees in reliability, but rather how to provide the adequate amount of reliability at the lowest cost for the target market segment” Architecture Design for Soft Errors – Shubu Mukherjee, Intel 11

12 12 Contributions First work that provides micro- architectural knobs to satisfy processor reliability budgets for transient faults Proactive and Reactive mechanisms to monitor and bound vulnerabilities of processor structures at cycle-level granularity

13 13 AVF Monitoring Reorder Buffer/Physical Register File Issue Queue ALU Reorder Buffer (PRF) RATARF Commit Pipeline In-order Pipeline out-of-order Pipeline In-order Fetch Decode Reorder Buffer (ROB) 1. Large pipeline structure holding number of instructions 2. Each instruction spends significant percentage of lifetime in ROB

14 1414 AVF Monitoring Mechanism Reorder Buffer (ROB) N Dispatch Event Reorder Buffer Commit Event B Filled at Dispatch Filled at WB R Writeback Event Mis-speculation N entries Each entry B bits Result R bits

15 15 Vulnerability Control via Throttling (VCT) DISPATCHDISPATCH WRITEBACKWRITEBACK REORDER BUFFER STALL DISPATCH AND WRITEBACK Writeback cannot be stalled Entire Entry ACE at Dispatch N - Entry Size = Fn (AVF Bound) 15

16 16 VCT Performance High IntegrityLow Integrity VCT

17 17 Advantages of a Reactive Bounding Mechanism AVF Bound Exceeded Verify Results Early Accounting of Writebacks Mis-speculated Instructions Reorder Buffer

18 18 Simultaneous Redundant Threading (SRT): Importance of Selective Redundancy Reorder Buffer (PRF) Fetch Decode ARF ISQALU RAT ARF RAT Redundant Thread After Primary Thread Redundant Execution protects entire pipeline AVF goes down Result Verification Reduces AVF

19 1919 Reorder Buffer (ROB) Fetch Decode ARF ISQALURAT ARF RAT Result Buffer Greedy Heuristic AVF Bound Exceeded Vulnerability Control via Selective Redundancy (VCSR) Infrastructure

20 20 VCSR Performance SRT VCT VCSR High IntegrityLow Integrity

21 2121 Optimizations Primary Thread Out Of Order Commit Reorder Buffer (PRF) Fetch Decode ARF ISQALU RATARF RAT Result Buffer Writeback – Commit ROB AVF affected Sec. Thread maintains architected state Non-compacting Reorder Buffer Reduces AVF Performance Boost since lesser inst are re-executed

22 22 VCH with OOO Commit Performance SRT VCT VCSR High IntegrityLow Integrity VCH(OOO)

23 23 Impact of vulnerability bounding Per-cycle vulnerability bounds, guaranteeing FIT rates are met Future Work – –Looking at developing a system-level AVF monitoring and bounding infrastructure

24 24OutlineMotivationContributions Vulnerability bounding mechanisms Summary of other works –Impact of DVFS on architectural vulnerability of GALS architectures –Address process variations in issue queue –Mitigate NBTI, HCE degradation in structures Conclusion and Future work 24

25 25 Multiple domains, each driven by individual clocks –Need for global clock network avoided GALS enables fine-grained VF scaling tuned to individual domains – –DVFS provides high performance per watt DVFS algorithms for GALS architectures are studied w.r.t IPC per watt Voltage scaling affects FIT raw, Frequency scaling affects AVF Need for vulnerability analysis in GALS Architectures Reliability Impact ignored Impact on AVF due to applying different DVFS algorithms Help designers choose DVFS algorithms meeting reliability requirements

26 26 AVF impact across algorithms Significant AVF variations when applying different algorithms Most DVFS algorithms lead to worser AVF than Non- DVFS 38% variation Lower is better 26

27 27OutlineMotivationContributions Vulnerability bounding mechanisms Other solutions –Impact of DVFS on architectural vulnerability of GALS architectures –Address process variations in issue queue –Mitigate NBTI, HCE degradation in structures Conclusion and Future work 27

28 28 Process Variation Static Dynamic Aging Thermal Effects SystematicRandom Sub-wavelength Lithography Overlay Dose RDF Process Variation (PV) - Introduction Process Variation: Variation in characteristics between two identically designed circuits [J. Tschanz et al., DAC 2005] Performance and Power impact significant Lack of predictability in timing characteristics lead to loss of yield Definite need to address PV at circuit and microarchitectural level 28

29 29 Contributions Study the impact of PV on the Issue Queue of a microprocessor   PV-unaware design has about 21% performance degradation w.r.t Non-PV design PV is a non-deterministic phenomenon. Design- time static partitioning not possible. Our solution enables the fast and slow entries to co-exist Instruction steering and sub-component switching schemes to reduce the impact of PV   Performance loss is about 1.3% w.r.t Non-PV design

30 30 Issue Queue Entry Select Logic Dispatch Write Forwarding Comparison Issue Read Forwarding Write V OpcodeRTagOperandRTagOperandDest Tag Tag1Tag N DISPATCH WRITE FORWARDING SELECT INST. READY INSTRUCTION ISSUE Valid Bit Set Valid Bit Reset Operand Ready Bit Set ALLOC LOGIC ISQ Full Alloc stalls Dispatch tt+1t+2t+3 Time Instruction wait for Ready Operands

31 31 Results Stalls reduced w.r.t specific activity Operand and port-switching further reduce stalls to a minimum Non-PVShutdownMCDPV-Aware 12% 7.3% 1.3%

32 32OutlineMotivationContributions Vulnerability bounding mechanisms Other solutions –Impact of DVFS on architectural vulnerability of GALS architectures –Address process variations in issue queue –Mitigate NBTI, HCE degradation in structures Conclusion and Future work 32

33 33 Increasing impact of transistor wearout Transistor lifetime decreasing with newer technologies Conservative Guardbands impact performance System longevity affects revenue More than 50% organizations, machine-age > 10 years Decreasing Technology Source: Intel Poll by Gartner Research, Source: J. Blome, Micro 2007

34 34 Contributions NBTI, HCE impact increasing in upcoming technologies Conventional collapsing issue queues have unwanted instruction movement across entries – –Collapsing required for age-based selection Round-Robin scheme to provide restricted collapsing Restricted collapsing balances switching activity, not losing much of age-based selection

35 35 Implementation SPEC2K Benchmark Simplescalar Architectural simulator [ISQ] Read Delay Degradation 100M instructions Capture Rd / Wr / Sw / Data probabilities per cell HSpice (32nm, 380K) 10-year degradation Transistor-level Degradation model Typically, solutions look at worst-case probabilities that might rarely occur

36 36 Results 32% reduction 1% reduction

37 37 Conclusion Growing Reliability concern “Pop culture of reliability has arrived” - Dr. Phil Emma, IBM [Architecture Design for Soft Errors] Work looks at increasing the fault-tolerance in back-end –Soft errors –Process variation –Wearout 37

38 38 Current Work Multi-core design have come to prominence While cache have ECC, the multiple pipelines involve structures holding data – ECC is hard –Total vulnerability to soft errors increases Study the impact on AVF of different structures in a multi-core environment 38

39 39 Future Work Multi-core – –Cores increase, market segments increase – –ILP vs TLP vs Clock frequency increase – –Application/Hardware sense best configuration Reconfigurable Hardware – –Defect Tolerance – –Verification time increasing – –“Firmware update” to control functionality

40 4040

41 41 Backup slides

42 42 DVFS Algorithms Threshold –VF scale use fixed thresholds. Preset thresholds affects algorithm efficiency Attack-Decay(AD) –Based on util. in adjacent intervals. Attack whenever big util. change. Otherwise decay. Greedy nature affects efficiency Modified Attack-Decay (ModAD) –Attack phase modified to correspond to util. change. Large VF swing can affect performance per watt PI Greedy –Sample and Hold phase. VF scaling based on ED 2 of past 2 intervals µ k = µ k-1 + K I (q’ k – q ref ) + K p (q’ k – q’ k-1 ) f k = µ k / IPC 42

43 43 Vulnerability Efficiency Non-DVFS has the best vulnerability efficiency –On average, AD and PI provide the best vulnerability efficiency 40% variation 43 Lower is better

44 44 Round Robin scheme 44 Clk Ctrl Bit New Inst Tail PseudoHead (PH) Clk Ctrl Bit N 11100 PH Later Entries Collapse Control Vector Clk Ctrl Bit 0 Head

45 45 Reliability Issues of Importance Solutions that are robust but overhead-aware as well 45

46 46Contributions Hardware Failure PermanentTemporary Transient Intermittent RadiationNon-Radiation Wearout Soft ErrorsPower supply Process variation Bounding vulnerability of processor structures to provide reliability guarantees Study impact of DVFS on vulnerability of GALS architectures Solutions to address impact of process variations on issue queue Source: ISCA 2005 tutorial 46 Solutions to reduce non- uniform aging due to NBTI, HCE on microprocessor structures

47 47Results SRT Throttling (T) SR High IntegrityLow Integrity SR with T(OOO) 47

48 48 Issue Queue RAT Alloc ISQ Entry id Op STag1 STag2 DTag Stall Optimization Table - - - Dest Tag STALL Slow Entry Bit Source Tags (STag1, STag2) Demux Decoder Dest Tag PV-aware steering - OptiSteer Non-Collapsing Assigns ISQ Entry 48

49 49 Intra-Entry Variation schemes Operand- and Port-Switching VOpcodeRTagR OperandDest Tag Dispatch Op STag1 Operand1 STag2 DTag Op STag2 STag1 Operand1 DTag Op STag1 Operand1 STag2 DTag Dispatch Write Issue Read Operand Operand Switch Port Switch 49

50 50 Timeline of ISQ activities DISPATCH WRITE FORWARDING SELECT INST. READY INSTRUCTION ISSUE Valid Bit Set Valid Bit Reset Operand Ready Bit Set ALLOC LOGIC ISQ Full Alloc stalls Dispatch tt+1t+2t+3 Time Instruction wait for Ready Operands Slow Dispatch Write Operand Switch SELECT INST. READY Port Switch SOT Fill SOT Value Required Forwarding Stall Port Switch Less instructions selected Slow issue read 50

51 51 Conventional Collapsing ISQ 51 Clk Ctrl Bit N Tail Issue N 2 1 0 Collapsing Logic Age-ordering for Instruction Selection Clk Ctrl Bit 1 Head Collapse

52 52 Round Robin scheme 52 Clk Ctrl Bit New Inst Tail PseudoHead Head Collapse

53 53 NBTI/HCE NBTI – Traps due to negative voltage at gate (input “0”) –Dominant in PMOS transistor –Increased when holding same data for long periods HCE – Traps due to high electric field near the drain –Dominant in NMOS transistor –Increased when switching activity is high V th shift accumulates over time, affects timing 53

54 54 Contributions Global solutions –Body Biasing Frequency boost increases leakage. Non-ideal for Issue Queue –Time-borrowing Absorbing clock jitter and skew becomes difficult Structure-specific solutions –Solutions for register file, and caches Issue Queue performance-determining structure, operation combines CAM, SRAM cells PV is a non-deterministic phenomenon. Our solution enables the fast and slow entries to co-exist Instruction steering and sub- component switching schemes are proposed to reduce the impact of PV 54

55 55 Results 1.43 1.14 1.31 1.36 55 1.43 1.42

56 56 Throughput comparison 56 10.5% relative decrease

57 57 Switching Activity 57

58 58 Wearout phenomena 58 Source: J. Blome. Micro 2007 Hot Carrier EffectsNegative Bias Temperature Instability Oxide BreakdownElectro-Migration Factors Temperature, switching activity, data (gate bias), V dd, current density NBTI, HCE impact increasing in upcoming technologies  A. Tiwari, Micro 2008  S. Sapatnekar, ISQED 2006

59 59 Optimizations – Vulnerability Control Hybrid Reorder Buffer (PRF) Fetch Decode ARF ISQALU RAT ARF RAT Dispatch Bandwidth not effectively utilized Reduces bottleneck in in- order units like Result Buffer

60 60 Microprocessor Design: Multi-Dimensional Problem Microprocessor design: Performance not single dimension – –Power – –Thermal effects – –Reliability Dimension-order driven by market – –Aircraft, Health-care: Reliability – –Embedded: Power, Thermal – –Desktops, Game Consoles: Performance Mitigation of Transient Faults at the System Level – the TTA approach. Herman Kopetz, SELSE 2006 Data sensitivity – Application Dependent INTEGRITY LEVEL of APPLICATION DOMAIN Low Moderate Very High Moderate Huge Large Small Present-dayAutomotive EnterpriseServer FlightControl SafetyCritical HighIntegrity ModerateIntegrity LowIntegrity Examples MarketVolume Data Integrity Requirement Application ConsumerElectronics

61 61 Domain 3 Domain 2 GALS Architecture Domains driven by individual clocks –Domain is internally synchronous Careful tuning of global clock distribution network is avoided –Better frequency scaling Different domains interact through FIFO Buffers Fetch Decode Rename Reg Read Reg Read Reg Read FP ISQ Mem ISQ Int ISQ Exec Write Back Write Back Write Back Retire Reg File D-cache Domain 1 Domain 2 Domain 3 Domain 4Domain 6 Domain 5 DVFS high performance per watt GALS enables fine-grained VF scaling tuned to individual domains 61

62 62 Contributions DVFS algorithms for GALS architectures are studied w.r.t IPC per watt Voltage scaling affects FIT raw, Frequency scaling affects AVF Reliability Impact ignored Impact on architectural vulnerability due to applying different DVFS algorithms Characterize the Vulnerability Efficiency (AVF*Watts/IPC) of DVFS algorithms Help designers choose DVFS algorithms meeting reliability requirements 62


Download ppt "11 Increasing Reliability of Performance-critical Pipeline structures Niranjan Soundararajan Advisors: Vijaykrishnan Narayanan Anand Sivasubramaniam Anand."

Similar presentations


Ads by Google