Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Crash-and-Recover to Sense- and-Adapt: Our Evolving Models of Computing Machines Rajesh K. Gupta UC San Diego.

Similar presentations


Presentation on theme: "From Crash-and-Recover to Sense- and-Adapt: Our Evolving Models of Computing Machines Rajesh K. Gupta UC San Diego."— Presentation transcript:

1 From Crash-and-Recover to Sense- and-Adapt: Our Evolving Models of Computing Machines Rajesh K. Gupta UC San Diego.

2 To a software designer, all chips look alike To a hardware engineer, a chip is delivered as per contract in a data-sheet. 2

3 COMPUTERS ARE BUILT ON STUFF THAT IS IMPERFECT AND… Reality is 3

4 Changing From Chiseled Objects to Molecular Assemblies 4 Courtesy: P. Gupta, UCLA 45nm Implementation of Leon3 Processor Core

5 Engineers Know How to “Sandbag” 5 PVTA margins add to guardbands – Static Process variation: effective transistor channel length and threshold voltage – Dynamic variations: Temperature fluctuations, supply Voltage droops, and device Aging (NBTI, HCI) Temperature Clock actual circuit delay guardband Aging V CC Droop Across-wafer Frequency

6 Uncertainty Means Unpredictability VLSI Designer: Eliminate It – Capture physics into models – Statistical or plain-old Monte Carlo – Manufacturing, temperature effects Architect: Average it out – Workload (Dynamic) Variations Software, OS: Deny It – Simplify, re-organize OS/tasks breaking these into parts that are precise (W.C.) and imprecise (Ave.) 6 Each doing their own thing, massive overdesign… Simulate ‘degraded’ netlist with model input changes (  Vth) Deterministic simulations capture known physical processes (e.g., aging) Multiple (Monte-Carlo) simulations wrapped around a nominal model Simulate ‘degraded’ netlist with model input changes (  Vth) Deterministic simulations capture known physical processes (e.g., aging) Multiple (Monte-Carlo) simulations wrapped around a nominal model

7 Let us step back a bit: HW-SW Stack Hardware Abstraction Layer (HAL) Operating System ApplicationApplication ApplicationApplication 7

8 Let us step back a bit: HW-SW Stack Hardware Abstraction Layer (HAL) Operating System ApplicationApplication ApplicationApplication 8 Time or part

9 Let us step back a bit: HW-SW Stack Hardware Abstraction Layer (HAL) Operating System ApplicationApplication ApplicationApplication 9 Time or part } overdesigned hardware 20x in sleep power 50% in performance 40% larger chip 35% more active power 60% more sleep power

10 What if? Hardware Abstraction Layer (HAL) Operating System ApplicationApplication ApplicationApplication 10 Time or part } underdesigned hardware

11 New Hardware-Software Interface.. Time or part ApplicationApplication Hardware Abstraction Layer (HAL) Operating System ApplicationApplication minimal variability handling in hardware Underdesigned Hardware Opportunistic Software Traditional Fault-tolerance 11

12 UNO Computing Machines Seek Opportunities based on Sensing Results Variability manifestations  faulty cache bits  delay variation  power variation Variability manifestations  faulty cache bits  delay variation  power variation Variability signatures:  cache bit map  cpu speed-power map  memory access time  ALU error rates Variability signatures:  cache bit map  cpu speed-power map  memory access time  ALU error rates Do Nothing (Elastic User, Robust App) Change Algorithm Parameters (Codec Setting, Duty Cycle Ratio) Change Algorithm Implementation (Alternate code path, Dynamic recompilation) Change Hardware Operating Point (Disabling parts of the cache, Changing V-f) 12 Sensors Models Metadata Mechanisms: Reflection, Introspection

13 UnO Computing Machines: Taxonomy of Underdesign 13 Nominal Design Hardware Characterization Tests Die Specific Adaptation D Signature Burn In Manufacturing Manufactured Die Performance Constraints D D D D D D D D D D D D D D D D D D D Manufactured Die With Stored Signatures Puneet Gupta/UCLA Hardware Software

14 Several Fundamental Questions How do we distinguish between codes that need to be accurate versus that can be not so? – How fine grain are these (or have to be)? How do we communicate this information across the stack in a manner that is robust and portable? – And error controllable (=safe). What is the model of error that should be used in designing UNO machines? 14

15 Building Machines that leverage move from Crash & Recover to Sense & Adapt 15

16 Expedition Grand Challenge & Questions “Can microelectronic variability be controlled and utilized in building better computer systems?” 16 Three Goals: a.Address fundamental technical challenges (understand the problem) b.Create experimental systems (proof of concept prototypes) c.Educational and broader impact opportunities to make an impact (ensure training for future talent). Three Goals: a.Address fundamental technical challenges (understand the problem) b.Create experimental systems (proof of concept prototypes) c.Educational and broader impact opportunities to make an impact (ensure training for future talent). What are most effective ways to detect variability? What are software-visible manifestations? What are software mechanisms to exploit variability? How can designers and tools leverage adaptation? How do we verify and test hw-sw interfaces?

17 Thrusts traverse institutions on testbed vehicles seeding various projects 17 Group A: Signature Detection and Generation Characterizing variability in power consumption for modern computing platforms, and implications Runtime support and software adaptation for variable hardware Probabilistic analysis of faulty hardware Understanding and exploiting variability in flash memory devices FPGA-based variability simulator Group B: Variability Mitigation Measures Mitigating variability in solid-state storage devices Hardware solutions to better understand and exploit variability VarEmu emulation-based testbed for variability-aware software Variability-aware opportunistic system software stack Application robustification for stochastic processors Group C: Opportunistic Software and Abstractions Effective error resilience Negative bias temperature instability and electromigration Memory-variability aware runtime systems Design-dependent ring oscillator and software testbed Executing programs under relaxed semantics

18 Instruction-level Vulnerability (ILV) Sequence-level Vulnerability (SLV) Procedure-level Vulnerability (PLV) Task-level Vulnerability (TLV) Monitor manifestations from instructions levels to task levels. Observe and Control Variability Across Stack By the time, we get to TLV, we are into a parallel software context: instruct OpenMP scheduler, even create an abstraction for programmers to express irregular and unstructured parallelism (code refactoring). The steps to build variability abstractions up to the SW layer [ILV,SLV,PLV,TLV] Rahimi et al, DATE’12, ISLPED’12, TC’13, DATE’13

19 Closer to HW: Uncertainty Manifestations The most immediate manifestations of variability are in path delay and power variations. – Path delay variations has been addressed extensively in delay fault detection by test community. With Variability, it is possible to do better by focusing on the actual mechanisms – For instance, major source of timing variation is voltage droops, and errors matter when these end up in a state change. 19 Combine these two observations and you get a rich literature in recent years for handling variability induced errors: Razor, EDA, TRC, …

20 Detecting and Correcting Timing Errors Detect error, tune supply voltage to reach an error rate, borrow time, stretch clock – Exploit detection circuits (e.g., voltage droops), double sampling with shadow latches, Exploit data dependence on circuit delays – Enable reduction in voltage margin – Manage timing guardbands and voltage margins – Tunable Replica allow non-intrusive operation. 20 Voltage droop

21 Sensing: Razor, RazorII, EDS, Bubble Razor Double Sampling (Razor I) [Ernest’03] Transition Detector with Time Borrowing [Bowman’09] Razor II [Das’09] Double Sampling with Time Borrowing [Bowman’09] EDS [Bowman ‘11]

22 Task Ingredients: Model, Sense, Predict, Adapt I.Sense & Adapt Observation using in situ monitors (Razor, EDS) with cycle- by-cycle corrections (leveraging CMOS knobs or replay) 22 Sense (detect) Adapt (correct) SensorsModel Prevent II.Predict & Prevent Relying on external or replica monitors  Model-based rule  derive adaptive guardband to prevent error

23 CHARACTERIZE, MODEL, PREDICT Don’t Fear Errors: Bits Flip, Instructions Don’t Always Execute Correctly 23 Bit Error Rate, Timing Error Rate, Instruction Error Rate, ….

24 Characterize Instructions and Instruction Sequences for Vulnerability to timing errors Characterize LEON3 in 65nm TSMC across full range of operating conditions: (-40°C−125°C, 0.72V−1.1V) 24 Critical path (ns) Dynamic variations cause the critical path delay to increase by a factor of 6.1×.

25 Generate ILV, SLV “Metadata” The ILV (SLV) for each instruction i (sequence i ) at every operating condition is quantified: – where N i (M i ) is the total number of clock cycles in Monte Carlo simulation of instruction i (sequence i ) with random operands. – Violation j indicates whether there is a violated stage at clock cycle j or not. ILV i (SLV i ) defined as the total number of violated cycles over the total simulated cycles for the instruction i (sequence i ). Now, I am going to make a jump over characterization data…

26 Observe: The execute and memory parts are sensitive to V/T variations, and also exhibit a large number of critical paths in comparison to the rest of processor. Hypothesis: We anticipate that the instructions that significantly exercise the execute and memory stages are likely to be more vulnerable to V/T variations  Instruction-level Vulnerability (ILV) V DD = 1.1V T= 125°C Connect the dots from paths to Instructions For SPARC V8 instructions (V, T, F) are varied and ILV i is evaluated for every instruction i with random operands; SLV i is evaluated for a high-frequent sequence i of instructions.

27 1) Classify Instructions in 3 Classes Instructions are partitioned into three main classes: (i) Logical & arithmetic; (ii) Memory; (iii) Multiply & divide. The 1st class shows an abrupt behavior when the clock cycle is slightly varied, mainly because the path distribution of the exercised part by this class is such that most of the paths have the same length, then we have all-or-nothing effect, which implies that either all instructions within this class fail or all make it. (V, T)(0.88V, -40°C)(0.88V, 0°C)(0.88V, 125°C) Cycle time (ns)11.021.061.081.101.1211.021.061.101.121.041.061.081.101.161.18 Logical & Arithmetic add10000010000100000 and10000010000100000 or10000010000100000 sll10000010000100000 sra10000010000100000 srl10000010000100000 sub10000010000100000 xnor10000010000100000 xor10000010000100000 Mem load10.824000010.70700010.7960000 store10.847000010.74300010.8230000 Mul. &Div mul10.9960.0640.0270.017010.9960.0650.018010.876 0.016060 div10.9910.989 0.984010.9940.9910.973010.991 0.9840 ILV at 0.88V, while varying temperature:

28 2) Check them across temperature Corners(0.72V, -40°C)(0.72V, 0°C)(0.72V, 125°C) Cycle time (ns) 4.104.124.144.163.583.603.623.643.662.882.902.922.942.983.003.20 Logical & Arithmetic add1000100001000000 and1000100001000000 or1000100001000000 sll1000100001000000 sra1000100001000000 srl1000100001000000 sub1000100001000000 xnor1000100001000000 xor1000100001000000 Mem load10.823 01 001 0.796 0 store10.847 01 001 0.823 0 Mul.& Div mul10.995 010.9960.9940010.9980.9970.996 0 div10.995 01 0.812010.994 0.9930.991 0 ILV at 0.72V, while varying temperature: All instruction classes act similarly across the wide range of operating conditions: as the cycle time increases gradually, the ILV becomes 0, firstly for the 1 st class, then for the 2 nd class, and finally for the 3 rd class. For every operating conditions ILV (3 rd Class) ≥ ILV (2 nd Class) ≥ ILV (1 st Class)

29 3) Classify Instruction Sequences CT (ns) Seq1Seq2Seq3Seq4Seq5Seq6Seq7Seq8Seq9Seq10Seq11Seq12Seq13Seq14Seq15Seq16Seq17Seq18Seq19Seq20 1.2611111111111111111111 1.2711111111111111111110.690 1.2811111111111111111110 1.2911111111111111111110 1.3011111111111111111110 1.3111111111111111111110 1.3211111111111111111110 1.330.8780.8110.8810.8800.8840.8920.8770.8590.8790.7580.883 0.811 0.9520.8110.8050.8100 1.340.3660.8110.5150.5120.3930.429010.85903010.4030.4070.811 0.8050.8100 1.3500000000000000000000 SLV at (0.81V, 125C) The top 20 high-frequent sequences (Seq1-Seq20) are extracted from 80 Billion dynamic instructions of 32 benchmarks. Sequences are classified into two classes based on their similarities in SLV values: Class I (Seq20) only consists of the arithmetic/logical instructions. Class II (Seq1-Seq19) is a mixture of all types of instructions including the memory, arithmetic/logical, and control instructions.

30 Classification of Sequence of Instructions (2/3) CT (ns)Seq1Seq2Seq3Seq4Seq5Seq6Seq7Seq8Seq9Seq10Seq11Seq12Seq13Seq14Seq15Seq16Seq17Seq18Seq19Seq20 1.36 11111111111111111110.475 1.37 11111111111111111110 1.38 11111111111111111110 1.39 11111111111111111110 1.40 11111111111111111110 1.41 11111111111111111110 1.42 0.8780.8110.8810.8800.8840.8920.8770.8590.9880.7580.8820.8830.811 0.8150.8700.8110.8070.8100 1.43 01 0.4790.396060401 0.90101 0.811 0.8100.8050.1310 1.44 00000000000000000000 SLV at (0.81V, -40C). Same trend with 165°C temperature variations. (V,T) = (0.81V, 125°C)(V,T) = (0.72V, 125°C) CT (ns)Class IIClass ICT (ns)Class IIClass I 1.26111.7811 1.2710.691.7910.58 1.28101.8010 1.29101.8110 1.30101.8210 1.31101.8310 1.32101.840.810 1.330.8101.850.130 1.340.8101.8600 1.35001.8700 For every operating conditions: SLV (Class II) ≥ SLV (Class I) Sequences in Class II need higher guardbands compared to Class I, because in addition of ALU's critical paths, the critical paths of memory are activated (for the load/store instructions) as well as the critical paths of integer code conditions (for the control instructions).

31 For every operating conditions: ILV (3 rd Class) ≥ ILV (2 nd Class) ≥ ILV (1 st Class) SLV (Class II) ≥ SLV (Class I) ILV and SLV classification for integer SPARC V8 ISA. ILV AND SLV: Partition them into groups according to their vulnerability to timing errors ILV: 1 st class= Logical and arithmetic; 2 nd class= Memory; 3 rd class= Multiply and divide. SLV: Class II= mixtures of memory, logic, control; Class I= logical and arithmetic. For top 20 high-frequency sequence from 80 billion dynamic instructions of 32 benchmarks

32 APPLY: STATICALLY TO ACHIEVE HIGHER INSTRUCTION THROUGHPUT, LOWER POWER Use Instruction Vulnerabilities to Generate Better Code, Call/Returns 32

33 Now Use ILV, SLV to Dynamically Adapt Guardbands VA Compiler Application Code SLV ILV App. type I.Error-tolerant Applications  Duplication of critical instructions  Satisfying the fidelity metric II.Error-intolerant Application  Increasing the percentage of the sequences of ClassI, i.e., increasing the number arithmetic instructions with regard to the memory and control flow instructions, e.g., through loop unrolling technique Adaptive Guardbanding IFIDWB CPM I$I$ D$ PLUT RA EX ME Adaptive Clocking (V,T) Seq i via memory-mapped I/O LEON3 core clock Compile time Runtime Adaptive clock scaling for each class of sequences mitigates the conservative inter- and intra-corner guardbanding. At the runtime, in every cycle, the PLUT module sends the desired frequency to the adaptive clocking circuit utilizing the characterized SLV metadata of the current sequence and the operating condition monitored by CPM.

34 Utilization SLV at Compile Time Applying the loop unrolling produces a longer chain of ALU instructions, and as a result the percentage of sequences of ClassI is increased up to 41% and on average 31%. Hence, the adaptive guardbanding benefits from this compiler transformation technique to further reduce the guardband for sequences of ClassI.

35 Effectiveness of Adaptive Guardbanding Using online SLV coupled with offline compiler techniques enables the processor to achieve 1.6× average speedup for intolerant applications Compared to recent work [Hoang’11], by adapting the cycle time for dynamic variations (inter- corner) and different instruction sequences (intra-corner). Adaptive guardbanding achieves up to 1.9× performance improvement for error-tolerant (probabilistic) applications in comparison to the traditional worst-case design.

36 Example: Procedure Hopping in Clustered CPU, Each core with its voltage domain Statically characterize procedure for PLV A core increases voltage if monitored delay is high A procedure hops from one core to another if its voltage variation is high Less 1% cycle overhead in EEMBC. 36 V DD = 0.81V V DD = 0.99V VA-V DD -Hopping=( 0.81V0.99V, ) f 0 862 f 1 909 f 2 870 f 3 847 f 4 826 f 5 855 f 6 877 f 7 893 f 8 820 f 9 826 f 10 909 f 11 847 f 12 901 f 13 917 f 14 847 f 15 901 f 0 862 f 1 909 f 2 870 f 3 847 f 4 1370 f 5 855 f 6 877 f 7 893 f 8 1370 f 9 1370 f 10 909 f 11 847 f 12 901 f 13 917 f 14 847 f 15 901 f 0 1408 f 1 1389 f 2 1408 f 3 1370 f 4 1370 f 5 1408 f 6 1408 f 7 1408 f 8 1370 f 9 1370 f 10 1389 f 11 1370 f 12 1408 f 13 1408 f 14 1389 f 15 1389

37 HW/SW Collaborative Architecture to Support Intra-cluster Procedure Hopping 37 The code is easily accessible via the shared-L1 I$. The data and parameters are passed through the shared stack in TCDM (Tightly Coupled Data Memory) A procedure hopping information table (PHIT) keeps the status for a migrated procedure.

38 APPLY: MODEL, SENSE, AND ADAPT DYNAMICALLY Combine Characterization with Online Recognition 38

39 Consider a Full Permutation of PVTA Parameters 39 10 32-bit integer, 15 single precision FP Functional Units (FUs) For each FU i working with t clk and a given PVTA variations, we defined Timing Error Rate (TER): Start Point End Point Step # of Points Voltage0.88V1.10V0.01V23 Temperature0°C120°C10°C13 Process (σ WID )0%9.6%3.2%4 Aging (∆V th )0mV100mV25mV5 t clk 0.2ns5.0ns0.2ns25

40 Parametric Model Fitting 40 We used Supervised learning (linear discriminant analysis) to generate a parametric model at the level of FU that relates PVTA parameters variation and t clk to classes of TER. On average, for all FUs the resubstitution error is 0.036, meaning the models classify nearly all data correctly. For extra characterization points, the model makes correct estimates for 97% of out-of-sample data. The remaining 3% is misclassified to the high-error rate class, C H, thus will have safe guardband. HFG ASIC Analysis Flow for TER PVTA t clk TER=0%33%>= TER >0%66%>= TER >33%100%>= TER >66% Class 0 (C 0 )Class Low (C L )Class Medium (C M )Class High (C H ) Classes of TER TER Class Linear discriminant analysis Parametric Model

41 Delay Variation and TER Characterization 41 During design time the delay of the FP adder has a large uncertainty of [0.73ns,1.32ns], since the actual values of PVTA parameters are unknown.

42 Hierarchical Sensors Observability 42 The question is what mix of monitors that would be useful? The more sensors we provide for a FU, the better conservative guardband reduction for that FU. The guardband of FP adder can be reduced up to 8% (P_sensor), 24% (PA_sensors), 28% (PAT_sensors), 44% (PATV_sensors) In-situ PVT sensors impose 1−3% area overhead [Bowman’09] Five replica PVT sensors increase area of by 0.2% [Lefurgy’11] The banks of 96 NBTI aging sensors occupy less than 0.01% of the core's area [Singh’11]

43 Online Utilization of Guardbanding 43 The control system tunes the clock frequency through an online model-based rule. 1.Fine-grained granularity of instruction-by-instruction monitoring and adaptation that uses signals of PATV sensors from individual FUs 2.Coarse-grained granularity of kernel-level monitoring uses a representative PATV sensors for the entire execution stage of pipeline

44 Throughput benefit of HFG 44 Kernel-level monitoring improves throughput by 70% from P to PATV sensors. Target TER=0 Instruction-level monitoring improves throughput by 1.8-2.1X.

45 PUTTING IT TOGETHER: COORDINATED ADAPTATION TO PROPAGATE ERRORS TOWARDS APPLICATION Consider shared 8-FPU 16-core architectures 45

46 Accurate, Approximate Operating Modes Accurate mode: every pipeline uses (with 3.8% area overhead) – EDS circuit sensors to detect any timing errors, ECU to correct errors using multiple-issue operation replay mechanism (without changing frequency) 46 Modeled after STM P2012 16-core machine

47 Accuracy-Configurable Architecture In the approximate mode – Pipeline disables the EDS sensors on the less significant N bits of the fraction where N is reprogrammable through a memory-mapped register. – The sign and the exponent bits are always protected by EDS. – Thus pipeline ignores any timing error below the less significant N bits of the fraction and save on the recovery cost. Switching between modes disables/enables the error detection circuits partially on N bits of the fraction  FP pipeline can efficiently execute subsequent interleaved accurate or approximate software blocks. 47

48 Fine-grain Interleaving Possible Through Coordination and Controlled Approximation Architecture: accuracy-reconfigurable FPUs that are shared among tightly-coupled processors and support online FPV characterization Compiler: OpenMP pragmas for approximate FP computations; profiling technique to identify tolerable error significance and error rate Runtime: Scheduler utilizes FPV metadata and promotes FPUs to accurate mode, or demotes them to approximate mode depending upon the code region requirements. 48 Either ignore the timing errors (in approximate regions) or reduce frequency of errors by assigning computations to correctible hardware resources for a cost. Ensure safety of error ignorance through a set of rules.

49 FP Vulnerability Dynamically Monitored and Controlled by ECU % of cycles with timing errors as reported by EDS sensors captured as FPV metadata Metadata is visible to the software through memory-mapped registers. Enables runtime scheduler to perform on-line selection of best FP pipeline candidates – Low FPV units for accurate blocks, or steer error without correction to application. 49

50 OpenMP Compiler Extension 50 #pragma omp accurate structured-block #pragma omp approximate [clause] structured-block error_significance_threshold ( ) #pragma omp parallel { #pragma omp accurate #pragma omp for for (i=K/2; i <(IMG_M-K/2); ++i) { // iterate over image for (j=K/2; j <(IMG_N-K/2); ++j) { float sum = 0; int ii, jj; for (ii =-K/2; ii<=K/2; ++ii) { // iterate over kernel for (jj = -K/2; jj <= K/2; ++jj) { float data = in[i+ii][j+jj]; float coef = coeffs[ii+K/2][jj+K/2]; float result; #pragma omp approximate error_significance_threshold(20) { result = data * coef; sum += result; } out[i][j]=sum/scale; } } } Code snippet for Gaussian filter utilizing OpenMP variability-aware directives int ID = GOMP_resolve_FP (GOMP_APPROX, GOMP_MUL, 20); GOMP_FP (ID, data, coeff, &result); int ID = GOMP_resolve_FP (GOMP_APPROX, GOMP_ADD, 20); GOMP_FP (ID, sum, result, &sum); Invokes the runtime FPU scheduler programs the FPU

51 FPV Metadata can even drive synthesis! 51 utilizing fast leaky standard cells (low-V TH ) for these paths utilizing the regular and slow standard cells (regular-V TH and high-V TH ) for the rest of paths  since errors can be ignored!

52 Save Recovery Time, Energy using FPV Monitoring (TSMC 45nm) Error-tolerant applications: Gaussian, Sobel filters – PSNR results show error significance threshold at N=20 while maintaining >30 dB 36% more energy efficient FPUs, recovery cycles reduced by 46% 5 kernel codes as error-intolerant applications – 22% average energy savings. 52 ARM v6 core16TCDM banks16 I$ size(per core)16KBTCDM latency2 cycles I$ line4 wordsTCDM size256 KB Latency hit1 cycleL3 latency≥ 60 cycles Latency miss≥ 59 cyclesL3 size256MB Shared-FPUs8FP ADD latency2 FP MUL latency2FP DIV latency18

53 Expedition Experimental Platforms & Artifacts Interesting and unique challenges in building research testbeds that drive our explorations – Mocks up don’t go far since variability is at the heart of microelectronic scaling. Need platforms that capture scaling and integration aspects. Testbeds to observe (Molecule, GreenLight, Ming), control (Oven, ERSA) 53 Ming the Merciless ERSA@BEE3 Molecule Red Cooper

54 Red Cooper Testbed Customized chip with processor + speed/leakage sensors Testbed board to finish the sensor feedback loop on board Used in building a duty-cycled OS based on variability sensors 54 CPU Mem Storage Accelerators Energy SourceNetwork (Batteries) Runtime Microarchitecture and Compilers Applications Vendor Process Ambient Aging Power Performance Errors

55 Ferrari Chip: Closing Loop On-Chip 55 ARM Cortex-M3 JTAG AMBA Bus GPIO Timers PLL RO CLK Config 64 kB IMEM 176 kB DMEM ECCECC Counters 8 banks of sensors (N/P Leak, Temp, Oxide) 19 DDROs GPIO Sens Out On-Chip Sensors – Memory mapped i/o and control – Leakage sensors, DDROs, temperature sensors, reliability sensors Better support for OS and software ARM Cortex -M3 DMEM IMEM DMEM PLLPLL 55 Energy SourceNetwork (Batteries) CPU Mem Storage Accelerators Runtime Microarchitecture and Compilers Applications Vendor Process Ambient Aging Power Performance Errors Available since April 2013

56 Sense-and-Adapt Fundamentally Alters The Stack Machines that consist of parts with variations in performance, power and reliability Machines that incorporate sensing circuits Machines w/ interfaces to change ongoing computation & structures New machine models: QOS or Relaxed Reliability parts. 56

57 Thank You! 57 http://variability.org The Variability Expedition A NSF Expeditions in Computing Project Rajesh K. Gupta Nikil Dutt, UCI Punit Gupta, UCLA Mani Srivastava, UCLA Steve Swanson, UCSD Lara Dolecek, UCLA Subhashish Mitra, Stanford YY Zhou, UCSD Tajana Rosing, UCSD Alex Nicolau, UCI Ranjit Jhala, UCSD Sorin Lerner, UCSD Rakesh Kumar, UIUC Dennis Sylvester, UMich Yuvraj Agrawal, CMU Lucas Wanner, UCLA


Download ppt "From Crash-and-Recover to Sense- and-Adapt: Our Evolving Models of Computing Machines Rajesh K. Gupta UC San Diego."

Similar presentations


Ads by Google