Presentation is loading. Please wait.

Presentation is loading. Please wait.

Low Power and Reliable Design for Emerging Technologies

Similar presentations


Presentation on theme: "Low Power and Reliable Design for Emerging Technologies"— Presentation transcript:

1 Low Power and Reliable Design for Emerging Technologies
Yuanqing Cheng Assistant Professor CADET Laboratory School of Electronic and Information Engineering Beihang University 6/30/2019

2 Introduction to Myself
2012, Ph.D. degree from Institute of Computing Technology, Chinese Academy of Sciences , post-doc research, CNRS/Lirmm Laboratory, Montpellier, France Co-advisors, Patrick Girard & Aida Todri-Sanial Join Beihang University since Dec , visiting scholar of University of California, Santa Barbara, CA, US. 6/30/2019

3 Outline Reliable Design for 3D Integration Circuits
Low Power Design and Reliable Design for Emerging Memory Technologies STT-MRAM Carbon Nanotube Conclusions 6/30/2019

4 Topic 1: Reliable Design for 3D Integration Circuits
Introduction to 3D IC Electromigration Elimination Techniques for 3D ICs Power Supply Noise Reduction Technique for 3D ICs 6/30/2019

5 Alleviating Through Silicon Via Electromigration for Three-dimensional Integrated Circuits Taking Advantage of Self-healing Effect [TVLSI’16] Yuanqing Cheng1, Aida Todri-Sanial2, Jianlei Yang3, Weisheng Zhao1 1, The School of Electrical and Information Engineering, Beihang University, Beijing, China 2, LIRMM, CNRS/University of Montpellier, Montpellier, France 3, ECE Department, University of Pittsburgh, PA, USA 6/30/2019

6 Electromigration Elimination Techniques for 3D ICs
Advantages of 3D Smaller global timing delay; Smaller interconnect power consumptions; Higher integration density (smaller form factor) Integration of disparate technologies Challenges of 3D: Chip yield due to novel fabrication process Thermal related issues Higher current density threatening reliability of 3D ICs… Through Silicon Via (TSV) F2B bonding Through Silicon Via (TSV) B2B bonding F2F bonding Through Silicon Via (TSV) C4 bump 6/30/2019

7 Electromigration in 2D ICs
High current density Mass transportation of metal atoms Void & hillock formation Interconnect breakdown or short Hillock Void Metal atom Current flow 6/30/2019

8 Electromigration in 3D ICs
Higher current density due to higher power consumptions of multiple tiers Thermal cycling Discontinuous bonding interface of TSV TSV defects Filling voids Misalignment Bonding interface contamination TSV breakdown due to EM [T. Frank et al. IRPS, 2011] 6/30/2019

9 Related Work TSV EM effect modeling
Pak et al. evaluated EM impact on TSVs from the layout perspective and provided some guidelines for EM-robust TSV design [ECTC’2011] Chen et al. proposed a TSV EM model based on finite element method to predict failure positions within a single TSV [ICEPT-HDP’2010] Frank et al. explored EM impact on TSV resistance and derived an analytical formula to describe the relationship [IRPS’2011] 2D interconnect EM effect investigation Gonzalez et al. investigated shape effect on electromigration for metal interconnects [Microelectronics and reliability, 1997] J. Abella et al. proposed an EM mitigation technique by alternating current flows within signal interconnects [Micro’08] Li et al. emphasized the importance of considering EM reliability across the whole work flow from foundry fabrication up to system design [ASP-DAC’15] Guan et al. analyzed EM effect on signal line reliability, which carries AC current and proposed a theoretical model to quantify healing effect due to AC currents [ECTC’15] 6/30/2019

10 TSV defects filling void misalignment bonding interface contamination
[Fraunhofer] [Ziptronix ] 6/30/2019

11 Current flows from A to B continuously and causes EM effect.
B 1 current flow A→B : 1 B→A : 0 1. Original TSV state ‘0’; 2. A sends ‘1’ to B. Current flows from A to B to charge TSV signal line; 3. B sends ‘0’ to A. Current flows from A to B to discharge TSV signal line. Current flows from A to B continuously and causes EM effect. 6/30/2019

12 Online self-healing circuit
Off-line defective TSV detection Fault map Switch network Defective TSV Online self-healing circuit 6/30/2019

13 Online self-healing circuit
Off-line defective TSV detection Fault map Switch network Defective TSV Online self-healing circuit 6/30/2019

14 Neighboring EM mitigation module sharing
TSV EM mitigation module Defective TSV EM mitigation module EM mitigation module EM mitigation module EM mitigation module 6/30/2019

15 Online self-healing circuit
Off-line defective TSV detection Fault map Switch network Defective TSV Online self-healing circuit 6/30/2019

16 Depending on the current direction,
control whether to change the current flow or not. Judge the current direction Synchronous with the top tier and recover the reversed signal at the receive end. 6/30/2019

17 Simulation target Configuration 6/30/2019

18 %3 defective TSV rate %1 and %5 defective TSV rate 6/30/2019

19 Power Supply Noise-Aware Workload Assignments for Homogeneous 3D MPSoCs with Thermal Consideration
Yinglin Zhao1,2, Jianlei Yang1,3, Weisheng Zhao1,2, Aida Todri-Sanial*3, Yuanqing Cheng*2 1.Fert Beijing Research Institute, BDBC 2. School of Electrical and Information Engineering, Beihang University, Beijing, China 3. School of Computer Science and Engineering, Beihang University, Beijing, China 3. LIRMM, University of Montpellier / CNRS, Montpellier, France 6/30/2019

20 Introduction Power supply to 3D ICs 6/30/2019

21 6/30/2019

22 Step1: input the core architecture, technology parameters to set up the architecture-level simulator. Step2: convert power traces into current traces and fed into the 3D MPSoC PDN model for PSN calculations. Step3: formulate the task scheduling problem and propose a heuristic algorithm to solve it. 6/30/2019

23 Introduction to Spintronics
“Electron does not have only a charge, but also a spin” Is it possible to construct a practical electronic device that operates on the spin of the electron, rather than its charge? Albert Fert Peter Grünberg Giant MagnetoResistance (GMR) A.Fert et al., PRL, 1988 FM: Ferromagnetic NM: Non Magnetic (Metal) Claude Chappert, Albert Fert, Nature Materials, 2007 6/30/2019

24 GMR was a good success story of high technology
track Read head of hard disc drive GMR sensor 5 nm Magnetic fields generated by the media 1997 (before GMR) : 1 Gbit/in2 , 2064 : GMR heads ~ 800 Gbit/in2 voltage current I 6/30/2019

25 The High R of MTJ is similar to R of transistors
High TMR ratio up to 600% High Resistance similar to semiconductor transistors 6/30/2019

26 MRAM R&D started from 1996 “1” “0”
1996 Darpa(MIT,Honeywell,Motorola,IBM): (MRAM: Magnetic Random Access Memory) P for low resistance Free Layer Barrier Reference layer “1” “0” AP for high resistance MgO Bit Line Word Line Source Line MRAM 1 MTJ + 1 NMOS 6/30/2019

27 Perpendicular Vs. In-Plane STT-MRAM
6/30/2019

28 Comparasions with Other Memory Technologies
6/30/2019

29 MTJ for Beihang 6/30/2019

30 Write Energy Optimizations for STT-MRAM
LLCs by Data Pattern Recogonition [ISVLSI’18] 6/30/2019

31 Motivation STT-MRAM write procedure Write energy challenge 6/30/2019
The write procedure of STT-MRM is to inject a spin polarized current from bitline to source line or the reverse direction depending on the data written to the cell. The write current of STT-MRAM is usually much larger than the read current. In addition, note the write current formula in the slide, with the shrinking of write time, the write current increases remarkably as well. We performed a simulation with the parameters provided by MTJ models developed by Purdue University, and performed the simulation on NVSim, the write energy comparisons of SRAM and STT-MRAM are shown in the figure. We can observe that there is a huge gap of the energy consumed by SRAM and that of STT-MRAM. It is imperative to reduce write energy overhead for STT-MRAM. 6/30/2019

32 Related Work Hybrid LLC cache [HPCA’09, NANOARCH’17]
Relax non-volatility [HPCA’64] Write procedure optimizations Early-write termination [ICCAD’09] AP-P state reversion [DATE’14] Multi-level cell and adaptive writing [ISQED’14] In fact, there are already some research efforts to optimize the STT-MRAM based LLC energy consumption. One solution is the hybrid cache design. By combining SRAM and STT-MRAM together, we can take advantage of the fast write speed and low write latency of SRAM, and at the same time the leakage energy can be reduced by STT-MRAM. Another solution is to reduce the thermal reliability of 6/30/2019

33 Our Observation The potential of write energy reduction – from the data pattern perspective SPEC2KINT SPEC2KFP 6/30/2019

34 Data Pattern Characterization
Emphasizing on the common case Storing data pattern is better? (16/(4*32) = 12.5%) Expensive!! How about put them in the index table ? 38 even larger than the overhead of the above method ! 6/30/2019

35 Our Proposed Scheme 0010 0010 0010 0010 Pfi=10/100 = 10% PEi=4/8 = 50%
One interesting observation One another question: frequent pattern energy saving ? An example Only a few cache line patterns dominates Pfi=10/100 = 10% PEi=4/8 = 50% Wi=0.05 6/30/2019

36 Capturing Dominating Patterns
Evaluation the potential How to deal with pattern variations of different applications ? Profiling Sorting Filling in the ROM index table 6/30/2019

37 The Big Picture of Our Scheme
Read procedure Write procedure 6/30/2019

38 Experimental Results Experiment Setup Write back energy savings 38%
50% 6/30/2019

39 Sensitivity Analysis Impact of the number of index table entries
Overhead: (16 entries * (16bit pattern code + 4 bit index)) 4/(32*8)=1.6% 6/30/2019

40 Summary Some frequent occurring data pattern dominates (make the common case energy efficient !) Capturing dominating pattern and construct index table to implement an efficient pattern characterization scheme Can reduce write energy significantly (38% for INT and over 50% for FP) with negligible storage overhead 6/30/2019

41 NEAR: A Novel Energy Aware Replacement Policy for STT-MRAM LLCs
[ISCAS’18] 6/30/2019

42 Brief Overview of Our Work
Write energy challenge of STT-MRAM Our contributions A novel cache replacement policy Low overhead hardware implementation 33.6% write energy saving with 0.5% performance overhead [JAP, 46(2013)] 6/30/2019

43 Preliminaries of STT-MRAM
MRAM evolution Pros Fast read speed, non-volatile, Low leakage power, High density… Cons High write latency/energy High cost / read disturbance … Toggle MRAM [Everspin] In-plane STT-MRAM PMA STT-MRAM 6/30/2019

44 Related Work Low power design for STT-MRAM LLCs
Comparisons with SRAM/DRAM [DAC’08] STT-MRAM/SRAM hybrid structure [HPCA’09] Relaxing thermal stability [HPCA’14] Early write termination [ICCAD’09] Swiching characteristics (AP − P transition) [DATE’14] Our angle: cache management policy 6/30/2019

45 Some Interesting Observations
Write challenge of STT-MRAM LLCs Write energy unawareness of traditional cache replacement Compact model of Beihang Univ. 6/30/2019

46 The Proposed Scheme — “NEAR” Policy
The whole working flow CPU SRAM Cache addr write back data MinHash engine = data data data tag MUX 6/30/2019

47 MinHash Engine Implementation
Inspired by MinHash search engine High complexity Long matching latency Our proposed implementation 6/30/2019

48 Further Considerations
Balance between energy saving and performance Overhead analysis Storage (Index ROM 64bytes) vs. MB LLCs Energy overhead (comparators) ~ 4.4bit write energy Latency incurred: 2ns (comparators) 6/30/2019

49 Experimental Setup Architectural configurations
Benchmarks and simulators SPEC2K NVSim for cache structure optimization and evaluation Gem5 for performance evaluations Simulation method: XX instrutions executed, detailed configurations 6/30/2019

50 Experimental Results (1/2)
Performance comparisons 0.5% Write energy savings 33.61% 6/30/2019

51 Experimental Results (2/2)
Parameter sensitivity analyses Trade-off between performance and energy saving (α) Different cache configurations 6/30/2019

52 Summary Investigating write energy optimization
A novel cache replacement policy is proposed Minhash engine based Trade-off between performance and energy saving 33.61% write energy can be saved with 0.5% performance degradation and negligible hardware overhead 6/30/2019

53 An Adaptive 3T-3MTJ Memory Cell Design for STT-MRAM Based LLCs [ICCAD’16, TVLSI’18]
6/30/2019

54 Background Introduction to STT-MRAM and modeling
Some commonly used STT-MRAM cell structures 4T-4MTJ [IEDM’13] 3T-2MTJ [IMW’13] 4T-2MTJ [VLSI’12] 1T-1MTJ [IEDM’09] 2T-2MTJ [IEDM’13] 6/30/2019

55 Our Proposed 3T-3MTJ Design
Combination of Ref. Sensing & Diff. Sensing Diff. Sensing Ref. 6/30/2019

56 Validations of the 3T-3MTJ Design
Waveforms of a single read operation 6/30/2019

57 Robustness of 3T-3MTJ Cell Structure
Stage 1 Sensing Monte Carlo Simulation Settings Stage 2 Sensing 6/30/2019

58 Comparisons (Cell Level)
Layout and area comparisons Read energy and write energy comparisons Area(F2) Read energy (pJ) Read latency (ns) Write energy (pJ) Write latency 1T-1MTJ 27.36 0.4 2.3 4.7 - 2T-2MTJ 66.96 0.026 0.2 9.4 3T-3MTJ 40.68 0.5/2=0.25 3/2=1.5 7 6/30/2019

59 The Memory Array Structure
6/30/2019

60 Comparasions (Array Level)
Array area Read energy Read performance (latency) Write energy 6/30/2019

61 Reliability Assessment
Write probability analyses Write activities 6/30/2019

62 The Adaptive Cache Design
Performance comparisons 6/30/2019

63 Temperature Impact Analysis and Access Reliability Enhancement for 1T1MTJ STT-RAM
6/30/2019

64 Thermal Analysis of 1T1MTJ STT-RAM
Motivation TMR varies with Temp.. 6/30/2019 64

65 Validation of the Thermal Model
6/30/2019

66 Read/Write Circuit for Evaluation
Sensing circuit Write circuit Parameters 6/30/2019 66

67 Thermal Analaysis of Read Operation
Read margin & energy 6/30/2019 67

68 Read Challenges with Thermal Issue
Error rates 6/30/2019 68

69 Write perf. increases with temp. Write error rate decreases with temp.
How About Write ?? Write operation timing Read ‘1’ Write error rate Write perf. increases with temp. Write error rate decreases with temp. 6/30/2019 69

70 How About the Situation When Coming to 1Xnm ?
Model scaling and validation Read Write Error rate Write perf. Improves with temp. Write energy decreases significantly with temp. due to large resistance variations Read perf. degrades with temp. Read energy decreases slightly with temp. due to sharp increase of MTJ resistance 6/30/2019

71 A Novel SA Design for Thermal Reliablity
Body biasing SA design Read margin comp. Read margin can be improved Read disturb. decreases due to reduced read current 6/30/2019

72 Thermosiphon: A Thermal Aware NUCA Architecture for Write Energy Reduction of the STT-MRAM based LLCs 6/30/2019

73 High Performance Desires Large Memory on Chip
First Processor Single-Core First Core™ Yonah Dual-Core Core™ i7 8 Cores The Intel i7-5960x - 20MB on-chip LLC The Intel Xeon-Phi - 30MB on-chip L2 cache 6/30/2019

74 Leakage Power – Nightmare!
Challenges: High static power consumption due to the CMOS leakage current. High power density which will increase the working temperature of CPU Fig. 1 The power dissipation trend of integrated circuit [1] [1] L. Wilson. International Technology Roadmap for Semiconductors (ITRS)[R] 6/30/2019

75 Promising Thermal Properties
∆ = 𝐻 𝑘 𝑀 𝑠 𝑘 𝐵 𝑇 𝑉 𝑜𝑙 Thermal Properties [2] Write Energy/Latency drop dramatically Read Energy/Latency slight fluctuations ∆ −Thermal stability of MTJ 𝑀 𝑠 − Saturation magnetization 𝑉 𝑜𝑙 − MTJ volume 𝑘 𝐵 − Boltzmann constant 6/30/2019

76 Motivation of thermal aware NUCA design(2/2)
NUMA architecture Current migration policy can’t exploit STT-MRAM’ s full potential 6/30/2019

77 Design and Implementation of “Thermosiphon”
Hot region (light gray in left figure) Cool region(dark grey in the left figure) [3] F. Mesa-Martinez, E. Ardestani and J. Renau. Characterizing Processor Thermal Behavior. In ASPLOS, pages 193–204. ACM, 2010. 6/30/2019

78 Design and Implementation of “Thermosiphon”(Cont.)
Implementation details Boundary bank Access Access Access Access 6/30/2019

79 Experiment setup Cadence Spectre NVSim Gem5 Hotspot 6/30/2019

80 Experimental Results Largest improvement: 7% Hybrid – 1 Hybrid – 2
TNUCA: 5.8% Our work: 7% Hybrid – 2 TNUCA: 2.5% Our Work: 3.9% 6/30/2019

81 Experimental Results (Cont.)
Save 22.5% write energy on average. More write operations have been migrated into hot region compared with T-NUCA 6/30/2019

82 Param. Sensitivity Analysis
According to the counter bit experimental results, the access counter is set to 4 bits, the ratio counter will count to 6 in maximum. Counter refresh policy 6/30/2019

83 Conclusions In this work, with the thermal consideration, we propose a thermal aware NUCA design “Thermosiphon”. The experimental results show that compared to the baseline, our proposed NUCA design can improve the performance by 7% at most, and reduce the write energy by 22.5% on average with only 1.3% extra hardware overhead. 6/30/2019

84 National Natrual Science Foundation Beijing Natural Science Foundation
Acknowledgement National Natrual Science Foundation Beijing Natural Science Foundation The State Key Lab Open Project Funding, CAS. Huawei Technologies 6/30/2019


Download ppt "Low Power and Reliable Design for Emerging Technologies"

Similar presentations


Ads by Google