Download presentation
Presentation is loading. Please wait.
Published byChristiana Rogers Modified over 6 years ago
1
Low Power and Reliable Design for Emerging Technologies
Yuanqing Cheng Assistant Professor CADET Laboratory School of Electronic and Information Engineering Beihang University 6/30/2019
2
Introduction to Myself
2012, Ph.D. degree from Institute of Computing Technology, Chinese Academy of Sciences , post-doc research, CNRS/Lirmm Laboratory, Montpellier, France Co-advisors, Patrick Girard & Aida Todri-Sanial Join Beihang University since Dec , visiting scholar of University of California, Santa Barbara, CA, US. 6/30/2019
3
Outline Reliable Design for 3D Integration Circuits
Low Power Design and Reliable Design for Emerging Memory Technologies STT-MRAM Carbon Nanotube Conclusions 6/30/2019
4
Topic 1: Reliable Design for 3D Integration Circuits
Introduction to 3D IC Electromigration Elimination Techniques for 3D ICs Power Supply Noise Reduction Technique for 3D ICs 6/30/2019
5
Alleviating Through Silicon Via Electromigration for Three-dimensional Integrated Circuits Taking Advantage of Self-healing Effect [TVLSI’16] Yuanqing Cheng1, Aida Todri-Sanial2, Jianlei Yang3, Weisheng Zhao1 1, The School of Electrical and Information Engineering, Beihang University, Beijing, China 2, LIRMM, CNRS/University of Montpellier, Montpellier, France 3, ECE Department, University of Pittsburgh, PA, USA 6/30/2019
6
Electromigration Elimination Techniques for 3D ICs
Advantages of 3D Smaller global timing delay; Smaller interconnect power consumptions; Higher integration density (smaller form factor) Integration of disparate technologies Challenges of 3D: Chip yield due to novel fabrication process Thermal related issues Higher current density threatening reliability of 3D ICs… Through Silicon Via (TSV) F2B bonding Through Silicon Via (TSV) B2B bonding F2F bonding Through Silicon Via (TSV) C4 bump 6/30/2019
7
Electromigration in 2D ICs
High current density Mass transportation of metal atoms Void & hillock formation Interconnect breakdown or short Hillock Void Metal atom Current flow 6/30/2019
8
Electromigration in 3D ICs
Higher current density due to higher power consumptions of multiple tiers Thermal cycling Discontinuous bonding interface of TSV TSV defects Filling voids Misalignment Bonding interface contamination TSV breakdown due to EM [T. Frank et al. IRPS, 2011] 6/30/2019
9
Related Work TSV EM effect modeling
Pak et al. evaluated EM impact on TSVs from the layout perspective and provided some guidelines for EM-robust TSV design [ECTC’2011] Chen et al. proposed a TSV EM model based on finite element method to predict failure positions within a single TSV [ICEPT-HDP’2010] Frank et al. explored EM impact on TSV resistance and derived an analytical formula to describe the relationship [IRPS’2011] 2D interconnect EM effect investigation Gonzalez et al. investigated shape effect on electromigration for metal interconnects [Microelectronics and reliability, 1997] J. Abella et al. proposed an EM mitigation technique by alternating current flows within signal interconnects [Micro’08] Li et al. emphasized the importance of considering EM reliability across the whole work flow from foundry fabrication up to system design [ASP-DAC’15] Guan et al. analyzed EM effect on signal line reliability, which carries AC current and proposed a theoretical model to quantify healing effect due to AC currents [ECTC’15] 6/30/2019
10
TSV defects filling void misalignment bonding interface contamination
[Fraunhofer] [Ziptronix ] 6/30/2019
11
Current flows from A to B continuously and causes EM effect.
B 1 current flow A→B : 1 B→A : 0 1. Original TSV state ‘0’; 2. A sends ‘1’ to B. Current flows from A to B to charge TSV signal line; 3. B sends ‘0’ to A. Current flows from A to B to discharge TSV signal line. Current flows from A to B continuously and causes EM effect. 6/30/2019
12
Online self-healing circuit
Off-line defective TSV detection Fault map Switch network Defective TSV Online self-healing circuit 6/30/2019
13
Online self-healing circuit
Off-line defective TSV detection Fault map Switch network Defective TSV Online self-healing circuit 6/30/2019
14
Neighboring EM mitigation module sharing
TSV EM mitigation module Defective TSV EM mitigation module EM mitigation module EM mitigation module EM mitigation module 6/30/2019
15
Online self-healing circuit
Off-line defective TSV detection Fault map Switch network Defective TSV Online self-healing circuit 6/30/2019
16
Depending on the current direction,
control whether to change the current flow or not. Judge the current direction Synchronous with the top tier and recover the reversed signal at the receive end. 6/30/2019
17
Simulation target Configuration 6/30/2019
18
%3 defective TSV rate %1 and %5 defective TSV rate 6/30/2019
19
Power Supply Noise-Aware Workload Assignments for Homogeneous 3D MPSoCs with Thermal Consideration
Yinglin Zhao1,2, Jianlei Yang1,3, Weisheng Zhao1,2, Aida Todri-Sanial*3, Yuanqing Cheng*2 1.Fert Beijing Research Institute, BDBC 2. School of Electrical and Information Engineering, Beihang University, Beijing, China 3. School of Computer Science and Engineering, Beihang University, Beijing, China 3. LIRMM, University of Montpellier / CNRS, Montpellier, France 6/30/2019
20
Introduction Power supply to 3D ICs 6/30/2019
21
6/30/2019
22
Step1: input the core architecture, technology parameters to set up the architecture-level simulator. Step2: convert power traces into current traces and fed into the 3D MPSoC PDN model for PSN calculations. Step3: formulate the task scheduling problem and propose a heuristic algorithm to solve it. 6/30/2019
23
Introduction to Spintronics
“Electron does not have only a charge, but also a spin” Is it possible to construct a practical electronic device that operates on the spin of the electron, rather than its charge? Albert Fert Peter Grünberg Giant MagnetoResistance (GMR) A.Fert et al., PRL, 1988 FM: Ferromagnetic NM: Non Magnetic (Metal) Claude Chappert, Albert Fert, Nature Materials, 2007 6/30/2019
24
GMR was a good success story of high technology
track Read head of hard disc drive GMR sensor 5 nm Magnetic fields generated by the media 1997 (before GMR) : 1 Gbit/in2 , 2064 : GMR heads ~ 800 Gbit/in2 voltage current I 6/30/2019
25
The High R of MTJ is similar to R of transistors
High TMR ratio up to 600% High Resistance similar to semiconductor transistors 6/30/2019
26
MRAM R&D started from 1996 “1” “0”
1996 Darpa(MIT,Honeywell,Motorola,IBM): (MRAM: Magnetic Random Access Memory) P for low resistance Free Layer Barrier Reference layer “1” “0” AP for high resistance MgO Bit Line Word Line Source Line MRAM 1 MTJ + 1 NMOS 6/30/2019
27
Perpendicular Vs. In-Plane STT-MRAM
6/30/2019
28
Comparasions with Other Memory Technologies
6/30/2019
29
MTJ for Beihang 6/30/2019
30
Write Energy Optimizations for STT-MRAM
LLCs by Data Pattern Recogonition [ISVLSI’18] 6/30/2019
31
Motivation STT-MRAM write procedure Write energy challenge 6/30/2019
The write procedure of STT-MRM is to inject a spin polarized current from bitline to source line or the reverse direction depending on the data written to the cell. The write current of STT-MRAM is usually much larger than the read current. In addition, note the write current formula in the slide, with the shrinking of write time, the write current increases remarkably as well. We performed a simulation with the parameters provided by MTJ models developed by Purdue University, and performed the simulation on NVSim, the write energy comparisons of SRAM and STT-MRAM are shown in the figure. We can observe that there is a huge gap of the energy consumed by SRAM and that of STT-MRAM. It is imperative to reduce write energy overhead for STT-MRAM. 6/30/2019
32
Related Work Hybrid LLC cache [HPCA’09, NANOARCH’17]
Relax non-volatility [HPCA’64] Write procedure optimizations Early-write termination [ICCAD’09] AP-P state reversion [DATE’14] Multi-level cell and adaptive writing [ISQED’14] In fact, there are already some research efforts to optimize the STT-MRAM based LLC energy consumption. One solution is the hybrid cache design. By combining SRAM and STT-MRAM together, we can take advantage of the fast write speed and low write latency of SRAM, and at the same time the leakage energy can be reduced by STT-MRAM. Another solution is to reduce the thermal reliability of 6/30/2019
33
Our Observation The potential of write energy reduction – from the data pattern perspective SPEC2KINT SPEC2KFP 6/30/2019
34
Data Pattern Characterization
Emphasizing on the common case Storing data pattern is better? (16/(4*32) = 12.5%) Expensive!! How about put them in the index table ? 38 even larger than the overhead of the above method ! 6/30/2019
35
Our Proposed Scheme 0010 0010 0010 0010 Pfi=10/100 = 10% PEi=4/8 = 50%
One interesting observation One another question: frequent pattern energy saving ? An example Only a few cache line patterns dominates Pfi=10/100 = 10% PEi=4/8 = 50% Wi=0.05 6/30/2019
36
Capturing Dominating Patterns
Evaluation the potential How to deal with pattern variations of different applications ? Profiling Sorting Filling in the ROM index table 6/30/2019
37
The Big Picture of Our Scheme
Read procedure Write procedure 6/30/2019
38
Experimental Results Experiment Setup Write back energy savings 38%
50% 6/30/2019
39
Sensitivity Analysis Impact of the number of index table entries
Overhead: (16 entries * (16bit pattern code + 4 bit index)) 4/(32*8)=1.6% 6/30/2019
40
Summary Some frequent occurring data pattern dominates (make the common case energy efficient !) Capturing dominating pattern and construct index table to implement an efficient pattern characterization scheme Can reduce write energy significantly (38% for INT and over 50% for FP) with negligible storage overhead 6/30/2019
41
NEAR: A Novel Energy Aware Replacement Policy for STT-MRAM LLCs
[ISCAS’18] 6/30/2019
42
Brief Overview of Our Work
Write energy challenge of STT-MRAM Our contributions A novel cache replacement policy Low overhead hardware implementation 33.6% write energy saving with 0.5% performance overhead [JAP, 46(2013)] 6/30/2019
43
Preliminaries of STT-MRAM
MRAM evolution Pros Fast read speed, non-volatile, Low leakage power, High density… Cons High write latency/energy High cost / read disturbance … Toggle MRAM [Everspin] In-plane STT-MRAM PMA STT-MRAM 6/30/2019
44
Related Work Low power design for STT-MRAM LLCs
Comparisons with SRAM/DRAM [DAC’08] STT-MRAM/SRAM hybrid structure [HPCA’09] Relaxing thermal stability [HPCA’14] Early write termination [ICCAD’09] Swiching characteristics (AP − P transition) [DATE’14] Our angle: cache management policy 6/30/2019
45
Some Interesting Observations
Write challenge of STT-MRAM LLCs Write energy unawareness of traditional cache replacement Compact model of Beihang Univ. 6/30/2019
46
The Proposed Scheme — “NEAR” Policy
The whole working flow CPU SRAM Cache addr write back data MinHash engine = data data data tag MUX 6/30/2019
47
MinHash Engine Implementation
Inspired by MinHash search engine High complexity Long matching latency Our proposed implementation 6/30/2019
48
Further Considerations
Balance between energy saving and performance Overhead analysis Storage (Index ROM 64bytes) vs. MB LLCs Energy overhead (comparators) ~ 4.4bit write energy Latency incurred: 2ns (comparators) 6/30/2019
49
Experimental Setup Architectural configurations
Benchmarks and simulators SPEC2K NVSim for cache structure optimization and evaluation Gem5 for performance evaluations Simulation method: XX instrutions executed, detailed configurations 6/30/2019
50
Experimental Results (1/2)
Performance comparisons 0.5% Write energy savings 33.61% 6/30/2019
51
Experimental Results (2/2)
Parameter sensitivity analyses Trade-off between performance and energy saving (α) Different cache configurations 6/30/2019
52
Summary Investigating write energy optimization
A novel cache replacement policy is proposed Minhash engine based Trade-off between performance and energy saving 33.61% write energy can be saved with 0.5% performance degradation and negligible hardware overhead 6/30/2019
53
An Adaptive 3T-3MTJ Memory Cell Design for STT-MRAM Based LLCs [ICCAD’16, TVLSI’18]
6/30/2019
54
Background Introduction to STT-MRAM and modeling
Some commonly used STT-MRAM cell structures 4T-4MTJ [IEDM’13] 3T-2MTJ [IMW’13] 4T-2MTJ [VLSI’12] 1T-1MTJ [IEDM’09] 2T-2MTJ [IEDM’13] 6/30/2019
55
Our Proposed 3T-3MTJ Design
Combination of Ref. Sensing & Diff. Sensing Diff. Sensing Ref. 6/30/2019
56
Validations of the 3T-3MTJ Design
Waveforms of a single read operation 6/30/2019
57
Robustness of 3T-3MTJ Cell Structure
Stage 1 Sensing Monte Carlo Simulation Settings Stage 2 Sensing 6/30/2019
58
Comparisons (Cell Level)
Layout and area comparisons Read energy and write energy comparisons Area(F2) Read energy (pJ) Read latency (ns) Write energy (pJ) Write latency 1T-1MTJ 27.36 0.4 2.3 4.7 - 2T-2MTJ 66.96 0.026 0.2 9.4 3T-3MTJ 40.68 0.5/2=0.25 3/2=1.5 7 6/30/2019
59
The Memory Array Structure
6/30/2019
60
Comparasions (Array Level)
Array area Read energy Read performance (latency) Write energy 6/30/2019
61
Reliability Assessment
Write probability analyses Write activities 6/30/2019
62
The Adaptive Cache Design
Performance comparisons 6/30/2019
63
Temperature Impact Analysis and Access Reliability Enhancement for 1T1MTJ STT-RAM
6/30/2019
64
Thermal Analysis of 1T1MTJ STT-RAM
Motivation TMR varies with Temp.. 6/30/2019 64
65
Validation of the Thermal Model
6/30/2019
66
Read/Write Circuit for Evaluation
Sensing circuit Write circuit Parameters 6/30/2019 66
67
Thermal Analaysis of Read Operation
Read margin & energy 6/30/2019 67
68
Read Challenges with Thermal Issue
Error rates 6/30/2019 68
69
Write perf. increases with temp. Write error rate decreases with temp.
How About Write ?? Write operation timing Read ‘1’ Write error rate Write perf. increases with temp. Write error rate decreases with temp. 6/30/2019 69
70
How About the Situation When Coming to 1Xnm ?
Model scaling and validation Read Write Error rate Write perf. Improves with temp. Write energy decreases significantly with temp. due to large resistance variations Read perf. degrades with temp. Read energy decreases slightly with temp. due to sharp increase of MTJ resistance 6/30/2019
71
A Novel SA Design for Thermal Reliablity
Body biasing SA design Read margin comp. Read margin can be improved Read disturb. decreases due to reduced read current 6/30/2019
72
Thermosiphon: A Thermal Aware NUCA Architecture for Write Energy Reduction of the STT-MRAM based LLCs 6/30/2019
73
High Performance Desires Large Memory on Chip
First Processor Single-Core First Core™ Yonah Dual-Core Core™ i7 8 Cores The Intel i7-5960x - 20MB on-chip LLC The Intel Xeon-Phi - 30MB on-chip L2 cache 6/30/2019
74
Leakage Power – Nightmare!
Challenges: High static power consumption due to the CMOS leakage current. High power density which will increase the working temperature of CPU Fig. 1 The power dissipation trend of integrated circuit [1] [1] L. Wilson. International Technology Roadmap for Semiconductors (ITRS)[R] 6/30/2019
75
Promising Thermal Properties
∆ = 𝐻 𝑘 𝑀 𝑠 𝑘 𝐵 𝑇 𝑉 𝑜𝑙 Thermal Properties [2] Write Energy/Latency drop dramatically Read Energy/Latency slight fluctuations ∆ −Thermal stability of MTJ 𝑀 𝑠 − Saturation magnetization 𝑉 𝑜𝑙 − MTJ volume 𝑘 𝐵 − Boltzmann constant 6/30/2019
76
Motivation of thermal aware NUCA design(2/2)
NUMA architecture Current migration policy can’t exploit STT-MRAM’ s full potential 6/30/2019
77
Design and Implementation of “Thermosiphon”
Hot region (light gray in left figure) Cool region(dark grey in the left figure) [3] F. Mesa-Martinez, E. Ardestani and J. Renau. Characterizing Processor Thermal Behavior. In ASPLOS, pages 193–204. ACM, 2010. 6/30/2019
78
Design and Implementation of “Thermosiphon”(Cont.)
Implementation details Boundary bank Access Access Access Access 6/30/2019
79
Experiment setup Cadence Spectre NVSim Gem5 Hotspot 6/30/2019
80
Experimental Results Largest improvement: 7% Hybrid – 1 Hybrid – 2
TNUCA: 5.8% Our work: 7% Hybrid – 2 TNUCA: 2.5% Our Work: 3.9% 6/30/2019
81
Experimental Results (Cont.)
Save 22.5% write energy on average. More write operations have been migrated into hot region compared with T-NUCA 6/30/2019
82
Param. Sensitivity Analysis
According to the counter bit experimental results, the access counter is set to 4 bits, the ratio counter will count to 6 in maximum. Counter refresh policy 6/30/2019
83
Conclusions In this work, with the thermal consideration, we propose a thermal aware NUCA design “Thermosiphon”. The experimental results show that compared to the baseline, our proposed NUCA design can improve the performance by 7% at most, and reduce the write energy by 22.5% on average with only 1.3% extra hardware overhead. 6/30/2019
84
National Natrual Science Foundation Beijing Natural Science Foundation
Acknowledgement National Natrual Science Foundation Beijing Natural Science Foundation The State Key Lab Open Project Funding, CAS. Huawei Technologies 6/30/2019
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.