Download presentation
Presentation is loading. Please wait.
Published byPhillip McKinney Modified over 8 years ago
1
Csaba Andras-Moritz ECE 668 3D IC Technology and Emerging 3D Processors
2
2 Outline Motivation TSV-based 3D IC and Monolithic 3D IC Skybridge Fabric NP-dynamic Skybridge Fabric Skybridge-3D-CMOS Fabric
3
3 Motivation Device scaling challenges V DD, V TH not scaling linearly Secondary effects System level power / performance challenges Interconnection bottleneck Increasing RC Fabrication challenges Lithography limitation Doping control challenges Performance trend [1] Lithography challenge with scaling [1] [1] J. Warnock, "Circuit Design Challenges at the 14nm Technology Node", pp. 464-467, DAC 2011
4
4 TSV 3D ICs: uses normal die process but needs special packaging Monolithic 3D IC: uses smaller vias, applies sequential process for each die TSV 3D IC has easier process and higher throughput Monolithic 3D ICs has better RC reduction Monolithic 3D IC vs. TSV-based 3D IC Implementation of TSV based 3D ICs [2] (via dimension: 5-10 μm) Block-level Monolithic 3D [3] (via dimension: 50-100nm) [2] J. H. Lau, et al., “TSV manufacturing yield and hidden costs for 3D IC integration”, pp. 1031-1042, ECTC, 2010 [3] S. Panth, et al., “High-density integration of functional modules using monolithic 3D-IC technology”, pp. 681-686, ASP-DAC, 2013
5
5 Transistor-level Monolithic 3D: Fine-grained 3D IC with Intra-cell benefits Simplified process for each die due to single type of MOS Less cost of each layer due to less mask layers Uses existing commercial CAD tools for placement and routing Gate-level vs. Transistor-level Monolithic 3D IC Gate-level Monolithic 3D [4] Transistor-level Monolithic 3D [4] S. Panth, et al., "Design challenges and solutions for ultra-high-density monolithic 3D ICs", pp 1-2, S3S 2014
6
6 Inter-layer dielectric to avoid coupling between tiers Monolithic Inter-layer via to connect pull-up and pull-down network Cell-to-cell routing uses metal layers in top-tier Overview of Transistor-Level Monolithic 3D IC
7
7 Monolithic 3D IC Bottom-up Sequential Process
8
8 True Vertical Integration Addresses 3D Device, circuit, connectivity, heat management and manufacturing requirements Follows a template based approach with uniform vertical nanowires Achieves tremendous benefits across all aspects Skybridge: 3D Integrated Framework Abstract View of Envisioned Skybridge Fabric [5] [5] M. Rahman, et al., "Fine-grained 3-D integrated circuit fabric using vertical nanowires", pp. 9.3.1-9.3.7, 3DIC 2015
9
9 Skybridge Fabric Components Fabric assembly by integration of core components Specially architected for 3D Core Fabric Components
10
10 Vertical Gate-all-Around Junctionless Transistor Single type uniform V-GAA Junctionless transistor as active device Simple device structure Uniform doping Device formation by material deposition Junctionless Device Structure and TCAD Simulation results
11
11 Skybridge 3D Circuit Style Dynamic circuit style amenable to physical implementation Uses only single type uniform n-type V-GAA Junctionless transistors Supports compound, cascaded dynamic circuits with both dual rail and single rail inputs Skybridge 3D Circuit Style. A) XOR Schematic; B) HSPICE Simulation; C) XOR Layout [6] [6] M. Rahman, et al., "Skybridge: 3-D Integrated Circuit Technology Alternative to CMOS" 2014; http://arxiv.org/abs/1404.0607
12
12 Various circuit options for optimizations High fan-in support due to dynamic circuit style Skybridge Circuit Styles, and High Fan-In Options Compound vs Cascaded Circuits with Dual Rail and Single Rail Fan-in Sensitivity Analysis
13
13 Follows Skybridge circuit style Utilizes fabric components 32 transistors for Full Adder accommodated in just 4 logic nanowires Full Adder Implementation in Skybridge Full-Adder Layout HSPICE Simulated Waveforms Full Adder Schematic
14
14 Volatile memory with single type transistors No sizing/doping requirements as in SRAM Two cross-coupled dynamic NAND gates for storage Addresses noise, leakage power concerns through circuit level designs Volatile Memory in Skybridge Fabric Skybridge RAM Schematic Simulated HSPICE Waveforms
15
15 Intrinsic fabric features for noise mitigation Engineered GND shielding approach All signal routing through Coaxial structures Noise Mitigation in Skybridge
16
16 Higher gate voltage in “Precharge” transistors to boost current Long interconnect delay mitigation Logic replication, dynamic buffer insertion, CMOS-like inverters in long interconnect paths Signal Pull-Up and Delay Mitigation Inverters in Long Interconnect Path [7] [7] S. Khasanvis, et al., “Architecting Connectivity for Fine-grained 3-D Vertically Integrated Circuits,” NANOARCH, pp. 175-180, 2015.
17
17 Arithmetic circuit design examples with Adders and Multiplier High fan-in circuit designs to evaluate scalability potentials Arithmetic Circuit Design Examples in Skybridge Array Multiplier Design (Block Diagram) 8 and 16 bit CLA designs (Block Diagram)
18
18 Benchmarking with respect to equivalent CMOS designs at 16nm WISP-4: 30x density and 3.5x performance/watt benefits High Bit-Width Arithmetic Circuits: 16-bit CLA design achieves 60.5x density, and 16.5x performance/watt benefits Analytical Interconnect Modeling Results: 10x less interconnect length, and 100x less repeater count Benchmarking Results (Skybridge vs. 2D-CMOS) CLA Throughput (s -1 ) Power (μW) Area (μm 2 ) CMOSSBCMOSSBCMOSSB 4-Bit Multiplier 5.0e95.1e942.3172501.27 4-Bit CLA9.9e910.4e923519.418.70.76 8-Bit CLA4.5e95.7e928723.564.71.34 16-Bit CLA 2.4e93.7e929727.8130.22.15 Benchmarking of Arithmetic Circuits
19
19 Integrated fabric approaches extending Skybridge 3-D concepts to incorporate both n-type and p-type transistors NP-Dynamic-Skybridge (NP-D-SB): an integrated framework to achieve NP-dynamic circuits in vertical nanowires SkyBridge-3D-CMOS(S3DC): an integrated framework to achieve static circuits in vertical nanowires NP-Dynamic Skybridge and Skybridge-3D-CMOS Fabric
20
20 Specifically designed fabric components for incorporating both p- and n-type transistors Vertical Si nanowire array with p- and n-doped regions as building blocks Device engineering for designing both p- and n-type transistors SB-ILC provides Ohmic connection between doping regions Fabric Components Fabric Components [8] [8] J. Shi, et al., “Architecting NP-Dynamic Skybridge,” NANOARCH, pp. 169-174, 2015
21
21 Support for elementary logic gates NAND, NOR with vertically stacked transistors in a single nanowire Compact implementation 5-in NOR needs 5 nanowires in SB, but only one nanowire in NP-D-SB NP-D-SB NOR and NAND Gate NOR GateNAND Gate
22
22 Improved in logic diversity and flexibility Skybridge is limited in AND-of-NANDs logic for compound gate NP-dynamic SB has both OR-of-NORs and AND-of-NANDs gate logics Diversity in logic expression helps to build compact circuit Compound Gates in NP-D-SB OR-of-NORs Gate LogicAND-of-NANDs Gate Logic
23
23 Uses uniform set of {PRE EVA} clock to control circuits No monotonicity problem in cascading of n-type and p-type gates Cascaded Gates in NP-D-SB Cascaded Gates Schematic Cascaded Gates Layout HSPICE Simulation
24
24 Significant benefits for latency, power-latency product and density 3x latency benefits over Skybridge single-rail implementation Over 2x density improvement over Skybridge dual-rail implementation At least 17% Throughput/Power benefit Throughput is worse due to less number of pipelined stages Benchmarking Results (NP-D-SB vs. SB) Benchmarking Evaluation Results
25
25 SB-CMOS follows static CMOS circuit style Signal “In”: routed between stages with routing nanowire Signal “Int0”, “Int1” and “Int2”: routed between stages without routing nanowires Cascaded Inverters in S3DC SB-CMOS Circuit style
26
26 3-in SB-3D-CMOS NAND: 3 nanowires for 3 parallel p-transistors Multiple nanowires shorted together by SB-ILC and bridges S3DC NAND Gate 3-in NAND physical layout Layout legend 3-in NAND schematic
27
27 SB-CMOS full adder implemented with 11 nanowires: 28 transistors in 0.06 um 2, 28X denser than 16nm CMOS technology S3DC Full Adder 1-bit SB-CMOS full adder design 1-bit full adder transistor-level schematic 1-bit SB-CMOS full adder physical layout Layout legend
28
28 S3DC 6T SRAM Cross-coupled INVs for holding value Pass transistors for write / read control Independent read / write access Customize transistor strength with various voltage levels SRAM schematic and physical layout SRAM operations Write operationRead operation Layout legend
29
29 Compared with 16nm-CMOS: Much better power and area efficiency Worse performance Compared with SB: Better latency but lower throughput Better power efficiency and less power consumption Good density Evaluation Results (S3DC vs. SB and 2D-CMOS) Latency( ps) Throughput (Ops./sec.) Power (μW) Performance/Watt (Ops./J) Area ( μm 2 ) SB-CMOS5012E+910.11.98E+141.09 16nm CMOS2014.97E+91722.89E+1350.1 SB (dual-rail)5245.09E+941.31.23E+141.27 Benchmarking Evaluation Results
30
30 Modeling and Simulation of Thermal Profile in 3-D Fine-grain transistor level modeling accounting for thermal conductivity at nanoscale Thermal profiling of 3-D circuits with and without Skybridge Heat Extraction features for the worst case static heat scenario Thermal Evaluation Methodology [9] [9] M. Rahman, et al., “Architecting 3-D Integrated Circuit Fabric with Intrinsic Thermal Management Features, ” NANOARCH, pp. 157-162, 2015
31
31 Analytical calculation of thermal resistance for different FET regions Electrical equivalent representation for HSPICE simulations Thermal Modeling of V-GAA Junctionless Transistor Heat Flow Paths Thermal Resistance Network for the Device Simulation Results
32
32 Up-to 85% average temperature reduction with heat extraction Thermal Simulations for 3-D Circuits
33
33 Evaluation methodology accounting for material structures, device physics, circuit style, and 3-D parasitics Design rules derived from circuit requirements and manufacturing assumptions at 16nm Circuit Evaluation Methodology, and Design Rules Width (nm) X Length (nm) Z Thicknes s (nm) Y Spacing (nm) Bridge (X,Y,Z) 16n- 58n 16n16n-58n16n-37n Transistor Channel (X,Y,X) 16n 58n Transistor Spacing (Z) ---16n Gate Electrode (Z) 29n16n11.5n- Contact (X,Y,Z)26n16n 39 Heat Junction (X,Y,Z) 22n16n6n- Coaxial (Si-M1) (X,Y) 37n- 4n (Si- M1) Coaxial (M1-M2) (X,Y) 58n- 4n (M1- M2) Evaluation Methodology Design Rules
34
34 4-bit fully functional microprocessor (WISP-4) design RISC architecture, 5 pipeline stages Implemented in Skybridge, NP-dynamic-Skybridge and S3DC 3D WISP-4 Microprocessor WISP-4 architecture
35
35 WISP-4: Instruction Fetch and Decode Instruction fetch stage 4-bit CLA for Program Counter 4:16 decoder to decode ROM address 16*9 ROM to store instructions Instruction decode stage 3:8 decoder to decode opcode 2-bit buffers for buffering address and data Instruction Fetch Instruction Decode
36
36 Register access stage Four 4-bit registers for operands Two 4:1 multiplexer and one 2:1 multiplexer for operand selection WISP-4: Register File Register File
37
37 Execution stage 4-bit CLA and multiplier for addition and multiplications A buffer for data buffering Two 2:1 multiplexers for result selection WISP-4: Arithmetic Logic Unit Arithmetic Logic Unit
38
38 WISP-4 Benchmarking Results Compared with 2D CMOS: 30x ~ 60x density benefits Up to 8x power efficiency benefits Up to 2x benefits in throughput WISP-4 Throughput (ops/sec) Power (uW) Power Efficiency (ops/Joule) Density (mm -2 ) 2D CMOS4.31E+98864.86+123.46E+3 Skybridge5.1E+9 (1.19x)301 (0.34x)1.69E+13 (3.46x)1.05E+5 (30x) NP-D-SB9.1E+9 (2.11x)230 (0.26x)3.96E+13 (8.15x)1.96E+5 (56.6x) S3DC4.55E+9 (1.06x)186 (0.21x)2.45E+13 (5.04x)9.43E+4 (27.3x) WISP-4 Benchmarking Results
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.