An Automated Design Flow for 3D Microarchitecture Evaluation

Slides:



Advertisements
Similar presentations
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
Advertisements

An International Technology Roadmap for Semiconductors
A Novel 3D Layer-Multiplexed On-Chip Network
1 Wire-driven Microarchitectural Design Space Exploration School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30332,
Wires.
A T HERMAL -D RIVEN F LOORPLANNING A LGORITHM FOR 3D IC S Jason Cong, Jie Wei, and Yan Zhang ICCAD
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.
National Tsing Hua University Po-Yang Hsu,Hsien-Te Chen,
Krit Athikulwongse, Dae Hyun Kim, Moongon Jung, and Sung Kyu Lim
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
Yuchun Ma Joint Work with Jason Cong, Yongxiang Liu, Glenn Reinman, and Yan Zhang International Center for Design on Nanotechnologies Workshop.
1 Thermal Via Placement in 3D ICs Brent Goplen, Sachin Sapatnekar Department of Electrical and Computer Engineering University of Minnesota.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
3D CMP and 3D IC Physical Design Flow Jason Cong and Guojie Luo University of California, Los Angeles {cong, cong,
1 Exploring Design Space for 3D Clustered Architectures Manu Awasthi, Rajeev Balasubramonian University of Utah.
Delay and Power Optimization with TSV-aware 3D Floorplanning M. A. Ahmed and M. Chrzanowska-Jeske Portland State University, Oregon, USA ISQED 2014.
SLIP 2000April 9, Wiring Layer Assignments with Consistent Stage Delays Andrew B. Kahng (UCLA) Dirk Stroobandt (Ghent University) Supported.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in partitioned architectures Rajeev Balasubramonian Naveen.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
CS 7810 Lecture 15 A Case for Thermal-Aware Floorplanning at the Microarchitectural Level K. Sankaranarayanan, S. Velusamy, M. Stan, K. Skadron Journal.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Temperature Aware Microprocessor Floorplanning Considering Application Dependent Power Load *Chunta Chu, Xinyi Zhang, Lei He, and Tom Tong Jing Electrical.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Physical Planning for the Architectural Exploration of Large-Scale Chip Multiprocessors Javier de San Pedro, Nikita Nikitin, Jordi Cortadella and Jordi.
Presentation for Advanced VLSI Course presented by:Shahab adin Rahmanian Instructor:Dr S. M.Fakhraie Major reference: 3D Interconnection and Packaging:
Introspective 3D Chips S. Mysore, B. Agrawal, N. Srivastava, S. Lin, K. Banerjee, T. Sherwood (UCSB), ASPLOS 2006 Shimin Chen (LBA Reading Group Presentation)
Avogadro-Scale Engineering: Form and Function MIT, November 18, Three Dimensional Integrated Circuits C.S. Tan, A. Fan, K.N. Chen, S. Das, N.
Power Reduction for FPGA using Multiple Vdd/Vth
CAD for Physical Design of VLSI Circuits
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
MARS A Scan-Island Based Design Enabling Pre-Bond Testability in Die-Stacked Microprocessors Dean L. Lewis Hsien-Hsin S. Lee Georgia Institute of Technology.
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
Abhishek Pandey Reconfigurable Computing ECE 506.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Impact of Interconnect Architecture on VPSAs (Via-Programmed Structured ASICs) Usman Ahmed Guy Lemieux Steve Wilton System-on-Chip Lab University of British.
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
Test Architecture Design and Optimization for Three- Dimensional SoCs Li Jiang, Lin Huang and Qiang Xu CUhk Reliable Computing Laboratry Department of.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Feb 14 th 2005University of Utah1 Microarchitectural Wire Management for Performance and Power in Partitioned Architectures Rajeev Balasubramonian Naveen.
Physical Synthesis Buffer Insertion, Gate Sizing, Wire Sizing,
Routability-driven Floorplanning With Buffer Planning Chiu Wing Sham Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University.
System in Package and Chip-Package-Board Co-Design
Floorplanning Optimization with Trajectory Piecewise-Linear Model for Pipelined Interconnects C. Long, L. J. Simonson, W. Liao and L. He EDA Lab, EE Dept.
Simultaneous Multi-Layer Access Improving 3D-Stacked Memory Bandwidth at Low Cost Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Khan, Onur Mutlu.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
Microprocessor Design Process
Die Stacking (3D) Microarchitecture Bryan Black, Murali Annavaram, Ned Brekelbaum, John DeVale, Lei Jiang, Gabriel H. Loh1, Don McCauley, Pat Morrow, Donald.
A Case for Standard-Cell Based RAMs in Highly-Ported Superscalar Processor Structures Sungkwan Ku, Elliott Forbes, Rangeen Basu Roy Chowdhury, Eric Rotenberg.
Introduction to ASICs ASIC - Application Specific Integrated Circuit
CS203 – Advanced Computer Architecture
Integrated Circuits.
The Interconnect Delay Bottleneck.
Architecture and Synthesis for Multi-Cycle Communication
3-D IC Fabrication and Devices
ELEC 7770 Advanced VLSI Design Spring 2016 Introduction
A Quantitative Analysis of Stream Algorithms on Raw Fabrics
Architecture & Organization 1
ELEC 7770 Advanced VLSI Design Spring 2014 Introduction
Xuechao Wei, Peng Zhang, Cody Hao Yu, and Jim Wu
Architecture & Organization 1
Overview of VLSI 魏凱城 彰化師範大學資工系.
Masamitsu Tanaka, Nagoya Univ.
Leveraging Optical Technology in Future Bus-based Chip Multiprocessors
Die Stacking (3D) Microarchitecture -- from Intel Corporation
Computer Evolution and Performance
A Case for Interconnect-Aware Architectures
Presentation transcript:

An Automated Design Flow for 3D Microarchitecture Evaluation Jason Cong1 Ashok Jagannathan1 Yuchun Ma1,2 Glenn Reinman1 Jie Wei1 Yan Zhang1 1University of California, Los Angeles, California 90095, U. S. A. 2Tsinghua University, Beijing, P.R.China Partially Supported by DARPA/MTO though CFDRC

Outline Introduction 3D Microarchitecture Evaluation Flow a.  3D thermal model b.  3D floorplanning  c.  IPC model and performance simulation d.  3D global routing and thermal via insertion  Case Study for a Design Driver Conclusions and Future Works 2018/12/4 UCLA VLSICAD LAB

Outline Introduction 3D Microarchitecture Evaluation Flow a.  3D thermal model b.  3D floorplanning  c.  IPC model and performance simulation d.  3D global routing and thermal via insertion  Case Study for a Design Driver Conclusions and Future Works 2018/12/4 UCLA VLSICAD LAB

3-D IC Technology Alternatives Block level integration vertical pitch>5m vertical density<40k/mm2 Chip level integration (3DMCM) vertical interconnect pitch>50m vertical interconnect density< 20/mm (400/mm2) Cell level integration vertical pitch>200nm vertical density<25M/mm2 There are several groups 3D technologies of different integration levels. Chip level integration is a pretty mature technology. It is also called 3-D MCM. Chips are integrated together using packaging techniques. The vertical interconnects are pretty big so the vertical interconnect density is very low. Block level integration will integrate fabricated blocks through techniques like wafer bonding. Interlayer vias will be etched, or drilled through the fabricated wafers. The vertical interlayer via size is in microns. Cell level integration is the ultimate 3-D IC manufacturing technique, where the multiple device layers are fabricated one on top of another. In this case, the vertical interconnect pitch will be similar to the size of a metal wire and via. And the vertical interconnect density is very high. However, this kind of technology is still in early research stage, and it’s mostly used for making memories right now. Our current work is more focused on the block level integration techniques, but the underlying algorithm can be applied to other techniques as well. 2018/12/4 UCLA VLSICAD LAB

3D ICs and Microarchitecture 3D IC Benefits Reduction of Interconnect delay – increase of performance; Reduction of wirelength – reduction of capacitive power; Increase of integration density; Allow heterogeneous (logic, DRAM, RF, etc) integration 3D IC Concerns Thermal constraint Cost No existing flow to evaluate 3D implementations of architectures systematically Not clear how much wirelength gain can turn into performance gain (in terms of BIPS) Thermal impact is not known 2018/12/4 UCLA VLSICAD LAB

Our Contribution: MEVA-3D An Automated Design Flow for 3D Architecture Evaluation (MEVA-3D) Evaluate 3D implementations of micro-architectures systematically in terms of both performance and thermal. Goal: build a physical prototype of the 3D architecture MEVA-3D Flow Automated 2D/3D floorplanning; Reduce the latency along critical loops in the mico-architecture by considering interconnect pipelining at a given target frequency. 3D routing and thermal optimization Performance Evaluation Cycle accurate architecture simulation on SPEC2000 benchmarks Thermal Evaluation Resistive network model considering white-space and thermal via insertion. 2018/12/4 UCLA VLSICAD LAB

Technology background Wafer bonding 3D IC technologies With flipping the top layer; Without flipping the top layer; (a)  With flipping the top layer (b) Without flipping the top layer A 3D IC example with two device layers 2018/12/4 UCLA VLSICAD LAB

3D Microarchitecture Evaluation Flow Outline Introduction 3D Microarchitecture Evaluation Flow a.  3D thermal model b.  3D floorplanning  c.  IPC model and performance simulation d.  3D global routing and thermal via insertion  Case Study for a Design Driver Conclusions and Future Works 2018/12/4 UCLA VLSICAD LAB

Overview of MEVA-3D 2018/12/4 UCLA VLSICAD LAB

Performance Model and Simulation Flow Core Library Target Frequency Interconnect Architecture Design Constraints: Area, Power, Performance, etc. Benchmarks First Pass Simulation: Block power density estimation Loop sensitivity models Block Power Density Critical Paths with Sensitivities P2P Latency 2D/3D Performance-driven Floorplan Pipeline estimation Thermal estimation MC-sim: Multicore Performance Simulator Communication delay based on communication architecture, p2p delay, queuing delay … Power = cores power + interconnect power thermal simulation Interconnect Power Estimation Interconnect power estimation 2018/12/4 UCLA VLSICAD LAB

3D Thermal Model Thermal Resistive Network [Wilkerson 2004] Device layer partitioned into tiles Tiles connected through thermal resistances Heat sources modeled as current sources Heat sinks of fixed temperature T0 TS vias at the center of the tile (a) Tiles Stack Array (c) Tile Stack Analysis (b) Single Tile Stack R Lateral P 1 b 2 3 4 5 + - This is the kind of thermal model that we use in TMARS. It is a thermal resistive network model based on the well-known analogy between the heat flow and the electrical flow. The whole chip stack. is partitioned into tiles and the tiles are connected through thermal resistances. The heat sources are modeled as current sources. Whether there is a via or not in the tile will change the resistance value. 2018/12/4 UCLA VLSICAD LAB

Microarchitecture Floorplanning with Wire Pipelining Given: target cycle time Tcycle clocking overhead Toverhead list of blocks in the microarchitecture with their area, dimensions and total logic delay set of critical microarchitectural paths with performance sensitivity models for the paths average power density estimates for the blocks. Objective: Generate a floorplan which optimizes for the die area, performance, and maximum on-chip temperature. 2018/12/4 UCLA VLSICAD LAB

3D floorplanner 3D thermal driven floorplan based on TCG representation [Cong 2004] Multi-layer layout Thermal driven Performance evaluation Use simulated annealing with the cost function 2018/12/4 UCLA VLSICAD LAB

IPC Model and Performance Simulation Pipeline evaluation The performance of the micro-architecture in BIPS; IPC performance sensitivity model IPC of a micro-architecture depends on a set of critical processor loops; The information about IPC degradation degree due to extra latency along the path [Sprangle 2002]; Calculate the IPC degradation caused by the extra latency introduced by the interconnects in the layout; Performance simulation After physical design stage, We adapted the SimpleScalar 3.0 tool set for our simulation framework; Provide feedback to the performance estimators in the floorplanning stage; 2018/12/4 UCLA VLSICAD LAB

Thermal Optimization Using Though-the-silicon Vias Two types of TS vias Signal vias, part of the netlists Thermal vias, with no connections, introduced to reduce temperature After floorplanning, we can further reduce the temperature by thermal via insertion. Decrease the maximum temperature by 50% 2018/12/4 UCLA VLSICAD LAB

3D Routing and Thermal Via Planning for 3D ICs Simultaneous routing and thermal via planning method. [Cong 2005] Multilevel routing and via planning framework Via planning during and after routing 2018/12/4 UCLA VLSICAD LAB

Case Study for a Design Driver Outline Introduction 3D Microarchitecture Evaluation Flow a.  3D thermal model b.  3D floorplanning  c.  IPC model and performance simulation d.  3D global routing and thermal via insertion  Case Study for a Design Driver Conclusions and Future Works 2018/12/4 UCLA VLSICAD LAB

Design Example An out-of-order superscalar processor micro-architecture with 4 banks of L2 cache in 70nm technology. 2018/12/4 UCLA VLSICAD LAB

Properties of Design Example The critical paths Baseline processor parameters 2018/12/4 UCLA VLSICAD LAB

3D Design Driver Alpha EV6-like core – 4GHz clock frequency Design Space Exploration without leveraging 3D for individual architectural blocks 2D EV6-like core 3D EV6-like core (2 layers) Wakeup loop : The extra cycle is eliminated. Branch misprediction resolution loop and the L2 cache access latency : Some of the extra cycles are eliminated 2018/12/4 UCLA VLSICAD LAB

The number of extra cycles for critical paths group Freq(Hz) 3G 4G 5G 6G 7G 8G 2D 3D wakeup 1 2 ALU DL1 L2 3 5 MPLAT 2018/12/4 UCLA VLSICAD LAB

Performance for the micro-architecture with 2D and 3D layout at different target frequencies 2018/12/4 UCLA VLSICAD LAB

Maximum On-Chip Temperature HS denotes a heat sink, and the 3D integration allows to insert thermal vias to reduce the temperature. 2018/12/4 UCLA VLSICAD LAB

Thermal Profiles for 2D chip(4Ghz) Temperature distribution in 2D integration. 2018/12/4 UCLA VLSICAD LAB

Thermal Profiles for 3D chip(4Ghz) Temperature distribution in 3D integration with one heat sink. Temperature distribution in 3D integration with two heat sinks and flipped upper layer. 2018/12/4 UCLA VLSICAD LAB

Conclusions 3D integration can eliminate most of the extra cycles incurred by interconnects in 2D micro-processor designs. MEVA-3D can systematically evaluate the 3D architecture both from the performance side and from the thermal side. By evaluating one Alpha-like out-of-order, superscalar microprocessor, we show that 3D integration can improve the performance by 10% with comparable maximum on- chip temperature. 2018/12/4 UCLA VLSICAD LAB

On-going Research 3D Architecture Modeling and Exploration 3D layout can only reduce the global interconnect delay. Additionally, the 3D implement of components can reduce internal delay and power consumption; 3D area, performance and power models for key architectural components including register files, caches and issue queues; Automated tool for microarchitectural and physical floorplanning co-design to explore the use of 2D and/or 3D architecture blocks; Preliminary study shows 20% performance improvement over a 2D architecture. 2018/12/4 UCLA VLSICAD LAB

THANKS 2018/12/4 UCLA VLSICAD LAB