3D CMP and 3D IC Physical Design Flow Jason Cong and Guojie Luo University of California, Los Angeles {cong, cong,

Slides:



Advertisements
Similar presentations
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
Advertisements

A Novel 3D Layer-Multiplexed On-Chip Network
A T HERMAL -D RIVEN F LOORPLANNING A LGORITHM FOR 3D IC S Jason Cong, Jie Wei, and Yan Zhang ICCAD
Architectural Improvement for Field Programmable Counter Array: Enabling Efficient Synthesis of Fast Compressor Trees on FPGA Alessandro Cevrero 1,2 Panagiotis.
3D-STAF: Scalable Temperature and Leakage Aware Floorplanning for Three-Dimensional Integrated Circuits Pingqiang Zhou, Yuchun Ma, Zhouyuan Li, Robert.
National Tsing Hua University Po-Yang Hsu,Hsien-Te Chen,
Paul Falkenstern and Yuan Xie Yao-Wen Chang Yu Wang Three-Dimensional Integrated Circuits (3D IC) Floorplan and Power/Ground Network Co-synthesis ASPDAC’10.
NATIONAL INSTITUTE OF SCIENCE & TECHNOLOGY Presented by: Susman Das Technical Seminar Presentation FPAA for Analog Circuit Design Presented by Susman.
Yuchun Ma Joint Work with Jason Cong, Yongxiang Liu, Glenn Reinman, and Yan Zhang International Center for Design on Nanotechnologies Workshop.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
IP I/O Memory Hard Disk Single Core IP I/O Memory Hard Disk IP Bus Multi-Core IP R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Networks.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Chapter 17 Parallel Processing.
Programmable logic and FPGA
Murali Vijayaraghavan MIT Computer Science and Artificial Intelligence Laboratory RAMP Retreat, UC Berkeley, January 11, 2007 A Shared.
Orion: A Power-Performance Simulator for Interconnection Networks Presented by: Ilya Tabakh RC Reading Group4/19/2006.
RLC Interconnect Modeling and Design Students: Jinjun Xiong, Jun Chen Advisor: Lei He Electrical Engineering Department Design Automation Group (
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Interconnection and Packaging in IBM Blue Gene/L Yi Zhu Feb 12, 2007.
Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.
Design Tools, Flows and Library Aspects during the FE-I4 Implementation on Silicon Vladimir Zivkovic National Institute for Subatomic Physics Amsterdam,
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
A Topology-based ECO Routing Methodology for Mask Cost Minimization Po-Hsun Wu, Shang-Ya Bai, and Tsung-Yi Ho Department of Computer Science and Information.
TLC: Transmission Line Caches Brad Beckmann David Wood Multifacet Project University of Wisconsin-Madison 12/3/03.
1 University of Utah & HP Labs 1 Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 Naveen Muralimanohar Rajeev Balasubramonian.
On-Chip Networks and Testing
Physical Planning for the Architectural Exploration of Large-Scale Chip Multiprocessors Javier de San Pedro, Nikita Nikitin, Jordi Cortadella and Jordi.
Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)
Avogadro-Scale Engineering: Form and Function MIT, November 18, Three Dimensional Integrated Circuits C.S. Tan, A. Fan, K.N. Chen, S. Das, N.
Global Routing.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
Three-Dimensional Layout of On-Chip Tree-Based Networks Hiroki Matsutani (Keio Univ, Japan) Michihiro Koibuchi (NII, Japan) D. Frank Hsu (Fordham Univ,
Abhishek Pandey Reconfigurable Computing ECE 506.
Automated Design of Custom Architecture Tulika Mitra
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
Thermal-aware Steiner Routing for 3D Stacked ICs M. Pathak and S.K. Lim Georgia Institute of Technology ICCAD 07.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
CHAPTER 8 Developing Hard Macros The topics are: Overview Hard macro design issues Hard macro design process Physical design for hard macros Block integration.
1 Interconnect/Via. 2 Delay of Devices and Interconnect.
Simultaneous Analog Placement and Routing with Current Flow and Current Density Considerations H.C. Ou, H.C.C. Chien and Y.W. Chang Electronics Engineering,
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Update on the Design Implementation Methodology for the 130nm process Microelecronics User Group meeting TWEPP 2010 – Aachen Sandro Bonacini CERN PH/ESE.
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
Greg Alkire/Brian Smith 197 MAPLD An Ultra Low Power Reconfigurable Task Processor for Space Brian Smith, Greg Alkire – PicoDyne Inc. Wes Powell.
Multi-objective Topology Synthesis and FPGA Prototyping Framework of Application Specific Network-on-Chip m Akram Ben Ahmed Xinyu LI, Omar Hammami.
System in Package and Chip-Package-Board Co-Design
By Islam Atta Supervised by Dr. Ihab Talkhan
Overview of VLSI 魏凱城 彰化師範大學資工系. VLSI  Very-Large-Scale Integration Today’s complex VLSI chips  The number of transistors has exceeded 120 million 
PERFORMANCE EVALUATION OF LARGE RECONFIGURABLE INTERCONNECTS FOR MULTIPROCESSOR SYSTEMS Wim Heirman, Iñigo Artundo, Joni Dambre, Christof Debaes, Pham.
Mohamed ABDELFATTAH Andrew BITAR Vaughn BETZ. 2 Module 1 Module 2 Module 3 Module 4 FPGAs are big! Design big systems High on-chip communication.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Runtime Reconfigurable Network-on- chips for FPGA-based systems Mugdha Puranik Department of Electrical and Computer Engineering
Mohamed Abdelfattah Vaughn Betz
Partial Reconfigurable Designs
An Automated Design Flow for 3D Microarchitecture Evaluation
A High Performance SoC: PkunityTM
Presentation transcript:

3D CMP and 3D IC Physical Design Flow Jason Cong and Guojie Luo University of California, Los Angeles {cong, cong,

2 Outline u Design Driver  3D Chip Multiprocessor Based on OpenRISC 1200 Based on OpenRISC 1200  NoC Interconnect  RF Reconfigurable Interconnects u Physical Design Flow  Design Flow for 3DM2  Design Flow in Development

3 3D Chip Multiprocessor (CMP) u Three Silicon Layers  Tier 3: Cache Data Components  Tier 2: Interconnect and Cache Tags  Tier 1: Cores u Non-Uniform Cache Access  Cores see different latencies to different cache banks  Data can migrate among distributed caches Can hide latency Can hide latency Adds interconnect traffic Adds interconnect traffic Heat sink

4 3DM2 - MITLL.18um 3D SOI Technology

5 3D CMP Test Chip Architecture u Using OpenRISC 1200   Open source in-order RISC uniprocessor  Has been tested in silicon and runs Linux  Simple core used due to test chip area constraints u MIT Lincoln Labs process  180nm, 25mm 2 x 3 tiers  Taped out on Nov 2006 L1 Inst L1 Data Priority Arbiter JTAG Debug Interface UART RS-232 L2 Cache Onchip RAM

6 3D CMP NoC Interconnect u One example NoC using two 5-port routers u Short vertical links to local L2 slices u Links to NoC fabric for remote L2 traffic Core L2 R R

7 3D Reconfigurable Interconnects u 3D Integration  Targets interconnect latency by reducing wirelength u RF Interconnects  Frequency-Division Multiple Access (FDMA)  Targets interconnect congestion by improving bandwidth Multiple signals can occupy a common interconnect Multiple signals can occupy a common interconnect  Further potential to dynamically tune frequencies Adapt to different communication patterns Adapt to different communication patterns Interconnect density can be reduced while minimizing performance impact Interconnect density can be reduced while minimizing performance impact Core A Bank 0 Core B Bank 1 Core C Bank 2 Core D Bank 3 AB 01 CD AB CD 0 A 1 B 2 C 3 D One shared RF Fabric can be configured to a wide range of topologies.

8 Carrier Frequency u On/off digital switching noise main source of noise couple to RF Interconnect u Higher freq carriers avoid all the base-band digital noise u Clock rate of future CPU not exceeding 4-5GHz (due to power consumption issues) u Bandwidth Base-band noise will be around the clock rate u We need to pick a freq far away from the noise u => f 1 = 8GHZ, f 2 = 16 GHz, f 3 = 24GHz, f 4 = 32GHz

9 Bi-Directional FDMA-Link/Bus Bi-directional Link Bi-directional Bus Advantages: Higher combined data rate Simultaneous, bi-directional communications Re-configurable between bands Low in-band coupling for parallel bus Potentially with fewer I/O pins and smaller routing area

10 FDMA-I I/O Data Eye Diagram

11 3D CMP Roadmap u 3D CMP with direct interconnects  Four OR1200 cores, four shared L2 cache banks, and a simple, static interconnect topology – implemented on an FPGA first and then fabricated at MIT LL u Simulation infrastructure to explore NUCA and RF design space  Dynamic adaptation of RF interconnect to a diverse set of multithreaded and multitasking applications u FPGA prototyping of core, bus structures, NUCA and RF  Choose the best power/performance point in the design space u Final implementation on a 3D process (MIT-LL)

12 Physical Design Flow for 3DM2 (1/2) Partition netlist Floorplan P/G network Place Trial Route (1)Place Macros (Memory) (2)Place Clock Via (3)Plan Signal Via Region RTL Synthesis RC extraction Clock Tree Route Routing Congestion Timing Constraint Layout Place P/G viaAlign Signal Via Match min/max Phase Delay on 3 tiers RC extraction DRC, LVS

13 Physical Design Flow for 3DM2 (2/2) u Most 3D features are handled manually u Ask for more 3D CAD tools

14 Thermal-Aware 3D Physical Design Flow Netlist (LEFDEF) Design constraints Technology CIF/GDSII ParasiticExtraction Thermal Simulation Simulation TimingAnalysis Thermal-Driven 3D Floorplanner 3D Floorplanner Thermal-Aware 3D Router w/ Thermal Via Planning OpenAccessOpenAccess Thermal-Driven 3D Placement CompactThermalmodelCompactThermalmodel LayoutVerification

15 R lateral Thermal Resistive Network [Wilkerson04] u u Circuit stack partitioned into tiles u u Tiles connected through thermal resistances   Lateral resistances: fixed   Vertical resistances  1/#via u u Heat sources modeled as current sources   Current value = power u u Heat sinks modeled as ground nodes (a) Tiles stack array (b) Single tile stack P1P1 R2R2 R3R3 R4R4 P4P4 P3P3 P2P2 R1R  R5R5 P5P5 5 Accurate and slow

16 Thermal-Aware 3D Floorplanning [ICCAD04] u Simulated Annealing (SA) Engine  New local z-neighbor operations  Cost function nwl  normalized wirelength nwl  normalized wirelength narea  normalized chip area narea  normalized chip area nvc  normalized interlayer via number nvc  normalized interlayer via number c T  temperature cost c T  temperature cost u Hybrid Thermal Evaluation  At each move ― uses simplified chain model  At each SA temperature drop ― the resistive network model

17 3D Placement via Transformation [ASPDAC 07] u Idea  Start from 2D placement  Heuristic 2D to 3D transformation Reduce long nets Reduce long nets Keep local connecting nets Keep local connecting nets  Window-based transformation balance WL and #via balance WL and #via  RCN graph based refinement Reduce #via and tempreture Reduce #via and tempreture

18 Multilevel TS-Via Planning and 3D Routing [ASPDAC’05 & ICCAD’05] u Alternating Direction TS-Via Planning  Decompose the NLP into simplified sub-problems u In a multi-level framework with routing I1I1 R 2 =  /a 2 R 3 =  /a 3 R 4 =  /a 4 I4I4 I3I3 I2I2 R1R 

19 OpenAccess extension for 3D design u Define additional 3D info.  Device Layer  Inter-layer via u Provide interface for 3D cad tool  Parameter extraciton  Timing  LVS u Compatible with Cadence Encounter 2D OA AppDef OA Gear Wraper 3D cad toolsCadence

20 Summary u Design Driver  3D Chip Multiprocessor Based on OpenRISC 1200 Based on OpenRISC 1200  NoC Interconnect  RF Reconfigurable Interconnects u Physical Design Flow  Design Flow for 3DM2  Design Flow in Development

THE END Thank You!