Physical Planning for the Architectural Exploration of Large-Scale Chip Multiprocessors Javier de San Pedro, Nikita Nikitin, Jordi Cortadella and Jordi.

Slides:



Advertisements
Similar presentations
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
Advertisements

Performance Analysis of NUCA Policies for CMPs Using Parsec v2.0 Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research.
Constraint Driven I/O Planning and Placement for Chip-package Co-design Jinjun Xiong, Yiuchung Wong, Egino Sarto, Lei He University of California, Los.
A Novel 3D Layer-Multiplexed On-Chip Network
Hierarchical Floorplanning of Chip Multiprocessors using Subgraph Discovery Javier de San Pedro Jordi Cortadella Antoni Roca Universitat Politècnica de.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Interconnect Complexity-Aware FPGA Placement Using Rent’s Rule G. Parthasarathy Malgorzata Marek-Sadowska Arindam Mukherjee Amit Singh University of California,
Better than the Two: Exceeding Private and Shared Caches via Two-Dimensional Page Coloring Lei Jin and Sangyeun Cho Dept. of Computer Science University.
Paul Falkenstern and Yuan Xie Yao-Wen Chang Yu Wang Three-Dimensional Integrated Circuits (3D IC) Floorplan and Power/Ground Network Co-synthesis ASPDAC’10.
Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Group Research QNoC Electrical Engineering Department Technion.
Coupling-Aware Length-Ratio- Matching Routing for Capacitor Arrays in Analog Integrated Circuits Kuan-Hsien Ho, Hung-Chih Ou, Yao-Wen Chang and Hui-Fang.
Yuchun Ma Joint Work with Jason Cong, Yongxiang Liu, Glenn Reinman, and Yan Zhang International Center for Design on Nanotechnologies Workshop.
CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips CiprianSeiculescu Stavros Volos Naser Khosro Pour Babak Falsafi Giovanni De Micheli.
HK-NUCA: Boosting Data Searches in Dynamic NUCA for CMPs Javier Lira ψ Carlos Molina ф Antonio González ψ,λ λ Intel Barcelona Research Center Intel Labs.
L2 to Off-Chip Memory Interconnects for CMPs Presented by Allen Lee CS258 Spring 2008 May 14, 2008.
Fixed-outline Floorplanning Through Better Local Search
3D CMP and 3D IC Physical Design Flow Jason Cong and Guojie Luo University of California, Los Angeles {cong, cong,
Lei Wang, Yuho Jin, Hyungjun Kim and Eun Jung Kim
The Auction: Optimizing Banks Usage in Non-Uniform Cache Architectures Javier Lira ψ Carlos Molina ψ,ф Antonio González ψ,λ λ Intel Barcelona Research.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Chip Planning 1. Introduction Chip Planning:  Deals with large modules with −known areas −fixed/changeable shapes −(possibly fixed locations for some.
11 Multi-Product Floorplan Optimization Framework for Chip Multiprocessors Marco Escalante 1, Andrew B. Kahng 2, Michael Kishinevsky 1, Umit Ogras 3 and.
Interconnection Networks: Introduction
Authors: Jia-Wei Fang,Chin-Hsiung Hsu,and Yao-Wen Chang DAC 2007 speaker: sheng yi An Integer Linear Programming Based Routing Algorithm for Flip-Chip.
Design and Management of 3D CMP’s using Network-in-Memory Feihui Li et.al. Penn State University (ISCA – 2006)
McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures Runjie Zhang Dec.3 S. Li et al. in MICRO’09.
High-Performance Networks for Dataflow Architectures Pravin Bhat Andrew Putnam.
CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
Report Advisor: Dr. Vishwani D. Agrawal Report Committee: Dr. Shiwen Mao and Dr. Jitendra Tugnait Survey of Wireless Network-on-Chip Systems Master’s Project.
Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.
1 Modern Floorplanning Based on Fast Simulated Annealing Tung-Chieh Chen* and Yao-Wen Chang* # Graduate Institute of Electronics Engineering* Department.
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
Cooperative Caching for Chip Multiprocessors Jichuan Chang †, Enric Herrero ‡, Ramon Canal ‡ and Gurindar S. Sohi * HP Labs † Universitat Politècnica de.
Ob-Chip Networks and Testing1 On-Chip Networks and Testing-II.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
Express Cube Topologies for On-chip Interconnects Boris Grot J. Hestness, S. W. Keckler, O. Mutlu † The University of Texas at Austin † Carnegie Mellon.
CSE 661 PAPER PRESENTATION
Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.
Impact of Interconnect Architecture on VPSAs (Via-Programmed Structured ASICs) Usman Ahmed Guy Lemieux Steve Wilton System-on-Chip Lab University of British.
CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.
Last Bank: Dealing with Address Reuse in Non-Uniform Cache Architecture for CMPs Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research.
Test Architecture Design and Optimization for Three- Dimensional SoCs Li Jiang, Lin Huang and Qiang Xu CUhk Reliable Computing Laboratry Department of.
A Stable Fixed-outline Floorplanning Method Song Chen and Takeshi Yoshimura Graduate School of IPS, Waseda University March, 2007.
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 1 Course Overview Mustafa Ozdal Computer Engineering Department, Bilkent University.
Analysis of NUCA Policies for CMPs Using Parsec Benchmark Suite Javier Lira ψ Carlos Molina ф Antonio González λ λ Intel Barcelona Research Center Intel.
Authors – Jeahyuk huh, Doug Burger, and Stephen W.Keckler Presenter – Sushma Myneni Exploring the Design Space of Future CMPs.
Simultaneous Analog Placement and Routing with Current Flow and Current Density Considerations H.C. Ou, H.C.C. Chien and Y.W. Chang Electronics Engineering,
Enabling System-Level Modeling of Variation-Induced Faults in Networks-on-Chips Konstantinos Aisopos (Princeton, MIT) Chia-Hsin Owen Chen (MIT) Li-Shiuan.
University of Michigan, Ann Arbor
Networks-on-Chip (NoC) Suleyman TOSUN Computer Engineering Deptartment Hacettepe University, Turkey.
Routability-driven Floorplanning With Buffer Planning Chiu Wing Sham Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University.
1ISPD'03 Process Variation Aware Clock Tree Routing Bing Lu Cadence Jiang Hu Texas A&M Univ Gary Ellis IBM Corp Haihua Su IBM Corp.
Topology-aware QOS Support in Highly Integrated CMPs Boris Grot (UT-Austin) Stephen W. Keckler (NVIDIA/UT-Austin) Onur Mutlu (CMU) WIOSCA '10.
International Symposium on Physical Design San Diego, CA April 2002ER UCLA UCLA 1 Routability Driven White Space Allocation for Fixed-Die Standard-Cell.
High-Bandwidth Packet Switching on the Raw General-Purpose Architecture Gleb Chuvpilo Saman Amarasinghe MIT LCS Computer Architecture Group January 9,
1 Floorplanning of Pipelined Array (FoPA) Modules using Sequence Pairs Matt Moe Herman Schmit.
Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs Graham Schelle and Dirk Grunwald University of Colorado at Boulder.
LRU-PEA: A Smart Replacement Policy for NUCA caches on Chip Multiprocessors Javier Lira ψ Carlos Molina ψ,ф Antonio González ψ,λ λ Intel Barcelona Research.
Interconnect Characteristics of 2.5-D System Integration Scheme Yangdong (Steven) Deng & Wojciech P. Maly
The Early Days of Automatic Floorplan Design
Performance Model for Future Multicore Process Designs Yipkei Kwok 02/06/2008.
Technische Universität München Institute for Electronic Design Automation PLATON: A Force-Directed Placement Algorithm for 3D Optical Networks-on- Chip.
3Boston University ECE Dept.;
Pablo Abad, Pablo Prieto, Valentin Puente, Jose-Angel Gregorio
On-Time Network On-chip
Verilog to Routing CAD Tool Optimization
An Automated Design Flow for 3D Microarchitecture Evaluation
Presentation transcript:

Physical Planning for the Architectural Exploration of Large-Scale Chip Multiprocessors Javier de San Pedro, Nikita Nikitin, Jordi Cortadella and Jordi Petit Universitat Politècnica de Catalunya (Barcelona) Project funded by Intel Corp.

Outline Introduction – Architectural exploration of CMPs Physical planning – Integration with architectural exploration – Floorplanning and wire planning technology Results and conclusions NOCS 20132

Designing a Chip Multiprocessor NOCS CMPCMP Off-Chip Memory DSPDSP GraphicsGraphics Data Mining BioinformaticsBioinformatics How many cores? How much L2/L3 on-chip cache? Interconnect: mesh/ring/bus? How many memory controllers?

Automated exploration Models (performance/power) Cores On-chip caches Off-chip memories Interconnect fabrics Cache protocol Workloads Architectural configuration Number of cores Cluster size L2/L3 size Hierarchical interconnect Memory controllers Exploration tool Constraints Area Throughput Power NOCS Huge design space: Billions of configurations

Our exploration flow NOCS 2013 Simulation Analytical Modeling* 5 Architectural configurations Promising configurations *N. Nikitin et al. "Analytical Performance Modeling of Hierarchical Interconnect Fabrics.“ NOCS 2012

Configuration example NOCS C2C2 L2L2 C2C2 L2L2 C2C2 L2L2 L3L3 Bus C2C2 L2L2 C2C2 L2L2 NINI RR C1C1 L2L2 - 6x4 mesh, 24 clusters - total 144 cores - 6 cores/cluster - 1 C1, 128K L1, 256K L2 - 5 C2, 64K L1, 96K L Mb total shared L3 Throughput = IPC - 6x4 mesh, 24 clusters - total 144 cores - 6 cores/cluster - 1 C1, 128K L1, 256K L2 - 5 C2, 64K L1, 96K L Mb total shared L3 Throughput = IPC

Min area, but … Unbalanced aspect ratio L2 not adjacent to cores Uneven distribution of ring routers Motivation for physical planning NOCS CC L2L2 CC L2L2 CC L2L2 CC L2L2 rrrr rrrr rr rr RR L3L3 NSWE Block area is used during exploration

Abutability in tiled CMPs NOCS 2013 N S E W 8

Over-the-cell routing ISPD 2013Tiled CMPs Cache Core 9  wires Router 1000 wires  10 µm

Physical planning NOCS 2013 Simulation Analytical Modeling L3L3 RR Local IC L2L2 CC CCL2L2 L3 10 Physical Planning Estimations: Area Wirelength Routability

Floorplanner Slicing structures & Simulated Annealing* Lightweight maze router Constraints: – Adjacency (Core  L2) – Balanced links (rings) – Maximum length for critical nets NOCS D.F. Wong and C.L. Liu, “A New Algorithm for Floorplan Design” DAC 1986, pages V HH V

Wire planner SAT-based approach for gridded routing Grid unit: link width (  wires) Customizable for any type of Boolean-encoded constraints (abutability, 1D/2D routing, …) NOCS 2013 Top view Cross-section view 12

Filtering floorplans NOCS Area limit

After physical planning NOCS

Conclusions Physical planning has significant impact during architectural exploration of tiled CMPs Future work: – New topologies (mesh of rings) – Multi-module memory floorplanning – Regularity and choppability NOCS