© 2006 IBM Corporation 0. IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center.

Slides:



Advertisements
Similar presentations
18 July 2001 Work In Progress – Not for Publication 2001 ITRS Test Chapter ITRS Test ITWG Mike Rodgers Don Edenfeld.
Advertisements

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
Keeping Hot Chips Cool Ruchir Puri, Leon Stok, Subhrajit Bhattacharya IBM T.J. Watson Research Center Yorktown Heights, NY Circuits R-US.
THE RAW MICROPROCESSOR: A COMPUTATIONAL FABRIC FOR SOFTWARE CIRCUITS AND GENERAL- PURPOSE PROGRAMS Taylor, M.B.; Kim, J.; Miller, J.; Wentzlaff, D.; Ghodrat,
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
MICROELETTRONICA Design methodologies Lection 8. Design methodologies (general) Three domains –Behavior –Structural –physic Three levels inside –Architectural.
The Design Process Outline Goal Reading Design Domain Design Flow
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
Spring 08, Jan 15 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
Chapter Hardwired vs Microprogrammed Control Multithreading
CAD and Design Tools for On- Chip Networks Luca Benini, Mark Hummel, Olav Lysne, Li-Shiuan Peh, Li Shang, Mithuna Thottethodi,
1 Evgeny Bolotin – ICECS 2004 Automatic Hardware-Efficient SoC Integration by QoS Network on Chip Electrical Engineering Department, Technion, Haifa, Israel.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Research Directions for On-chip Network Microarchitectures Luca Carloni, Steve Keckler, Robert Mullins, Vijay Narayanan, Steve Reinhardt, Michael Taylor.
From Concept to Silicon How an idea becomes a part of a new chip at ATI Richard Huddy ATI Research.
Module I Overview of Computer Architecture and Organization.
(1) Introduction © Sudhakar Yalamanchili, Georgia Institute of Technology, 2006.
Single-Chip Multi-Processors (CMP) PRADEEP DANDAMUDI 1 ELEC , Fall 08.
Computer performance.
1 Chapter 2. The System-on-a-Chip Design Process Canonical SoC Design System design flow The Specification Problem System design.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Chap. 1 Overview of Digital Design with Verilog. 2 Overview of Digital Design with Verilog HDL Evolution of computer aided digital circuit design Emergence.
Research on Analysis and Physical Synthesis Chung-Kuan Cheng CSE Department UC San Diego
CAD for Physical Design of VLSI Circuits
1 Lecture 1: CS/ECE 3810 Introduction Today’s topics:  Why computer organization is important  Logistics  Modern trends.
Multi-Core Architectures
Winter 2004 Class Representation For Advanced VLSI Course Instructor : Dr S.M.Fakhraie Presented by : Naser Sedaghati Major Reference : Design and Implementation.
Automated Design of Custom Architecture Tulika Mitra
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
Design Verification An Overview. Powerful HDL Verification Solutions for the Industry’s Highest Density Devices  What is driving the FPGA Verification.
A New Method For Developing IBIS-AMI Models
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
High-Performance Computing An Applications Perspective REACH-IIT Kanpur 10 th Oct
1 5. Application Examples 5.1. Programmable compensation for analog circuits (Optimal tuning) 5.2. Programmable delays in high-speed digital circuits (Clock.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Design methodologies.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Chonnam national university VLSI Lab 8.4 Block Integration for Hard Macros The process of integrating the subblocks into the macro.
ESL and High-level Design: Who Cares? Anmol Mathur CTO and co-founder, Calypto Design Systems.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
ASIC, Customer-Owned Tooling, and Processor Design Nancy Nettleton Manager, VLSI ASIC Device Engineering April 2000 Design Style Myths That Lead EDA Astray.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
CHAPTER 8 Developing Hard Macros The topics are: Overview Hard macro design issues Hard macro design process Physical design for hard macros Block integration.
- 1 - ©2009 Jasper Design Automation ©2009 Jasper Design Automation JasperGold for Targeted ROI JasperGold solutions portfolio delivers competitive.
Dec 1, 2003 Slide 1 Copyright, © Zenasis Technologies, Inc. Flex-Cell Optimization A Paradigm Shift in High-Performance Cell-Based Design A.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Enabling System-Level Modeling of Variation-Induced Faults in Networks-on-Chips Konstantinos Aisopos (Princeton, MIT) Chia-Hsin Owen Chen (MIT) Li-Shiuan.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
System-on-Chip Design Hao Zheng Comp Sci & Eng U of South Florida 1.
Chapter 11 System-Level Verification Issues. The Importance of Verification Verifying at the system level is the last opportunity to find errors before.
Computer Organization Yasser F. O. Mohammad 1. 2 Lecture 1: Introduction Today’s topics:  Why computer organization is important  Logistics  Modern.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
Real-Time System-On-A-Chip Emulation.  Introduction  Describing SOC Designs  System-Level Design Flow  SOC Implemantation Paths-Emulation and.
CS203 – Advanced Computer Architecture
1 The user’s view  A user is a person employing the computer to do useful work  Examples of useful work include spreadsheets word processing developing.
MAHARANA PRATAP COLLEGE OF TECHNOLOGY SEMINAR ON- COMPUTER PROCESSOR SUBJECT CODE: CS-307 Branch-CSE Sem- 3 rd SUBMITTED TO SUBMITTED BY.
April 15, 2013 Atul Kwatra Principal Engineer Intel Corporation Hardware/Software Co-design using SystemC/TLM – Challenges & Opportunities ISCUG ’13.
1 Design and Implementation of the POWER5 Microprocessor J. Clabes 1, J. Friedrich 1, M. Sweet 1, J DiLullo 1, S. Chu 1, D. Plass 2, J. Dawson 2, P. Muench.
System-on-Chip Design
Lynn Choi School of Electrical Engineering
ASIC Design Methodology
Architecture & Organization 1
Architecture & Organization 1
A High Performance SoC: PkunityTM
HIGH LEVEL SYNTHESIS.
Presentation transcript:

© 2006 IBM Corporation 0

IBM Research © 2007 IBM Corporation Multi-Core Design Automation Challenges John Darringer IBM T. J. Watson Research Center Yorktown Heights, NY, USA DAC 2007

© 2006 IBM Corporation 2  Scaling no longer provides traditional performance boost  Power limits everything  Advances will come from entire performance stack Technology Chip Level System Level Application Dynamic optimization Assist Threads Fast Computation Power Optimization Compiler Support Packaging, Cooling New Devices Dense SRAM, eDRAM Optics Memory Languages, Software Tuning Efficient Programming Middleware System Performance Requires An Integrated Approach Compiler Support Multiple Cores SMT Accelerators Power Management Interconnect Circuits Recent Historical Trend Device Performance Production Date FPG 100

© 2006 IBM Corporation 3 Innovation in System Design Power 4 Multi-Core-2001 Power 5 Multi-Thread-2004 CELL Accelerators-2006 Power Ghz-2007

© 2006 IBM Corporation 4 Trend to Modular Application Optimized Systems  Growing use of diverse modular components  Chip integration may evolve to component assembly  Challenge is in system-level design –Optimizing architecture for specific applications Core Accelerator Cache Blades SMP... Memory

© 2006 IBM Corporation 5 Multi-Core ASICs  Multi-core ASIC SoCs are common today –Address broad range of markets –Enables high functional integration –Provides rapid time to market  One example from 2004 –Cisco Silicon Packet Processor – bit RISC processors –47 BIPS

© 2006 IBM Corporation 6 Multi-Core Processors  Power efficient, reusable cores  Application matched accelerators  Flexible scaleable interconnect  Optimized memory hierarchy  High speed I/O  Energy management  Deliver system performance  Rapid chip assembly to serve diverse markets

© 2006 IBM Corporation 7 CHALLENGE  System Design –Continued performance growth –Increasing power efficiency –Optimizing for new applications  Design Automation – Custom design efficiency – AISC productivity – Design and verification  Enablers – Physical Architecture – Integrated Early Analysis – Multi-Core Verification

© 2006 IBM Corporation 8 Physical Architecture  Complement logical architecture  Streamline chip integration  Plan for interconnect  Provide predictable results  Multiple strategies –Fixed layout per block –Parametric or generated –Extended synthesis Example Logical Architecture Example Physical Architecture

© 2006 IBM Corporation 9 Modular Components  Components need self-contained vertical stack – with clean interfaces to enable automated integration Component Fabric Interface Component Function Future Component Current “Component” Mixed Fabric and Component Function; Custom Interface Future Chips Current Chips Automated connection with parametric fabric Custom crafting of clock, data, and power meshes

© 2006 IBM Corporation 10 Custom Design  Careful interconnect design –Communication –Clock distribution –Power and ground  Better power efficiency –Clock gating, Power gating –Detailed transistor sizing  High bandwidth memory and I/O  Higher frequency operation

© 2006 IBM Corporation 11 Challenges of Modular Design Core  Custom Layout – Flexible shape and orientation – Optimum mesh for power and clock – Distributed communication and test – Manually optimized  Modular Layout – Constrained shape and orientation – Separate power and clock per core – Parametric interconnect fabric – Automatic connection to fabric

© 2006 IBM Corporation 12 Custom Clock Design  Distribution network –Latches and clocked gates –Control skew and jitter –Minimize power –Survive variation and noise  Interconnect models –Inductance critical –Transmission line –Buffer placement  Hand optimized –Still an art Phillip Restle

© 2006 IBM Corporation 13 Custom Power Distribution  Distribute to all devices  Multiple voltage domains  Simulate detailed power demand  Model chip and package  Consider ground coupling  Balance mesh and trees  Allocate decoupling capacitors  Focus on resonant frequency  Explore clock/power gating scenarios Howard Chen

© 2006 IBM Corporation 14 Challenges of Modular Design  Custom Wiring – Optimized over chip – Resources shared – Variation minimized – Complex analysis and integration  Modular Wiring – Optimized at block level – Fixed resource allocation – Some variation in results – Requires automated integration

© 2006 IBM Corporation 15 Spectrum of Strategies Fixed physical architecture  Careful block design  Custom within block  Automated block connect  Predictable results  Good for planned cases  Stresses design Modular Reuse Extended Synthesis Generated physical architecture  More abstract layout  Heavy physical synthesis  Unique block configuration  Results will vary  Flexible restructuring  Stresses tools Fixed Layout…. Parametric….. Generated

© 2006 IBM Corporation 16 Systems Demand Early Analysis  To explore many more options –Cores, Accelerators, Interconnect, Memory Hierarchy, …  To consider many design criteria simultaneously –Power, Performance, Latency, Hotspots, Reliability, …  To optimize system for specific market  Environment exists for early functional modeling  But today’s tools are not linked to physical design

© 2006 IBM Corporation 17 Early System Analysis Performance Models Design Power Analysis Technology Thermal Analysis Package Implementation Interconnect Analysis Floorplan Assumptions Design Team  Loosely coupled disciplines with multiple experts and distinct models

© 2006 IBM Corporation 18 Performance Modeling Is Changing  New parallel workloads emerging –Execution vs. trace driven  Shifting to multi-core designs –Stresses balance of model performance and accuracy  Complex interconnect fabric and memory hierarchy –Bus, switch, network, asynchronous,…  Increasing use of SystemC –For early software development and component sharing

© 2006 IBM Corporation 19 Early Physical Planning is Essential  Interconnect requires full chip layout –Estimate component area before implementation –Need more accurate methods –Have to plan for all facilities to predict chip size  Placement coupled to many factors –Interconnect performance –Power –Thermal and reliability concerns –Yield

© 2006 IBM Corporation 20 Interconnect Fabric Modeling Interconnects in Multi-Core Designs Memory Controller Core Cache Core Cache Core Cache Core Async/Sync Interface with Parametric delay Interconnect Delays  Interconnect delays – Effect performance – Depend on placement – Require accurate modeling

© 2006 IBM Corporation 21 Power is Key Criteria, but Hard to Predict  Need estimate before implementation –Voltage/Frequency scaling, Voltage islands, clock gating, leakage  Not just core, but many diverse chip components –Core, cache, interconnect, controllers, I/O, pervasive  Model “interesting” states and transitions  Scale known implementations –Complex measurement process for calibration –Requires data from chip layout

© 2006 IBM Corporation 22 Integrated Early System Analysis Implementation Design Floorplan Package Technology Assumptions Results Performance Power Interconnect Thermal Optimize Handoff Design Team  Couple all forms of early analysis  Share data in central repository  Industry standard data model – Open Access  Hand-off to chip integration – Assumptions, blocks, layout, …  Graphic interface for editing  Stage is set for optimization

© 2006 IBM Corporation 23 Multi-Core Verification  Verification has always been the greatest challenge  Complexity grows with each generation  Challenge is to exploit reuse with multi-core designs –Requires clear interface definition Core Core Verification System Verification Traditional ApproachMulti-Core Approach

© 2006 IBM Corporation 24 Core Verification  Complexity growing –Clock/Power gating, Voltage and frequency scaling  Formal methods are used –Checking RTL = netlist –Checking assertions –Proving implementation equivalent to reference model  Simulation still dominates  Need higher level of specification –Improve quality –Stretch synthesis and verification tools  Reuse verification environment

© 2006 IBM Corporation 25 System Verification  More complex systems –Many cores, accelerators, networks, asynchronous links  Memory and network contention is critical area  Formal methods have made impact –Verifying abstract memory protocols  Simulation is still the final check  Need system-level test case generation –Use system knowledge to expose resource contention issues

© 2006 IBM Corporation 26 Summary  Exciting and challenging times –Designing application optimized multi-core systems –Delivering custom efficiency with ASIC productivity  Focus areas –Physical Architecture to streamline chip integration –Integrated Early Analysis to explore design space –Multi-core verification that exploits reuse  Long history of invention in today’s RTL flow  Innovation is needed now at the system level

© 2006 IBM Corporation 27 Acknowledgements  Thanks to the following people –Emrah Acar, Reinaldo Bergamaschi, Pradip Bose, Howard Chen, Nagu Dhanwada, Steven German, Steve Kosonocky, Indira Nair, Ruchir Puri, Phillip Restle, Albert Ruehli, Michael Vinov.