Some Thoughts on Technology and Strategies for Petaflops.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

Device Tradeoffs Greg Stitt ECE Department University of Florida.
CompE 460 Real-Time and Embedded Systems Lecture 2 – Interview and Design Process.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
Types of Parallel Computers
Lecture 2: Modern Trends 1. 2 Microprocessor Performance Only 7% improvement in memory performance every year! 50% improvement in microprocessor performance.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Week 1- Fall 2009 Dr. Kimberly E. Newman University of Colorado.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Seven Minute Madness: Special-Purpose Parallel Architectures Dr. Jason D. Bakos.
11/14/05ELEC Fall Multi-processor SoCs Yijing Chen.
Configurable System-on-Chip: Xilinx EDK
6/30/2015HY220: Ιάκωβος Μαυροειδής1 Moore’s Law Gordon Moore (co-founder of Intel) predicted in 1965 that the transistor density of semiconductor chips.
ECE 526 – Network Processing Systems Design
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Implementation of DSP Algorithm on SoC. Characterization presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompany engineer : Emilia Burlak.
Chapter 6 Memory and Programmable Logic Devices
Heterogeneous Computing Dr. Jason D. Bakos. Heterogeneous Computing 2 “Traditional” Parallel/Multi-Processing Large-scale parallel platforms: –Individual.
3.1Introduction to CPU Central processing unit etched on silicon chip called microprocessor Contain tens of millions of tiny transistors Key components:
COM181 Computer Hardware Ian McCrumRoom 5B18,
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
Future FPGA Development Duane McDonald Digital Electronics 3.
Introduction to Reconfigurable Computing Greg Stitt ECE Department University of Florida.
Delevopment Tools Beyond HDL
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
Current Computer Architecture Trends CE 140 A1/A2 29 August 2003.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
SYSTEM-ON-CHIP (SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.
1 Recap (from Previous Lecture). 2 Computer Architecture Computer Architecture involves 3 inter- related components – Instruction set architecture (ISA):
VLSI & ECAD LAB Introduction.
J. Christiansen, CERN - EP/MIC
IEEE ICECS 2010 SysPy: Using Python for processor-centric SoC design Evangelos Logaras Elias S. Manolakos {evlog, Department of Informatics.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Introduction to Reconfigurable Computing Greg Stitt ECE Department University of Florida.
VLSI DESIGN CONFERENCE 1998 TUTORIAL Embedded System Design and Validation: Building Systems from IC cores to Chips Rajesh Gupta University of California,
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
Computer Organization & Assembly Language © by DR. M. Amer.
System On Chip Devices for High Performance Computing Design Automation Conference 2015 System On Chip Workshop Noel Wheeler
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Cray XD1 Reconfigurable Computing for Application Acceleration.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Programmable Logic Devices
Programmable Hardware: Hardware or Software?
Constructing a system with multiple computers or processors
FPGAs in AWS and First Use Cases, Kees Vissers
Introduction to Reconfigurable Computing
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
A High Performance SoC: PkunityTM
Introduction to Embedded Systems
Constructing a system with multiple computers or processors
Chapter 1 Introduction.
Computer Evolution and Performance
® IRL Solutions File Number Here.
Types of Parallel Computers
Presentation transcript:

Some Thoughts on Technology and Strategies for Petaflops

Rick Stevens Argonne  Chicago Possible paths to Petaflops Traditional Commodity Clusters Leverage Moore’s law on GP microprocessors Interconnect and memory bandwidth problems Type C machines DARPA HPCS paths (e.g. Cascade etc.) Embedded systems based Clusters QCDOC one example BG/L another example

Rick Stevens Argonne  Chicago Beyond Commodity Clusters Improved design capability Small groups can design SoCs Small groups can gain access to state of the art fabrication capabilities Design cycles are getting shorter thanks to increasing availability of off-the-shelf IP Blue Logic, MIPS, etc. QCDOC example

Rick Stevens Argonne  Chicago

Rick Stevens Argonne  Chicago Hardware/Software Co-design Application kernels Simple “FORTRAN” like C code - well behaved basic blocks with performance requirement annotations Compiler builds performance model for each basic block Decision point based on performance estimate Compile for GPU or synthesize logic/FGPA code Generate glue code/runtime

Rick Stevens Argonne  Chicago Special purpose SoCs Networking Processing Units Core of fast IP switches and routers Many companies producing 10Gbps components and moving towards 40 Gbps parts DSPs Cell phone base stations.. Signal processing and array on a chip processors Example is 2 GHz, 175 Million transistors 64 processor DSP array, several hundred dollars a chip in quantities of 1,000.

Rick Stevens Argonne  Chicago Graphics Accelerators NVIDIA Geforce4 example > 100 M transistors High-speed (QDR) RAM interface > 10 GBps Moving towards General purpose processors Cg programming language (programmable shaders) Evolving to become faster than the main CPU on a commodity based node Pentium or Itanium2 process becomes a service processor?

Rick Stevens Argonne  Chicago Extendable Cores Possible target for HPC Hardware/Software Co- design Provides a reconfigurable node platform Xilinx virtex-pro Multiple PowerPC cores (1-4) Millions of gates of FPGA Clock rates lag high-performance chips Other vendors producing similar things MIPS cores, SPARClite cores, etc.

Rick Stevens Argonne  Chicago Billion Transistor Dies by 2005/6 Design challenges and opportunities Many 32 bit cores available < 500,000 transistors Several 64 bit cores available < 2,000,000 transistors Complete SoC libraries becoming available (e.g. Blue Logic, etc.) Unprecedented opportunity for semi-custom node architectures based on SoC technologies

Rick Stevens Argonne  Chicago Design Tools are Improving We can start to think in terms similar to desktop publishing from 20 years ago Mass customization will become possible but: What design Macros are needed ? How to involve algorithms and applications developers in the design process ? How to connect with systems software (OS, runtime, libraries)?

Rick Stevens Argonne  Chicago Evolution of Commodity Clusters GPU/Node ….. Commodity Network High-Performance Interconnect ….. SoCs I/O O(1000) nodes GP services O(100K) nodes Semi-custom or Reconfigurable

Rick Stevens Argonne  Chicago Systems Software for SoCs Embedded Processor Systems Software DSP: real-time OS/Runtime ~40K on chip FLASH ROM (shadow RAM), off chip extensions for future NPUs: real-time runtime support < 100K typically, some general purpose co-processors (Linux typically used in Juniper) Graphics processors on chip runtime support upgradeable via device drivers

Rick Stevens Argonne  Chicago A Few Recommendations Comprehensive applications studies To determine feasibility of acceleration via semi-custom SoC/CLoCs To understand what OS functions are actually required for full HPC applications Establish some design challenges Pick several core algorithms (besides lattice gauge) and do some paper designs to validate the possible advantages of SoC based approaches An augmented cluster testbed GP Linux cluster with SoC/CLoC based compute backends