Chip Design for the Next Generation Neuromorphic Many-Core Systems Sebastian Höppner collaborators: TUD, UMAN Madrid, 28th September 2015 September.

Slides:



Advertisements
Similar presentations
Click to edit Master title style. Click to edit Master subtitle style.
Advertisements

A Programmable Adaptive Router for a GALS Parallel System Jian Wu APT Group University of Manchester May 2009.
Slides Prepared from the CI-Tutor Courses at NCSA By S. Masoud Sadjadi School of Computing and Information Sciences Florida.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
1 Power Management for High- speed Digital Systems Tao Zhao Electrical and Computing Engineering University of Idaho.
1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.
Sequential Definitions  Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the.
Topic 3: Sensor Networks and RFIDs Part 2 Instructor: Randall Berry Northwestern University MITP 491: Selected Topics.
L27:Lower Power Algorithm for Multimedia Systems 성균관대학교 조 준 동
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Jonathan.
Integrated  -Wireless Communication Platform Jason Hill.
Architecture for Network Hub in 2011 David Chinnery Ben Horowitz.
An Efficient Programmable 10 Gigabit Ethernet Network Interface Card Paul Willmann, Hyong-youb Kim, Scott Rixner, and Vijay S. Pai.
RF Wakeup Sensor – On-Demand Wakeup for Zero Idle Listening and Zero Sleep Delay.
NS Training Hardware. System Controller Module.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 1 Fundamentals of Quantitative Design and Analysis Computer Architecture A Quantitative.
Spring 2000, 4/27/00 Power evaluation of SmartDust remote sensors CS 252 Project Presentation Robert Szewczyk Andras Ferencz.
CACTI-IO: CACTI With Off-Chip Power-Area-Timing Models
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
1 Interconnection Networks and Scalable Crossbars Prof. U. Brüning Computer Architecture Group Institute of Computer Engineering University of Mannheim.
Lecture 03: Fundamentals of Computer Design - Trends and Performance Kai Bu
Low-Power Wireless Sensor Networks
1 Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy.
The University of Adelaide, School of Computer Science
C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.
System Architecture Directions for Networked Sensors Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kris Pister Presented by Yang Zhao.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Improved air combat awareness - with AESA and next-generation signal processing Main beam jamming rejection Wide transmit beam Communication Side lobe.
Configuring a Large-Scale GALS System M.M. Khan*, J. Navaridas†, L.A. Plana*, M. Luj´an*, J.V Woods*, J. Miguel-Alonso† and S.B. Furber* *School of Computer.
ATtiny23131 A SEMINAR ON AVR MICROCONTROLLER ATtiny2313.
EA PROJETO EM ELETRÔNICA APLICADA Bruno Mourão Siqueira.
Network On Chip Platform
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
1 Soft Timers: Efficient Microsecond Software Timer Support For Network Processing Mohit Aron and Peter Druschel Rice University Presented By Oindrila.
Distributed Computation: Circuit Simulation CK Cheng UC San Diego
Morgan Kaufmann Publishers
AT91 Products Overview. 2 The Atmel AT91 Series of microcontrollers are based upon the powerful ARM7TDMI processor. Atmel has taken these cores, added.
Kirchhoff Institute for Physics Johannes Schemmel Ruprecht-Karls-Universität Heidelberg 1 Accelerated Neuromorphic Hardware : Hybrid Plasticity - The Next.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
L.Royer – Calice Manchester – Sept A 12-bit cyclic ADC dedicated to the VFE electronics of Si-W Ecal Laurent ROYER, Samuel MANEN LPC Clermont-Ferrand.
GreenCloud: A Packet-level Simulator of Energy-aware Cloud Computing Data Centers Dzmitry Kliazovich ERCIM Fellow University of Luxembourg Apr 16, 2010.
CS203 – Advanced Computer Architecture
Product Overview 박 유 진박 유 진.  Nordic Semiconductor ASA(Norway 1983)  Ultra Low Power Wireless Communication System Solution  Short Range Radio Communication(20.
RKE Tx and Rx PDH Promotion – Aug Copyright © Infineon Technologies All rights reserved. Confidential Page 2 Wireless Control in Automotive.
System on a Programmable Chip (System on a Reprogrammable Chip)
SPRING 2012 Assembly Language. Definition 2 A microprocessor is a silicon chip which forms the core of a microcomputer the concept of what goes into a.
TWEPP Biologically-Inspired Massively-Parallel Computation Steve Furber The University of Manchester
CS203 – Advanced Computer Architecture
Lynn Choi School of Electrical Engineering
Project Title EVM IN 8051 Under the Guidance of Submitted by.
Green cloud computing 2 Cs 595 Lecture 15.
Morgan Kaufmann Publishers
Energy Efficient Computing in Nanoscale CMOS
Israel Cidon, Ran Ginosar and Avinoam Kolodny
Lecture 2: Performance Today’s topics: Technology wrap-up
Click to edit Master text styles
Click to edit Master text styles
Introduction and History of Cray Supercomputers
A High Performance SoC: PkunityTM
The University of Adelaide, School of Computer Science
Click to edit Master text styles
Slide Title Edit Master text styles Second level Third level
ОПШТЕСТВО ТЕМА: МЕСТОТО ВО КОЕ ЖИВЕАМ Скопје
Co-designed Virtual Machines for Reliable Computer Systems
Click to edit Master text styles
The University of Adelaide, School of Computer Science
Utsunomiya University
Click to edit Master text styles
Steve Furber, Mantas Mikaitis, Michael Hopkins, David Lester
Presentation transcript:

Chip Design for the Next Generation Neuromorphic Many-Core Systems Sebastian Höppner collaborators: TUD, UMAN Madrid, 28th September 2015 September 2015 HBP Summit 2015

Outline Targets and Architecture Chip Innovations Roadmap September 2015 HBP Summit 2015

NM-MC (SpiNNaker) Rationale Communication and memory centric architecture for efficient real-time simulation of spiking neural networks NM-MC-1 (SpiNNaker) has a broad user base ~40 systems in use around the world Flexibility: adaptable network, neuron model & plasticity Real-time: suits robotics & faster than HPC Capacity of 109 neurons and 1012 synapses Energy per synaptic event 10-8J (HPC: 10-4J) SpiNNaker uses old 130nm CMOS technology Scope for 10x improvement on modern technology with innovative circuit techniques September 2015 HBP Summit 2015

NM-MC Chip Scaling Targets Feature SpiNNaker SpiNNaker2 technology 130nm 28nm cores 18 68 core frequency 200MHz >400MHz external memory 128MByte (1 Gbyte/s) 2GByte (>10 Gbyte/s) power 1W power management no yes floating point support exponential function hardware vector processing true random numbers biological realtime operation no. of neurons / synapses 16k / 16M 128k / 128M energy/synaptic event 10-8J 10-9J ≈10x improvement September 2015 HBP Summit 2015

SpiNNaker2 Chip Architecture router sub-system (spike communication) processing sub-system (neuromorphic computation) memory sub-system (synaptic memory) 3 GBit/s x3 68 cores ARM M4 PE ARM M4 PE ARM M4 PE ARM M4 ARM M4 SpiNNaker Router >400MHz HMC Interface Network-on-Chip IP contract with ARM for Cortex M4 processor cores signed in 01/2015 400MHz Periphery (Timer, IRQ) True Random generator shared memory >10 GByte/s 3 GBit/s x3 3 GBit/s host link September 2015 HBP Summit 2015

Neuromorphic Computation Scenario High update rates Peak processing load and spike communication bandwidth Latency requirements <1ms for processing and communication IP contract with ARM for Cortex M4 processor cores signed in 01/2015 Low update rates Relaxed processing load and spike communication bandwidth September 2015 HBP Summit 2015

Computation: Power Management Techniques Dynamic voltage and frequency scaling (DVFS) Self DVFS by ARM cores based on neural network activity up to 40% energy reduction Adaptive power rail adaption to run at minimum required voltage up to 25% energy reduction VDD VDD2 VDD3 „slow“ AVS adaption „fast“ DVFS switching t 26-28 Jan 2015 HBP Summit 2015

Computation: Exponential Function Accelerator Neural models often employ exponential decays Calculating exponentials is costly with standard processors typically dominates synapse update time in SpiNNaker Exponential function accelerator unit Latency: 4 clock cycles, fully pipelined Standard s16.15 fixed-point format, full accuracy (1 LSB) Integrated with ARM processor via AHB Up to 15x speed-up ARM Exp Unit FIFO AHB September 2015 HBP Summit 2015

Computation: Random Number Generation Random numbers are required for neural modeling for stochastic computing Hardware acceleration for pseudo-random number generation (PRNG) per ARM core True random numbers from silicon noise Re-use PLL jitter as noise source  no power overhead! Dedicated true random oscillators Local generation and global distribution of entropy Santos contains the equivalent of 125k true random and 8M pseudo random number 8-bit sources @1kHz September 2015 HBP Summit 2015

Communication: Energy Efficient Inter-Chip Links Lightweight packet switched spike communication within SpiNNaker network 5GBit/s multi-standard SerDes in 28nm CMOS TX and RX equalization, clock-data recovery Towards energy proportional communication Low power IDLE mode Fast wake-up for biological real-time operation Reduced lock time by 1000x (1.5ms  < 1.5µs) Fast wake-up power data rate link IDLE September 2015 HBP Summit 2015

Santos Chip SerDes SerDes SerDes Processing Element Shared Memory Router Processing Element Processing Element MCU Shared Memory Memory Interface SerDes SerDes SerDes SerDes GLOBALFOUNDRIES 28nm SLP CMOS, 18mm² Tape-out 07/2015 September 2015 HBP Summit 2015

Santos Evaluation Platform Demonstrator and evaluation platform for NM-MC2 components Support NM-MC2 software development and platform integration Up to 4 boards with 4 Santos modules  64 ARM cores Available in Q1 2016 September 2015 HBP Summit 2015

NM-MC2 Chip Roadmap 2023 2022 SpiNNaker2 2021 ? 2020 2019 68 ARM M4 cores Power management SpiNNaker router with SerDes HMC interface Chip Size: 70mm², >1 billion transistors ? 2021 2020 2019 Nanolink28_gen3 Final SerDes Final HMC Interface 2018 ? 2017 Nanolink28_gen2 2nd Iteration SerDes HMC Interface ? Santos28 4 ARM M4 cores AVFS, DVFS power management SpiNNaker router with SerDes LPDDR2 Memory Interface Chip Size: 18mm² 2016 2015 NanoLink28 SerDes Transceiver Chip Size: 4mm² 2014 2013 September 2015 HBP Summit 2015

Thank you for your attention The NM-MC2 chip design team: The University of Manchester: Luis Plana, Jim Garside, Steve Temple, David Lester, Steve Furber Technische Universität Dresden: Sebastian Höppner, Stefan Scholze, Andreas Dixius, Johannes Partzsch, Georg Ellguth, Stephan Hartmann, Thomas Hocker, Stephan Henker, Jörg Schreiter, Stefan Hänzsche, Stefan Schiefer, Love Cederstroem, René Schüffny, Christian Mayr September 2015 HBP Summit 2015

HBP Period 1 Review (Oct 2013 – Sep 2014) Memory Integration DRAM for synaptic memory Hybrid Memory Cube (HMC) 3D stacked-die DRAM module Serial high-speed interface with 15GBit/s to processor chip for high bandwidth and low power consumption Status: conceptual planning Research challenge: 15G SerDes PHY in 28nm SLP technology Develop innovative and cost-efficient system-in-package integration concepts by Micron The Hybrid Memory Cube technology offers a low footprint DRAM integration based on a 3D chip stack. The HMC memory interface employs low voltage swing serial links at high data rates up to 15GBit/s for reduced energy per bit for memory access and compact physical realization (fewer physical lines). The implementation of these high speed interfaces in a super low power technology (e.g. 28nm SLP) with less device performance compared to the high-performance technology counterparts (e.g. 28nm HPP) is a main research challenge. Target footprint ≈2.5cm x 2.5cm Contributions by: 26-28 Jan 2015 HBP Period 1 Review (Oct 2013 – Sep 2014)