Download presentation

Presentation is loading. Please wait.

Published byIrving Brunton Modified over 3 years ago

1
From tens to millions of neurons Computer Architecture Group Paul Fox How computer architecture can help

2
What hinders the scaling of neural computation? Neural Computation = Communication + Data Structures + Algorithms Neural Computation = Communication + Data Structures + Algorithms But almost everybody ignores the first two!

3
What is Computer Architecture? Designing computer systems that are appropriate for their intended use Relevant design points for neural computation are: Memory hierarchy Type and number of processors Communication infrastructure Just the things that existing approaches dont consider!

4
Our approach Bluehive system Vast communication and memory resources Reprogrammable hardware using FPGAS Can explore different system designs and see what is most appropriate for neural computation

5
Organisation of data for spiking neural networks

6
First approach – Custom FPGA pipeline

7
Running 256k Neurons

8
First approach – Custom FPGA pipeline Real-time performance for at least 256k neurons over 4 boards Saturates memory bandwidth Plenty of FPGA area left, so could use a more complex neuron model But only if it doesnt need more data But time consuming and not really usable by non computer scientists Can we use more area to make something that is easier to program but still attains performance approaching the custom pipeline? Can we use more area to make something that is easier to program but still attains performance approaching the custom pipeline?

9
Single scalar processor … Data bus = 256 bits Data bus = any width DDR2 RAM (from 200MHz FPGA) Block RAM Processor One 32-bit transfer at a time

10
Multicore scalar processor … Data bus = 256 bits Data bus = any width DDR2 RAM (from 200MHz FPGA) Block RAM Processor … Ruins spatial locality Inter-processor communication needed

11
Vector processor – many words at a time … Data bus = 256 bits Data bus = any width DDR2 RAM (from 200MHz FPGA) Block RAM Vector Processor

12
Productivity vs. Performance Run time (s) Lines of code 2005005k-10k 1 2 125 Izhikevich.c IzhikevichVec.c NeuronSimulator/*.bsv Dual-core NIOS II+BlueVec NIOS II Bluespec System Verilog Vector version doesnt have much more code than original code Massive performance improvement

13
Simple example for (i = 0; i < N; i++) histo[array[i]] += 1; Compute the histogram of array[N] : Akin to I-value accumulation: histo[i] the number of spikes reaching neuron i array array of target neurons from some neuron

14
Using BlueVec for (i = 0; i < N; i+=16) { Load(0, array+i); Commit; LoadLocalH(1, 0); IncrementH(1); StoreLocalH(1, 0); } 16 updates at a time Vector register R denoted R Vector of 16 addresses Increment 16 frequencies Load array elements from DDR2 Load histo frequencies from BRAM Vector of 16 frequencies

15
Optimisation 1: Software pipelining Load(0, array); for (i = 0; i < N; i+=16) { Commit; Load(0, array+i+16); LoadLocalH(1, 0); IncrementH(1); StoreLocalH(1, 0); } Load first 16 values Prefetch the 16 values needed for next iteration Parallelism (an alternative to DMA into a scratchpad)

16
Optimisation 2: Burst Loads Load(0, array, 8); for (i = 0; i < N; i+=128) { Commit; Load(0, array+i+128, 8); … } LoadLocalH(8, 0); IncrementH(8); StoreLocalH(8, 0); … LoadLocalH(8, 7); IncrementH(8); StoreLocalH(8, 7); Burst load of 8 16-element vectors into registers 0.. 7 Now 54x faster than pure C version

17
Application to Izhikevich simulator Time (s)% I-values87.874 Neuron updates28.124 Spike delay buffer1.52 Total117.4 Time (s)% I-values1.435 Neuron updates1.845 Spike delay buffer0.820 Total4.0 Izhikevich.c IzhikevichVec.c 193 lines of code417 lines of code

18
BlueVec stats 3 stage pipeline Single-cycle instructions (mostly) Clocks at 210MHz i.e. does not inhibit NIOS II frequency Simple design (1210 lines of code) DE4 logic utilisation NIOS II5% BlueVec8%

19
Multicore processing CoresLogic Utilisation Run-time 118%4.0s 227%2.1s 449%1.9s CoresRun-time 1122s 269s 486s NIOS IINIOS II + BlueVec 265 lines of code499 lines of code Attention must be given to memory access patterns

20
Future work Support distributed simulation across multiple FPGAs High-level language support for vector processing? C: NESL: typedef __attribute__((vector_size(32))) int v8i; v8i vadd(v8i a, v8i b) { return a + b; } function dotprod(a,b) = sum({x * y: x in a ; y in b});

21
Towards better benchmarks Nengo compiler from algorithms to LIF neurons (Uni of Waterloo). Nengo archive currently contains 22 models developed by neuroscientists. Biggest 3m neurons. We have developed an LIF simulator in C and applied it to the Nengo model for digit recognition (6k neurons, 1m connections). On scalar NIOS II: 25 times slower than real-time. Soon to be vectorised.

22
Example for LIF character recognition Time (ms)% I-values331.783 Gain/Bias39.29 Neuron updates26.86 Total397.7 Time (ms)% I-values7.942 Gain/Bias3.618 Neuron updates5.830 Total18.9 LIF.c LIFVec.c 324 lines of code496 lines of code

23
LIF simulator on FPGA running a Nengo model

24
Conclusion When designing a neural computation system you need to think about every part of the computation, not just the algorithm Some form of vector processor is likely to be most appropriate Or write your model in NeuroML and let us do the hard work!

25
Questions?

Similar presentations

OK

Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.

Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on safer construction practices in disaster prone areas Ppt on current account deficit in australia Ppt on edge detection Ppt on power sharing in democracy sovereignty Ppt on file system in unix everything is a file Ppt on guru granth sahib Ppt on principles of peace building support office Ppt on reliance mutual fund Ppt on traction rolling stock maintenance Ppt on biodegradable and non biodegradable meaning