Presentation is loading. Please wait.

Presentation is loading. Please wait.

MSECE Thesis Presentation Paul D. Reynolds

Similar presentations


Presentation on theme: "MSECE Thesis Presentation Paul D. Reynolds"— Presentation transcript:

1 MSECE Thesis Presentation Paul D. Reynolds
Algorithm Implementation in FPGAs Demonstrated Through Neural Network Inversion on the SRC-6e MSECE Thesis Presentation Paul D. Reynolds

2 Algorithm Hardware Implementation
Algorithms One Output for each Input Purely Combinational Problems Too Large to be Directly Implemented Timing Issues Solution Clocked Design Repeated Use of Hardware

3 Implementation Hardware
SRC-6e Reconfigurable Computer 2 Pentium III Processors 1 GHz 2 Xilinx XC2V6000 FPGAs 100 MHz 144 Multipliers 144 Block RAMs 6 Memory Blocks 4 MB each

4 Hardware Architecture

5 SRC-6e Development Environment
main.c C Executes on Pentium Processors Command Line Interface Hardware accessed as a Function

6 SRC-6e Development Environment
hardware.mc Modified C or FORTRAN Executes in Hardware Controls Memory Transfer One for each FPGA used Can be for entire code or with hardware description functions

7 Hardware Description- VHDL and VERILOG
Reasons for Use To avoid c-compiler idiosyncrasies Latency added to certain loops 16 bit multiplies converted to 32 bit multiplies More control Fixed point multiplication with truncation Pipelines and parallel execution simpler IP Cores Useable More efficient implementation

8 Neural Network and Inversion Example

9 Problem Background To determine the optimal sonar setup to maximize the ensonification of a grid of water. Influences to ensonification: Environmental Conditions – Temperature, Wind Speed Bathymetry – Bottom Type, Shape of Bottom Sonar System Total of 27 different factors accounted for

10 Ensonification Example
15 by 80 pixel grid Red: High signal to interference ratio Blue: Low signal to interference ratio Bottom: No signal

11 Original Solution Take current conditions
Match to previous optimum sonar setups with similar conditions Run acoustic model using current conditions and previous optimum setups Use sonar setup with highest signal to interference ratio

12 New Problem Problem: Solution
One acoustic model run took tens of seconds Solution Train a Neural Network on the acoustic model (APL & University of Washington)

13 Neural Network Overview
Inspired by the human ability to recognize patterns. Mathematical structure able to mimic a pattern Trained using known data Show the network several examples and identify each example The network learns the pattern Show the network a new case and let the network identify it.

14 Neural Network Structure
Each neuron is the squashed sum of the inputs to that neuron A squash is a non-linear function that restricts outputs to between 0 and 1 Each arrow is a weight times a neuron output OUTPUTS WEIGHT LAYER NEURON INPUTS

15 Ensonification Neural Network
Taught using examples from the acoustical model. Recognizes a pattern between the 27 given inputs and 15 by 80 grid output Architecture Squash =

16 Did the neural network solve the problem?
Yes: Neural network acoustic model approximation: 1 ms However- Same method of locating best: Run many possible setups in neural network Choose best Problem: Better, but still not real time

17 How to find a good setup solution: Particle Swarm Optimization
Idea Several Particles Wandering over a Fitness Surface Math xk+1 = xk + vk vk+1 = vk + rand*w1*(Gb-xk)+rand*w2*(Pb-xk) Theory Momentum pushes particles around surface Pulled towards Personal Best Pulled towards Global Best Eventually particles oscillate around Global Best

18 Particle Swarm - Math xk+1 = xk + vk
vk+1 = vk + rand*w1*(Gb-xk)+rand*w2*(Pb-xk) Next Position = Current Position Current Velocity Next Velocity = Current Velocity Global Pull Personal Pull

19 Particle Swarm in Operation
Link to Particle Swarm file – in Quicktime

20 Particle Swarm Optimization
27 Inputs to Neural Network, Sonar System Setup Fitness Surface Calculated from neural network output Two Options Match a desired output Sum of the difference from desired output Minimize the difference Maximize signal to interference ratio in an area Ignore output in undesired locations

21 Particle Swarm in Operation
Link to Particle Swarm file – in Quicktime

22 New Problem Enters Time for 100k step particle swarm using a 2.2Ghz Pentium: nearly 2 minutes Desire a real time version Solution: Implement the neural network and particle swarm optimization in parallel on reconfigurable hardware

23 Three Design Stages Activation Function Design Neural Network Design
Sigmoid not efficient to calculate Neural Network Design Parallel Design Particle Swarm Optimization Hardware Implementation

24 Activation Function Design
Fixed Point Design Sigmoid Accuracy Level Weight Accuracy Level

25 Fixed Point Design VS Floating Point Data Range of -50 to 85
Easier Less Area Faster Data Range of -50 to 85 2’s Complement 7 integer bits 1 sign bit Fractional Portion Sigmoid outputs less than 1 Some number of fractional bits

26 Sigmoid Accuracy Level

27 Weight Accuracy Level

28 Total Accuracy

29 Fixed Point Results 16-bit Number Advantages 1 Sign Bit 7 Integer Bits
8 Fractional Bits Advantages 18 x 18 multipliers 64-bit input banks

30 Activation Function Approximation
Compared 4 Designs Look-up Table Shift and Add CORDIC Taylor Series

31 Look-up Table Advantages Disadvantages weights Unlimited Accuracy
Short Latency of 3 Disadvantages Desire entirely in chip design Might use memory needed for weights

32 Look-up Table

33 Shift and Add Y(x)=2-n*x + b Advantages Disadvantages Small Design
Short Latency of 5 Disadvantages Piecewise Outputs Limited Accuracy

34 Shift and Add

35 CORDIC Computation Divide Argument By 2 Series of Rotations
Sinh(x) Cosh(x) Division for Tanh(x) Shift and Add for Result

36 CORDIC Advantages Disadvantages Unlimited Accuracy Real Calculation
Long Latency of 50 Large Design

37 CORDIC

38 Taylor Series Y(x) = a+b(x-x0)+c(x-x0)2 Advantages Average
Unlimited Accuracy Average Latency of 10 Medium Size Design Disadvantages 3 multipliers

39 Taylor Series

40 Neural Network Design Desired Limitations
Architecture Maximum Parallel Design Entirely on Chip design Limitations 92, bit weights in 144 RAMB16s Layers are Serial 144 18x18 Multipliers

41 Neural Network Design Initial Test Design Serial Pipeline
One Multiply per Clock 92,000 Clocks 1 ms=PC equivalent

42 Test Output FPGA output Real output

43 Test Output FPGA output Real output

44 Test Output FPGA output Real output

45 Neural Network Design Maximum Parallel Version
71 Multiplies in Parallel Zero weight padding Treat all layers as the same length 71 25 clock wait for Pipeline Total 1475 clocks per Network Evaluation 15 microseconds 60,000 Networks Evaluations per Second

46 Neural Network Design

47 Particle Swarm Optimization
2 Chips in SRC Particle Swarm Controls inputs Sends to Fitness Chip Receives a fitness back Fitness Function Calculates Network Compares to Desired Output

48 Particle Swarm Implementation
Problem - randomness vk+1 = vk + rand*w1*(Gb-xk)+rand*w2*(Pb-xk) Solutions Remove randomness vk+1 = vk + w1*(Gb-xk) + w2*(Pb-xk) Linear Feedback Shift Register Squared Decimal Implementation

49 Random vs. Deterministic
Deterministic – Blue Random – Green/Red

50 Linear Feedback Shift Register

51 Squared Decimal

52 Randomness Results Standard Conventional Swarm Error
units per pixel Deterministic Swarm Error units per pixel LFSR Swarm Error units per pixel Squared Decimal Error units per pixel

53 Randomness Results The gain from randomness is not significant.
Deterministic method used. All much higher than conventional swarm Approximated Network Approximation Error between Networks 1.423 units per pixel Deterministic error on approximated network units per pixel

54 Particle Swarm Chip 10 Agents Restrictions
Preset Starting Points and Velocities 8 from Previous Data, Random Velocities 1 at maximum range, aimed down 1 at minimum range, aimed up Restrictions Maximum Velocity Range

55 Update Equation Implementation
XnDimk VnDimk PnDimk Gk Vmaxk Xmaxk Xmink X+V Compare New XnDimk P-X G-X V+1/8(P-X)+1/16(G-X) New VnDimk xk+1 = xk + vk vk+1 = vk + w1*(Gb-xk)+w2*(Pb-xk)

56 Results – Output Matching 100k iteration PSO ->1.76 s
SWARM REAL

57 Results – Output Matching 100k iteration PSO ->1.76 s
SWARM REAL

58 Results – Output Matching 100k iteration PSO ->1.76 s
SWARM REAL

59 Particle Swarm-Area Specific 100k iteration PSO ->1.76 s

60 Particle Swarm-Area Specific 100k iteration PSO ->1.76 s

61 Particle Swarm-Area Specific 100k iteration PSO ->1.76 s

62 ANY QUESTIONS?


Download ppt "MSECE Thesis Presentation Paul D. Reynolds"

Similar presentations


Ads by Google