Presentation is loading. Please wait.

Presentation is loading. Please wait.

PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.

Similar presentations


Presentation on theme: "PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015."— Presentation transcript:

1 PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015

2 BASICS OF ARTIFICIAL NEURAL NETWORKS What is an Artificial Neural Network (ANN)? What makes up a neuron? How is “learning” modelled in ANNs?

3 STRUCTURE OF A NEURAL NETWORK  A neural network is a collection of interconnected neurons that compute and generate impulses  Specific parts include neurons, synapses, and activation functions  An artificial neural network is a mathematical model, based on natural neural networks found in animals’ brains.

4 BASIC STRUCTURE OF A NEURON There is an input vector containing {x1, x2, …, xn} and an associated vector of weights {w1, w2, …, wn}. The input x weight vector summation is calculated and the output is sent into an activation function. Based on the activation function, the summation is mapped to some value, generally between {-1, 1}, such as in the shown step activation function. This value is then considered the output of the neuron.

5 TRAINING A NEURAL NETWORK  To properly train a neural network, the weights must be “tuned” to model the goal function as closely as possible.  “Goal” function represents the function that maps input data to output data in our training set.  Training a neural network is by far the most costly step in the majority of scenarios.  Google has reported training times <2 days for certain problems and network sizes.  Once trained, new items can be classified very quickly though  Some popular options  Backpropagation (used in the majority of cases).  genetic algorithms with simulated annealing  Hebbian learning  a combination of different methods in a “Committee of Machines”

6 BACKPROPAGATION  Most popular training method  Works by reducing error on the training set  Requires many training examples to get error low  Uses gradient descent on the error  mean squared error  Partial derivatives are used to determine which neuron/weight to blame for parts of the error

7 Backward pass is done through backpropagation Uses chain rule to calculate partial derivative

8 Underlying operations are embarrassingly parallel, but many problems still remain Backpropagation, Communication and Computational issues all must be considered when scaling neural networks

9 PROBLEMS WITH SCALING BACKPROPAGATION  Requires neurons of one layer to be fully connected to the neurons of the next layer  Lots of communication required  Gradient descent is prone to getting stuck in local optima  Requires many iterations to reduce error to acceptable rate  Training data set sizes are very large  Rule of thumb for error  Training set size should be roughly the number of weights divided by the permitted classification error rate  10% error rate = 10x the number of weights, 1% = 100x, etc.

10 COMPUTATIONAL ISSUES IN SCALING ANNS  Main operation is matrix multiplication  N-node layer requires N 2 scalar multiplications and N sums of N numbers  Requires a good multiply or multiply-and-add function  Activation function  Often sigmoid is used f(x) = 1/(1+e -x )  Has to be approximated efficiently

11 COMMUNICATION ISSUES IN SCALING ANNS  High degree of connectivity  Large data flows  Structure and bandwidth are very important  Broadcasts and ring topologies are often used because of the necessary communication requirements  More processors does not mean faster computation in many cases

12 TWO KEY METHODOLOGIES Model dimension  One model, but multiple workers train individual parts  High amount of communication  Need to synchronize at the edges  Efficient when the computation is heavy per neuron  Datasets where each data point contains many attributes Data Dimension  Different workers train on completely different sets of data  Also high amount of communication  Need to synchronize parameters, weights to ensure consistent model  Efficient when each weight needs a high amount of computation  Large datasets where each data point only contains a few attributes

13 Example of splitting on the data dimension

14 SPANN (SCALABLE PARALLEL ARTIFICIAL NEURAL NETWORK)  Inspired by human brain’s ability to communicate between groups of neurons without fully connected paths  Focused on parallelizing the model dimension  Uses MPI library  Reduces need for communication between every neuron in consecutive layers of a neural network  Only boundary values are communicated between “ghost” neurons

15 BIOLOGICAL INSPIRATION  Neocortex is the part of the brain most commonly associated with intelligence  Columnar structure with an estimated 6 layers

16 SPANN CONT. Recall from Serial BackpropagationParallel Backpropagation L is the number of layers, including input/output layers N proc is the number of processors being used As shown by the first box, every input is sent to every processor Each processor only has N hidden / N proc hidden neurons/layers and N out / N proc output layers Divide by number of processors to get weights/processor Example comparison of 3 layer network: Serial ANN 200 input, 48 output, 125 hidden (200+48)*125 = 31,000 weights need to be trained Using SPANN in a Parallel ANN 200 input, 48 output, 120 hidden 6 layers, 8 processors 30,280 weights need to be trained, but only 3785 per processor

17

18 PERFORMANCE COMPARISON 37890 weights on a serial ANN took 1313 seconds to complete training, compared to 30,240 weights taking 842 seconds There is significant slowdown shown in the serial version 8 resolution computes ~36 weights/sec, but 9 resolution falls to only ~28.5 weights/sec The time taken per weight grows slower in SPANN, so once the size of the training data reaches a significant size, it becomes much quicker per weight. Speedup factor is related to the training data size Larger size, larger speedup

19 RESULTS CONT.

20 SPANN CONCLUSIONS  Developed an architecture that can scale into billions of weights or synapses  Successful by reducing the communication requirements in between layers to a few “gatekeeper nodes”  Uses a human biological model as inspiration

21 SCALING ANNS CONCLUSIONS Neural networks are a tool that have provided significant developments in artificial intelligence and machine learning fields Scaling issues are big, even though calculations are embarrassingly parallel Communication Computational SPANN showed promising results Research continues today Heavy focus on communication, as training set sizes are growing faster than the computational requirements in many cases

22 QUESTIONS?


Download ppt "PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015."

Similar presentations


Ads by Google