M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014)

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
Learning in Neural and Belief Networks - Feed Forward Neural Network 2001 년 3 월 28 일 안순길.
Deep Learning Bing-Chen Tsai 1/21.
Stochastic Neural Networks Deep Learning and Neural Nets Spring 2015.
CS590M 2008 Fall: Paper Presentation
ImageNet Classification with Deep Convolutional Neural Networks
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Handwritten Character Recognition Using Artificial Neural Networks Shimie Atkins & Daniel Marco Supervisor: Johanan Erez Technion - Israel Institute of.
Neural Networks Basic concepts ArchitectureOperation.
Deep Belief Networks for Spam Filtering
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Artificial Neural Networks -Application- Peter Andras
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Traffic Sign Recognition Using Artificial Neural Network Radi Bekker
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Appendix B: An Example of Back-propagation algorithm
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Neural networks in modern image processing Petra Budíková DISA seminar,
PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
Practical Message-passing Framework for Large-scale Combinatorial Optimization Inho Cho, Soya Park, Sejun Park, Dongsu Han, and Jinwoo Shin KAIST 2015.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
CS 188: Artificial Intelligence Learning II: Linear Classification and Neural Networks Instructors: Stuart Russell and Pat Virtue University of California,
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Deep learning Tsai bing-chen 10/22.
Convolutional Neural Network
GPGPU Performance and Power Estimation Using Machine Learning Gene Wu – UT Austin Joseph Greathouse – AMD Research Alexander Lyashevsky – AMD Research.
Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.
Philipp Gysel ECE Department University of California, Davis
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.
Lecture 3a Analysis of training of NN
Big data classification using neural network
Some Slides from 2007 NIPS tutorial by Prof. Geoffrey Hinton
Stanford University.
TensorFlow– A system for large-scale machine learning
RNNs: An example applied to the prediction task
Energy models and Deep Belief Networks
Data Mining, Neural Network and Genetic Programming
Chilimbi, et al. (2014) Microsoft Research
Neural Network Implementations on Parallel Architectures
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Restricted Boltzmann Machines for Classification
Combining CNN with RNN for scene labeling (segmentation)
Intelligent Information System Lab
Deep Learning Qing LU, Siyuan CAO.
Deep Belief Networks Psychology 209 February 22, 2013.
TensorFlow and Clipper (Lecture 24, cs262a)
Neural Networks and Backpropagation
PipeDream: Pipeline Parallelism for DNN Training
RNNs: Going Beyond the SRN in Language Prediction
Convolutional Neural Networks
Introduction to Neural Networks
Logistic Regression & Parallel SGD
network of simple neuron-like computing elements
Graph Neural Networks Amog Kamsetty January 30, 2019.
Artificial Neural Networks
TensorFlow: A System for Large-Scale Machine Learning
Attention for translation
Learning and Memorization
Object Detection Implementations
Search-Based Approaches to Accelerate Deep Learning
Deep learning: Recurrent Neural Networks CV192
CSC 578 Neural Networks and Deep Learning
Principles of Back-Propagation
Presentation transcript:

Minerva: A Scalable and Highly Efficient Training Platform for Deep Learning M. Wang, T. Xiao, J. Li, J. Zhang, C. Hong, & Z. Zhang (2014) Presentation by Cameron Hamilton

Overview Problem: disparity between deep learning tools oriented towards productivity/generality (e.g. MATLAB) and task-specific tools designed for speed and scale (e.g. CUDA-Convnet). Solution: A matrix-based API, known as Minerva, with a MATLAB-like procedural coding style. Program is translated into an internal dataflow graph at runtime, which is generic enough to be implemented on different types of hardware.

Minerva System Overview Every training iteration has two phases Generate dataflow graph from user code Evaluate dataflow graph

Example of User Code

System Overview: Performance via Parallelism Performance of deep learning algorithms dependent on whether operations can be performed in parallel. Minerva utilizes two forms of parallelism: Model parallelism: model replicas used to train the same model Replicas exchange updates via “logically centralized parameter server” (p. 4). Data parallelism: model replicas assigned to different portions of the data sets Always evaluates on GPU if available

Programming Model Minerva API  3 stages for deep learning Define model architecture Model model; Layer layer1 = model.AddLayer(dim); model.AddConnection(layer1,layer2,FULL); Declaring primary matrices (i.e. weights & biases) Matrix W = Matrix(layer2,layer1,RANDOM); Matrix b(layer2,1,RANDOM); Vector<Matrix> inputs = LoadBatches(layer1,…);

Programming Model Specifying training procedure Convolutional neural networks (CNNs) are specified with a different syntax. The architecture is specified with a single line: AddConvConnect(layer1,layer2,…). Minerva then handles the arrangement of these layers (p.4).

Programming Model Expressing Parallelism Model Parallelism SetPartition(layer1,2);SetPartition(layer2,2); Data Parallelism ParameterSet pset; pset.Add(“W”,W); pset.Add(“V”,V); pset.Add(“b”,b); pset.Add(“c”,c); RegisterToParameterServer(pset); …//Learning Procedure Here if(epoch % 3 == 0) PushToParameterServer(pset); if(epoch % 6 == 0) PullFromParameterServer(pset); EvalAll();

Putting it All Together

Putting it All Together

System Design: More on Parallelism Within a neural network, the operations that will occur at each computing vertex (i.e. forward propagation, backward propagation, weight update) are predefined. This allows for network training to be partitioned for theoretically any number of threads. Updates shared between local parameter servers Load-balance by dividing task up amongst partitions Coordination and Overhead by determining ownership of computing vertex based on location of its input and output vertices. Partitions stick to their vertices. Locality by receiving input to vertex in layer n from n-1 and outputting layer n+1

Model Parallelism

Convolutional Networks Partitions handle patches of the input data, the patches are merged, then convolved with a kernel.

More on Data Parallelism Each machine/partition has its own local parameter server that updates and exchanges with its neighbor servers. Coordination done through belief-propagation-like algorithm (p.7) Merge updates with neighbors, then server “gossips to each of them the missing portion”

Experience and Evaluation Minerva Implementation Highlights ImageNet (CNN) 1K classification task (Krzhevsky et al., 2012) 42.7% top-1 error rate 15x faster than MATLAB implementation 4.6x faster with 16-way partition on 16 core machine than no partitions Speech-net 1100 input neurons, 2000 sigmoid neurons x 8 hidden layers, 9000 softmax output layer 1.5-2x faster than MATLAB implementation RNN 10000 input, 1000 hidden, 10000 flat outputs

Experience and Evaluation Scaling Up (Figure 8): CNN using mini-batch size of 128 Minerva(GPU) trained faster than Caffee using 256 and 512 mini-batch sizes

Experience and Evaluation

Experience and Evaluation

Experience and Evaluation

Conclusion Powerful and versatile framework for big data and deep learning Pipeline may be more preferable than partitioned fully connected layers which cause traffic My Comments Lacks restricted Boltzmann machine (RBM) so deep belief network (DBN) is not currently possible API appears to be concise and readable Lacks implementation of algorithm for genetic design of network (e.g. NEAT), however population generation would be ideal for partitioning. Not clear how Minerva handles situations where partitions do not evenly divide # of nodes within a given layer

References Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105). Wang, M., Xiao, T., Li, J., Zhang, J., Hong, C., & Zhang, Z. (2014). Minerva: A scalable and highly efficient training platform for deep learning. All figures appearing within this presentation are borrowed from Wang et al., 2014.