Download presentation
Presentation is loading. Please wait.
Published bysimi ps Modified over 4 years ago
2
GREEDY LAYER-WISE UNSUPERVISED PRETRAINING Presented by SIMI P S IV semester part time MTECH (DS &AI) Department of Computer Science, Cochin University of Science & Technology
3
WHY TRAINING DEEP NEURAL NETWORKS IS A CHALLENGE? VANISHING GRADIENT PROBLEMVANISHING GRADIENT PROBLEM. As the number of hidden layers is increased, the amount of error information propagated back to earlier layers is dramatically reduced. This means that weights in hidden layers close to the output layer are updated normally, whereas weights in hidden layers close to the input layer are updated minimally or not at all. Generally, this problem prevented the training of very deep neural networks. Normal training easily gets stuck in undesired local optima which prevent the lower layers from learning useful features. This problem can be partially circumvented by pretraining the layers in an unsupervised fashion and thus initialising them in a region of the error function which is easier to train (or fine-tune) using steepest descent techniques. INTRODUCTION
4
Pretraining Pretraining is based on the assumption that it is easier to train a shallow network instead of a deep network and contrives a layer-wise training process that we are always only ever fitting a shallow model. Pretraining can be used to iteratively deepen a supervised or an unsupervised model that can be repurposed as a supervised model. Pretraining may be useful for problems with small amounts labeled data and large amounts of unlabeled data. Pretraining involves successively adding a new hidden layer to a model and refitting, allowing the newly added model to learn the inputs from the existing hidden layer, often while keeping the weights for the existing hidden layers fixed. This gives the technique the name “layer-wise” as the model is trained one layer at a time. However it is also called pretraining, because it is supposed to be only a first step before a joint training algorithm is applied to fine-tune all the layers together. Pretraining decreases test error without decreasing training error) and a form of parameter initialization
5
PRETRAINING Main approaches The key benefits of pretraining are: Simplified training process. Facilitates the development of deeper networks. Useful as a weight initialization scheme. Perhaps lower generalization error. In general, pretraining may help both in terms of optimization and in terms of generalization. Supervised greedy layer-wise pretraining. Unsupervised greedy layer-wise pretraining.
6
GREEDY LAYER-WISE TRAINING Provides a way to develop deep multi-layered neural networks Is a pre-training algorithm Aims to train each layer of a DBN in a sequential way, feeding lower layers’ results to the upper layers. This renders a better optimization of a network than traditional training algorithms, i.e. training method using stochastic gradient descent Greedy algorithms break a problem into many components. Then solve for the optimal version of each component in isolation. Unfortunately, combining the individually optimal components is not guaranteed to yield an optimal complete solution.
7
It optimizes each piece of the solution independently, one piece at a time, rather than jointly optimizing all pieces. It optimizes each layer at a time greedily. These independent pieces are the layers of the network. Greedy layer-wise pretraining proceeds one layer at a time, training the k-th layer while keeping the previous ones fixed. The lower layers (which are trained first) are not adapted after the upper layers are introduced. Each layer is trained with an unsupervised representation learning algorithm. After unsupervised training, there is usually a fine-tune stage, when a joint supervised training algorithm is applied to all the layers. GREEDY LAYERWISE UNSUPERVISED
8
APPLICATIONS OF GREEDY LAYER-WISE UNSUPERVISED PRETRAINING As initialization for other unsupervised learning algorithms, such as deep autoencoders and probabilistic models with many layers of latent variables. Such models include deep belief networks and deep Boltzmann machines As a regularizer One hypothesis is that pretraining encourages the learning algorithm to discover features that relate to the underlying causes that generate the observed data. When the initial representation is poor. The use of word embeddings is a great example, where learned word embeddings naturally encode similarity between words When the function to be learned is extremely complicated.
9
Why Do We Need Greedy Layer-Wise Training? Training a deep structure can be difficult since there may exist high dependencies across layers’ parameters i.e. the relation between parts of pictures and pixels. SOLUTION The first step is adapting lower layers to feed good input to the upper layers’ final setting (the harder part). Next we need to adjust upper layers to make use of that end setting of upper layers Greedy layer-wise training has been introduced just to tackle this issue. It can be used for training the DBN in a layer-wise sequence where each layer is composed of an RBM It is confirmed to bring a better generalization by initializing a local minimum (or local criterion) that helps to formulate a representation of high-level abstractions of the input to the network
10
Unsupervised pretraining Is a way to initialize the weights when training deep neural networks. Initialization with pre-training can have better convergence properties than simple random training, especially when the number of (labeled) training points is not very large. Most helpful when the number of labeled examples is very small or the number of unlabeled examples is very large because the source of information added by unsupervised pretraining is the unlabeled data Pretraining involves successively adding a new hidden layer to a model and refitting, allowing the newly added model to learn the inputs from the existing hidden layer, often while keeping the weights for the existing hidden layers fixed. This gives the technique the name “layer-wise” as the model is trained one layer at a time.
11
The advantage of unsupervised training It allows us to use all of our data in the process of training (shared lower-level representation) and it does not require training criterion to be labeled (unsupervised). This unsupervised training process provides an optimal start for supervised training as well as to restrict the range of parameters for further supervised training Unsupervised pretraining is likely to be most useful when the function to be learned is extremely complicated. If the true underlying functions are complicated and shaped by regularities of the input distribution, unsupervised learning can be a more appropriate regularizer. Unsupervised pretraining can help tasks other than classification, and can act to improve optimization rather than being merely a regularizer. For example, it can improve both train and test reconstruction error for deep autoencoders Unsupervised pretraining initializes neural network parameters into a region that they do not escape, and the results following this initialization are more consistent and less likely to be very bad than without this initialization
12
DISADVANTAGES OF UNSUPERVISED PRETRAINING It operates with two separate training phases. Unsupervised pretraining does not offer a clear way to adjust the the strength of the regularization arising from the unsupervised stage. There is not a way of flexibly adapting the strength of the regularization Another disadvantage of having two separate training phases is that each phase has its own hyperparameters. The performance of the second phase usually cannot be predicted during the first phase, so there is a long delay between proposing hyperparameters for the first phase and being able to update them using feedback from the second phase.
13
SCOPE Today, unsupervised pretraining has been largely abandoned, except in the field of natural language processing, where the natural representation of words as one-hot vectors conveys no similarity information and where very large unlabeled sets are available. In that case, the advantage of pretraining is that one can pretrain once on a huge unlabeled set (for example with a corpus containing billions of words), learn a good representation (typically of words, but also of sentences), and then use this representation or fine-tune it for a supervised task for which the training set contains substantially fewer examples Pre-training is no longer necessary. Its purpose was to find a good initialization for the network weights in order to facilitate convergence when a high number of layers were employed.is no longer necessary Nowadays,we have ReLU, dropout and batch normalization, all of which contribute to solve the problem of training deep neural networksReLUdropout batch normalization
14
REFERENCES FREE ONLINE BOOKSDeep Learning -Ian Goodfellow,Yoshua Bengio,Aaron CourvilleNeural Networks and Deep Learning -Michael NielsenVIDEO AND LECTURES1. Deep Learning, Self-Taught Learning and Unsupervised Feature Learning By Andrew NgOTHER LINKShttps://machinelearningmastery.com/greedy-layer-wise-pretraining-tutorial/https://cedar.buffalo.edu/~srihari/CSE676/15.1%20Greedy-Layerwise.pdf
15
THANK YOU
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.