Download presentation
Presentation is loading. Please wait.
Published byMary Wilkins Modified over 9 years ago
1
Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh
2
AUTO-ASSOCIATIVE NEURAL NETWORKS OUTPUT SIMILAR AS INPUT
3
BOTTLENECK CONSTRAINT LINEAR ACTIVATION – PCA [Baldi et al., 1989] NON-LINEAR PCA [Kramer, 1991] – 5 layered network ALTERNATE SIGMOID AND LINEAR ACTIVATION EXTRACTS NON-LINEAR FACTORS
4
ABILITY TO LEARN HIGHLY COMPLEX FUNCTIONS TACKLE THE NON-LINEAR STRUCTURE OF UNDERLYING DATA HEIRARCHICAL REPRESENTATION RESULTS FROM CIRCUIT THEORY – SINGLE LAYERED NETWORK WOULD NEED EXPONENTIALLY HIGH NUMBER OF HIDDEN UNITS
5
DIFFICULTY IN TRAINING DEEP NETWORKS NON-CONVEX NATURE OF OPTIMIZATION GETS STUCK IN LOCAL MINIMA VANISHING OF GRADIENTS DURING BACKPROPAGATION SOLUTION -``INITIAL WEIGHTS MUST BE CLOSE TO A GOOD SOLUTION’’ – [Hinton et. al., 2006] GENERATIVE PRE-TRAINING FOLLOWED BY FINE-TUNING
6
PRE-TRAINING INCREMENTAL LAYER-WISE TRAINING EACH LAYER ONLY TRIES TO REPRODUCE THE HIDDEN LAYER ACTIVATIONS OF PREVIOUS LAYER
7
INITIALIZE THE AUTOENCODER WITH WEIGHTS LEARNT BY PRE-TRAINING PERFORM BACKPROPOAGATION AS USUAL
8
STOCHASTIC – RESTRICTED BOLTZMANN MACHINES (RBMs) HIDDEN LAYER ACTIVATIONS (0-1) USED TO TAKE A PROBABILISTIC DECISION OF PUTTING 0 OR 1 MODEL LEARNS THE JOINT PROBABILITY OF 2 BINARY DISTRIBUTIONS - 1 IN INPUT AND THE OTHER IN HIDDEN LAYER EXACT METHODS – COMPUTATIONALLY INTRACTABLE NUMERICAL APPROXIMATION - CONTRASTIVE DIVERGENCE
9
DETERMINISTIC – SHALLOW AUTOENCODERS HIDDEN LAYER ACTIVATIONS (0-1) ARE DIRECTLY USED FOR INPUT TO NEXT LAYER TRAINED BY BACKPROPAGATION DENOISING AUTOENCODERS CONTRACTIVE AUTOENCODERS SPARSE AUTOENCODERS
10
TASK \ MODELRBMSHALLOW AE CLASSIFIER [Hinton et al, 2006] and many others since then Investigated by [Bengio et al, 2007], [Ranzato et al, 2007], [Vincent et al, 2008], [Rifai et al, 2011] etc. DEEP AE [Hinton & Salakhutdinov, 2006] No significant results reported in literature - Gap
11
MNIST Big and Small Digits
12
Square & Room 2d Robot Arm 3d Robot Arm
13
Libraries used Numpy, Scipy Theano – takes care of parallelization GPU Specifications Memory – 256 MB Frequency – 33 MHz Number of Cores – 240 Tesla C1060
14
REVERSE CROSS-ENTROPY X – Original input Z – Output Θ – Parameters – Weights and Biases
15
RESULTS FROM PRELIMINARY EXPERIMENTS
16
TIME TAKEN FOR TRAINING CONTRACTIVE AUTOENCODERS TAKE VERY LONG TO TRAIN
17
EXPERIMENT USING SPARSE REPRESENTATIONS STRATEGY A – BOTTLENECK STRATEGY B – SPARSITY + BOTTLENECK STRATEGY C – NO CONSTRAINT + BOTTLENECK
19
MOMENTUM INCORPORATING THE PREVIOUS UPDATE CANCELS OUT COMPONENTS IN OPPOSITE DIRECTIONS – PREVENTS OSCILLATION ADDS UP COMPONENTS IN SAME DIRECTION – SPEEDS UP TRAINING WEIGHT DECAY REGULARIZATION PREVENTS OVER-FITTING
20
USING ALTERNATE LAYER SPARSITY WITH MOMENTUM & WEIGHT DECAY YIELDS BEST RESULTS
21
MOTIVATION
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.