Presentation is loading. Please wait.

Presentation is loading. Please wait.

Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh.

Similar presentations


Presentation on theme: "Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh."— Presentation transcript:

1 Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh

2  AUTO-ASSOCIATIVE NEURAL NETWORKS  OUTPUT SIMILAR AS INPUT

3  BOTTLENECK CONSTRAINT  LINEAR ACTIVATION – PCA [Baldi et al., 1989]  NON-LINEAR PCA [Kramer, 1991] – 5 layered network  ALTERNATE SIGMOID AND LINEAR ACTIVATION  EXTRACTS NON-LINEAR FACTORS

4  ABILITY TO LEARN HIGHLY COMPLEX FUNCTIONS  TACKLE THE NON-LINEAR STRUCTURE OF UNDERLYING DATA  HEIRARCHICAL REPRESENTATION  RESULTS FROM CIRCUIT THEORY – SINGLE LAYERED NETWORK WOULD NEED EXPONENTIALLY HIGH NUMBER OF HIDDEN UNITS

5  DIFFICULTY IN TRAINING DEEP NETWORKS  NON-CONVEX NATURE OF OPTIMIZATION  GETS STUCK IN LOCAL MINIMA  VANISHING OF GRADIENTS DURING BACKPROPAGATION  SOLUTION  -``INITIAL WEIGHTS MUST BE CLOSE TO A GOOD SOLUTION’’ – [Hinton et. al., 2006]  GENERATIVE PRE-TRAINING FOLLOWED BY FINE-TUNING

6  PRE-TRAINING  INCREMENTAL LAYER-WISE TRAINING  EACH LAYER ONLY TRIES TO REPRODUCE THE HIDDEN LAYER ACTIVATIONS OF PREVIOUS LAYER

7  INITIALIZE THE AUTOENCODER WITH WEIGHTS LEARNT BY PRE-TRAINING  PERFORM BACKPROPOAGATION AS USUAL

8  STOCHASTIC – RESTRICTED BOLTZMANN MACHINES (RBMs)  HIDDEN LAYER ACTIVATIONS (0-1) USED TO TAKE A PROBABILISTIC DECISION OF PUTTING 0 OR 1  MODEL LEARNS THE JOINT PROBABILITY OF 2 BINARY DISTRIBUTIONS - 1 IN INPUT AND THE OTHER IN HIDDEN LAYER  EXACT METHODS – COMPUTATIONALLY INTRACTABLE  NUMERICAL APPROXIMATION - CONTRASTIVE DIVERGENCE

9  DETERMINISTIC – SHALLOW AUTOENCODERS  HIDDEN LAYER ACTIVATIONS (0-1) ARE DIRECTLY USED FOR INPUT TO NEXT LAYER  TRAINED BY BACKPROPAGATION  DENOISING AUTOENCODERS  CONTRACTIVE AUTOENCODERS  SPARSE AUTOENCODERS

10 TASK \ MODELRBMSHALLOW AE CLASSIFIER [Hinton et al, 2006] and many others since then Investigated by [Bengio et al, 2007], [Ranzato et al, 2007], [Vincent et al, 2008], [Rifai et al, 2011] etc. DEEP AE [Hinton & Salakhutdinov, 2006] No significant results reported in literature - Gap

11  MNIST  Big and Small Digits

12  Square & Room  2d Robot Arm  3d Robot Arm

13  Libraries used  Numpy, Scipy  Theano – takes care of parallelization  GPU Specifications  Memory – 256 MB  Frequency – 33 MHz  Number of Cores – 240  Tesla C1060

14  REVERSE CROSS-ENTROPY  X – Original input  Z – Output  Θ – Parameters – Weights and Biases

15  RESULTS FROM PRELIMINARY EXPERIMENTS

16  TIME TAKEN FOR TRAINING  CONTRACTIVE AUTOENCODERS TAKE VERY LONG TO TRAIN

17  EXPERIMENT USING SPARSE REPRESENTATIONS  STRATEGY A – BOTTLENECK  STRATEGY B – SPARSITY + BOTTLENECK  STRATEGY C – NO CONSTRAINT + BOTTLENECK

18

19  MOMENTUM  INCORPORATING THE PREVIOUS UPDATE  CANCELS OUT COMPONENTS IN OPPOSITE DIRECTIONS – PREVENTS OSCILLATION  ADDS UP COMPONENTS IN SAME DIRECTION – SPEEDS UP TRAINING  WEIGHT DECAY  REGULARIZATION  PREVENTS OVER-FITTING

20  USING ALTERNATE LAYER SPARSITY WITH MOMENTUM & WEIGHT DECAY YIELDS BEST RESULTS

21  MOTIVATION

22

23

24

25

26

27

28

29

30

31

32

33


Download ppt "Submitted by:Supervised by: Ankit Bhutani Prof. Amitabha Mukerjee (Y9227094)Prof. K S Venkatesh."

Similar presentations


Ads by Google