Presentation is loading. Please wait.

Presentation is loading. Please wait.

Andrew Ng Advanced topics. Andrew Ng Outline Self-taught learning Learning feature hierarchies (Deep learning) Scaling up.

Similar presentations


Presentation on theme: "Andrew Ng Advanced topics. Andrew Ng Outline Self-taught learning Learning feature hierarchies (Deep learning) Scaling up."— Presentation transcript:

1 Andrew Ng Advanced topics

2 Andrew Ng Outline Self-taught learning Learning feature hierarchies (Deep learning) Scaling up

3 Andrew Ng Self-taught learning

4 Andrew Ng Supervised learning Testing: What is this? CarsMotorcycles

5 Andrew Ng Semi-supervised learning Unlabeled images (all cars/motorcycles) Testing: What is this? Car Motorcycle

6 Andrew Ng Self-taught learning Testing: What is this? Car Motorcycle Unlabeled images (random internet images)

7 Andrew Ng Self-taught learning Sparse coding, LCC, etc.    , …,  k Car Motorcycle Use learned    , …,  k to represent training/test sets. Using    , …,  k  a   a , …, a k If have labeled training set is small, can give huge performance boost.

8 Andrew Ng Learning feature hierarchies/Deep learning

9 Andrew Ng Why feature hierarchies pixels edges object parts (combination of edges) object models

10 Andrew Ng Deep learning algorithms Stack sparse coding algorithm Deep Belief Network (DBN) (Hinton) Deep sparse autoencoders (Bengio) [Other related work: LeCun, Lee, Yuille, Ng …]

11 Andrew Ng Deep learning with autoencoders Logistic regression Neural network Sparse autoencoder Deep autoencoder

12 Andrew Ng Logistic regression x1x1 x2x2 x3x3 +1 Logistic regression has a learned parameter vector . On input x, it outputs: where Draw a logistic regression unit as:

13 Andrew Ng Neural Network String a lot of logistic units together. Example 3 layer network: x1x1 x2x2 x3x3 +1 a3a3 a2a2 a1a1 Layer 1 Layer 3

14 Andrew Ng Neural Network x1x1 x2x2 x3x3 +1 Layer 1Layer 2 Layer 4 +1 Layer 3 Example 4 layer network with 2 output units:

15 Andrew Ng Neural Network example [Courtesy of Yann LeCun]

16 Andrew Ng Training a neural network Given training set (x 1, y 1 ), (x 2, y 2 ), (x 3, y 3 ), …. Adjust parameters  (for every node) to make: (Use gradient descent. “Backpropagation” algorithm. Susceptible to local optima.)

17 Andrew Ng Unsupervised feature learning with a neural network x4x4 x5x5 x6x6 +1 Layer 1 Layer 2 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x1x1 x2x2 x3x3 +1 Layer 3 Autoencoder. Network is trained to output the input (learn identify function). Trivial solution unless: - Constrain number of units in Layer 2 (learn compressed representation), or - Constrain Layer 2 to be sparse. a1a1 a2a2 a3a3

18 Andrew Ng Training a sparse autoencoder. Given unlabeled training set x 1, x 2, … Unsupervised feature learning with a neural network Reconstruction error term L 1 sparsity term a1a1 a2a2 a3a3

19 Andrew Ng Unsupervised feature learning with a neural network x4x4 x5x5 x6x6 +1 Layer 1 Layer 2 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x1x1 x2x2 x3x3 +1 Layer 3 a1a1 a2a2 a3a3

20 Andrew Ng Unsupervised feature learning with a neural network x4x4 x5x5 x6x6 +1 Layer 1 Layer 2 x1x1 x2x2 x3x3 +1 a1a1 a2a2 a3a3 New representation for input.

21 Andrew Ng Unsupervised feature learning with a neural network x4x4 x5x5 x6x6 +1 Layer 1 Layer 2 x1x1 x2x2 x3x3 +1 a1a1 a2a2 a3a3

22 Andrew Ng Unsupervised feature learning with a neural network x4x4 x5x5 x6x6 +1 x1x1 x2x2 x3x3 a1a1 a2a2 a3a3 b1b1 b2b2 b3b3 Train parameters so that, subject to b i ’s being sparse.

23 Andrew Ng Unsupervised feature learning with a neural network x4x4 x5x5 x6x6 +1 x1x1 x2x2 x3x3 a1a1 a2a2 a3a3 b1b1 b2b2 b3b3 Train parameters so that, subject to b i ’s being sparse.

24 Andrew Ng Unsupervised feature learning with a neural network x4x4 x5x5 x6x6 +1 x1x1 x2x2 x3x3 a1a1 a2a2 a3a3 b1b1 b2b2 b3b3 Train parameters so that, subject to b i ’s being sparse.

25 Andrew Ng Unsupervised feature learning with a neural network x4x4 x5x5 x6x6 +1 x1x1 x2x2 x3x3 a1a1 a2a2 a3a3 b1b1 b2b2 b3b3 New representation for input.

26 Andrew Ng Unsupervised feature learning with a neural network x4x4 x5x5 x6x6 +1 x1x1 x2x2 x3x3 a1a1 a2a2 a3a3 b1b1 b2b2 b3b3

27 Andrew Ng Unsupervised feature learning with a neural network x4x4 x5x5 x6x6 +1 x1x1 x2x2 x3x3 a1a1 a2a2 a3a3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3

28 Andrew Ng Unsupervised feature learning with a neural network x4x4 x5x5 x6x6 +1 x1x1 x2x2 x3x3 a1a1 a2a2 a3a3 b1b1 b2b2 b3b3 c1c1 c2c2 c3c3 New representation for input. Use [c 1, c 3, c 3 ] as representation to feed to learning algorithm.

29 Andrew Ng Deep Belief Net Deep Belief Net (DBN) is another algorithm for learning a feature hierarchy. Building block: 2-layer graphical model (Restricted Boltzmann Machine). Can then learn additional layers one at a time.

30 Andrew Ng Restricted Boltzmann machine (RBM) Input [x 1, x 2, x 3, x 4 ] Layer 2. [a 1, a 2, a 3 ] (binary-valued) MRF with joint distribution: Use Gibbs sampling for inference. Given observed inputs x, want maximum likelihood estimation: x4x4 x1x1 x2x2 x3x3 a2a2 a3a3 a1a1

31 Andrew Ng Restricted Boltzmann machine (RBM) Input [x 1, x 2, x 3, x 4 ] Layer 2. [a 1, a 2, a 3 ] (binary-valued) Gradient ascent on log P(x) : [x i a j ] obs from fixing x to observed value, and sampling a from P(a|x). [x i a j ] prior from running Gibbs sampling to convergence. Adding sparsity constraint on a i ’s usually improves results. x4x4 x1x1 x2x2 x3x3 a2a2 a3a3 a1a1

32 Andrew Ng Deep Belief Network Input [x 1, x 2, x 3, x 4 ] Layer 2. [a 1, a 2, a 3 ] Layer 3. [b 1, b 2, b 3 ] Similar to a sparse autoencoder in many ways. Stack RBMs on top of each other to get DBN. Train with approximate maximum likelihood (often with sparsity constraint on a i ’s):

33 Andrew Ng Deep Belief Network Input [x 1, x 2, x 3, x 4 ] Layer 2. [a 1, a 2, a 3 ] Layer 3. [b 1, b 2, b 3 ] Layer 4. [c 1, c 2, c 3 ]

34 Andrew Ng Deep learning examples

35 Andrew Ng Convolutional DBN for audio Spectrogram Detection units Max pooling unit

36 Andrew Ng Convolutional DBN for audio Spectrogram

37 Andrew Ng Probabilistic max pooling X3X3 X1X1 X2X2 X4X4 max {x 1, x 2, x 3, x 4 } Convolutional Neural net: Convolutional DBN: X3X3 X1X1 X2X2 X4X4 max {x 1, x 2, x 3, x 4 } Where x i are real numbers. Where x i are {0,1}, and mutually exclusive. Thus, 5 possible cases: Collapse 2 n configurations into n+1 configurations. Permits bottom up and top down inference

38 Andrew Ng Convolutional DBN for audio Spectrogram

39 Andrew Ng Convolutional DBN for audio One CDBN layer Detection units Max pooling Detection units Max pooling Second CDBN layer

40 Andrew Ng CDBNs for speech Learned first-layer bases

41 Andrew Ng Convolutional DBN for Images Visible nodes (binary or real) At most one hidden nodes are active. WkWk Detection layer H Max-pooling layer P Hidden nodes (binary) “Filter” weights (shared) ‘’max-pooling’’ node (binary) Input data V

42 Andrew Ng Convolutional DBN on face images pixels edges object parts (combination of edges) object models Note: Sparsity important for these results.

43 Andrew Ng Examples of learned object parts from object categories Learning of object parts Faces CarsElephantsChairs

44 Andrew Ng Second layer bases learned from 4 object categories. Third layer bases learned from 4 object categories. Training on multiple objects Plot of H(class|neuron active) Trained on 4 classes (cars, faces, motorbikes, airplanes). Second layer: Shared-features and object-specific features. Third layer: More specific features.

45 Andrew Ng Input images Samples from feedforward Inference (control ) Samples from Full posterior inference Hierarchical probabilistic inference Generating posterior samples from faces by “filling in” experiments (cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.

46 Andrew Ng Key issue in feature learning: Scaling up

47 Andrew Ng Scaling up with graphics processors Peak GFlops NVIDIA GPU US$ (Source: NVIDIA CUDA Programming Guide) Intel CPU

48 Andrew Ng Scaling up with GPUs Approx. number of parameters (millions): Using GPU (Raina et al., 2009)

49 Andrew Ng Unsupervised feature learning: Does it work?

50 Andrew Ng State-of-the-art task performance TIMIT Phone classificationAccuracy Prior art (Clarkson et al.,1999) 79.6% Stanford Feature learning 80.3% TIMIT Speaker identificationAccuracy Prior art (Reynolds, 1995) 99.7% Stanford Feature learning 100.0% Audio Images Multimodal (audio/video) CIFAR Object classificationAccuracy Prior art (Yu and Zhang, 2010) 74.5% Stanford Feature learning 75.5% NORB Object classificationAccuracy Prior art (Ranzato et al., 2009) 94.4% Stanford Feature learning 96.2% AVLetters Lip readingAccuracy Prior art (Zhao et al., 2009) 58.9% Stanford Feature learning 63.1% Video UCF activity classificationAccuracy Prior art (Kalser et al., 2008) 86% Stanford Feature learning 87% Hollywood2 classificationAccuracy Prior art (Laptev, 2004) 47% Stanford Feature learning 50%

51 Andrew Ng Summary Instead of hand-tuning features, use unsupervised feature learning! Sparse coding, LCC. Advanced topics: –Self-taught learning –Deep learning –Scaling up

52 Andrew Ng Other resources Workshop page: Code for Sparse coding, LCC. References. Full online tutorial.


Download ppt "Andrew Ng Advanced topics. Andrew Ng Outline Self-taught learning Learning feature hierarchies (Deep learning) Scaling up."

Similar presentations


Ads by Google