Download presentation

Presentation is loading. Please wait.

Published byJameson Parrent Modified over 3 years ago

1
Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University Self-taught Learning Transfer Learning from Unlabeled Data

2
The “one learning algorithm” hypothesis There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities. – Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992] Self-taught Learning (Roe et al., 1992. Hawkins & Blakeslee, 2004)

3
There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities. – Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992] If we could find this one learning algorithm, we would be done. (Finally!) Self-taught Learning (Roe et al., 1992. Hawkins & Blakeslee, 2004) The “one learning algorithm” hypothesis

4
This talk If the brain really is one learning algorithm, it would suffice to just: Find a learning algorithm for a single layer, and, Show that it can build a small number of layers. We evaluate our algorithms: Against biology. On applications. Finding a deep learning algorithm Self-taught Learning e.g., Sparse RBMs for V2: Poster yesterday (Lee et al.)

5
Supervised learning Cars Motorcycles TrainTest Self-taught Learning Supervised learning algorithms may not work well with limited labeled data.

6
Learning in humans Your brain has 10 14 synapses (connections). You will live for 10 9 seconds. If each synapse requires 1 bit to parameterize, you need to “learn” 10 14 bits in 10 9 seconds. Or, 10 5 bits per second. Human learning is largely unsupervised, and uses readily available unlabeled data. Self-taught Learning (Geoffrey Hinton, personal communication)

7
Supervised learning Cars Motorcycles TrainTest Self-taught Learning

8
“Brain-like” Learning Cars Motorcycles TrainTest Unlabeled images (randomly downloaded from the Internet) Self-taught Learning

9
“Brain-like” Learning Unlabeled English characters Labeled Digits Self-taught Learning Labeled Webpages Unlabeled newspaper articles Labeled Russian Speech Unlabeled English speech + ? + ? + ?

10
“Self-taught Learning” Unlabeled English characters Labeled Digits Self-taught Learning Labeled Webpages Unlabeled newspaper articles Labeled Russian Speech Unlabeled English speech + ? + ? + ?

11
Recent history of machine learning 20 years ago: Supervised learning 10 years ago: Semi-supervised learning. 10 years ago: Transfer learning. Next: Self-taught learning? Cars Motorcycles BusCars Motorcycles TractorAircraftHelicopter Natural scenes Car Motorcycle Cars Motorcycles

12
Self-taught Learning Labeled examples: Unlabeled examples: The unlabeled and labeled data: Need not share labels y. Need not share a generative distribution. Advantage: Such unlabeled data is often easy to obtain.

13
Overview: Represent each labeled or unlabeled input as a sparse linear combination of “basis vectors”. A self-taught learning algorithm = 0.8 * + 0.3 * + 0.5 * x = 0.8 * b 87 + 0.3 * b 376 + 0.5 * b 411 Self-taught Learning

14
Key steps: 1.Learn good bases using unlabeled data. 2.Use these learnt bases to construct “higher-level” features for the labeled data. 3.Apply a standard supervised learning algorithm on these features. A self-taught learning algorithm = 0.8 * + 0.3 * + 0.5 * Self-taught Learning x = 0.8 * b 87 + 0.3 * b 376 + 0.5 * b 411

15
Given only unlabeled data, we find good bases b using sparse coding: Learning the bases: Sparse coding Self-taught Learning Reconstruction errorSparsity penalty [Details: An extra normalization constraint on is required.] (Efficient algorithms: Lee et al., NIPS 2006)

16
Example bases Natural images. Learnt bases: “Edges” Self-taught Learning Handwritten characters. Learnt bases: “Strokes”

17
Constructing features Using the learnt bases b, compute features for the examples x l from the classification task by solving: Finally, learn a classifer using a standard supervised learning algorithm (e.g., SVM) over these features. = 0.8 * + 0.3 * + 0.5 * Self-taught Learning x l = 0.8 * b 87 + 0.3 * b 376 + 0.5 * b 411 Reconstruction error Sparsity penalty

18
Image classification Self-taught Learning Large image (Platypus from Caltech101 dataset) Feature visualization

19
Image classification Self-taught Learning Platypus image (Caltech101 dataset) Feature visualization

20
Image classification Self-taught Learning Platypus image (Caltech101 dataset) Feature visualization

21
Image classification Self-taught Learning Platypus image (Caltech101 dataset) Feature visualization

22
Image classification Self-taught Learning Baseline16% PCA37% Sparse coding47% Other reported results: Fei-Fei et al, 2004: 16% Berg et al., 2005: 17% Holub et al., 2005: 40% Serre et al., 2005: 35% Berg et al, 2005: 48% Zhang et al., 2006: 59% Lazebnik et al., 2006: 56% (15 labeled images per class) 36.0% error reduction

23
Raw54.8% PCA54.8% Sparse coding58.5% Character recognition Self-taught Learning DigitsHandwritten EnglishEnglish font Handwritten English classification (20 labeled images per handwritten character) Bases learnt on digits English font classification (20 labeled images per font character) Bases learnt on handwritten English Raw17.9% PCA14.5% Sparse coding16.6% Sparse coding + Raw20.2% 8.2% error reduction2.8% error reduction

24
Text classification Self-taught Learning Raw words62.8% PCA63.3% Sparse coding64.3% Reuters newswire Webpages UseNet articles Webpage classification (2 labeled documents per class) Bases learnt on Reuters newswire Raw words61.3% PCA60.7% Sparse coding63.8% UseNet classification (2 labeled documents per class) Bases learnt on Reuters newswire 4.0% error reduction6.5% error reduction

25
Shift-invariant sparse coding Self-taught Learning Sparse featuresBasis functions Reconstruction (Algorithms: Grosse et al., UAI 2007)

26
Audio classification Self-taught Learning Spectrogram38.5% MFCCs43.8% Sparse coding48.7% 8.7% error reduction (Details: Grosse et al., UAI 2007) Speaker identification (5 labels, TIMIT corpus, 1 sentence per speaker.) Bases learnt on different dialects Spectrogram48.4% MFCCs54.0% Music-specific model49.3% Sparse coding56.6% Musical genre classification (5 labels, 18 seconds per genre.) Bases learnt on different genres, songs 5.7% error reduction

27
Sparse deep belief networks Self-taught Learning (Details: Lee et al., NIPS 2007. Poster yesterday.)... h: Hidden layer v: Visible layer W, b, c: Parameters New Sparse RBM

28
Sparse deep belief networks Self-taught Learning 1-layer sparse DBN44.5% 2-layer sparse DBN46.6% 3.2% error reduction (Details: Lee et al., NIPS 2007. Poster yesterday.) Image classification (Caltech101 dataset)

29
Summary Self-taught learning: Unlabeled data does not share the labels of the classification task. Use unlabeled data to discover features. Use sparse coding to construct an easy-to-classify, “higher-level” representation. Self-taught Learning Cars Motorcycles = 0.8 * + 0.3 * + 0.5 * Unlabeled images

30
THE END

31
Related Work Self-taught Learning Weston et al, ICML 2006 Make stronger assumptions on the unlabeled data. Ando & Zhang, JMLR 2005 For natural language tasks and character recognition, use heuristics to construct a transfer learning task using unlabeled data.

Similar presentations

Presentation is loading. Please wait....

OK

Introduction to Deep Learning

Introduction to Deep Learning

© 2018 SlidePlayer.com Inc.

All rights reserved.

To ensure the functioning of the site, we use **cookies**. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy & Terms.
Your consent to our cookies if you continue to use this website.

Ads by Google

Ppt on central limit theorem Ppt on topography of pakistan Ppt on self development activities Ppt on refraction of light class 10 Presentation ppt on teamwork Ppt on tsunami and earthquake What to expect at 30 week dr appt on the beach Ppt on elements and their symbols Ppt on eye os Ppt on the road not taken