Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.

Similar presentations


Presentation on theme: "Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University."— Presentation transcript:

1 Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University Self-taught Learning Transfer Learning from Unlabeled Data

2 The “one learning algorithm” hypothesis There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities. – Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992] Self-taught Learning (Roe et al., Hawkins & Blakeslee, 2004)

3 There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities. – Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992] If we could find this one learning algorithm, we would be done. (Finally!) Self-taught Learning (Roe et al., Hawkins & Blakeslee, 2004) The “one learning algorithm” hypothesis

4 This talk If the brain really is one learning algorithm, it would suffice to just: Find a learning algorithm for a single layer, and, Show that it can build a small number of layers. We evaluate our algorithms: Against biology. On applications. Finding a deep learning algorithm Self-taught Learning e.g., Sparse RBMs for V2: Poster yesterday (Lee et al.)

5 Supervised learning Cars Motorcycles TrainTest Self-taught Learning Supervised learning algorithms may not work well with limited labeled data.

6 Learning in humans Your brain has synapses (connections). You will live for 10 9 seconds. If each synapse requires 1 bit to parameterize, you need to “learn” bits in 10 9 seconds. Or, 10 5 bits per second. Human learning is largely unsupervised, and uses readily available unlabeled data. Self-taught Learning (Geoffrey Hinton, personal communication)

7 Supervised learning Cars Motorcycles TrainTest Self-taught Learning

8 “Brain-like” Learning Cars Motorcycles TrainTest Unlabeled images (randomly downloaded from the Internet) Self-taught Learning

9 “Brain-like” Learning Unlabeled English characters Labeled Digits Self-taught Learning Labeled Webpages Unlabeled newspaper articles Labeled Russian Speech Unlabeled English speech + ? + ? + ?

10 “Self-taught Learning” Unlabeled English characters Labeled Digits Self-taught Learning Labeled Webpages Unlabeled newspaper articles Labeled Russian Speech Unlabeled English speech + ? + ? + ?

11 Recent history of machine learning 20 years ago: Supervised learning 10 years ago: Semi-supervised learning. 10 years ago: Transfer learning. Next: Self-taught learning? Cars Motorcycles BusCars Motorcycles TractorAircraftHelicopter Natural scenes Car Motorcycle Cars Motorcycles

12 Self-taught Learning Labeled examples: Unlabeled examples: The unlabeled and labeled data: Need not share labels y. Need not share a generative distribution. Advantage: Such unlabeled data is often easy to obtain.

13 Overview: Represent each labeled or unlabeled input as a sparse linear combination of “basis vectors”. A self-taught learning algorithm = 0.8 * * * x = 0.8 * b * b * b 411 Self-taught Learning

14 Key steps: 1.Learn good bases using unlabeled data. 2.Use these learnt bases to construct “higher-level” features for the labeled data. 3.Apply a standard supervised learning algorithm on these features. A self-taught learning algorithm = 0.8 * * * Self-taught Learning x = 0.8 * b * b * b 411

15 Given only unlabeled data, we find good bases b using sparse coding: Learning the bases: Sparse coding Self-taught Learning Reconstruction errorSparsity penalty [Details: An extra normalization constraint on is required.] (Efficient algorithms: Lee et al., NIPS 2006)

16 Example bases Natural images. Learnt bases: “Edges” Self-taught Learning Handwritten characters. Learnt bases: “Strokes”

17 Constructing features Using the learnt bases b, compute features for the examples x l from the classification task by solving: Finally, learn a classifer using a standard supervised learning algorithm (e.g., SVM) over these features. = 0.8 * * * Self-taught Learning x l = 0.8 * b * b * b 411 Reconstruction error Sparsity penalty

18 Image classification Self-taught Learning Large image (Platypus from Caltech101 dataset) Feature visualization

19 Image classification Self-taught Learning Platypus image (Caltech101 dataset) Feature visualization

20 Image classification Self-taught Learning Platypus image (Caltech101 dataset) Feature visualization

21 Image classification Self-taught Learning Platypus image (Caltech101 dataset) Feature visualization

22 Image classification Self-taught Learning Baseline16% PCA37% Sparse coding47% Other reported results: Fei-Fei et al, 2004: 16% Berg et al., 2005: 17% Holub et al., 2005: 40% Serre et al., 2005: 35% Berg et al, 2005: 48% Zhang et al., 2006: 59% Lazebnik et al., 2006: 56% (15 labeled images per class) 36.0% error reduction

23 Raw54.8% PCA54.8% Sparse coding58.5% Character recognition Self-taught Learning DigitsHandwritten EnglishEnglish font Handwritten English classification (20 labeled images per handwritten character) Bases learnt on digits English font classification (20 labeled images per font character) Bases learnt on handwritten English Raw17.9% PCA14.5% Sparse coding16.6% Sparse coding + Raw20.2% 8.2% error reduction2.8% error reduction

24 Text classification Self-taught Learning Raw words62.8% PCA63.3% Sparse coding64.3% Reuters newswire Webpages UseNet articles Webpage classification (2 labeled documents per class) Bases learnt on Reuters newswire Raw words61.3% PCA60.7% Sparse coding63.8% UseNet classification (2 labeled documents per class) Bases learnt on Reuters newswire 4.0% error reduction6.5% error reduction

25 Shift-invariant sparse coding Self-taught Learning Sparse featuresBasis functions Reconstruction (Algorithms: Grosse et al., UAI 2007)

26 Audio classification Self-taught Learning Spectrogram38.5% MFCCs43.8% Sparse coding48.7% 8.7% error reduction (Details: Grosse et al., UAI 2007) Speaker identification (5 labels, TIMIT corpus, 1 sentence per speaker.) Bases learnt on different dialects Spectrogram48.4% MFCCs54.0% Music-specific model49.3% Sparse coding56.6% Musical genre classification (5 labels, 18 seconds per genre.) Bases learnt on different genres, songs 5.7% error reduction

27 Sparse deep belief networks Self-taught Learning (Details: Lee et al., NIPS Poster yesterday.)... h: Hidden layer v: Visible layer W, b, c: Parameters New Sparse RBM

28 Sparse deep belief networks Self-taught Learning 1-layer sparse DBN44.5% 2-layer sparse DBN46.6% 3.2% error reduction (Details: Lee et al., NIPS Poster yesterday.) Image classification (Caltech101 dataset)

29 Summary Self-taught learning: Unlabeled data does not share the labels of the classification task. Use unlabeled data to discover features. Use sparse coding to construct an easy-to-classify, “higher-level” representation. Self-taught Learning Cars Motorcycles = 0.8 * * * Unlabeled images

30 THE END

31 Related Work Self-taught Learning Weston et al, ICML 2006 Make stronger assumptions on the unlabeled data. Ando & Zhang, JMLR 2005 For natural language tasks and character recognition, use heuristics to construct a transfer learning task using unlabeled data.


Download ppt "Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University."

Similar presentations


Ads by Google