Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.

Slides:

Advertisements

Similar presentations

Scalable Learning in Computer Vision

Advertisements

Deep Learning Bing-Chen Tsai 1/21.

Object recognition and scene “understanding”

CS590M 2008 Fall: Paper Presentation

Advanced topics.

Nathan Wiebe, Ashish Kapoor and Krysta Svore Microsoft Research ASCR Workshop Washington DC Quantum Deep Learning.

Stacking RBMs and Auto-encoders for Deep Architectures References:[Bengio, 2009], [Vincent et al., 2008] 2011/03/03 강병곤.

Generalizing Backpropagation to Include Sparse Coding David M. Bradley and Drew Bagnell Robotics Institute Carnegie.

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

Differentiable Sparse Coding David Bradley and J. Andrew Bagnell NIPS

Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009

Recent Developments in Deep Learning Quoc V. Le Stanford University and Google.

Data Visualization STAT 890, STAT 442, CM 462

Image classification by sparse coding.

Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.

How to do backpropagation in a brain

Efficient Sparse Coding Algorithms

Deep Belief Networks for Spam Filtering

AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/

Autoencoders Mostafa Heidarpour

Multimodal Deep Learning

Image Classification using Sparse Coding: Advanced Topics

Efficient Image Search and Retrieval using Compact Binary Codes

Andrew Ng CS228: Deep Learning & Unsupervised Feature Learning Andrew Ng TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

Deep Learning and its applications to Speech EE 225D - Audio Signal Processing in Humans and Machines Oriol Vinyals UC Berkeley.

Comp 5013 Deep Learning Architectures Daniel L. Silver March,

Multiclass object recognition

Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning

How to do backpropagation in a brain

Overcoming Dataset Bias: An Unsupervised Domain Adaptation Approach Boqing Gong University of Southern California Joint work with Fei Sha and Kristen Grauman.

Decoding Human Face Processing Ankit Awasthi Prof. Harish Karnick.

An Example of Course Project Face Identification.

IIIT Hyderabad Synthesizing Classifiers for Novel Settings Viresh Ranjan CVIT,IIIT-H Adviser: Prof. C. V. Jawahar, IIIT-H Co-Adviser: Dr. Gaurav Harit,

A shallow introduction to Deep Learning

Csc Lecture 7 Recognizing speech. Geoffrey Hinton.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Large-scale Deep Unsupervised Learning using Graphics Processors

Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011

Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,

Andrew Ng Feature learning for image classification Kai Yu and Andrew Ng.

Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.

Transfer Learning for Image Classification Group No.: 15 Group member : Feng Cai Sauptik Dhar Sauptik.

Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Andrew Ng, Director, Stanford Artificial Intelligence Lab

CSC321: Introduction to Neural Networks and Machine Learning Lecture 19: Learning Restricted Boltzmann Machines Geoffrey Hinton.

Introduction to Deep Learning

1 Andrew Ng, Associate Professor of Computer Science Robots and Brains.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.

1 Andrew Ng, Associate Professor of Computer Science Robots and Brains.

CSC321 Lecture 24 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Deep Belief Network Training Same greedy layer-wise approach First train lowest RBM (h 0 – h 1 ) using RBM update algorithm (note h 0 is x) Freeze weights.

CSC321 Lecture 27 Using Boltzmann machines to initialize backpropagation Geoffrey Hinton.

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Convolutional Neural Network

Deep Learning Amin Sobhani.

Energy models and Deep Belief Networks

Learning Mid-Level Features For Recognition

Article Review Todd Hricik.

Restricted Boltzmann Machines for Classification

Deep Learning Yoshua Bengio, U. Montreal

Deep learning and applications to Natural language processing

Deep Learning Workshop

Non-linear hypotheses

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

Representation Learning with Deep Auto-Encoder

CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.

Presentation transcript:

Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University Self-taught Learning Transfer Learning from Unlabeled Data

The “one learning algorithm” hypothesis There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities. – Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992] Self-taught Learning (Roe et al., Hawkins & Blakeslee, 2004)

There is some evidence that the human brain uses essentially the same algorithm to understand many different input modalities. – Example: Ferret experiments, in which the “input” for vision was plugged into auditory part of brain, and the auditory cortex learns to “see.” [Roe et al., 1992] If we could find this one learning algorithm, we would be done. (Finally!) Self-taught Learning (Roe et al., Hawkins & Blakeslee, 2004) The “one learning algorithm” hypothesis

This talk If the brain really is one learning algorithm, it would suffice to just: Find a learning algorithm for a single layer, and, Show that it can build a small number of layers. We evaluate our algorithms: Against biology. On applications. Finding a deep learning algorithm Self-taught Learning e.g., Sparse RBMs for V2: Poster yesterday (Lee et al.)

Supervised learning Cars Motorcycles TrainTest Self-taught Learning Supervised learning algorithms may not work well with limited labeled data.

Learning in humans Your brain has synapses (connections). You will live for 10 9 seconds. If each synapse requires 1 bit to parameterize, you need to “learn” bits in 10 9 seconds. Or, 10 5 bits per second. Human learning is largely unsupervised, and uses readily available unlabeled data. Self-taught Learning (Geoffrey Hinton, personal communication)

Supervised learning Cars Motorcycles TrainTest Self-taught Learning

“Brain-like” Learning Cars Motorcycles TrainTest Unlabeled images (randomly downloaded from the Internet) Self-taught Learning

“Brain-like” Learning Unlabeled English characters Labeled Digits Self-taught Learning Labeled Webpages Unlabeled newspaper articles Labeled Russian Speech Unlabeled English speech + ? + ? + ?

“Self-taught Learning” Unlabeled English characters Labeled Digits Self-taught Learning Labeled Webpages Unlabeled newspaper articles Labeled Russian Speech Unlabeled English speech + ? + ? + ?

Recent history of machine learning 20 years ago: Supervised learning 10 years ago: Semi-supervised learning. 10 years ago: Transfer learning. Next: Self-taught learning? Cars Motorcycles BusCars Motorcycles TractorAircraftHelicopter Natural scenes Car Motorcycle Cars Motorcycles

Self-taught Learning Labeled examples: Unlabeled examples: The unlabeled and labeled data: Need not share labels y. Need not share a generative distribution. Advantage: Such unlabeled data is often easy to obtain.

Overview: Represent each labeled or unlabeled input as a sparse linear combination of “basis vectors”. A self-taught learning algorithm = 0.8 * * * x = 0.8 * b * b * b 411 Self-taught Learning

Key steps: 1.Learn good bases using unlabeled data. 2.Use these learnt bases to construct “higher-level” features for the labeled data. 3.Apply a standard supervised learning algorithm on these features. A self-taught learning algorithm = 0.8 * * * Self-taught Learning x = 0.8 * b * b * b 411

Given only unlabeled data, we find good bases b using sparse coding: Learning the bases: Sparse coding Self-taught Learning Reconstruction errorSparsity penalty [Details: An extra normalization constraint on is required.] (Efficient algorithms: Lee et al., NIPS 2006)

Example bases Natural images. Learnt bases: “Edges” Self-taught Learning Handwritten characters. Learnt bases: “Strokes”

Constructing features Using the learnt bases b, compute features for the examples x l from the classification task by solving: Finally, learn a classifer using a standard supervised learning algorithm (e.g., SVM) over these features. = 0.8 * * * Self-taught Learning x l = 0.8 * b * b * b 411 Reconstruction error Sparsity penalty

Image classification Self-taught Learning Large image (Platypus from Caltech101 dataset) Feature visualization

Image classification Self-taught Learning Platypus image (Caltech101 dataset) Feature visualization

Image classification Self-taught Learning Platypus image (Caltech101 dataset) Feature visualization

Image classification Self-taught Learning Platypus image (Caltech101 dataset) Feature visualization

Image classification Self-taught Learning Baseline16% PCA37% Sparse coding47% Other reported results: Fei-Fei et al, 2004: 16% Berg et al., 2005: 17% Holub et al., 2005: 40% Serre et al., 2005: 35% Berg et al, 2005: 48% Zhang et al., 2006: 59% Lazebnik et al., 2006: 56% (15 labeled images per class) 36.0% error reduction

Raw54.8% PCA54.8% Sparse coding58.5% Character recognition Self-taught Learning DigitsHandwritten EnglishEnglish font Handwritten English classification (20 labeled images per handwritten character) Bases learnt on digits English font classification (20 labeled images per font character) Bases learnt on handwritten English Raw17.9% PCA14.5% Sparse coding16.6% Sparse coding + Raw20.2% 8.2% error reduction2.8% error reduction

Text classification Self-taught Learning Raw words62.8% PCA63.3% Sparse coding64.3% Reuters newswire Webpages UseNet articles Webpage classification (2 labeled documents per class) Bases learnt on Reuters newswire Raw words61.3% PCA60.7% Sparse coding63.8% UseNet classification (2 labeled documents per class) Bases learnt on Reuters newswire 4.0% error reduction6.5% error reduction

Shift-invariant sparse coding Self-taught Learning Sparse featuresBasis functions Reconstruction (Algorithms: Grosse et al., UAI 2007)

Audio classification Self-taught Learning Spectrogram38.5% MFCCs43.8% Sparse coding48.7% 8.7% error reduction (Details: Grosse et al., UAI 2007) Speaker identification (5 labels, TIMIT corpus, 1 sentence per speaker.) Bases learnt on different dialects Spectrogram48.4% MFCCs54.0% Music-specific model49.3% Sparse coding56.6% Musical genre classification (5 labels, 18 seconds per genre.) Bases learnt on different genres, songs 5.7% error reduction

Sparse deep belief networks Self-taught Learning (Details: Lee et al., NIPS Poster yesterday.)... h: Hidden layer v: Visible layer W, b, c: Parameters New Sparse RBM

Sparse deep belief networks Self-taught Learning 1-layer sparse DBN44.5% 2-layer sparse DBN46.6% 3.2% error reduction (Details: Lee et al., NIPS Poster yesterday.) Image classification (Caltech101 dataset)

Summary Self-taught learning: Unlabeled data does not share the labels of the classification task. Use unlabeled data to discover features. Use sparse coding to construct an easy-to-classify, “higher-level” representation. Self-taught Learning Cars Motorcycles = 0.8 * * * Unlabeled images

THE END

Related Work Self-taught Learning Weston et al, ICML 2006 Make stronger assumptions on the unlabeled data. Ando & Zhang, JMLR 2005 For natural language tasks and character recognition, use heuristics to construct a transfer learning task using unlabeled data.