Scalable Learning in Computer Vision

Slides:



Advertisements
Similar presentations
Shape Matching and Object Recognition using Low Distortion Correspondence Alexander C. Berg, Tamara L. Berg, Jitendra Malik U.C. Berkeley.
Advertisements

Applications of one-class classification
Joint Face Alignment The Recognition Pipeline
Greedy Layer-Wise Training of Deep Networks
Active Appearance Models
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
Ignas Budvytis*, Tae-Kyun Kim*, Roberto Cipolla * - indicates equal contribution Making a Shallow Network Deep: Growing a Tree from Decision Regions of.
Object recognition and scene “understanding”
Object Recognition with Features Inspired by Visual Cortex T. Serre, L. Wolf, T. Poggio Presented by Andrew C. Gallagher Jan. 25, 2007.
Advanced topics.
Rajat Raina Honglak Lee, Roger Grosse Alexis Battle, Chaitanya Ekanadham, Helen Kwong, Benjamin Packer, Narut Sereewattanawoot Andrew Y. Ng Stanford University.
Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting by Yann LeCun, Fu Jie Huang, and Léon Bottou in Proceedings of CVPR'04,
|| Dmitry Laptev, Joachim M. Buhmann Machine Learning Lab, ETH Zurich 05/09/14Dmitry Laptev1 Convolutional Decision Trees.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
Image classification by sparse coding.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
A Study of Approaches for Object Recognition
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
REAL-TIME DETECTION AND TRACKING FOR AUGMENTED REALITY ON MOBILE PHONES Daniel Wagner, Member, IEEE, Gerhard Reitmayr, Member, IEEE, Alessandro Mulloni,
Image Classification using Sparse Coding: Advanced Topics
Andrew Ng CS228: Deep Learning & Unsupervised Feature Learning Andrew Ng TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.
Anomaly detection Problem motivation Machine Learning.
Comp 5013 Deep Learning Architectures Daniel L. Silver March,
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
End-to-End Text Recognition with Convolutional Neural Networks
A shallow introduction to Deep Learning
Large-scale Deep Unsupervised Learning using Graphics Processors
Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Andrew Ng Feature learning for image classification Kai Yu and Andrew Ng.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Dr. Z. R. Ghassabi Spring 2015 Deep learning for Human action Recognition 1.
Transfer Learning for Image Classification Group No.: 15 Group member : Feng Cai Sauptik Dhar Sauptik.
IIIT Hyderabad Scalable Clustering using Multiple GPUs K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Human pose recognition from depth image MS Research Cambridge.
Click to edit Master subtitle style 2/23/10 Time and Space Optimization of Document Content Classifiers Dawei Yin, Henry S. Baird, and Chang An Computer.
Students: Meera & Si Mentor: Afshin Dehghan WEEK 4: DEEP TRACKING.
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Presenter: Jae Sung Park
Data Summit 2016 H104: Building Hadoop Applications Abhik Roy Database Technologies - Experian LinkedIn Profile:
Machine Learning Artificial Neural Networks MPλ ∀ Stergiou Theodoros 1.
Adam Kortylewski Departement of Mathematics and Computer Science
LECTURE 01: Introduction to Algorithms and Basic Linux Computing
Deep Learning Amin Sobhani.
Week III: Deep Tracking
Learning Mid-Level Features For Recognition
Article Review Todd Hricik.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Matt Gormley Lecture 16 October 24, 2016
Machine Learning Basics
Real-Time Human Pose Recognition in Parts from Single Depth Image
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Finding Clusters within a Class to Improve Classification Accuracy
By: Kevin Yu Ph.D. in Computer Engineering
Computer Vision James Hays
CMPT 733, SPRING 2016 Jiannan Wang
Pose Estimation for non-cooperative Spacecraft Rendevous using CNN
KFC: Keypoints, Features and Correspondences
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
RCNN, Fast-RCNN, Faster-RCNN
CMPT 733, SPRING 2017 Jiannan Wang
Deep CNN for breast cancer histology Image Analysis
Presentation transcript:

Scalable Learning in Computer Vision Adam Coates Honglak Lee Rajat Raina Andrew Y. Ng Stanford University

Computer Vision is Hard

Common Dataset Sizes (positives per class) Introduction One reason for difficulty: small datasets. Common Dataset Sizes (positives per class) Caltech 101 800 Caltech 256 827 PASCAL 2008 (Car) 840 PASCAL 2008 (Person) 4168 LabelMe (Pedestrian) 25330 NORB (Synthetic) 38880

Introduction But the world is complex. Hard to get extremely high accuracy on real images if we haven’t seen enough examples. AUC Training Set Size

Introduction Small datasets: Large datasets: Clever features Carefully design to be robust to lighting, distortion, etc. Clever models Try to use knowledge of object structure. Some machine learning on top. Large datasets: Simple features Favor speed over invariance and expressive power. Simple model Generic; little human knowledge. Rely on machine learning to solve everything else.

Supervised Learning from synthetic data

The Learning Pipeline Need to scale up each part of the learning process to really large datasets. Image Data Low-level features Learning Algorithm

Synthetic Data Not enough labeled data for algorithms to learn all the knowledge they need. Lighting variation Object pose variation Intra-class variation Synthesize positive examples to include this knowledge. Much easier than building this knowledge into the algorithms.

Synthetic Data Collect images of object on a green-screen turntable. Green Screen image Synthetic Background Segmented Object Photometric/Geometric Distortion

Synthetic Data: Example Claw hammers: Synthetic Examples (Training set) Real Examples (Test set)

The Learning Pipeline Feature computations can be prohibitive for large numbers of images. E.g., 100 million examples x 1000 features.  100 billion feature values to compute. Image Data Low-level features Learning Algorithm

Features on CPUs vs. GPUs Difficult to keep scaling features on CPUs. CPUs are designed for general-purpose computing. GPUs outpacing CPUs dramatically. (nVidia CUDA Programming Guide)

Features on GPUs Features: Cross-correlation with image patches. High data locality; high arithmetic intensity. Implemented brute-force. Faster than FFT for small filter sizes. Orders of magnitude faster than FFT on CPU. 20x to 100x speedups (depending on filter size).

The Learning Pipeline Large number of feature vectors on disk are too slow to access repeatedly. E.g., Can run an online algorithm on one machine, but disk access is a difficult bottleneck. Image Data Low-level features Learning Algorithm

Distributed Training Solution: must store everything in RAM. No problem! RAM as low as $20/GB Our cluster with 120GB RAM: Capacity of >100 million examples. For 1000 features, 1 byte each.

Distributed Training Algorithms that can be trained from sufficient statistics are easy to distribute. Decision tree splits can be trained using histograms of each feature. Histograms can be computed for small chunks of data on separate machines, then combined. Split + = x x x Slave 1 Slave 2 Master Master

The Learning Pipeline We’ve scaled up each piece of the pipeline by a large factor over traditional approaches: > 1000x 20x – 100x > 10x Image Data Low-level features Learning Algorithm

Size Matters AUC Training Set Size

UNSUPERVISED FEATURE LEARNING

Traditional supervised learning Cars Motorcycles Testing: What is this?

Self-taught learning Natural scenes Testing: What is this? Car Motorcycle Testing: What is this?

Learning representations Image Data Low-level features Learning Algorithm Where do we get good low-level representations?

Computer vision features SIFT Spin image HoG RIFT Textons GLOH

Unsupervised feature learning Higher layer (Combinations of edges; cf.V2) “Sparse coding” (edges; cf. V1) Input image (pixels) DBN (Hinton et al., 2006) with additional sparseness constraint. [Related work: Hinton, Bengio, LeCun, and others.] Note: No explicit “pooling.”

Unsupervised feature learning Higher layer (Model V3?) Higher layer (Model V2?) Model V1 Input image Very expensive to train. > 1 million examples. > 1 million parameters.

Learning Large RBMs on GPUs 2 weeks Dual-core CPU 1 week 35 hours Learning time for 10 million examples (log scale) 72x faster 1 day 8 hours 2 hours 5 hours GPU 1 hour ½ hour 1 18 36 45 Millions of parameters (Rajat Raina, Anand Madhavan, Andrew Y. Ng)

Learning features Can now train very complex networks. Can learn increasingly complex features. Both more specific and more general-purpose than hand-engineered features. Object models Object parts (combination of edges) Edges Pixels

Conclusion Performance gains from large training sets are significant, even for very simple learning algorithms. Scalability of the system allows these algorithms to improve “for free” over time. Unsupervised algorithms promise high-quality features and representations without the need for hand-collected data. GPUs are a major enabling technology.

THANK YOU