C ONTEXT AS S UPERVISORY S IGNAL : D ISCOVERING O BJECTS WITH P REDICTABLE C ONTEXT Carl Doersch, Abhinav Gupta, Alexei Efros.

Slides:



Advertisements
Similar presentations
Learning Semantics with Less Supervision
Advertisements

Mixture Models and the EM Algorithm
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
Patch to the Future: Unsupervised Visual Prediction
Internet Vision - Lecture 3 Tamara Berg Sept 10. New Lecture Time Mondays 10:00am-12:30pm in 2311 Monday (9/15) we will have a general Computer Vision.
1 Building a Dictionary of Image Fragments Zicheng Liao Ali Farhadi Yang Wang Ian Endres David Forsyth Department of Computer Science, University of Illinois.
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Data Visualization STAT 890, STAT 442, CM 462
Large-Scale Object Recognition with Weak Supervision
Image classification by sparse coding.
Fitting: The Hough transform
Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Transferring information using Bayesian priors on object categories Li Fei-Fei 1, Rob Fergus 2, Pietro Perona 1 1 California Institute of Technology, 2.
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
What Makes Paris Look like Paris? Carl Doersch 1 Saurabh Singh 1 Abhinav Gupta 1 Josef Sivic 2 Alexei A. Efros 1,2 1 Carnegie Mellon University 2 INRIA.
Crash Course on Machine Learning
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
Interactive Discovery and Semantic Labeling of Patterns in Spatial Data Thomas Funkhouser, Adam Finkelstein, David Blei, and Christiane Fellbaum Princeton.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Fitting: The Hough transform
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
CSC2535: Computation in Neural Networks Lecture 12: Non-linear dimensionality reduction Geoffrey Hinton.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.
Image Classification for Automatic Annotation
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.
Recognition Using Visual Phrases
Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:
Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by:
CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.
Unsupervised Visual Representation Learning by Context Prediction
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Representing Moving Images with Layers J. Y. Wang and E. H. Adelson MIT Media Lab.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Object Recognition by Parts
Fitting: Voting and the Hough Transform
Data Driven Attributes for Action Detection
Machine Learning Ali Ghodsi Department of Statistics
Object Recognition by Parts
Object Recognition by Parts
“The Truth About Cats And Dogs”
Object Recognition by Parts
Brief Review of Recognition + Context
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Machine learning overview
Object Recognition by Parts
Object Recognition with Interest Operators
Learning complex visual concepts
Presentation transcript:

C ONTEXT AS S UPERVISORY S IGNAL : D ISCOVERING O BJECTS WITH P REDICTABLE C ONTEXT Carl Doersch, Abhinav Gupta, Alexei Efros

Unsupervised Object Discovery Children learn to see without millions of labels Is there a cue hidden in the data that we can use to learn better representations? …

How hard is unsupervised object discovery? In state of the art literature it is still commonplace to select 20 categories from Caltech : Le et al. got in the New York Times for spending 1M+ CPU hours and reporting just 3 objects. 2 1 Faktor & Irani, “Clustering by Composition” – Unsupervised Discovery of Image Categories, ECCV Le et al, Building High-level Features Using Large Scale Unsupervised Learning, ICML 2012

Object Discovery = Patch Clustering

Clustering algorithms

Clustering algorithms?

Clustering as Prediction Y value is here Prediction: Actual

Clustering as Prediction Y value is here Prediction: Actual

What cue is used for clustering here? The datapoints are redundant! – Within a cluster, knowing one coordinate tells you the other Images are redundant too! Contribution #1 (of 3): use patch-context redundancy for clustering.

More rigorously Unlike standard clustering, we measure the affinity of a whole cluster rather than pairs Datapoints x can be divided into x=[p,c] Affinity is defined by how ‘learnable’ is the function c=F(p) – Or P(c|p) – Measure how well the function has been learned via leave-one-out cross validation

Previous whispers of this idea Weakly supervised methods Probabilistic clustering – choose clusters s.t. dataset can be compressed Autoencoders & RBM’s (Olshausen & Field 1997)

Clustering as Prediction Y value is here Prediction: Actual

When is a prediction “good enough”? Image distance metrics are unreliable Min distance: 2.59e-4 Max distance: 1.22e-4 InputNearest neighbor

It’s even worse during prediction Horizontal gratings are very close together in HOG feature space, and are easy to predict Cats have more complex shape that is farther apart and harder to predict. ??

Our task is more like this:

P 2 (X|Y) P 1 (X|Y) Y value is here Contribution #2 (of 3): Cluster Membership as Hypothesis Testing

What are the hypotheses? thing hypothesis – The patch is a member of the proposed cluster – Context is best predicted by transferring appearance from other cluster members stuff hypothesis – Patch is ‘miscellaneous’ – The best we can do is model the context as clutter or texture, via low-level image statistics.

Our Hypotheses True Region: High Likelihood Low Likelihood Stuff Model … Stuff-based Prediction Thing Model Thing-based Prediction Predicted Correspondence Condition Region (Patch) Prediction Region (Context)

Real predictions aren’t so easy!

What to do? Generate multiple predictions from different images, and take the best Take bits and pieces from many images Still not enough? Generate more with warping. With arbitrarily small pieces and lots of warping, you can match anything!

Generative Image Models Long history in computer vision Come up with a distribution P(c|p), without looking at c Distribution is extremely complicated due to dependencies between features - Hinton et al Weber et al Fergus et al Fei-Fei et al Sudderth et al Wang et al. 2006

Generative Image Models Object Part Locations Features

Inference in Generative Models Object Part Locations Features

Contribution #3: Likelihood Factorization Let c be context and p be a patch Break c chunks c 1 …c k (e.g. HOG cells) Now the problem boils down to estimating

Our Hypotheses True Region: Stuff Model … Thing Model Predicted Correspondence Condition Region (Patch) Prediction Region (Context)

Surprisingly many good properties This is not an approximation Broke a hard generative problem into a series of easy discriminative ones Latent variable inference can use maximum likelihood: it’s okay to overfit to condition region c i ’s do not need to be integrated out Likelihoods can be compared across models

Why did nobody think of this? We now have 1000 predictions per detection The original idea of generative models was to be based on real-world generative processes It’s not obvious that these conditional predictions actually capture what we want, or that they make the job easier.

Discriminative won over generative DPM’s can be seen as doing maximum likelihood estimation over latent parts. Overfits, but discriminative training doesn’t care (Felzenszwalb et al. 2008)

Recap of Contributions Clustering as prediction Statistical hypothesis testing of stuff vs. thing to determine cluster membership Factorizing the computation of stuff and thing data likelihoods

Stuff Model Condition Region Find NN’s for this Region and transfer Prediction Region

Faster Stuff Model x1,000,000x5,000 Conditional: Random Image PatchesDictionary (GMM of patches)

Faster Stuff Model x5,000 Cond’l: (e) Estimate Cluster Membership Features from condition region Condition Region Prediction Region Prediction Region Specified In Dictionary x5,000 Cond’l: Marginalization

Thing Model Image i … … Query image … … Condition Region Prediction Region

Finite and Occluded ‘Things’ If the thing model believes it can’t predict well, ‘mimic’ the background model – When the thing model believes the prediction region corresponds to regions previously predicted poorly – When the thing model has been predicting badly nearby (handles occlusion!) Image i … … Query image … …

Clusters discovered on 6 Pascal Categories Cumulative Coverage Visual Words.63 (.37) Russel et al..66 (.38) HOG Kmeans.70 (.40) Our Approach.83 (.48) Singh et al..83 (.47) Purity

Results: Pascal 2011, unsupervised RankInitialFinal

Results: Pascal 2011, unsupervised RankInitialFinal

Results: Pascal 2011, unsupervised RankInitialFinal

Errors

Unsupervised keypoint transfer L F Wheel Ctr R F Wheel Ctr L Headlight L Mirror R Headlight [sic] R Mirror R F Roof L B Roof L Taillight L B Wheel Ctr R B Wheel Ctr L F Roof R B Roof

Keypoint Precision-Recall