Learning Semantics with Less Supervision

Slides:

Advertisements

Similar presentations

Attributes for Classifier Feedback Amar Parkash and Devi Parikh.

Advertisements

CVPR2013 Poster Modeling Actions through State Changes.

Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.

Adding Unlabeled Samples to Categories by Learned Attributes Jonghyun Choi Mohammad Rastegari Ali Farhadi Larry S. Davis PPT Modified By Elliot Crowley.

Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.

A Unified Framework for Context Assisted Face Clustering

Albert Gatt Corpora and Statistical Methods Lecture 13.

Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Three things everyone should know to improve object retrieval

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Unsupervised Detection of Regions of Interest Using Iterative Link Analysis Gunhee Kim 1 Antonio Torralba 2 1: SCS, CMU 2: CSAIL, MIT Neural Information.

Machine learning continued Image source:

CVPR2013 Poster Representing Videos using Mid-level Discriminative Patches.

Lecture 31: Modern object recognition

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Yuanlu Xu Human Re-identification: A Survey.

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Mid-level Visual Element Discovery as Discriminative Mode Seeking Harley Montgomery 11/15/13.

Knowing a Good HOG Filter When You See It: Efficient Selection of Filters for Detection Ejaz Ahmed 1, Gregory Shakhnarovich 2, and Subhransu Maji 3 1 University.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Large-Scale Object Recognition with Weak Supervision

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.

Discriminative and generative methods for bags of features

Poselets Michael Krainin CSE 590V Oct 18, Person Detection Dalal and Triggs ‘05 – Learn to classify pedestrians vs. background – HOG + linear SVM.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Object Recognition with Informative Features and Linear Classification Authors: Vidal-Naquet & Ullman Presenter: David Bradley.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Spatial Pyramid Pooling in Deep Convolutional

What Makes Paris Look like Paris? Carl Doersch 1 Saurabh Singh 1 Abhinav Gupta 1 Josef Sivic 2 Alexei A. Efros 1,2 1 Carnegie Mellon University 2 INRIA.

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time Yong Jae Lee, Alexei A. Efros, and Martial Hebert Carnegie Mellon.

Generic object detection with deformable part-based models

Describing People: A Poselet-Based Approach to Attribute Classification Lubomir Bourdev 1,2 Subhransu Maji 1 Jitendra Malik 1 1 EECS U.C. Berkeley 2 Adobe.

Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.

Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.

Learning Collections of Parts for Object Recognition and Transfer Learning University of Illinois at Urbana- Champaign.

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Recognition Using Visual Phrases

Discovering Objects and their Location in Images Josef Sivic 1, Bryan C. Russell 2, Alexei A. Efros 3, Andrew Zisserman 1 and William T. Freeman 2 Goal:

Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

C ONTEXT AS S UPERVISORY S IGNAL : D ISCOVERING O BJECTS WITH P REDICTABLE C ONTEXT Carl Doersch, Abhinav Gupta, Alexei Efros.

NEIL: Extracting Visual Knowledge from Web Data Xinlei Chen, Abhinav Shrivastava, Abhinav Gupta Carnegie Mellon University CS381V Visual Recognition -

Recent developments in object detection

Semi-Supervised Clustering

Object Detection based on Segment Masks

Object detection with deformable part-based models

The topic discovery models

Data Driven Attributes for Action Detection

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Nonparametric Semantic Segmentation

Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.

The topic discovery models

Object detection as supervised classification

“The Truth About Cats And Dogs”

Brief Review of Recognition + Context

The topic discovery models

Presentation transcript:

Learning Semantics with Less Supervision

Agenda Beyond Fixed Keypoints Beyond Keypoints Open discussion

Part Discovery from Partial Correspondence [Subhransu Maji and Gregory Shakhnarovich, CVPR 2013]

Keypoints in diverse categories Where are the keypoints? Can you name them?

Does the name of a keypoint matter? We can mark correspondences without naming parts Maji and Shakhnarovich HCOMP’12

Annotation interface on MTurk Example landmarks are provided:

Example annotations Annotators mark 5 landmark pairs on average

Are the landmarks consistent across annotators? Yes

Semantic part discovery Given a window in the first image we can find the corresponding window in the second image

propagate correspondence in the “semantic graph”

Semantic part discovery Iter 2 Iter 1 Iter 0 Discover parts using breadth-first traversal

The semantic graph alone is not good enough Graph only Graph + appearance Trained using latent LDA scale, translation, membership

Semantic part discovery Graph only Graph + Appearance

Examples of learned parts

Part-based representation image other activations on the training set

Part-based representation image other activations on the training set

Detecting church buildings: individual parts graph mining better seeds

Detecting church buildings: collection of parts Detection is challenging due to structural variability Latent LDA parts + voting AP=39.9%, DPM AP=34.7%

Label Transfer Ask users to label parts where it makes sense: -> arch -> tower -> window Transfer labels on test images:

Agenda Beyond Fixed Keypoints Beyond Keypoints Open Discussion

Unsupervised Discovery of Mid-Level Discriminative Patches Sarubh Singh, Abhinav Gupta and Alexei Efros, ECCV12

Can we get nice parts without supervision? Idea 0: K-means clustering in HOG space

Still not good enough The SVM memorizes bad examples and still scores them highly However, the space of bad examples is much more diverse So we can avoid overfitting if we train on a training subset but look for patches on a validation subset

Why K-means on HOG fails? Chicken & Egg Problem If we know that a set of patches are visually similar we can easily learn a distance metric for them If we know the distance metric, we can easily find other members

Idea 1: Discriminative Clustering Start with K-Means Train a discriminative classifier for the distance function, using all other classes as negative examples Re-assign patches to clusters whose classifier gives highest score Repeat

Idea 2: Discriminative Clustering+ Start with K-Means or kNN Train a discriminative classifier for the distance function, using Detection Detect the patches and assign to top k clusters Repeat

Can we get good parts without supervision? What makes a good part? Must occur frequently in one class (representative) Must not occur frequently in all classes (discriminative)

Discriminative Clustering+

Discriminative Clustering+

Idea 3: Discriminative Clustering++ Split the discovery dataset into two equal parts (training and validation) Train on the training subset Run the trained classifier on the validation set to collect examples Exchange training and validation sets Repeat

Discriminative Clustering++

Doublets: Discover second-order relationships Start with high-scoring patches Find spatial correlations to other (weaker patches) Rank the potential doublets on validation set

Doublets

AP on MIT Indoor-67 scene recognition dataset

Blocks that shout: Distinctive Parts for Scene Classification Juneja, Vedaldi, Jawahar and Zisserman, CVPR13 bookstore buffet computer room closet

Three steps Seeding (proposing initial parts) Expansion (learning part detectors) Selection (identifying good parts)

Step 1: Seeding Segment the image Find proposal regions based on “objectness” Compute HOG features for each

Step 2: Expansion Train Exemplar SVM for each seed region [Malisiewitz et al] Apply it on validation set to collect more examples Retrain and repeat

Step 3: Selection Good parts should occur frequently in small number of classes but infrequently in the rest Collect top 5 parts from each validation image, sort occurrences of each part by score and keep the top r Compute the entropy for each part over the class distribution. Retain lowest-entropy parts Filter out any parts too similar to others (based on cosine similarity of their SVM weights)

Features and learning Features: Explored Dense RootSIFT, BoW, LLS, Improved Fisher Vectors Non-linear SVM (sqrt kernel)

Results on MIT Indoor-67 Singh et al Juneja et al Seeding K-means on HOG Exemplar SVM Feature space HOG IFV SVM Linear Non-linear Selection Purity & discriminativeness (penalizes parts that perform well for multiple clusters) Entropy rank (allows for parts that work for multiple clusters) AP on MIT 67 49.4 61.1

Learning Collections of Parts for Object Recognition [Endres, Shih, Jiaa and Hoiem, CVPR13]

Overview of the method Seeding: Random samples including full bounding box and sub-window boxes Expanding: Exemplar SVM, fast training (using LDA) Selection: Greedy method, pick parts that require each training example to be explained by a part Appearance Consistency: Include parts that have high SVM score Spatial Consistency: Prefer parts that come from the same location within bounding box Training and Detection: Boosting over Category Independent Object Proposals [Endres & Hoiem]

Results on PASCAL 2010 detection Averages of patches on the top 15 detections on the validation set for a set of parts

Agenda Beyond Fixed Keypoints Beyond Keypoints Open Discussion

Gender Recognition on Labeled Faces in the Wild Much easier dataset – no occlusion, high resolution, centered frontal faces Method Gender AP Kumar et al, ICCV 2009 95.52 Frontal Face poselet 96.43 [Zhang et al, arXiv:1311.5591]

Gender Recognition on Labeled Faces in the Wild Much easier dataset – no occlusion, high resolution, centered frontal faces Method Gender AP Kumar et al, ICCV 2009 95.52 Frontal Face poselet 96.43 Poselets + Deep Learning 99.54 Male of female? [Zhang et al, arXiv:1311.5591]

Poselets vs DPMs vs Discriminative Patches Approach Parametric Non-parametric Speed Faster (fewer types) Slower Slower (many types) Redundancy Little A lot (improves accuracy) A lot Spatial model Sophisticated Primitive (threshold) Primitive Supervision requirements Needs 2 keypoints Needs more keypoints (10+) No supervision Uses multi-scale signal? Two scale levels Yes, multiple scales yes Jointly trained Yes No Attached semantics Medium

Supervision in parts DISCRIMINATIVE PATCHES ISM DPMs SIFT POSELETS unsupervised strongly supervised

Questions for open discussion What is the future for mid-level parts? More supervision vs less supervision? Should low-level parts be hard-coded or jointly trained? Parametric vs non-parametric approaches? Parts with/without associated semantics