Download presentation
Presentation is loading. Please wait.
Published byJacqueline Adair Modified over 9 years ago
1
Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of Technology
2
Goal Recognition of object categories Unassisted learning
3
Some object categories Learn from examples Difficulties: Size variation Background clutter Occlusion Intra-class variation
4
Representation Learning Recognition Main issues Representation Learning Recognition
5
Model: Constellation of Parts f Fischler & Elschlager 1973 Yuille ‘ 91 Brunelli & Poggio ‘ 93 Lades, v.d. Malsburg et al. ‘ 93 Cootes, Lanitis, Taylor et al. ‘ 95 Amit & Geman ‘ 95, ‘ 99 Perona et al. ‘ 95, ‘ 96, ’ 98, ’ 00 f Agarwal & Roth ‘02
6
Detection & Representation of regions Appearance Location Scale (x,y) coords. of region center Diameter of region (pixels) 11x11 patchNormalize Projection onto PCA basis c1c1 c2c2 c 15 ……….. Gives representation of appearance in low-dimensional vector space Find regions within image Use Kadir and Brady's salient region operator [IJCV ’01]
7
Foreground model Gaussian shape pdf Poission pdf on # detections Uniform shape pdf Gaussian part appearance pdf Generative probabilistic model Clutter model Gaussian background appearance pdf Gaussian relative scale pdf log(scale) Prob. of detection 0.80.750.9 Uniform relative scale pdf log(scale) based on Burl, Weber et al. [ECCV ’98, ’00]
8
Recognition
9
Motorbikes Samples from appearance model
10
Learning
11
Task:Estimation of model parameters Learning Let the assignments be a hidden variable and use EM algorithm to learn them and the model parameters Chicken and Egg type problem, since we initially know neither: - Model parameters - Assignment of regions to foreground / background
12
Learning procedure E-step:Compute assignments for which regions are foreground / background M-step:Update model parameters Find regions & their location, scale & appearance Initialize model parameters Use EM and iterate to convergence: Trying to maximize likelihood – consistency in shape & appearance
13
Experiments
14
Experimental procedure Two series of experiments: Fixed-scale model - Objects the same size (manual normalization) Scale-invariant model - Objects between 100 and 550 pixels in width Datasets Training 50% images No identifcation of object within image Testing 50% images Simple object present/absent test Motorbikes Airplanes Frontal Faces Cars (Side) Cars (Rear) Spotted cats
15
Motorbikes
16
Background images evaluated with motorbike model
17
Frontal faces
18
Airplanes
19
Spotted cats
20
Summary of results Dataset Fixed scale experiment Scale invariant experiment Motorbikes7.56.7 Faces4.6 Airplanes9.87.0 Cars (Rear)15.29.7 Spotted cats10.0 % equal error rate Note: Within each series, same settings used for all datasets
21
Comparison to other methods DatasetOursOthers Motorbikes7.516.0 Weber et al. [ECCV ‘00] Faces4.66.0Weber Airplanes9.832.0Weber Cars (Side)11.521.0 Agarwal Roth [ECCV ’02] % equal error rate
22
Robustness of Algorithm
23
Summary Limitations future work Comprehensive probabilistic model for object classes Learn appearance, shape, relative scale, occlusion etc. simultaneously in scale and translation invariant manner Same algorithm gives <= 10% error across 5 diverse datasets with identical settings Datasets available from: http://www.robots.ox.ac.uk/~vgg/data Very reliant on region detector Different part types (e.g. edgel curves) Only learns a single viewpoint Use mixture models Need lots of images to learn Bayesian learning - fewer images [ICCV ’03 (Fei Fei, Fergus, Perona)] Need more through testing Looking towards testing 100’s of datasets
24
Ease of training # Categories (log 2 ) Overview of approaches to category recognition Unsupervise d 00 Labelled Normalized & labelled Normalized & labelled & segmented 11 22 33 88 Schneiderman & Kanade [CVPR ’00] Weber et al. [ECCV ’00] Fergus et al. [CVPR ’03] 14 Humans Viola & Jones [CVPR ’01]
25
Cars from rear
26
ROC equal error rates Pre-scaled data (identical settings): Scale-invariant learning and recognition:
27
Image size histograms
28
Sampling from models FacesMotorbikes
29
Overview of Object Representation It handles: Appearance Shape Relative scale Occlusion Background statistics all in a probabilistic manner Objects represented as a constellation of parts The constellation of parts uses a generative parametric probabilistic model, which is an extension of Weber et. al. [ECCV ’00] Model uses salient regions found by Kadir and Brady’s interest operator PCA is used to give a compact representation of appearance for each region
30
Learning Task:Estimation of model parameters Done using EM algorithm in an maximum-likelihood manner: E-step:Compute soft assignments for which P regions are foreground / background M-step:Update model parameters 200-400 images required to avoid overfitting 6 part model with 30 regions/image and 400 training images takes around 24 hrs to run (~100 EM iterations) Foreground/background decision is automated Characteristics of region detector are learnt Generative model, so no background set used
31
Salient regions Find regions within image Use Kadir and Brady's salient region operator [IJCV ’01] Uses gray-scale input Finds maxima in entropy over scale and location
32
Motorbikes
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.