Thomas Berg and Peter Belhumeur

Thomas Berg and Peter Belhumeur
“POOF: Part Based One-vs-One Features for Fine Grained Categorization, Face Verification, and Attribute Estimation” Thomas Berg and Peter Belhumeur CVPR 2013 VGG Reading Group Eric Sommerlade

Summary A POOF is a scalar defined Perks:
for a discriminative region between two classes and two landmarks for a set of base features (e.g. HOG or colour hist.) Perks: Regions automatically learned from data set Great Performance transfers in knowledge from external datasets

Motivation: Standard approach to part based recognition:
- extract standard feature (SIFT, HOG, LBP) - train classifier - relevant regions tuned by hand Idea: “standard” features hardly optimal for specific problem “best” according to - domain (dog features != bird features) - task (face recognition != gender classification)

POOF feature learning:
From dataset with landmark annotations

Choose feature part f Choose alignment part a Align and crop to 128x64 region Larger/shorter distance -> coarser/finer scale

Scales: 8x8 and 16x16  8*16 + 4*8 = 160 cells

Per cell: 8 bin gradient direction histogram Dg=8 (‘gradhist’) Or Felsenszwalb HOG: Dg=31 Color histogram Dc=32 Concatenated length (Dg+Dc)*160

For each scale (8x8, 16x16): learn linear SVM, get weights w Keep max abs(w_i) per cell Keep cells with max(w_c)>=median(max(w_c)) keep connected component (4?) starting at f W: c1 c2 cn … c1 c2 cn max: … c1 c2 cn threshold: …

retrain SVM on selected cells only Get POOF (bitmap+svm weight vector):

POOF feature extraction:
Find corresponding landmarks Authors use Belhumeur CVPR 2011 Align & crop to 128x64 region Get base features Get SVM score from features in masked region

Results: categorization
UCSD birds dataset, 200 classes 13 landmarks used About 5m POOF combinations possible Randomly chosen subset of 5000 POOFs Use as feature vector in one-vs-all linear SVM Evaluation on gt bbox of object gt landmarks or detected landmarks

gradhist HOG lowlevel baseline [27] [4] (MKL) [33] (RF) [32] [8] [35] 200det 54 56 28 14det 65 70 57 200gt 69 73 40 17 19 14gt 80 85 44 5det 55

Results: Face Verification
Are two images of the same person? LFW dataset 16 landmarks 120 subjects ~3.5m POOF choices Each image yields random POOFs f(I) For image pair concat [|f(I)-f(J)| f(I).*f(J)] Train same-vs-different classifier

Performance equal to Tom-vs-Pete (bmvc2012) But: Support regions learned automatically Linear SVM, not RBF  faster Uses same “identity preserving alignment” on landmark detections [2] input affine canonical Mean of all closest in dataset

Results: Attribute classification
Attributes such as gender, “big nose”, “eyeglasses” (Kumar [14]) POOFs learned as before, on LFW dataset Extracts POOFs from attribute dataset Train linear SVM for each attribute POOFs transfer discriminability from different classes  no need for fully labelled attribute dataset

Results: Attribute classification
Restricted number of attribute samples POOF features don’t latch on to noise …

Thomas Berg and Peter Belhumeur

Similar presentations

Presentation on theme: "Thomas Berg and Peter Belhumeur"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Thomas Berg and Peter Belhumeur

Similar presentations

Presentation on theme: "Thomas Berg and Peter Belhumeur"— Presentation transcript:

Similar presentations

About project

Feedback