Presentation on theme: "Jan-Michael Frahm, Enrique Dunn Spring 2013"— Presentation transcript:
1Jan-Michael Frahm, Enrique Dunn Spring 2013 776 Computer VisionFace/Object detectionJan-Michael Frahm, Enrique DunnSpring 2013
2Previous Lecture The Viola/Jones Face Detector A seminal approach to real-time object detectionTraining is slow, but detection is very fastKey ideasIntegral images for fast feature evaluationBoosting for feature selectionAttentional cascade for fast rejection of non-face windowsP. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. CVPR 2001.P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.
3Image Features “Rectangle filters” Value = ∑ (pixels in white area) – ∑ (pixels in black area)For real problems results are only as good as the features used...This is the main piece of ad-hoc (or domain) knowledgeRather than the pixels, we have selected a very large set of simple functionsSensitive to edges and other critcal features of the image** At multiple scalesSince the final classifier is a perceptron it is important that the features be non-linear… otherwise the final classifier will be a simple perceptron.We introduce a threshold to yield binary features
4Computing the integral image ii(x, y-1)s(x-1, y)i(x, y)Cumulative row sum: s(x, y) = s(x–1, y) + i(x, y) Integral image: ii(x, y) = ii(x, y−1) + s(x, y)MATLAB: ii = cumsum(cumsum(double(i)), 2);
5Boosting for face detection First two features selected by boosting: This feature combination can yield 100% detection rate and 50% false positive rate
7Receiver operating characteristic Attentional cascadeChain classifiers that are progressively more complex and have lower false positive rates:Receiver operating characteristicvsfalsenegdetermined by% False Pos% Detection50In general simple classifiers, while they are more efficient, they are also weaker.We could define a computational risk hierarchy (in analogy with structural risk minimization)…A nested set of classifier classesThe training process is reminiscent of boosting…- previous classifiers reweight the examples used to train subsequent classifiersThe goal of the training process is different- instead of minimizing errors minimize false positivesTTTTIMAGESUB-WINDOWFACEClassifier 1Classifier 2Classifier 3FFFNON-FACENON-FACENON-FACE
8Boosting vs. SVM Advantages of boosting Disadvantages Integrates classifier training with feature selectionComplexity of training is linear instead of quadratic in the number of training examplesFlexibility in the choice of weak learners, boosting schemeTesting is fastEasy to implementDisadvantagesNeeds many training examplesTraining is slowOften doesn’t work as well as SVM (especially for many-class problems)
9Face Recognition Attributes for training Similes for training N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, "Attribute and Simile Classifiers for Face Verification," ICCV 2009.
10Face Recognition with Attributes Low-levelfeaturesImagesAttributesVerificationRGBHOGLBPSIFT…+-DarkhairAsianRoundJawDifferentMaleRGBHOGLBPSIFT…+-
11Learning an attribute classifier TrainingimagesLow-levelfeaturesFeatureselectionTrainclassifierRGBHoGHSV…RGB, NoseHoG, EyesGenderclassifierMalesHSV, HairRGBHoGHSV…Edges, Mouth…Male0.87Females
14Principal Component Analysis A N x N pixel image of a face, represented as a vector occupies a single point in N2-dimensional image space.Images of faces being similar in overall configuration, will not be randomly distributed in this huge image space.Therefore, they can be described by a low dimensional subspace.Main idea of PCA for faces:To find vectors that best account for variation of face images in entire image space.These vectors are called eigen vectors.Construct a face space and project the images into this face space (eigenfaces).
15Image RepresentationTraining set of m images of size N*N are represented by vectors of size N2x1,x2,x3,…,xMExample
16Average Image and Difference Images The average training set is defined bym= (1/m) ∑mi=1 xiEach face differs from the average by vectorri = xi – m
17Covariance Matrix The covariance matrix is constructed as C = AAT where A=[r1,…,rm]Finding eigenvectors of N2 x N2 matrix is intractable. Hence, use the matrix ATA of size m x m and find eigenvectors of this small matrix.Size of this matrix is N2 x N2
18Eigenvalues and Eigenvectors - Definition If v is a nonzero vector and λ is a number such thatAv = λv, then v is said to be an eigenvector of A with eigenvalue λ.Examplel(eigenvalues)Av(eigenvectors)
19Eigenvectors of Covariance Matrix The eigenvectors vi of ATA are:Consider the eigenvectors vi of ATA such thatATAvi = iviPremultiplying both sides by A, we haveAAT(Avi) = i(Avi)
20Face Space A face image can be projected into this face space by The eigenvectors of covariance matrix areui = AviFace Spaceui resemble facial images which look ghostly, hence called EigenfacesA face image can be projected into this face space bypk = UT(xk – m) where k=1,…,m
27Limitations of Eigenfaces Approach Variations in lighting conditionsDifferent lighting conditions for enrolment and query.Bright light causing image saturation.Differences in pose – Head orientation- 2D feature distances appear to distort.Expression- Change in feature location and shape.
28Linear Discriminant Analysis PCA does not use class informationPCA projections are optimal for reconstruction from a low dimensional basis, they may not be optimal from a discrimination standpoint.LDA is an enhancement to PCAConstructs a discriminant subspace that minimizes the scatter between images of same class and maximizes the scatter between different class images
29More sliding window detection: Discriminative part-based models Many slides based on P. Felzenszwalb
34Pedestrian detection Features: Histograms of oriented gradients (HOG) Learn a pedestrian template using a linear support vector machineAt test time, convolve feature map with templateHOG feature mapTemplateDetector response mapN. Dalal and B. Triggs, Histograms of Oriented Gradients for Human Detection, CVPR 2005
36Discriminative part-based models Root filterPart filtersDeformation weightsP. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan, Object Detection with Discriminatively Trained Part Based Models, PAMI 32(9), 2010
38Object hypothesisMultiscale model: the resolution of part filters is twice the resolution of the root
39Scoring an object hypothesis The score of a hypothesis is the sum of filter scores minus the sum of deformation costsSubwindow featuresDisplacementsFiltersDeformation weights
40Scoring an object hypothesis The score of a hypothesis is the sum of filter scores minus the sum of deformation costsRecall: pictorial structuresSubwindow featuresDisplacementsFiltersDeformation weightsMatching costDeformation cost
41Scoring an object hypothesis The score of a hypothesis is the sum of filter scores minus the sum of deformation costsSubwindow featuresDisplacementsFiltersDeformation weightsConcatenation of filter and deformation weightsConcatenation of subwindow features and displacements
42DetectionDefine the score of each root filter location as the score given the best part placements:
43DetectionDefine the score of each root filter location as the score given the best part placements:Efficient computation: generalized distance transformsFor each “default” part location, find the best-scoring displacementHead filter responsesDistance transformHead filter
46Training Training data consists of images with labeled bounding boxes Need to learn the filters and deformation parameters
47Training Our classifier has the form w are model parameters, z are latent hypothesesLatent SVM training:Initialize w and iterate:Fix w and find the best z for each training example (detection)Fix z and solve for w (standard SVM training)Issue: too many negative examplesDo “data mining” to find “hard” negatives
56Quantitative results (PASCAL 2008) 7 systems competed in the 2008 challengeOut of 20 classes, first place in 7 classes and second place in 8 classesBicyclesPersonBirdProposed approachProposed approachProposed approach