Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images
Ed Bremer University of Rochester

References [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, October 2004, [2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, [3] Lowe, D., Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp [4] Matas, J., Chum, O., Urban, M., Pajdla,T Robust Wide Baseline Stereo From Maximally Stable Extremal Regions, Proc British Machine Vision Conference BMVC2002, pages 384 – 393. [5] Zisserman, A., Schaffalitzky, F., 2002, Multi-view matching for unordered image sets, or ”How do I organize my holiday snaps?”, Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pages , vol 1. [6] Baumberg, A., 2000, Reliable Feature Matching Across Widely Separated Views, In Proc. CVPR ,pages [7] Mikolajczyk, K, Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, pages Automatic Matching of Multi-View Images

Outline Motivation Applications Process Components Region Detectors Descriptors Matching Criteria Performance Evaluation Conclusion & Next Steps Automatic Matching of Multi-View Images

Motivation Multi-view/Multi-image Matching Multiple images of scene taken by single or multiple cameras with different rotation, scale, viewpoint and illumination 3D scene Automatic Matching of Multi-View Images

Motivation Applications … detecting matching regions is used in all the following Image registration Super-resolution Stereo vision Object detection and recognition Object and motion tracking Indexing and retrieval of objects 3D scene reconstruction Scene recognition Automatic Matching of Multi-View Images

Examples of Multi-view Images [2]
[2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, Automatic Matching of Multi-View Images

Process Components Covariant region detection Detect image regions covariant to class of transformation between reference image and transformed image Invariant descriptor Compute invariant descriptors from covariant regions Descriptor matching Compute distance between descriptors in reference image and transformed image [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, Automatic Matching of Multi-View Images

Region Detectors Support regions for computation of descriptors Determined independently in each image Scale invariant or Affine invariant Can be points (feature points) or regions (covariant) Provide dense (local) coverage – robust to occlusion Need to be stable and repeatable Five region detectors - Harris points -> invariant to rotation Harris-Laplacian -> invariant to rotation and scale Hessian-Laplace ->invariant to rotation and scale Harris-Affine -> invariant to affine image transformations Hessian-Affine -> invariant to affine image transformations [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, Automatic Matching of Multi-View Images

Region Detectors Harris points - Maxima of Harris function used to locate interest point Support region fixed in size, 41x41 neighborhood centered at interest point Harris-Laplace regions - Scale adapted Harris function Interest point is local minima or maxima across scale-space by Laplacian-of-Gaussian [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, Automatic Matching of Multi-View Images

Region Detectors Harris-Laplace Performance - Approximately 10% better than Laplacian, Lowe or gradient methods. Harris standard detector is very poor under scale changes [7] Mikolajczyk, K., Schmid, C., 2001, Indexing based on scale invariant interest points, In Proc. 8th ICCV, Pages Automatic Matching of Multi-View Images

Region Detectors Hessian-Laplace regions - Interest point is at local maxima of Hessian determinant Location in scale-space using maxima of Laplacian-of-Gaussian (can also use Difference-of-Gaussians) [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, [3] Lowe, D., Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, 60, 2 (2004), pp Automatic Matching of Multi-View Images

Region Detectors Harris-Affine regions - Find regions using Harris-Laplace detector Region based on 2nd moment & affine adapted Hessian-Affine regions - Find regions using Hessian-Laplace detector Affine adapted region based on 2nd moment. [2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, Automatic Matching of Multi-View Images

Region Detectors Regions produced by Harris-Affine and Hessian-Affine detectors [2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, Automatic Matching of Multi-View Images

Region Detectors Affine normalization using 2nd moment matrix for region L and R [2] Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Van Gool, L., 2004, A comparison of affine region detectors, Submitted to International Journal of Computer Vision, August 2004, Automatic Matching of Multi-View Images

Region Detectors Region normalization Detectors produce circular or elliptical regions Size dependant on detection scale Map regions to circular region with constant radius Rotate regions in direction of dominant gradient orientation Illumination normalization Use affine transformation -> aI(x) + b Mean and standard deviation of pixel intensities [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, Automatic Matching of Multi-View Images

Descriptors Descriptors -> Feature vector Invariant to changes in scale, rotation, affine translation and affine illumination Need to be distinct, stable and repeatable Distribution (histogram) type or Covariance type Ten Descriptor types Scale-Invariant Feature Transform (SIFT) Gradient Location and Orientation histogram (GLOH) Shape Context Principal Component Analysis (PCA)-SIFT Steerable Filters Differential Invariants Complex Filters Moment Invariants Cross-Correlation Spin Image [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, Automatic Matching of Multi-View Images

Descriptors SIFT and GLOH 3D Descriptors SIFT -> 4 x 4 x 8 = 128 dimension descriptor GLOH -> Log-polar [(2 x 8) + 1] x 16 = 272 dimension descriptor [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, Automatic Matching of Multi-View Images

Matching Criteria Distance measure Find putative matches between images Mahalanobis distance – used for covariant descriptors Euclidean distance – used for distribution (histogram) descriptors Direct distance comparison not suitable for indexing or database searching Simple threshold Descriptors match if distance between is below threshold t Descriptor in reference image can have many matches to descriptors in transformed image Nearest Neighbor (NN) Find closest match between descriptors in reference and transformed image Descriptor in reference image can have only 1 match to descriptor in transformed image Automatic Matching of Multi-View Images

Performance Evaluation
Criterion basis Recall rate = #correct matched/#correspondences 1-precision = #false matches/[#correct matches + #false matches] Ideal descriptor -> recall rate = 1, for all precision given no overlap error [1] Mikolajczyk, K., Schmid, C., 2004, A performance evaluation of local descriptors, Submitted to PAMI, Automatic Matching of Multi-View Images

SIFT - Scale Invariant Feature Transform
Scale Invariant Feature Transform (SIFT) Lowe [3] Features – Invariant to image scale, rotation Invariant for small changes in illumination and 3D camera viewpoint Extracts large number of highly distinctive features Enables detection of small objects Improved performance in cluttered scenes Algorithms are efficient – complex operations applied to local regions or features vs whole image Procedure Scale-space extrema detection Keypoint localization Orientation asignment Keypoint vector (descriptor) Automatic Matching of Multi-View Images

SIFT - Scale Invariant Feature Transform [3]
Scale-Space Blob Detector - Search for stable features over all scales and image locations Scale-space kernel -> Gaussian function Difference of Gaussian Automatic Matching of Multi-View Images

Difference of Gaussian (DoG) simple subtraction of blurred L images Approximation to scale-normalized Laplacian of Gaussian Maxima or minima of scale-normalized Laplacian produces the most stable image features compared to gradient, Hessian, or Harris corner function (Mikolajczyk 2002) Automatic Matching of Multi-View Images

Scale-Space Image Set - Divide each octave into s intervals Compute s + 3 filtered (increasing blurry) images, k = 2(1/s) s = 3, k = > 6th –> 3.18σ 5th –> 2.52σ 4th –> 2.00σ 3rd –> 1.59σ 2nd –> 1.26σ 1st –> 1.00σ Subtract adjacent images to produce DoG images Repeat for next octave using 2nd image from top and decimate by 2 Automatic Matching of Multi-View Images

Scale-Space Pyramid - (from Lowe) Automatic Matching of Multi-View Images

Locating Scale-Space Extrema - Detection of local maxima or minima of D(x, y, σ) Compare each sample point to 8 neighbors in same scale image and 9 neighbors in scale image above and below. Mark if sample is greater than or less than all of the neighbors Compares s number of DoG images Automatic Matching of Multi-View Images

Improving Localization - Reject points that have low contrast using: <threshold Where –> Gives offset extremum -> Hessian and derivative of D(x, y, σ) uses differences of neighboring sample points. x = (x, y , σ)T is offset from sample point Automatic Matching of Multi-View Images

Edge Rejection - Eliminate poorly defined peaks (edges) using Hessian matrix Verify ratio of principal curves is less than threshold r<10 Efficient to compute -> less than 20 floating point operations Automatic Matching of Multi-View Images

Results from Lowe [3] – 832 keypoints reduced to 536 (233x189 image) Automatic Matching of Multi-View Images

SIFT - Scale Invariant Feature Transform
Results from Lowe [3] – performance measures Automatic Matching of Multi-View Images

Orientation – rotational invariance Use scale of point to select image L(x, y, σ) Compute the gradient m(x, y) and orientation θ(x, y) at each image sample using differences. Orientation histogram of sample points – entries weighted by gradient magnitude and a Gaussian window around the keypoint, bins cover 360° range Peaks in histogram correspond to dominant directions of local gradients Automatic Matching of Multi-View Images

Descriptor – the feature vector 8x8 sub-region histograms allow shift in gradient positions 128 element feature vector -> 4x4 array of 8 orientations (2x2x8 from Lowe is shown below) Feature vectors matched by nearest neighbor (Euclidean distance) Automatic Matching of Multi-View Images

Results from Lowe [3] – Two training objects recognized in cluttered image Small squares show point matches Large rectangles shown border of training image after affine transformation Automatic Matching of Multi-View Images

Conclusions Conclusions Harris-Laplacian region detector performs better than Laplacian, DoG and gradient scale-space operators Scale-space detectors provide invariance to rotation, scale and small changes to illumination and viewpoint. Affine adaptation provides invariance to affine transformations GLOH and SIFT descriptors provide the best performance. Dense, localized descriptors perform well under occlusions Nexts steps Coding and testing of region detectors, descriptors and matching… Automatic Matching of Multi-View Images

Automatic Matching of Multi-View Images

Similar presentations

Presentation on theme: "Automatic Matching of Multi-View Images"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Automatic Matching of Multi-View Images

Similar presentations

Presentation on theme: "Automatic Matching of Multi-View Images"— Presentation transcript:

Similar presentations

About project

Feedback