6 Image alignment: Applications Recognition of object instances
7 Image alignment: Challenges Small degree of overlapIntensity changesOcclusion, clutter
8 Feature-based alignment Search sets of feature matches that agree in terms of:Local appearanceGeometric configuration?
9 Feature-based alignment: Overview Alignment as fittingAffine transformationsHomographiesRobust alignmentDescriptor-based feature matchingRANSACLarge-scale alignmentInverted indexingVocabulary treesApplication: searching the night sky
10 Alignment as fittingPrevious lectures: fitting a model to features in one imageMFind model M that minimizesxi
11 Find transformation T that minimizes Alignment as fittingPrevious lectures: fitting a model to features in one imageAlignment: fitting a model to a transformation between pairs of features (matches) in two imagesMFind model M that minimizesxiTxi'Find transformation T that minimizes
13 Let’s start with affine transformations Simple fitting procedure (linear least squares)Approximates viewpoint changes for roughly planar objects and roughly orthographic camerasCan be used to initialize fitting for more complex models
14 Fitting an affine transformation Assume we know the correspondences, how do we get the transformation?Want to find M, t to minimize
15 Fitting an affine transformation Assume we know the correspondences, how do we get the transformation?
16 Fitting an affine transformation Linear system with six unknownsEach match gives us two linearly independent equations: need at least three to solve for the transformation parameters
17 Fitting a plane projective transformation Homography: plane projective transformation (transformation taking a quad to another arbitrary quad)
18 Homography The transformation between two views of a planar surface The transformation between images from two cameras that share the same center
20 Fitting a homography Recall: homogeneous coordinates Converting to homogeneous image coordinatesConverting from homogeneous image coordinates
21 Fitting a homography Recall: homogeneous coordinates Equation for homography:Converting to homogeneous image coordinatesConverting from homogeneous image coordinates
22 3 equations, only 2 linearly independent Fitting a homographyEquation for homography:3 equations, only 2 linearly independent
23 Direct linear transform H has 8 degrees of freedom (9 parameters, but scale is arbitrary)One match gives us two linearly independent equationsHomogeneous least squares: find h minimizing ||Ah||2Eigenvector of ATA corresponding to smallest eigenvalueFour matches needed for a minimal solution
24 Robust feature-based alignment So far, we’ve assumed that we are given a set of “ground-truth” correspondences between the two images we want to alignWhat if we don’t know the correspondences?
25 Robust feature-based alignment So far, we’ve assumed that we are given a set of “ground-truth” correspondences between the two images we want to alignWhat if we don’t know the correspondences??
33 Generating putative correspondences ?( )?( )=feature descriptorfeature descriptorNeed to compare feature descriptors of local patches surrounding interest points
34 Feature descriptorsRecall: feature detection and description
35 Feature descriptorsSimplest descriptor: vector of raw intensity valuesHow to compare two such vectors?Sum of squared differences (SSD)Not invariant to intensity changeNormalized correlationInvariant to affine intensity change
36 Disadvantage of intensity vectors as descriptors Small deformations can affect the matching score a lot
38 Feature descriptors: SIFT Descriptor computation:Divide patch into 4x4 sub-patchesCompute histogram of gradient orientations (8 reference angles) inside each sub-patchResulting descriptor: 4x4x8 = 128 dimensionsDavid G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp , 2004.
39 Feature descriptors: SIFT Descriptor computation:Divide patch into 4x4 sub-patchesCompute histogram of gradient orientations (8 reference angles) inside each sub-patchResulting descriptor: 4x4x8 = 128 dimensionsAdvantage over raw vectors of pixel valuesGradients less sensitive to illumination changePooling of gradients over the sub-patches achieves robustness to small shifts, but still preserves some spatial informationDavid G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp , 2004.
40 Feature matchingGenerating putative matches: for each patch in one image, find a short list of patches in the other image that could match it based solely on appearance?
41 Problem: Ambiguous putative matches Source: Y. Furukawa
42 Rejection of unreliable matches How can we tell which putative matches are more reliable?Heuristic: compare distance of nearest neighbor to that of second nearest neighborRatio of closest distance to second-closest distance will be high for features that are not distinctiveThreshold of 0.8 provides good separationDavid G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp , 2004.
43 RANSACThe set of putative matches contains a very high percentage of outliersRANSAC loop:Randomly select a seed group of matchesCompute transformation from seed groupFind inliers to this transformationIf the number of inliers is sufficiently large, re-compute least-squares estimate of transformation on all of the inliersKeep the transformation with the largest number of inliers
45 RANSAC example: Translation Select one match, count inliers
46 RANSAC example: Translation Select one match, count inliers
47 RANSAC example: Translation Select translation with the most inliers
48 Scalability: Alignment to large databases What if we need to align a test image with thousands or millions of images in a model database?Efficient putative match generationApproximate descriptor similarity search, inverted indicesTest image?Model database
49 Large-scale visual search Reranking/Geometric verificationInverted indexingFigure from: Kristen Grauman and Bastian Leibe, Visual Object Recognition, Synthesis Lectures on Artificial Intelligence and Machine Learning, April 2011, Vol. 5, No. 2, Pages 1-181
50 Example indexing technique: Vocabulary trees Test imageDatabaseVocabulary tree with inverted indexD. Nistér and H. Stewénius, Scalable Recognition with a Vocabulary Tree, CVPR 2006
51 Goal: find a set of representative prototypes or cluster centers to which descriptors can be quantizedDescriptor space
52 K-means clusteringWant to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mkAlgorithm:Randomly initialize K cluster centersIterate until convergence:Assign each data point to the nearest centerRecompute each cluster center as the mean of all points assigned to it
53 K-means demo Source: http://shabal.in/visuals/kmeans/1.html Another demo:
54 Recall: Visual codebook for generalized Hough transform …Appearance codebookSource: B. Leibe
55 Hierarchical k-means clustering of descriptor space (vocabulary tree) We then run k-means on the descriptor space. In this setting, k defines what we call the branch-factor of the tree, which indicates how fast the tree branches.In this illustration, k is three. We then run k-means again, recursively on each of the resulting quantization cells.This defines the vocabulary tree, which is essentially a hierarchical set of cluster centers and their corresponding Voronoi regions.We typically use a branch-factor of 10 and six levels, resulting in a million leaf nodes. We lovingly call this the Mega-Voc.Hierarchical k-means clustering of descriptor space (vocabulary tree)Slide credit: D. Nister
56 Vocabulary tree/inverted index We now have the vocabulary tree, and the online phase can begin.In order to add an image to the database, we perform feature extraction. Each descriptor vector is now dropped down from the root of the tree and quantized very efficiently into a path down the tree, encoded by a single integer.Each node in the vocabulary tree has an associated inverted file index.Indecies back to the new image are then added to the relevant inverted files.This is a very efficient operation that is carried out whenever we want to add an image.Vocabulary tree/inverted indexSlide credit: D. Nister
57 Populating the vocabulary tree/inverted index Model imagesPopulating the vocabulary tree/inverted indexSlide credit: D. Nister
58 Populating the vocabulary tree/inverted index Model imagesPopulating the vocabulary tree/inverted indexSlide credit: D. Nister
59 Populating the vocabulary tree/inverted index Model imagesPopulating the vocabulary tree/inverted indexSlide credit: D. Nister
60 Populating the vocabulary tree/inverted index Model imagesPopulating the vocabulary tree/inverted indexSlide credit: D. Nister
61 Looking up a test image Model images Test image When we then wish to query on an input image, we quantize the descriptor vectors of the input image in a similar way, and accumulate scores for the images in the database with so called term frequency inverse document frequency (tf-idf). This is effectively an entropy weighting of the information. The winner is the image in the database with the most common information with the input image.Looking up a test imageSlide credit: D. Nister
62 Cool application of large-scale alignment: searching the night sky BCDA