Presentation on theme: "Compact Signatures for High- speed Interest Point Description and Matching Calonder, Lepetit, Fua, Konolige, Bowman, Mihelich (as rendered by Lord)"— Presentation transcript:
Compact Signatures for High- speed Interest Point Description and Matching Calonder, Lepetit, Fua, Konolige, Bowman, Mihelich (as rendered by Lord)
Just Kidding Actually, were doing three papers –Fast Keypoint Recognition in Ten Lines of Code (Ferns) –Keypoint Signatures for Fast Learning and Recognition (Signatures) –Compact Signatures for High-speed Interest Point Description and Matching (Compact Signatures) We will be doing them briefly, so dont worry Context: were talking about keypoint description and matching
Ferns Problem: Features designed to be invariant or robust to commonly-observed deformations (e.g. SIFT) are slow to compute, limiting how many can be handled in many practical applications Solution: Move most of the computation offline via a discriminative learning framework
Ferns We want to assign the patch around a keypoint to the most probable class ĉ i given the binary features f j calculated over it: Standard Bayess Rule: Assuming a uniform prior, this becomes a maximum likelihood expression: Choose a very simple feature, the sign of the difference between two pixels:
Ferns Need about 300 of these features for accurate classification. The full joint thus cant be represented. As usual, seek to alleviate this problem with independence assumptions. At the extreme: This (complete independence) will of course not really work on anything. So, a simple in-between: These groups are the ferns. Model dependence within each group, assume independence between them (at random):
Ferns The titular ten lines: The fern form has M2 S parameters, with M between 30 and 50, and S about 10.
Ferns Other details, which well skip: –Modeling confidence in empirical estimates –Using thresholds to reduce evaluation count –Relationship with Random Trees –Comparison against SIFT
Signatures Problem: Ferns are based on an offline training phase, so you cant learn new features online. This renders ferns useless for, e.g., SLAM. Solution: Describe new classes in terms of the old (assuming the initial set is rich enough).
Signatures Call these points the base set, and train a Randomiz(s)ed Tree classifier on them. (Call the method Generic Trees.) Pull some keypoints at random from an arbitrary textured scene (here, N DOG/SIFT points not within 5 pixels): The response of a keypoint from the base set to the classifier trained on the base set should peak at that keypoint: You also warp the base set patches to make the class recognition transformation-invariant (TBD):
Signatures The response of a keypoint not in the base set tends to peak in multiple (but relatively few) locations. This response is the keypoints signature (intended to be transformation-invariant): By thresholding, you can replace this signature with a sparse approximation to itself: A signature is essentially the collection of base patches you most resemble:
Signatures For evaluation, signatures are matched using best-bin-first with geometric ground truths on baseline pairs like this: N and t determine signature length, N explicitly and t implicitly (N increases description and matching, t only increases matching) (At t=0.01,) signature lengths are short and tightly distributed Experimentally, found reason to go beyond N=300
Signatures t does not have to be terribly large to max out your matching performance:
Signatures According to the paper, this represents a 35-time speedup. Division gives me about 53. Am I misunderstanding something, or was that a typo? (They also show this can be applied to SLAM, but well note that without getting into it yet. TBD.) The selling point of this is that it gives very similar performance to SIFT, at a fraction of the cost in time (TBD):
Compact Signatures Problem: Signatures are naturally sparse, but the first attempt at them did not exploit this: matching time and memory usage are higher than needed. Solution: Compress the signatures through random projection. (This is the whole paper.)
Compact Signatures You again have a base classifier consisting of J fern units. Although, now, the ferns are combined additively, like random trees, so theyre not really the ferns detailed in the reference (TBD): And again, there is a sparse version of the response created by thresholding against θ: With base size N, feature count d, and bytes to store a float b, the memory requirement of the approach is For J=50, d=10, and N=500, this exceeds 100 MB.
Compact Signatures However, you can compress this with an ROP matrix Φ: Because of the linear combination of fern responses, you can pre-compress the leaf vectors, avoiding storing their uncompressed versions: This effectively replaces N by M (row dimension of Φ), dividing memory requirement by N/M, and requiring N/M times fewer operations in computing descriptors. There is then further (SIMD-enabling) bit-level compression:
Compact Signatures The transformation from the previous approach (top) to this one (bottom) can be pictured like this:
Theres no reason to make M larger than 176, and no reason to worry much about how you do the projection:
Compact Signatures This paper was about time and space: There are details about PTAM incorporation and a small appendix on compressive sensing, which we dont do in detail here.
TBD Transformation-invariance SLAM application Ferns vs. random trees