Announcements No class Thursday. Attend Rao lecture. Double-check your paper assignments.
Key Points Rigid rotation is 3x3 orthonormal matrix. 3-D Translation is 3x4 matrix. 3-D Translation + Rotation is 3x4 matrix. Scaled Orthographic Projection: Remove row three and allow scaling. Planar Object, remove column 3. Projective Transformations –Rigid Rotation of Planar Object Represented by 3x3 matrix. –When we write in homogeneous coordinates, projection implicit. –When we drop rigidity, 3x3 matrix is arbitrary.
Projective Rigid rotation and translation. Notation suggests that first two columns are orthonormal, and transformation has 6 degrees of freedom. Projective Transformation Notation suggests that transformation is unconstrained linear transformation. Points in homogenous coordinates are equivalent. Transformation has 8 degrees of freedom, because its scale is arbitrary.
Lines: Parameterization Equation for line: ax+by+c=0. Parameterize line as l = (a,b,c) T. p=(x,y,1) T is on line if =0.
Line Intersection The intersection of l and l’ is l x l’ (where x denotes the cross product). This follows from the fact that the cross product is orthogonal to both lines.
Intersection of Parallel Lines Suppose l and l’ are parallel. We can write l=(a,b,c), l’ = (a,b,c’). l x l’ = (c’-c)(b,-a,0). This equivalent to (b,-a,0). This point corresponds to a line through the focal point that doesn’t intersect the image plane. We can think of the real plane as points (a,b,c) where c isn’t equal to 0. When c = 0, we say these points lie on the ideal line at infinity. Note that a projective transformation can map this to another line, the horizon, which we see.
Invariants of Lines Notice that affine transformations are the subgroup of projective transformations in which the last row is (0, 0, 1). These map the line at infinity to itself. So parallel lines are affine invariants, since they continue to intersect at infinity.
Invariance in 3D to 2D 3D to 2D “Invariance” isn’t captured by mathematical definition of invariance because 3D to 2D transformations don’t form a group. –You can’t compose or invert them. Definition: Let f be a function on images. We say f is an invariant iff for every Object O, if I1 and I2 are images of O, f(I1)=f(I2). This means we can define f(O) as f(I) for I any image of O. O and I match only if f(O)=f(I). f is a non-trivial invariant if there exist two image I1 and I2 such that f(I1)~=f(I2).
Non-Invariance in 3D to 2D Theorem: Assume valid objects are any 3D point sets of size k, for some k. Then there are no non-trivial invariants of the images of these objects under perspective projection.
Proof Strategy Let f be an invariant. Suppose two objects, A and B have a common image. Then f(I)=f(J) if I and J are images of either A or B. Given any O0, Ok, we construct a series of objects, O1, …, O(k-1), so that Oi and O(i+1) have a common image for all i, and Ok and j have a common image. So for any pair of images, I, J, from any two objects, f(I) = f(J).
Constructing O1 … Ok-1 Oi has its first i points identical to the first i points of Ok, and the remaining points identical to the remaining points of O0. If two objects are identical except for one point, they produce the same image when viewed along a line joining those two points. –Along that line, those two points look the same. –The remaining points always look the same.
Summary Planar objects give rise to rich set of invariants. 3-D objects have no invariants. –We can deal with this by focusing on planar portions of objects. –Or special restricted classes of objects. –Or by relaxing notion of invariants. However, invariants have become less popular in computer vision due to these limitations.
Lowe and Biederman Background Viewpoint Invariant Non-Accidental Properties. –Lowe sees these as probabilistic. –Biederman drops this. –Primitive properties –Composing them into units/geons. Use in Recognition. –Speed search. –Geons: analogy to speech. Evidence for Value. –Computational speed. –Human psychology: parts; qualitative descriptions; view invariance.
Background Computational –2D approach to recognition. Lowe is reacting to Marr. Partly due to Lowe, recognition rarely involves reconstruction now. (But also 3D models more rare). –State of the art: –Little recognition of 3D objects, grouping implicit. –Speed, robustness a big concern. –2D recognition through search. Psychology –Much more ambitious and specific than any prior theory of recognition (I believe). –P.O. widely studied, rarely related to other tasks. Contrast. –CS must account for low-level processing. –Psych must account for categorization.
Viewpoint Invariant NAPs Non-Accidental Property –Happens rarely by chance –More frequently by scene structure. –p = property, c = chance, s = structure. Lowe focuses on this Jepson and Richards consider this Biederman downplays probabilistic inference. Not concerned with background, feature detection. This is high due to viewpoint invariance.
Issues with Non-Accidental Properties Is it “just” Bayesian inference? –Then why not model all information? This may fit Lowe Biederman relies more on certain inference. See also Feldman, Jepson, Richards.
Viewpoint Invariance Match properties that are invariant to viewing conditions. –Parallelism, symmetry, collinearity, cotermination, straightness. –Lowe picks one side of property, Biederman stresses contrast. Why? How used? –Lowe, correspondence of geometric features. Speed up search –Description of parts for indexing.
Geons –Biederman, description of geons. Are they still view invariant when describing a geon? 3D shape’s occluding contour depends on viewpoint. May be straight from one view, curved from another. Metric properties not truly invariant. Maybe more like quasi- invariants.
Geons for Recognition Analogy to speech. –36 different geons. –Different relations between them. –Millions of ways of putting a few geons together.
Empirical Support for Geons First, divide geons predictions: –Part structure is important in recognition. –Perceptual grouping can be used for filling in. –NAPs are used for indexing. View invariant descriptions. Qualitative descriptions. Second, what is alternative? –View-based recognition with many examples.
Empirical Support Recognition is fast. Fine metric judgments are slow. –Does this disqualify other approaches? Recognition is view-invariant. –Does this disqualify other approaches? Number of geon descriptions sufficient for number of categories we recognize. –Argues plausibility, but no more.
Empirical Support (2) 2-4 Geons needed for recognition. Complex objects no harder than simple ones. Line Drawings vs. Colored images. Color similar speed.
Empirical Support (3): Degraded Objects Deleting contours that interfere with geon structure interferes more. Deleting Components worse than midsections. This argues for perceptual organization for interpolation/reconstruction. But for geons? Should we measure information deleted rather than contour length?
Conclusions Maybe helpful to separate: –Perceptual organization/completion. –View Invariance –Part Structure. All three widely used in computer vision. Biederman’s paper probably addresses view-invariance least. –This became subject of much research.