Presentation on theme: "Linear Subspaces - Geometry. No Invariants, so Capture Variation Each image = a pt. in a high-dimensional space. –Image: Each pixel a dimension. –Point."— Presentation transcript:
No Invariants, so Capture Variation Each image = a pt. in a high-dimensional space. –Image: Each pixel a dimension. –Point set: Each coordinate of each pt. A dimension. Simplest rep. of variation is linear. –Basis (eigen) images: x 1 …x k –Each image, x = a 1 x 1 + … + a k x k Useful if k << n.
When is this accurate? Approximately right when: –Variation approximately linear. Always true for small variation. –Some variations big, some small, can discard small. Exactly right sometimes. –Point features with scaled-orthographic projection. –Convex, Lambertian objects and distant lights.
Principal Components Analysis (PCA) All-purpose linear approximation. Given images (as vectors) Finds low-dimensional linear subspace that best approximates them. –Eg., minimizes distance from images to subspace.
Derivation on whiteboard This is all taken from Duda, Hart and Stork Pattern Classification pp. 114- 117. Excerpt in library.
SVD Scatter matrix can be big, so computation non-trivial. Stack data into matrix X, each row an image. SVD gives X = UDV T –D is diagonal with non-increasing values. –U and V have orthonormal rows. V T (:,1:k) gives first k principal components. matlab
Linear Combinations ISP Immediately apparent that u and v coordinates lie in a 4D linear subspace
We Can Remove Translation (1) This is trivial, because we can pick a simple origin. –World origin is arbitrary. –Example: We can assume first point is at origin. Rotation then doesn’t effect that point. All its motion is translation. –Better to pick center of mass as origin. Average of all points. This also averages all noise.
Specifically, we can never tell where the world points were to begin with. Adding one to every x coordinate in P and then subtracting 1 in every tx is undetectable. So, wlog we can assume that sum(P(k,:)) = 0 for k from 1 to 3, ie., sum(x1 … xn) = 0, sum(y1…yn) = 0, sum(z1 … zn) = 0. Rotation doesn’t move the origin, which is now the center of mass. Neither does scaled orthographic projection. So, this only moves from translation. Explicitly, we assume sum(p) = (0,0,0)^T. Then: sum(s*R(p)) = s*R(sum(p)) = s*R(0,0,0)^T = (0,0,0)^T. (^T means transpose).
More explicitly, suppose sum(p) = (0,0,0,n)^T. Then, sum(R*P) = R*(sum(P)) = R*(0,0,0,n)^T = (0,0,0,n)^T. Sum(T*R*P) = T*(0,0,0,n)^T = (ntx,nty,ntz,n)^T. (Or just look at the 2x4 projection matrix). If we subtract tx or ty from every row, then the residual is (s11,s12,s13;s21,s22,s23)*P. I = s part of matrix + t part of matrix.
Remove Translation (2) Notice this is just the first step of PCA.
Rank Theorem has rank 3. This means there are 3 vectors such that every row of is a linear combination of these vectors. These vectors are the rows of P. S P So, given any object, u and v coordinates of any image of it lie in a 3-dimensional linear subspace.
Lower-Dimensional Subspace ( affine invariant coordinates of point 4 relative to first three. Represent image: ( … n n ) –This representation is complete. –The or coordinates of all images of an object occupy a 1D linear subspace.
Representation is Complete 3D-2D affine transformation is projection in some direction + 2D-2D affine transformation. 2D-2D affine maps first three image points anywhere. So they’re irrelevant to a complete representation. Once we use only affine coordinates, 2D-2D affine transformation no longer matters.
Let m1…mn be 3D model points. Let i1…in be 2D image points Let P be the plane spanned by m1, m2, and m3. Let m4’,… mn’ be the projection of m4…mn onto P. Write the affine coordinates of mi’ relative to m1,m2,m3, as (ai,bi). The image points depend on the viewing direction. For some viewing direction, let i’j be the intersection of the line connecting mj and ij with P. The triangle formed by m4,m4’,i’4 is similar to the one formed by mj,mj’,i’j, for any j. So we have: (m’4-i4’) = rj(m’j-ij’) for some scale factor rj. i’j appears in the same image position as mj. Since i’j is coplanar with p1,p2,p3, it’s affine coordinates are invariant to projection. They are ( j, j). Then, writing m’j, i’j with affine coordinates we have: ( j, j) = (aj,bj) + (( ) – (a4,b4))/rj. Looking at either component, we get a series of linear equations which define a line in alpha or beta space.
Summary Projection can be linearized. So images produced under projection can be linear and low-dimensional. Are these results relevant to surfaces of real 3D objects projected to 2D? –Maybe to features; probably not to intensity images. Why would images of a class of objects be low-dimensional?