Advanced Computer Vision

Advanced Computer Vision
Chapter 2 Image Formation Presented by: 傅楸善 & 翁丞世

Image Formation 2.1 - Geometric primitives and transformations
2.2 - Photometric image formation 2.3 - The digital camera In Chapter 2, we break down image formation into three major components. Geometric image formation (Section 2.1) deals with points, lines, and planes, and how these are mapped onto images using projective geometry and other models (including radial lens distortion). Photometric image formation (Section 2.2) covers radiometry, which describes how light interacts with surfaces in the world, and optics, which projects light onto the sensor plane. Finally, Section 2.3 covers how sensors work, including topics such as sampling and aliasing, color sensing, and in-camera compression.

Image Formation Section 2.1 introduces the basic geometric primitives used throughout the book (points, lines, and planes) and the geometric transformations that project these 3D quantities into 2D image features (Figure 2.1a). Section 2.2 describes how lighting, surface properties (Figure 2.1b), and camera optics (Figure 2.1c) interact in order to produce the color values that fall onto the image sensor. Section 2.3 describes how continuous color images are turned into discrete digital samples inside the image sensor (Figure 2.1d) and how to avoid (or at least characterize) sampling deficiencies, such as aliasing.

2.1 Geometric primitives and transformation
D transformations D transformations D rotations D to 2D projections Lens distortions Before we can intelligently analyze and manipulate images, we need to establish a vocabulary for describing the geometry of a scene. In this chapter, we present a simplified model of such an image formation process.

2.1.1 Geometric Primitives Geometric primitives form the basic building blocks used to describe three-dimensional shapes. Points, Lines, Planes

2.1.1 Geometric Primitives 2D points (pixel coordinates in an image)
Denoted using a pair of values 𝒙= 𝑥,𝑦 = 𝑥 𝑦 ∈ ℛ 2

2.1.1 Geometric Primitives Homogeneous coordinates (or projective coordinates): 𝒙 = 𝑥 , 𝑦 , 𝑤 ∈ 𝒫 2 𝒫 2 = ℛ 3 − 0,0,0 is called the 2D projective space 𝒙 = 𝑥 , 𝑦 , 𝑤 = 𝑤 𝑥, 𝑦, 1 = 𝑤 𝒙 𝒙 = 𝑥, 𝑦, 1 is the augmented vector When 𝑤 is 0 the point represented is a point at infinity

2.1.1 Geometric Primitives 2D lines
2D lines can be represented using homogeneous coordinates 𝒍 =(𝑎,𝑏,𝑐) 𝒙 ∙ 𝒍 =𝑎𝑥+𝑏𝑦+𝑐=0 Normalize the line equation vector 𝒍= 𝑛 𝑥 , 𝑛 𝑦 ,𝑑 = 𝒏 ,𝑑 , 𝒏 =1 𝒏 = 𝑛 𝑥 , 𝑛 𝑦 =(cos𝜃,𝑠𝑖𝑛𝜃) Homogeneous coordinates 齊次座標

2.1.1 Geometric Primitives 2D conics: circle, ellipse, parabola, hyperbola There are other algebraic curves that can be expressed with simple polynomial homogeneous equations Quadric equation: 𝒙 𝑻 𝑄 𝒙 =0 Hyperbola 雙曲線

2.1.1 Geometric Primitives 3D points, similar to 2D points
𝒙= 𝑥,𝑦,𝑧 ∈ ℛ 3 𝒙 = 𝑥 , 𝑦 , 𝑧 , 𝑤 ∈ 𝒫 3 𝒙 = 𝑥,𝑦,𝑧,1

2.1.1 Geometric Primitives 3D planes 𝑚 =(𝑎,𝑏,𝑐,𝑑) 𝒙 ∙ 𝒎 =𝑎𝑥+𝑏𝑦+𝑐𝑧+𝑑=0
Normalize the plane equation vector 𝒎= 𝑛 𝑥 , 𝑛 𝑦 , 𝑛 𝑧 ,𝑑 = 𝒏 ,𝑑 , 𝒏 =1 𝒏 =(𝑐𝑜𝑠𝜃𝑠𝑖𝑛𝜙,𝑠𝑖𝑛𝜃𝑠𝑖𝑛𝜙,𝑐𝑜𝑠𝜙)

2.1.1 Geometric Primitives 3D lines use two points on the line, 𝑝, 𝑞
Any other point on the line can be expressed as a linear combination of these two points 𝒓= 1−𝜆 𝒑+𝜆𝒒

D Transformation Having defined our basic primitives, we can now turn our attention to how they can be transformed.

2.1.2 2D Transformation Translation 𝒙 ′ =𝒙+𝒕
𝒙 ′ = 𝑰 𝒕 𝒙 , 𝑰 is the 2x2 identity matrix 𝒙 ′ = 𝑰 𝒕 0 𝑇 1 𝒙 , makes it possible to chain transformations using matrix multiplication

2.1.2 2D Transformation Rotation + Translation 𝒙 ′ =𝑹𝒙+𝒕
𝒙 ′ = 𝑹 𝒕 𝒙 , 𝑹= cos𝜃 −sin𝜃 sin𝜃 cos𝜃 , 𝑅 𝑅 𝑇 =𝐼 𝑎𝑛𝑑 𝑅 =1

2.1.2 2D Transformation Scaled rotation
Also known as similarity transform 𝑥 ′ =𝑠𝑅𝑥+𝑡 𝑥 ′ = 𝑠 𝑅 𝑡 𝑥 = 𝑎 −𝑏 𝑡 𝑥 𝑏 𝑎 𝑡 𝑦 𝑥 The similarity transform preserve angles between lines

2.1.2 2D Transformation Affine
Written as 𝑥 ′ =𝐴 𝑥 , where 𝐴 is an arbitrary 2x3 matrix 𝑥 ′ = 𝑎 00 𝑎 01 𝑎 02 𝑎 10 𝑎 11 𝑎 𝑥 Parallel lines remain parallel under affine transformation

2.1.2 2D Transformation Projective
Also known as a perspective transform 𝑥 ′ = 𝐻 𝑥 , where 𝐻 is an arbitrary 3x3 matrix 𝑥 ′ = ℎ 00 𝑥+ ℎ 01 𝑦+ ℎ 02 ℎ 20 𝑥+ ℎ 21 𝑦+ ℎ 22 , 𝑦 ′ = ℎ 10 𝑥+ ℎ 11 𝑦+ ℎ 12 ℎ 20 𝑥+ ℎ 21 𝑦+ ℎ 22 Perspective transformations preserve straight lines

D Transformation

D Transformation The set of three-dimensional coordinate transformation is very similar to that available for 2D transformations.

2.1.3 3D Transformation Translation 𝑥 ′ =𝑥+𝑡
𝑥 ′ = 𝐼 𝑡 𝑥 , 𝐼 is the 3x3 identity matrix

2.1.3 3D Transformation Rotation + Translation 𝑥 ′ =𝑅𝑥+𝑡
𝑥 ′ = 𝑅 𝑡 𝑥 , 𝑅 𝑅 𝑇 =𝐼 𝑎𝑛𝑑 𝑅 =1

2.1.3 3D Transformation Scaled rotation 𝑥 ′ =𝑠𝑅𝑥+𝑡 𝑥 ′ = 𝑠 𝑅 𝑡 𝑥
𝑥 ′ = 𝑠 𝑅 𝑡 𝑥 The similarity transform preserves angles between lines and planes

2.1.3 3D Transformation Affine
𝑥 ′ =𝐴 𝑥 , where 𝐴 is an arbitrary 3x4 matrix 𝑥 ′ = 𝑎 00 𝑎 01 𝑎 02 𝑎 10 𝑎 11 𝑎 𝑎 20 𝑎 21 𝑎 𝑎 03 𝑎 13 𝑎 𝑥 Parallel lines and planes remain parallel under affine transformation

2.1.3 3D Transformation Projective
Also known as a perspective transform 𝑥 ′ = 𝐻 𝑥 , where 𝐻 is an arbitrary 4x4 matrix Perspective transformations preserve straight lines

D Transformation

D Rotations The biggest difference between 2D and 3D coordinate transformation is that the parameterization of the 3D rotation matrix R is not as straightforward.

2.1.4 3D Rotations Axis/angle
A rotation can be represented by a rotation axis 𝑛 and an angle 𝜃, or equivalently by a 3D vector 𝜔=𝜃 𝑛 .

D Rotations Project the vector v onto the axis 𝑛 𝑣 ∥ = 𝑛 𝑛 ∙𝑣 = 𝑛 𝑛 𝑇 𝑣 The perpendicular residual of v from 𝑛 𝑣 ⊥ =𝑣− 𝑣 ∥ = 𝐼− 𝑛 𝑛 𝑇 𝑣 Rotate this vector by 90° 𝑣 × = 𝑛 ×𝑣= 𝑛 × 𝑣 𝑛 × = 0 − 𝑛 𝑧 𝑛 𝑦 𝑛 𝑧 0 − 𝑛 𝑥 − 𝑛 𝑦 𝑛 𝑥 0 is the matrix form of the cross product operator

D Rotations Then rotating this vector by another 90° 𝑣 ×× = 𝑛 × 𝑣 × = 𝑛 × 2 𝑣=− 𝑣 ⊥ Hence, 𝑣 ∥ =𝑣− 𝑣 ⊥ =𝑣+ 𝑣 ×× = 𝐼+ 𝑛 × 2 𝑣 The rotate vector u 𝑢 ⊥ =cos𝜃 𝑣 ⊥ +sin𝜃 𝑣 × = sin𝜃 𝑛 × −cos𝜃 𝑛 × 2 𝑣 Final rotate vector u 𝑢= 𝑢 ⊥ + 𝑣 ∥ = 𝐼+sin𝜃 𝑛 × + 1−cos𝜃 𝑛 × 2 𝐼+sin𝜃 𝑛 × + 1−cos𝜃 𝑛 × 2 𝑣

D Rotations We can therefore write the rotation matrix corresponding to a rotation by 𝜃 around an axis 𝑛 as 𝑅 𝑛 ,𝜃 =𝐼+sin𝜃 𝑛 × + 1−cos𝜃 𝑛 × 2

D to 2D Projections We need to specify how 3D primitives are projected onto the image planes. The simplest model is orthography, and the more commonly used model is perspective.

D to 2D Projections An orthography projection simply drop the z components of the three- dimensional coordinate p to obtain the 2D point x. 𝑥= 𝐼 2× 𝑝 In homogeneous coordinates, 𝑥 = 𝑝

D to 2D Projections In practice, world coordinates need to be scaled to fit onto an image sensor. Then scaled orthography is actually more commonly used, 𝑥= 𝑠 𝐼 2×2 0]𝑝

D to 2D Projections A closely related projection model is para-perspective. In this model, object points are first projected onto a local reference parallel to the image plane. Then they are projected parallel to the line of sight to the object center.

D to 2D Projections The most commonly used projection in computer graphics and computer vision is true 3D perspective. Here, points are projected onto the image plane by dividing them by their z component. 𝑥 = 𝒫 𝑧 𝑝 = 𝑥/𝑧 𝑦/𝑧 1

2.1.5 Camera Intrinsics Once we have projected a 3D point through an ideal pinhole using a projection matrix, we must still transform the resulting coordinates according to the pixel sensor spacing and the relative position of the sensor plane to the origin. Intrinsics 本質

2.1.5 Camera Intrinsics The combined 2D to 3D projection can then be written as (xs, ys): pixel coordinates (sx, sy): pixel spacings cs: 3D origin coordinate Rs: 3D rotation matrix Ms: sensor homography matrix Homography 單應性

2.1.5 Camera Intrinsics The relationship between the 3D pixel center p and the 3D camera-centered point 𝑝 𝑐 is given by an unknown scaling 𝛼, 𝑝=𝛼 𝑝 𝑐 . K: calibration matrix (camera intrinsic) pw: 3D world coordinates, 𝑅 𝑡 : camera extrinsic P: camera matrix

2.1.5 Camera Intrinsics

2.1.5 A Note on Focal Lengths

2.1.5 Camera Matrix 3×4 camera matrix
𝐸 is a 3D rigid-body (Euclidean) transformation 𝐾 is the full-rank calibration matrix

2.1.6 Lens Distortions Imaging models all assume that cameras obey a linear projection model where straight lines in the world result in straight lines in the image. Many wide-angle lenses have noticeable radial distortion, which manifests itself as a visible curvature in the projection of straight lines.

2.1.6 Lens Distortions

2.2 Photometric Image Formation
Images are not composed of 2D features, they are made up of discrete color or intensity values. How do they relate to the lighting in the environment, surface properties and geometry, camera optics, and sensor properties? Photometric 光度學

2.2.1 Lighting To produce an image, the scene must be illuminated with one or more light sources. A point light source originates at a single location in space (e.g., a small light bulb), potentially at infinity (e.g., the sun). A point light source has an intensity and a color spectrum, i.e., a distribution over wavelengths L(λ).

2.2.2 Reflectance and Shading
Bidirectional Reflectance Distribution Function (BRDF) describes how much of each wavelength arriving at an incident direction 𝑣 𝑖 is emitted in a reflected direction 𝑣 𝑟 . the angles of the incident and reflected directions relative to the surface frame

2.2.2 Diffuse reflection

2.2.2 Diffuse Reflection vs. Specular Reflection

2.2.2 Phong Shading Phong reflection is an empirical model of local illumination. Combined the diffuse and specular components of reflection. Objects are generally illuminated not only by point light sources but also by a diffuse illumination corresponding to inter-reflection (e.g., the walls in a room) or distant sources, such as the blue sky. Ambient 環境的、周圍的

2.2.3 Optics Once the light from a scene reaches the camera, it must still pass through the lens before reaching the sensor. For many applications, it suffices to treat the lens as an ideal pinhole. Suffice 滿足

2.2.3 Optics

2.2.3 Chromatic Aberration The tendency for light of different colors to focus at slightly different distances. Aberration 脫離常規 Chromatic aberration 色差

2.2.3 Vignetting The tendency for the brightness of the image to fall off towards the edge of the image.

2.3 The Digital Camera CCD: Charge-Coupled Device
CMOS: Complementary Metal-Oxide-Semiconductor A/D: Analog-to-Digital Converter DSP: Digital Signal Processor JPEG: Joint Photographic Experts Group

2.3 Image Sensor CCD: Charge-Coupled Device
Photons are accumulated in each active well during the exposure time. In a transfer phase, the charges are transferred from well to well in a kind of “bucket brigade” until they are deposited at the sense amplifiers, which amplify the signal and pass it to an Analog-to-Digital Converter (ADC). Brigade 旅

2.3 Image Sensor CMOS: Complementary Metal-Oxide-Semiconductor
The photons hitting the sensor directly affect the conductivity of a photodetector, which can be selectively gated to control exposure duration, and locally amplified before being read out using a multiplexing scheme.

2.3 Image Sensor Shutter speed Sampling pitch
The shutter speed (exposure time) directly controls the amount of light reaching the sensor and, hence, determines if images are under- or over-exposed. Sampling pitch The sampling pitch is the physical spacing between adjacent sensor cells on the imaging chip.

2.3 Image Sensor Fill factor Chip size
The fill factor is the active sensing area size as a fraction of the theoretically available sensing area (the product of the horizontal and vertical sampling pitches). Chip size Video and point-and-shoot cameras have traditionally used small chip areas(1/4- inch to 1/2-inch sensors). Digital SLR (Single-Lens Reflex) camera try to come closer to the traditional size of a 35mm film frame. When overall device size is not important, having a larger chip size is preferable, since each sensor cell can be more photo-sensitive. Point-and-shoot 傻瓜相機、全自動相機 SLR camera (single-lens reflex camera) 單眼反光相機

2.3 Image Sensor Analog gain Sensor noise
Before analog-to-digital conversion, the sensed signal is usually boosted by a sense amplifier. Sensor noise Throughout the whole sensing process, noise is added from various sources, which may include fixed pattern noise, dark current noise, shot noise, amplifier noise and quantization noise.

2.3 Image Sensor ADC resolution Digital post-processing
Resolution: how many bits it yields noise level: how many of these bits are useful in practice Digital post-processing To enhance the image before compressing and storing the pixel values.

2.3.1 Sampling and Aliasing

2.3.1 Nyquist Frequency 𝑓 𝑠 ≥2 𝑓 max
The maximum frequency in a signal is known as the Nyquist frequency and the inverse of the minimum sampling frequency 𝑟 𝑠 =1/ 𝑓 𝑠 known as the Nyquist rate.

2.3.1 Nyquist Frequency

2.3.2 Color

2.3.2 CIE RGB In the 1930s, Commission Internationale de L'Eclairage (CIE) standardized the RGB representation by performing such color matching experiments using the primary colors of red (700.0nm wavelength), green (546.1nm), and blue (435.8nm). For certain pure spectra in the blue–green range, a negative amount of red light has to be added.

2.3.2 CIE RGB

2.3.2 XYZ The transformation from RGB to XYZ is given by
Chromaticity coordinates

2.3.2 XYZ

2.3.2 L*a*b* Color Space CIE defined a non-linear re-mapping of the XYZ space called L*a*b* (also sometimes called CIELAB) L*: lightness

2.3.2 Color Cameras Each color camera integrates light according to the spectral response function of its red, green, and blue sensors L(λ): incoming spectrum of light at a given pixel SR(λ), SB (λ), SG (λ): the red, green, and blue spectral sensitivities of the corresponding sensors

2.3.2 Color Filter Arrays

2.3.2 Bayer Pattern Green filters over half of the sensors (in a checkerboard pattern), and red and blue filters over the remaining ones. The reason that there are twice as many green filters as red and blue is because the luminance signal is mostly determined by green values and the visual system is much more sensitive to high frequency detail in luminance than in chrominance. The process of interpolating the missing color values so that we have valid RGB values for all the pixels is known as demosaicing.

2.3.2 Color Balance (Auto White Balance)
Move the white point of a given image closer to pure white. If the illuminant is strongly colored, such as incandescent indoor lighting (which generally results in a yellow or orange hue), the compensation can be quite significant. Incandescent 發白光的、發光的

2.3.2 Gamma The relationship between the voltage and the resulting brightness was characterized by a number called gamma (γ), since the formula was roughly 𝐵= 𝑉 γ . 𝛾=2.2

2.3.2 Gamma To compensate for this effect, the electronics in the TV camera would pre- map the sensed luminance Y through an inverse gamma 𝑌 ′ = 𝑌 1 𝛾 1 𝛾 =0.45

2.3.2 Gamma

2.3.2 Other Color Spaces YUV 𝑅 ′ 𝐺 ′ 𝐵′: the triplet of gamma-compressed color components Mnemonics [ni-mon-iks] 記憶術

2.3.2 Other Color Spaces JPEG standard uses the full eight-bit range with no reserved values 𝑅 ′ 𝐺 ′ 𝐵′: the eight-bit gamma-compressed color components The Cb and Cr signals carry the blue and red color difference signals and have more useful mnemonics than UV.

2.3.2 Other Color Spaces HSV: hue, saturation, value
A projection of the RGB color cube onto a non-linear chroma angle, a radial saturation percentage, and a luminance-inspired value.

2.3.2 Other Color Spaces

2.3.3 Compression All color video and image compression algorithms start by converting the signal into YCbCr (or some closely related variant), so that they can compress the luminance signal with higher fidelity than the chrominance signal. In video, it is common to subsample Cb and Cr by a factor of two horizontally; with still images (JPEG), the subsampling (averaging) occurs both horizontally and vertically. Fidelity 準確性

2.3.3 Compression Once the luminance and chrominance images have been appropriately subsampled and separated into individual images, they are then passed to a block transform stage. The most common technique used here is the Discrete Cosine Transform (DCT), which is a real-valued variant of the Discrete Fourier Transform (DFT).

2.3.3 Compression After transform coding, the coefficient values are quantized into a set of small integer values that can be coded using a variable bit length scheme such as a Huffman code or an arithmetic code.

2.3.3 Compression JPEG (Joint Photographic Experts Group)

2.3.3 PSNR (Peak Signal-to-Noise Ratio)
The quality of a compression algorithm MSE (Mean Square Error) = 1 𝑛 𝑥 𝐼 𝑥 − I 𝑥 2 𝐼(𝑥): the original uncompressed image 𝐼 𝑥 : its compressed counterpart RMS Root Mean Square Error = 𝑀𝑆𝑅 PSNR=10 log 𝐼 𝑚𝑎𝑥 2 𝑀𝑆𝐸 =20 log 𝐼 𝑚𝑎𝑥 𝑅𝑀𝑆

Project due March 22 Camera calibration i.e. compute #pixels/mm object displacement Use lens of focal length: 16mm, 25mm, 55mm Object displacement of: 1mm, 5mm, 10mm, 20mm Object distance of: 0.5m, 1m, 2m Camera parameters: 8.8mm * 6.6mm ==> 512*485 pixels Are pixels square or rectangular?

Calculate theoretical values and compare with measured values.
Calculate field of view in degrees of angle.

Advanced Computer Vision

Similar presentations

Presentation on theme: "Advanced Computer Vision"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advanced Computer Vision

Similar presentations

Presentation on theme: "Advanced Computer Vision"— Presentation transcript:

Similar presentations

About project

Feedback