Probabilistic video stabilization using Kalman filtering and mosaicking.
Published byModified over 4 years ago
Presentation on theme: "Probabilistic video stabilization using Kalman filtering and mosaicking."— Presentation transcript:
Probabilistic video stabilization using Kalman filtering and mosaicking
ABSTRACT The removal of unwanted, parasitic vibrations in a video sequence induced by camera motion is an essential part of video acquisition. We present a new image processing method to remove such vibrations and reconstruct a video sequence void of sudden camera movement.
INTRODUCTION-1 An approach (optical stabilization) consists of implementing an optical system that compensates for unwanted camera motion using motion sensor and active optical system. The most powerful, but makes video cameras significantly expensive.
INTRODUCTION-2 This paper is focus on another approach consists in performing post-processing of the video sequence to eliminate unwanted motion in the video (swings and twists) caused by a person holding the camera or mechanical vibration.
VIDEO STABILIZATION AND RECONSTRUCT FRAMEWORK-1 The overall algorithm consists of the following steps: 1.Video sequence stabilization 1)Estimation of the pair-wise transformations between adjacent frames. 2)Estimation of the intentional motion parameters (Kalman filtering in time). 3)Compensation of each frame for unwanted motion (frame warping).
VIDEO STABILIZATION AND RECONSTRUCT FRAMEWORK-2 2.Reconstruction of undefined regions using mosaicking: 1)Estimation of the transformation between distant frame. 2)Warping distant frames and constructing mosaic for undefined regions in each frame.
Estimation of the pair-wise transformations between adjacent frames-1 Under an affine transformation, pixel locations in frames and are related by a transformation given by where and are pixel coordinates before and after transformation respectively. elements of matrix A describe zoom, rotation and dolly motion of the camera, and vector b describes panning and tracking motion.
Estimation of the pair-wise transformations between adjacent frames-2 Transform of,aligning frames and,is estimated by minimizing the following cost function with respect to where m=1 and is the set of all locations in the image plane for which transformed coordinates lie in the limits of the valid image coordinate. The choice of function φ(x) is crucial for robustness of the transformation.
Estimation of the pair-wise transformations between adjacent frames-3 Here we use an approximation to the -norm given by with β=0.01,insures differentiability of the cost function near zero; p=1 chosen empirically from several test sequences… In order to avoid local minima of the cost function (2) and accelerate the convergence we use a multi-scale implement.
Estimating intentional motion parameters-1 The cumulative transform for frame n, denoted by,can be obtained as follows: Elements of matrix A describe zoom, rotation and dolly motion of the camera, and vector b describes panning and tracking motion. Similarly, we describe the image transform parameters representing intentional motion in terms of intentional cumulative transform
Estimating intentional motion parameters-2 Optimal estimation of is carry out using a recursive Kalman filtering algorithm. We treat as noisy observations of intentional cumulative transform parameters obeying physics-based dynamic model. The state model for each of the parameters depends on real-life expected behavior of these parameters. Two distinct behavior patterns can be identified leading to different dynamic models for different parameters.
Two distinct behavior patterns leading to different dynamic models for different parameters We introduce velocity variables for each,respectively. It is reasonable to assume the independence of dynamic models for each of the 4 parameters. For example, and follow the dynamic model given by where is white Gaussian noise with variance.
Two distinct behavior patterns leading to different dynamic models for different parameters The remaining parameters and are assumed to be constant in the absence of noise. The simple dynamic model for and we have (for )
The overall state-space model for the intentional cumulative transform parameters is given by variance of the noise term is different for each kind of variable.
The observed cumulative transform parameters are treated as noisy observations of the intentional cumulative transform parameters. The observation model for each parameters is independent, leading to observation model Observation noise variances describe the variability of unwanted transformation between frames.
Compensation of each frame for unwanted transformation (frame warping) Resulting transform is given by where and are initial and transformed coordinates in frame n. Using (9), a warped frame is computed as follows p.s. computing image values at non- integer locations in (10) is carried out by cubic interpolation.
Reconstruction of undefined regions using mosaicking After compensating transformation is applied to each frame, undefined regions appear near the edge of each frame. The extent of these regions varies from frame to frame and presents unacceptable visual artifacts. Use frame trimming and magnification or filling by a constant value, lead to severe quality degradation of the resulting video and limit the range of possible correcting transformations. Here we propose to use mosaicking for each frame in order to exploit temporal correlations between frames.
Estimation of the transformation between distant frames-1 In order to properly align up to M future and past frames with respect to the current warped frame n, we need to find registration parameters of these frames with respect to the current frame. For a given frame n, as initial conditions, we sequentially estimate the global transform parameters between frame n and n ± m, where 2≤ m ≤M. For each m, cascaded transforms and are used to initialize the solution for.
Estimation of the transformation between distant frames-2 The coordinate transformation obtained using cascaded transforms is given by For instance, transformation for past frame (-m) with respect to frame n can be found by inverting the registration of frame n as a future frame (n - m)
Warping distant frames and composing mosaic for regions in each frame-1 Each frame out of 2M neighboring frames is aligned with respect to the warped current frame given by (10). Aligning transform for frame is formed by cascading inverted registration transform with the correcting transform defined in (9). The resulting warping transform is given by
Warping distant frames and composing mosaic for regions in each frame-2 And the warped frame is computed as follows For each undefined pixel x in the target frame,the reconstructed image value is found as follows where the weights are set to the inverse of the errors of registrationobtained by minimizing (2).
RESULTS We test our technique on 3 real-life video sequences (which we call A, B and C). ABCABC To simplify the task we modify the motion model, first we assume only translational motion between described by vector b. Using this model, the cumulative transform parameters are given by the components of for sequence A are shown in Figure3.
CONCLUSIONS Using our technique we obtained promising preliminary results on random test sequences with complex motion and severe vibrations. We compared our results with one of commercial products and showed a significant improvement of performance for our technique. Compared A and B. ABAB Our method of stabilization can be easily adapted to perform additional processing, such as sampling rate conversion, static mosaic construction, ego-motion estimation.