 # Probabilistic video stabilization using Kalman filtering and mosaicking.

## Presentation on theme: "Probabilistic video stabilization using Kalman filtering and mosaicking."— Presentation transcript:

Probabilistic video stabilization using Kalman filtering and mosaicking

ABSTRACT  The removal of unwanted, parasitic vibrations in a video sequence induced by camera motion is an essential part of video acquisition.  We present a new image processing method to remove such vibrations and reconstruct a video sequence void of sudden camera movement.

INTRODUCTION-1  An approach (optical stabilization) consists of implementing an optical system that compensates for unwanted camera motion using motion sensor and active optical system.  The most powerful, but makes video cameras significantly expensive.

INTRODUCTION-2  This paper is focus on another approach consists in performing post-processing of the video sequence to eliminate unwanted motion in the video (swings and twists) caused by a person holding the camera or mechanical vibration.

VIDEO STABILIZATION AND RECONSTRUCT FRAMEWORK-1  The overall algorithm consists of the following steps: 1.Video sequence stabilization 1)Estimation of the pair-wise transformations between adjacent frames. 2)Estimation of the intentional motion parameters (Kalman filtering in time). 3)Compensation of each frame for unwanted motion (frame warping).

VIDEO STABILIZATION AND RECONSTRUCT FRAMEWORK-2 2.Reconstruction of undefined regions using mosaicking: 1)Estimation of the transformation between distant frame. 2)Warping distant frames and constructing mosaic for undefined regions in each frame.

The block diagram of overall algorithm

Estimation of the pair-wise transformations between adjacent frames-1  Under an affine transformation, pixel locations in frames and are related by a transformation given by where and are pixel coordinates before and after transformation respectively. elements of matrix A describe zoom, rotation and dolly motion of the camera, and vector b describes panning and tracking motion.

Estimation of the pair-wise transformations between adjacent frames-2  Transform of,aligning frames and,is estimated by minimizing the following cost function with respect to where m=1 and is the set of all locations in the image plane for which transformed coordinates lie in the limits of the valid image coordinate.  The choice of function φ(x) is crucial for robustness of the transformation.

Estimation of the pair-wise transformations between adjacent frames-3  Here we use an approximation to the -norm given by with β=0.01,insures differentiability of the cost function near zero; p=1 chosen empirically from several test sequences…  In order to avoid local minima of the cost function (2) and accelerate the convergence we use a multi-scale implement.

Estimating intentional motion parameters-1  The cumulative transform for frame n, denoted by,can be obtained as follows: Elements of matrix A describe zoom, rotation and dolly motion of the camera, and vector b describes panning and tracking motion.  Similarly, we describe the image transform parameters representing intentional motion in terms of intentional cumulative transform

Estimating intentional motion parameters-2  Optimal estimation of is carry out using a recursive Kalman filtering algorithm.  We treat as noisy observations of intentional cumulative transform parameters obeying physics-based dynamic model.  The state model for each of the parameters depends on real-life expected behavior of these parameters.  Two distinct behavior patterns can be identified leading to different dynamic models for different parameters.

Two distinct behavior patterns leading to different dynamic models for different parameters  We introduce velocity variables for each,respectively.  It is reasonable to assume the independence of dynamic models for each of the 4 parameters. For example, and follow the dynamic model given by where is white Gaussian noise with variance.

Two distinct behavior patterns leading to different dynamic models for different parameters  The remaining parameters and are assumed to be constant in the absence of noise.  The simple dynamic model for and we have (for )

 The overall state-space model for the intentional cumulative transform parameters is given by variance of the noise term is different for each kind of variable.

 The observed cumulative transform parameters are treated as noisy observations of the intentional cumulative transform parameters.  The observation model for each parameters is independent, leading to observation model  Observation noise variances describe the variability of unwanted transformation between frames.

Compensation of each frame for unwanted transformation (frame warping)  Resulting transform is given by where and are initial and transformed coordinates in frame n.  Using (9), a warped frame is computed as follows p.s. computing image values at non- integer locations in (10) is carried out by cubic interpolation.

Reconstruction of undefined regions using mosaicking  After compensating transformation is applied to each frame, undefined regions appear near the edge of each frame.  The extent of these regions varies from frame to frame and presents unacceptable visual artifacts.  Use frame trimming and magnification or filling by a constant value, lead to severe quality degradation of the resulting video and limit the range of possible correcting transformations.  Here we propose to use mosaicking for each frame in order to exploit temporal correlations between frames.

Mosaicking illustration

Estimation of the transformation between distant frames-1  In order to properly align up to M future and past frames with respect to the current warped frame n, we need to find registration parameters of these frames with respect to the current frame.  For a given frame n, as initial conditions, we sequentially estimate the global transform parameters between frame n and n ± m, where 2≤ m ≤M.  For each m, cascaded transforms and are used to initialize the solution for.

Estimation of the transformation between distant frames-2  The coordinate transformation obtained using cascaded transforms is given by  For instance, transformation for past frame (-m) with respect to frame n can be found by inverting the registration of frame n as a future frame (n - m)

Warping distant frames and composing mosaic for regions in each frame-1  Each frame out of 2M neighboring frames is aligned with respect to the warped current frame given by (10).  Aligning transform for frame is formed by cascading inverted registration transform with the correcting transform defined in (9).  The resulting warping transform is given by

Warping distant frames and composing mosaic for regions in each frame-2  And the warped frame is computed as follows  For each undefined pixel x in the target frame,the reconstructed image value is found as follows where the weights are set to the inverse of the errors of registrationobtained by minimizing (2).

RESULTS  We test our technique on 3 real-life video sequences (which we call A, B and C). ABCABC  To simplify the task we modify the motion model, first we assume only translational motion between described by vector b.  Using this model, the cumulative transform parameters are given by the components of for sequence A are shown in Figure3.

Cumulative motion parameters for sequence A

 Assume static camera (performing “ total motion compensation ” ), the correcting transform becomes the result of applying such compensating transform is illustrated in Figure4.

 In the figure, it can be seen that landmark objects in the corrected sequence do not move with respect to the frame coordinates, while rotational vibrations remain uncorrected. Shift A A

Full 6-parameter inter-frame affine motion model

Full result of A, B, and C. ABCABC

CONCLUSIONS  Using our technique we obtained promising preliminary results on random test sequences with complex motion and severe vibrations.  We compared our results with one of commercial products and showed a significant improvement of performance for our technique. Compared A and B. ABAB  Our method of stabilization can be easily adapted to perform additional processing, such as sampling rate conversion, static mosaic construction, ego-motion estimation.