Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding and evaluating blind deconvolution algorithms

Similar presentations


Presentation on theme: "Understanding and evaluating blind deconvolution algorithms"— Presentation transcript:

1 Understanding and evaluating blind deconvolution algorithms
Anat Levin1,2, Yair Weiss1,3, Fredo Durand1, Bill Freeman1,4 1MIT CSAIL, 2Weizmann Institute, 3Hebrew University, 4Adobe

2 ? ? Blind deconvolution Rich literature, no perfect solution
kernel blurred image sharp image Rich literature, no perfect solution Fergus et al. 06, Levin 06, Jia 07, Joshi et al. 08, Shan et al. 08 In this talk: No new algorithm What makes blind deconvolution hard? Quantitatively evaluate recent algorithms on the same dataset Images captured by a hand healed camera are often degraded due to camera shake. Under certain assumptions the blur can be expressed as a convolution of a sharp image and a blur kernel, where both the sharp image and blur kernel are unknown. The blind deconvolution task is to recover both the sharp image and the blur kernel from a single input image. Blind deconvolution is a long standing signal processing problem which has attracted a lot of attention. Recently it gained renewed interest in our community. Despite the rich literature the problem is far from being solved. In this talk I am not going to present any new algorithm either. Instead, our goal is to analyze the major components of the blind deconvolution problem. We want to understand what is actually making the problem hard, and which aspects should attract future research efforts. Then we will also quantitatively evaluate recent blind deconvolution algorithms on the same data.

3 Unknown, need to estimate
Blind deconvolution blurred image blur kernel sharp image noise Input (known) Unknown, need to estimate ? In blind deconvolution we are given a blurred image y which is the convolution of a sharp image x with a blur kernel k. both x and k are unknown and therefore the number of unknowns is larger than the number of constraints and the problem is ill posed. For example, this slide demonstrates two perfectly valid solutions. The first one is the desired solution. A second possible interpretation is to say that there was no blur- k is a delta kernel and the sharp image x is equal to the observed blurry image y. note that this solution perfectly satisfies the convolution equation. Since there is an infinite space of possible solutions, a common approach is to add a prior that will help us favor an image x which looks like a natural image. ?

4 Natural image priors -|x|0.5 -|x|0.25 x x
Gaussian: -x2 Laplacian: -|x| -|x|0.5 Log prob -|x|0.25 x x Derivative histogram from a natural image Parametric models One strong property of natural images is the sparse derivatives distribution. If we plot the log histogram of derivatives in a natural image, we note that we can fit it with a parametric model of the form absolute derivative value to the power of alpha, when alpha is smaller than one. And an exponential distribution with alpha smaller than one is sparse. Derivative distributions in natural images are sparse:

5 Sparse priors in image processing
Denoising Simoncelli et al., Roth&Black Inpainting Sapiro et al., Levin et al. Super resolution Tappen et al. Transparency Levin et al. Demosaicing Tappen et al., Hel-Or et al. Non blind deconvolution Sparse priors are a very popular tool in image processing and have been used for a large number of applications. We will try to check if this prior can help solve the blind deconvolution problem as well.

6 Naïve MAPx,k estimation
Find a kernel k and latent image x minimizing: Convolution constraint Sparse prior Should favor sharper x explanations The simplest approach to blind deconvolution is a MAP estimation of the latent image x and kernel k. It means that we seek a pair x k that will maximize the posterior probability given the observed image y. It reduces to finding a pair that will satisfies the convolution constraints up to noise, and will have sparse derivatives, that is, minimize the sum of derivatives to the power of alpha So now that we have a good prior we can “of course” solve blind deconvolution, and we expect that optimizing a MAPxk score will provide the desired sharp image…

7 P( , )>P( , ) The MAPx,k paradox Claim 1:
kernel kernel Latent image Latent image Claim 1: Let be an arbitrarily large image sampled from a sparse prior , and Then the delta explanation is favored But actually, we got a problem. It turns out that the delta explanation is more probable than the correct one. The probability of the blurred image plus delta kernel pair, is higher than the probability of the sharp image with the correct kernel. In fact, in the paper we prove that always happens: if you are given an arbitrary large image x sampled exactly from a sparse prior and a blurred version y, than the explanation in which we set the image as the observed blurred image y and the kernel as delta , is more probable than the correct explanation with the true sharp image and true kernel. That means that the MAPxk approach will always fail.

8 ? The MAPx,k failure sharp blurred
To illustrate this we took here a natural image and blurred it with a Gaussian filter. We then consider local windows from each image and compare the sparsity measure of the blurred and sharp windows. That is, we compare the sum of derivatives to the power of alpha. If the sharp explanation is more probable, the sum of derivatives in the sharp window should be lower.

9 The MAPx,k failure Red windows = [ p(sharp x) >p(blurred x) ]
15x15 windows 25x25 windows 45x45 windows simple derivatives [-1,1],[-1;1] FoE filters (Roth&Black) We mark in read all local windows in which the sharp explanation is more probable. One can see that that actually happens only in a small percent of the windows. If we increase the window size to 25x25 pixels there is an even lower percent of windows at which sharp is favored. When the window size is as large as 45x45 pixels, the blurred version is already favored over 100% of the windows. We see that in contrast to our intuition the sparse prior actually favors the blurred image and not the sharp one. So far we have evaluated only simple first derivatives computed with the [-1 1] filter. However, the problem is consistent with more sophisticated filters as well. For example we demonstrate here the field of experts’ filters of Roth and Black. Again the blurred image is favored. Red windows = [ p(sharp x) >p(blurred x) ]

10 The MAPx,k failure - intuition
> k=[0.5,0.5] P(step edge) P(blurred step edge) cheaper sum of derivatives: < P(impulse) P(blurred impulse) To understand why the sparse prior favors the blurred image we consider toy examples. The first example is a step edge and its blurred version. In the sharp edge we have a single derivative with absolute value 1. In the smooth edge the edge contrast is divided between two pixels and we have two derivatives with magnitude now if we measure the sum of the derivatives to the power of alpha=0.5, for the sharp edge we pay 1, but for the smooth edge we pay 1.4. In this case the sharp edge is cheaper. This is the canonical example at which the sparse prior indeed favors the sharp signal, as we might have originally thought. But now we consider a second type of signal, an impulse. For the sharp impulse we have two derivatives of magnitude 1. When we blur this signal the height of the impulse is reduced and we now have two derivatives with magnitude only So the contrast of the smooth signal is lower, and no wonders that the sum of derivatives is cheaper. Therefore, the smoothed impulse signal is more probable than the sharp one. cheaper sum of derivatives:

11 Blur reduces derivative contrast
< P(sharp real image) P(blurred real image) cheaper Now what happens in real images? Here we plot a row from a natural image, and the derivative signal. Real images contain a lot of low resolution texture which behaves like impulse signals –the contrast is reduced by blur. When we blur a natural image we essentially reduce the total derivatives contrast and therefore a blurred image signal achieves a higher probability than a sharp one. Real image row: Noise and texture behave as impulses - total derivative contrast reduced by blur

12 Why does MAPx,k fail? Too few measurements? Fails even with infinitely large image Wrong prior? Fails even for signals sampled from the prior Choice of estimator We have seen that a sparse prior with a MAPxk fails, and cannot favor the sharp solution. What is actually wrong? One reason we could think of is that we don’t have enough measurements? But we prove that the problem is there even with infinitely large image. A second explanation could be that maybe the sparse prior is wrong? But we show that the problem remains even if we directly sample a signal from the sparse prior, and hence the prior is perfectly correct in the generative sense. At the second part of this talk we will show that the problem is the choice of estimation rule.

13 x k y k y x k y argmax P( , | ) argmax P( | ) = P( , | )dx
MAPk estimation MAPx,k- estimate x,k simultaneously argmax P( , | ) x k y MAPk- estimate k alone, marginalize x argmax P( | ) k y So far we have discussed MAP estimation on both x and k simultaneously, and have seen that this approach fails. We now consider a second estimator- MAP on the kernel k alone. That means that for every candidate k, we evaluate P(k|y) which is obtained by marginalizing P(x,k|y) over all possible images x. The main difference is that we now consider the volume of all x explanations and not only the probability of a single best x. = P( , | )dx x k y

14 Results in this paper: Let be an arbitrarily large image sampled from a sparse prior , and Then Claim 1- MAPx,k estimator fails: The delta explanation is favored Claim 2- MAPk estimator succeeds: is maximized by the true kernel In the first part of the talk we showed that given an arbitrarily large image x sampled from a sparse prior and a blurred version y, MAP estimation of both x and k will favor the delta explanation over the correct one. In the paper we also prove that under the exact same conditions, with a sufficiently large image, if we do MAP estimation of k alone, while marginalizing over x, the estimator succeeds and favors the true kernel.

15 Intuition: dimensionality asymmetry
MAPx,k– Estimation unreliable Number of measurements always lower than number of unknowns: #y<#x+#k MAPk – Estimation reliable Many measurements for large images: #y>>#k kernel k blurred image y sharp image x ~105 measurements Large, ~105 unknowns Small, ~102 unknowns The main factor which makes MAP estimation of k alone succeed and a simultaneous MAP estimation of both x and k fail, is the strong asymmetry in the dimensionality of the two unknowns- the dimensionality of latent image x increase with the size of the input image y. however, the size of the blur kernel is fixed, and is usually way smaller than the image size. The limitation of MAP estimators in the case of few measurements is well known in estimation theory and statistical signal processing. However, estimation theory also tells us that MAP estimators provably converge to the true solution given a sufficiently large number of measurements. When we attempt to do MAP on both x and k, the number of unknowns is the size of x plus the size of k, and this is always higher than the number of measurements, the number of pixels in y. and therefore the estimation is unreliable. In contrast, when we do MAP on k alone, for sufficiently large image the number of unknowns is way smaller than the number of measurements, because k is way smaller than the image size. Therefore the estimation is reliable and we can recover the correct kernel.

16 Approximate MAPk strategies
Marginalization on x is challenging to compute Approximation strategies: Independence assumption in derivatives space: Levin NIPS06 Variational approximation: Miskin and Mackay 00, Fergus et al. SIGGRAPH06 Laplace approximation: Brainard and Freeman 97, Bronstein et al. 05 While we have proved that MAP estimation of k alone can favor the true kernel, evaluating it in practice is not simple because marginalizing over all x explanations is computationally expensive, but there are several approximation strategies. In fact, in the paper we show that a number of recent blind deconvolution algorithms are essentially approximated MAPk estimator.

17 Evaluation on 1D signals
Exact MAPk Favors delta solution MAPx,k Favor correct solution despite wrong prior! So far we have discussed the theory of MAPxk and MAPk estimators. We now move to evaluate blind deconvolution strategies. In the first evaluation we used 1D row signals from a natural image. The advantage of 1D signals is that we could compute the MAP score of k alone, and marginalize over x exactly, while for 2D images one must use some approximation. In this experiment we blurred the signal with a 5 taps box filter and assumed we know the correct kernel is a box filter. Hence the only unknown parameter is the box width and we perform and excusive search. With a MAP on k alone the negative log likelihood achieves a minimum at the correct kernel width of 5. However, as we discussed before, MAP on both x and k favors the delta solution at width 1. Another option that we consider in the paper is to evaluate MAP on the kernel with a Gaussian prior on the image derivatives instead of a sparse one. The advantage of a Gaussian prior is that the MAP on k score can be evaluated in closed form. It is well known that the distribution of natural images is highly non Gaussian. However, despite the wrong prior, Gaussian MAP on the kernel favors the correct kernel width. This stands in contrast to the failure of MAP on both x and k with the correct sparse prior, and it suggests that the estimation strategy is essentially more important than using the correct prior. The last score we consider is a variational approximation to MAP on k with sparse prior, this is essentially the approach taken by Fergus et al. and this approximation also favors the correct box kernel width. MAPk, Gaussian prior MAPk, variational approximation (Fergus et al.)

18 Ground truth data acquisition 4 images x 8 kernels = 32 test images
We now move to evaluation on 2D images. We captured motion blur data with ground truth, and we make this data publicly available online. The data consists of 4 images times 8 kernels, 32 test images in total. 4 images x 8 kernels = 32 test images Data available online:

19 Comparison MAPx,k Fergus et al. SIGGRAPH06 MAPk, variational approx.
Ground truth Shan et al. SIGGRAPH08 adjusted MAPx,k MAPk, Gaussian prior Here is one test image from the data. We have implemented a MAP on x and k algorithm and as expected it results in the delta kernel. The Fergus algorithm approximates MAP on k, and it successfully recovers the kernel and sharp image. We also tried the algorithm of Shan etal. This algorithm seemingly do MAP on both x and k, but it has many additional components built into it which are hard to analyze but apparently do have an effect on the results. Finally, a Gaussian MAP on k achieves something which resembles the right kernel, but the result is of a lower quality, because it is still using the wrong prior. Note that for the Fergus and Shan results we used the authors’ code.

20 Evaluation Fergus, variational MAPk Shan et al. SIGGRAPH08
MAPx,k sparse prior MAPk, Gaussian prior 100 Successes percent 80 60 40 Here we plot the accumulated histogram of success for the different algorithms. In bin r we plot the percent of test examples for which an algorithm achieves a deconvolution error below threshold r. empirically we observe that error ratios above threshold 2-3 are already visually implausible. The algorithm of Fergus etal outperforms all existing alternatives by far. Fergus is doing something acceptable for 70% of the images while all others succeed on less than 30%. The second best algorithm is that of Shan etal. A Gaussian MAP on k isn’t doing so great because it is using a wrong prior and natural images are still highly non Gaussian. However, it is still doing better than MAP on x and k with sparse prior. This is demonstrating again that the choice of estimator is more important than the prior itself. 20 Cumulative histogram of deconvolution successes : bin r = #{ deconv error < r }

21 Problem: uniform blur assumption is unrealistic
Before we conclude, one last thing that I would like to point out. Most existing blind deconvolution algorithms assume that the blur is spatially uniform over the image and can be expressed as a convolution. However our attempts to capture blurred images with a hand held camera showed that this model is unrealistic. For example this slide show the traces of 4 points at 4 corners of a planar object, and one can see that the blur is different at each corner. Note that Fergus et al preformed a similar experiment in their siggraph 2006 paper but reached the opposite conclusion. For the evaluation in this paper we made a special effort to capture images with spatially uniform blur, because this assumption is made by all the algorithms we wanted to evaluate. Yet, we predict that to make blind deconvolution work in practice on real examples, we will have to come up with blind deconvolution algorithms that relax the uniform blur assumption. Variation of dot traces at 4 corners Note: opposite conclusion by Fergus et al., 2006

22 Ground truth data available online
Summary Good estimator is more important than correct prior: - MAPk approach can do deconvolution even with Gaussian prior - MAPx,k approach fails even with sparse prior Spatially uniform blur assumption is invalid Compare blind deconvolution algorithms on the same dataset, Fergus et al. 06 significantly outperforms all alternatives Ground truth data available online To summarize, in this talk we have discussed the difference between MAP on k alone and a simultaneous MAP on both x and k and we see that the choice of estimator is more important than the correct prior. MAP on k can succeed even with a wrong Gaussian prior, and MAP on both x and k fails even with the correct sparse prior. We also suggest that the common model of spatially uniform blur is unrealistic We have collected ground truth data and quantitatively compared existing blind deconvolution algorithms. We have seen that the algorithm of Fergus significantly outperforms all existing alternatives. Finally we make our data publicly available online and we encourage other researchers to use it to evaluate other deconvolution algorithms.


Download ppt "Understanding and evaluating blind deconvolution algorithms"

Similar presentations


Ads by Google