# Matrix Factorization with Unknown Noise

## Presentation on theme: "Matrix Factorization with Unknown Noise"— Presentation transcript:

Matrix Factorization with Unknown Noise
Deyu Meng 参考文献： Deyu Meng, Fernando De la Torre. Robust Matrix Factorization with Unknown Noise. International Conference of Computer Vision (ICCV), 2013. Qian Zhao, Deyu Meng, Zongben Xu, Wangmeng Zuo, Lei Zhang. Robust principal component analysis with complex noise, International Conference of Machine Learning (ICML), 2014. My lecture is about matrix factorization with unknown noise.

Low-rank matrix factorization are widely used in computer vision.
Structure from Motion Photometric Stereo (E.g., Zheng et al.,2012) (E.g.,Eriksson and Hengel ,2010) Face Modeling Background Subtraction Low-rank matrix factorization aims to factorize the given data matrix into two smaller matrices, which compose of the subspace information underneath data. Such subspace information can always very useful for many practical computer vision tasks, such as structure from motion, photometric stereo, face modeling, background subtraction and et al.. (E.g., Candes et al.,2012) (E.g. Candes et al.,2012)

Complete, clean data (or with Gaussian noise) SVD: Global solution
Actually, both models have intrinsic limitations. L2 model is only statistically optimal to Gaussian noises and L1 model is optimal to Laplacian noises. But the real noise in data is generally neither Gaussian or Laplacian. This means both models tend to be improper in applications.

Complete, clean data (or with Gaussian noise) SVD: Global solution
There are always missing data There are always heavy and complex noise Actually, both models have intrinsic limitations. L2 model is only statistically optimal to Gaussian noises and L1 model is optimal to Laplacian noises. But the real noise in data is generally neither Gaussian or Laplacian. This means both models tend to be improper in applications.

Pros: smooth model, faster algorithm,
L2 norm model 𝐖⊙(𝐗−𝐔𝐕) 𝑭 Young diagram (CVPR, 2008) L2 Wiberg (IJCV, 2007) LM_S/LM_M (IJCV, 2008) SALS (CVIU, 2010) LRSDP (NIPS, 2010) Damped Wiberg (ICCV, 2011) Weighted SVD (Technometrics, 1979) WLRA (ICML, 2003) Damped Newton (CVPR, 2005) CWM (AAAI, 2013) Reg-ALM-L1 (CVPR, 2013) The matrix factorization tasks are mainly solve by two optimization models, utilize L2 and L1 norm, respectively. Here W denotes the weight matrix, indicating the positions of the missing data entries. There are a bunch of methods proposed for both models. The L2 model methods are generally faster to solve attributed the smoothness of the model. When data contain no missing data, SVD can find the global optimum of the problem. But it is not robust to outliers and heavy non-Gaussian noises, and has local minimum in missing data cases. The L1 model methods have attracted much attention very recently due to their robust performance to outliers. However, the L1 model is non-smooth and the related methods generally very slow. Besides, these methods always cannot perform well in Gaussian noise data. Pros: smooth model, faster algorithm, have global optimum for non- missing data Cons: not robust to heavy outliers

Pros: smooth model, faster algorithm,
L2 norm model L1 norm model 𝐖⊙(𝐗−𝐔𝐕) 𝑭 𝐖⊙(𝐗−𝐔𝐕) 𝟏 Young diagram (CVPR, 2008) L2 Wiberg (IJCV, 2007) LM_S/LM_M (IJCV, 2008) SALS (CVIU, 2010) LRSDP (NIPS, 2010) Damped Wiberg (ICCV, 2011) Weighted SVD (Technometrics, 1979) WLRA (ICML, 2003) Damped Newton (CVPR, 2005) CWM (AAAI, 2013) Reg-ALM-L1 (CVPR, 2013) Torre&Black (ICCV, 2001) R1PCA (ICML, 2006) PCAL1 (PAMI, 2008) ALP/AQP (CVPR, 2005) L1Wiberg (CVPR, 2010, best paper award) RegL1ALM (CVPR, 2012) The matrix factorization tasks are mainly solve by two optimization models, utilize L2 and L1 norm, respectively. Here W denotes the weight matrix, indicating the positions of the missing data entries. There are a bunch of methods proposed for both models. The L2 model methods are generally faster to solve attributed the smoothness of the model. When data contain no missing data, SVD can find the global optimum of the problem. But it is not robust to outliers and heavy non-Gaussian noises, and has local minimum in missing data cases. The L1 model methods have attracted much attention very recently due to their robust performance to outliers. However, the L1 model is non-smooth and the related methods generally very slow. Besides, these methods always cannot perform well in Gaussian noise data. Pros: smooth model, faster algorithm, have global optimum for non- missing data Cons: not robust to heavy outliers Pros: robust to extreme outliers Cons: non-smooth model, slow algorithm, perform badly in Gaussian noise data

L2 model is optimal to Gaussian noise
L1 model is optimal to Laplacian noise But real noise is generally neither Gaussian nor Laplacian Actually, both models have intrinsic limitations. L2 model is only statistically optimal to Gaussian noises and L1 model is optimal to Laplacian noises. But the real noise in data is generally neither Gaussian or Laplacian. This means both models tend to be improper in applications.

Yale B faces: Saturation and shadow noise Camera noise

We propose Mixture of Gaussian (MoG)
Universal approximation property of MoG Any continuous distributions MoG Our idea is to model the noise as a mixture of Gaussian. This is motivated by the universal approximation ability of MoG to any continuous distributions. For example, a Laplacian can be equivalently expressed as a scaled MoG. By this doing, we expect to extend the effective range of the current L2 and L1 matrix factorization methods. (Maz’ya and Schmidt, 1996) E.g., a Laplace distribution can be equivalently expressed as a scaled MoG (Andrews and Mallows, 1974)

MLE Model Use EM algorithm to solve it!

E Step: M Step: 𝐖⊙(𝐗−𝐔𝐕) 𝑭 𝟐

Synthetic experiments
Three noise cases Gaussian noise Sparse noise Mixture noise Six error measurements What L2 and L1 methods optimize Now we show some performance comparison in synthetic data experiments. We designed three series of experiments, on data with Gaussian noise, sparse noise and mixture noise, respectively. Six measurements are used for performance assessment. The first two are actually the objective functions of L2 and L1 models. However, it is easy to see that the last four measures are what we really want to use, which assess the accuracy of the output to the groundtruth information. Good measures to estimate groundtruth subspace

MoG performs similar with L2 methods, better than L1 methods.
Our method L2 methods L1 methods Gaussian noise experiments MoG performs similar with L2 methods, better than L1 methods. Sparse noise experiments MoG performs as good as the best L1 method, better than L2 methods. Our result shows that in Gaussian noise cases, our method can perform as good as other L2 methods, better than L1 methods. In sparse noise cases, our method perform as good as the best L1 method, better than L2 methods. In mixture noise cases, our method is always best among all competing methods. Mixture noise experiments MoG performs better than all L2 and L1 competing methods

Why MoG is robust to outliers?
L1 methods perform well in outlier or heavy noise cases since it is a heavy-tail distribution. Through fitting the noise as two Gaussians, the obtained MoG distribution is also heavy tailed.

Face modeling experiments
We also run some face modeling experiments. Like other methods, our method can remove some occlusions and saturations from the face. While our method can perform better in extracting the diffuse component from face. In such cases the small light spanned in face is like outliers and the face details hiding in shadow are like a Gaussian blur in face. Due to such relatively complicated noises, our method can perform better as expected.

Explanation Saturation and shadow noise Camera noise

Background Subtraction

Background Subtraction

Summary We propose a LRMF model with a Mixture of Gaussians (MoG) noise The new method can well handle outliers like L1- norm methods but using a more efficient way. The extracted noises are with certain physical meanings

Thanks!