Presentation on theme: "Nonsmooth Nonnegative Matrix Factorization (nsNMF) Alberto Pascual-Montano, Member, IEEE, J.M. Carazo, Senior Member, IEEE, Kieko Kochi, Dietrich Lehmann,"— Presentation transcript:
1 Nonsmooth Nonnegative Matrix Factorization (nsNMF) Alberto Pascual-Montano, Member, IEEE, J.M. Carazo, Senior Member, IEEE, Kieko Kochi, Dietrich Lehmann, and Roberto D. Pascual-Marqui 2006,IEEEPresenter : 張庭豪
2 Outline NONSMOOTH NMF (nsNMF) EXPERIMENTS CONCLUSIONS AND DISCUSSION INTRODUCTIONREVIEW OF NMF AND ITS SPARSE VARIANTSNONSMOOTH NMF (nsNMF)EXPERIMENTSCONCLUSIONS AND DISCUSSION
3 INTRODUCTIONNonnegative matrix factorization (NMF) has been introduced as a matrixfactorization technique that produces a useful decomposition in theanalysis of data.This method results in a reduced representation of the original data thatcan be seen either as a feature extraction or a dimensionality reductiontechnique.More importantly, NMF can be interpreted as a parts-based representationof the data due to the fact that only additive, not subtractive,combinations are allowed.
4 INTRODUCTIONFormally, the nonnegative matrix decomposition can be described as follows:V ≈WHwhere 𝑽∈ 𝑹 𝒑×𝒏 is a positive data matrix with p variables and n objects,W∈ 𝑹 𝒑×𝒒 are the reduced q basis vectors or factors, and 𝐇∈ 𝑹 𝒒×𝒏 containsthe coefficients of the linear combinations of the basis vectors needed toreconstruct the original data (also known as encoding vectors).
5 INTRODUCTIONIn fact, taking a closer look at the basis and encoding vectors producedby NMF, it is noticeable that there is a high degree of overlapping amongbasis vectors that contradict the intuitive nature of the “parts”.In this sense, a matrix factorization technique capable of producing morelocalized, less overlapped feature representations of the data is highlydesirable in many applications.The new method, here referred to as Nonsmooth Nonnegative MatrixFactorization (nsNMF), differs from the original in the use of an extrasmoothness matrix for imposing sparseness. The goal of nsNMF is to findsparse structures in the basis functions that explain the data set.
6 REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS V ≈WH 𝑽∈ 𝑹 𝒑×𝒏 W∈ 𝑹 𝒑×𝒒 𝐇∈ 𝑹 𝒒×𝒏The columns of W (the basis vectors) are normalized (sum up to 1).The objective function, based on the Poisson likelihood, is:which, after some simplifications and elimination of pure data terms, gives:
7 REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Taking the derivative with respect to H gives:The gradient algorithm then states:for some step size
8 REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Forcing:gives the multiplicative rule:
9 REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Taking the derivative with respect to W gives:The gradient algorithm then states:Forcing the step size: gives:
10 REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Finally, the derived algorithm is as follows:1. Initialize W and H with positive random numbers.2. For each basis vector 𝑾 𝒂 ∈ 𝑹 𝒑×𝟏 ,update the corresponding encodingvector 𝑯 𝒂 ∈ 𝑹 𝟏×𝒏 , followed by updating and normalizing the basis vectorWa .Repeat this process until convergence.Repeat until convergence:For a = 1...q do beginFor b = 1...n doFor c=1...p do begin
11 REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Local Nonnegative Matrix Factorization (LNMF)LNMF algorithm intended for learning spatially localized, parts-basedrepresentation of visual patterns.Their aim was to obtain a truly part-based representation of objects byimposing sparseness constraints on the encoding vectors (matrix H) andlocality constraints to the basis components (matrix W).Taking the factorization problem defined in (1), define 𝐴= 𝑎 𝑖𝑗 = 𝑊 𝑡 𝑊and B= 𝑏 𝑖𝑗 =𝐻 𝐻 𝑡 , where A,B ∈ 𝑹 𝒒×𝒒 .
12 REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS The LNMF algorithm is based on the following three additional constraints:1. Maximum Sparseness in H. It should contain as many zero components as possible. This implies that the number of basis components required to represent V is minimized. Mathematically, each aij should be minimum.2. Maximum expressiveness of W. This constraint is closely related to the previous one and it aims at further enforcing maximum sparseness in H. Mathematically, 𝑖=1 𝑞 𝑏 𝑖𝑖 should be maximum.3. Maximum orthogonality of W. This constraint imposes that different bases should be as orthogonal as possible to minimize redundancy. This is forced by minimizing ∀𝑖,𝑗.𝑖≠𝑗 𝑎 𝑖𝑗 . Combining this constraint, with the one described in point 1, the objective is to minimize ∀𝑖,𝑗 𝑎 𝑖𝑗 .
13 REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Thus, ∀𝑖,𝑗 , the constrained divergence function is:where 𝛼 , 𝛽 > 0 represent some constants for expressing the importance ofthe additional constraints described above.
14 REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Nonnegative Sparse Coding (NNSC)Similar to the LNMF algorithm, the Nonnegative Sparse Coding (NNSC)method is intended to decompose multivariate data into a set of positivesparse componentsCombining a small reconstruction error with a sparseness criterion, theobjective function is:where the form of f defines how sparseness on H is measured and controls thetrade-off between sparseness and the accuracy of the reconstruction.
15 REVIEW OF NONNEGATIVE MATRIX FACTORIZATION (NMF) AND ITS SPARSE VARIANTS Sparse Nonnegative Matrix Factorization (SNMF)Instead of using a Euclidean least-square type functional, as in (22), theyused a divergence term. Thus, the sparse NMF functional is:for 𝛼 > 0.This method forces sparseness via minimizing the sum of all Hij. The updaterule for matrix H is:while the update rule for W is expressed in (17) and (18).
16 OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (nsNMF) Because of the multiplicative nature of the model, i.e., “basis” multiplied by“encoding,” sparseness in one of the factors will almost certainly force“nonsparseness” or smoothness in the other.On the other hand, forcing sparseness constraints on both the basis andthe encoding vectors will deteriorate the goodness of fit of the model tothe data.Therefore, from the outset, this approach is doomed to failure in achievinggeneralized sparseness and satisfactory goodness of fit.
17 OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (nsNMF) The new model proposed in this study, denoted as “NonSmoothNonnegative Matrix Factorization” (nsNMF), is defined as:where V, W, and H are the same as in the original NMF model. The positivesymmetric matrix 𝐒∈ 𝑹 𝒒×𝒒 is a “smoothing” matrix defined as:where I is the identity matrix, 1 is a vector of ones, and the parametersatisfies 𝟎<𝜽<𝟏 .(V ≈WH)
18 OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (nsNMF) The interpretation of S as a smoothing matrix can be explained as follows:Let X be a positive, nonzero, vector. Consider the transformed vectorY = SX. If 𝜽 = 0, then Y = X and no smoothing on X has occurred.However, as 𝜽→ 1, the vector Y tends to the constant vector with allelements almost equal to the average of the elements of X.This is the smoothest possible vector in the sense of “nonsparseness”because all entries are equal to the same nonzero value, instead of havingsome values close to zero and others clearly nonzero.Note that the parameter 𝜃 controls the extent of smoothness of the matrixoperator S.
19 OUR PROPOSAL: NONSMOOTH NONNEGATIVE MATRIX FACTORIZATION (nsNMF) However, due to the multiplicative nature of the model (28), strongsmoothing in S will force strong sparseness in both the basis and theencoding vectors.Therefore, the parameter 𝜃 controls the sparseness of the model.𝑉= 𝑊𝑆 𝐻=𝑊(𝑆𝐻)Nonsparseness in the basis W will force sparseness in the encoding H.At the same time, nonsparseness in the encoding H will force sparseness inthe basis W.1. In the update equation for H (16), substitute (W) with (WS).2. In the update equation for W (17), substitute (H) with (SH).3. Equation (18) remains the same.
20 EXPERIMENTSAs mentioned in the previous section, the multiplicative nature of thesparse variants of the NMF model will produce a paradoxical effect:Imposing sparseness in one of the factors will almost certainly forcesmoothness in the other in an attempt to reproduce the data as best aspossible.Additionally, forcing sparseness constraints on both the basis and theencoding vectors will decrease the explained variance of the data by themodel.Table 1 shows the results when using exactly three factors in all cases.Different NMF-type methods were applied to the same randomlygenerated positive data set (5 variables, 20 items, rank = 3).
22 EXPERIMENTS basis “swimmer” data set NMF failed in extracting the 16 limbs and the torso, while nsNMF successfully explained the data using one factor for each independent part.NMF extract parts of the data in a more holistic manner, while nsNMF sparsely represents the same reality.
23 EXPERIMENTSCBCL faces data set49 basisNot sparse
24 EXPERIMENTS NMF Results Fig. 3(a) shows the results using the Lee and Seung algorithm applied to thefacial database using 49 factors.Even if the factors’ images give an intuitive notion of a parts-basedrepresentation of the original faces, the factorization is not really sparseenough to represent unique parts of an average face.In other words, the NMF algorithm allows some undesirable overlapping ofparts, especially in those areas that are common to most of the faces inthe input data.
27 CONCLUSIONS AND DISCUSSION The approach presented here is an attempt to improve the ability of theclassical NMF algorithm in this process by producing truly sparsecomponents of the data structure.The experimental results on both synthetic data and real data sets haveshown that the nsNMF algorithm outperformed the existing sparse NMFvariants in performing parts-based representation of the data whilemaintaining the goodness of fit.