Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Ordinary Least-Squares

Operation Research Chapter 3 Simplex Method.

Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Pattern Recognition and Machine Learning

Optimization : The min and max of a function

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

Computational Methods for Management and Economics Carla Gomes Module 8b The transportation simplex method.

Nonsmooth Nonnegative Matrix Factorization (nsNMF) Alberto Pascual-Montano, Member, IEEE, J.M. Carazo, Senior Member, IEEE, Kieko Kochi, Dietrich Lehmann,

Lecture 8 – Nonlinear Programming Models Topics General formulations Local vs. global solutions Solution characteristics Convexity and convex programming.

by Rianto Adhy Sasongko Supervisor: Dr.J.C.Allwright

Separating Hyperplanes

An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker: Wei-Lun Chao Date: Nov. 23, 2011 DISP Lab, Graduate Institute of Communication.

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Linear programming Thomas S. Ferguson University of California at Los Angeles Compressive Sensing Tutorial PART 3 Svetlana Avramov-Zamurovic January 29,

Motion Analysis (contd.) Slides are from RPI Registration Class.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Support Vector Machines and Kernel Methods

Learning the parts of objects by nonnegative matrix factorization D.D. Lee from Bell Lab H.S. Seung from MIT Presenter: Zhipeng Zhao.

Unconstrained Optimization Problem

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Active Set Support Vector Regression

Kathryn Linehan Advisor: Dr. Dianne O’Leary

Informatics and Mathematical Modelling / Intelligent Signal Processing ISCAS Morten Mørup Approximate L0 constrained NMF/NTF Morten Mørup Informatics.

Today Wrap up of probability Vectors, Matrices. Calculus

LINEAR PROGRAMMING SIMPLEX METHOD.

Chapter 3 Linear Programming Methods 高等作業研究高等作業研究 ( 一 ) Chapter 3 Linear Programming Methods (II)

1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.

Non Negative Matrix Factorization

1 Introduction to Linear and Nonlinear Programming.

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Topic III The Simplex Method Setting up the Method Tabular Form Chapter(s): 4.

An Introduction to Support Vector Machines (M. Law)

Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.

Optimization with Neural Networks Presented by: Mahmood Khademi Babak Bashiri Instructor: Dr. Bagheri Sharif University of Technology April 2007.

Part 4 Nonlinear Programming 4.5 Quadratic Programming (QP)

A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.

Biointelligence Laboratory, Seoul National University

Linear Models for Classification

Solve a system of linear equations By reducing a matrix Pamela Leutwyler.

+ Quadratic Programming and Duality Sivaraman Balakrishnan.

Ariadna Quattoni Xavier Carreras An Efficient Projection for l 1,∞ Regularization Michael Collins Trevor Darrell MIT CSAIL.

NONNEGATIVE MATRIX FACTORIZATION WITH MATRIX EXPONENTIATION Siwei Lyu ICASSP 2010 Presenter : 張庭豪.

Non-negative Matrix Factorization

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

INTRO TO OPTIMIZATION MATH-415 Numerical Analysis 1.

Consider Preconditioning – Basic Principles Basic Idea: is to use Krylov subspace method (CG, GMRES, MINRES …) on a modified system such as The matrix.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.

Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.

Algorithm for non-negative matrix factorization Daniel D. Lee, H. Sebastian Seung. Algorithm for non-negative matrix factorization. Nature.

Regularized Least-Squares and Convex Optimization.

Unsupervised Learning II Feature Extraction

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Lecture 16: Image alignment

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent

Compressive Coded Aperture Video Reconstruction

Multiplicative updates for L1-regularized regression

Perturbation method, lexicographic method

Basic Algorithms Christina Gallner

Probabilistic Models for Linear Regression

Chap 3. The simplex method

Sparse Principal Component Analysis

Non-Negative Matrix Factorization

NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &

Computer Animation Algorithms and Techniques

Presentation transcript:

Nonnegative Matrix Factorization with Sparseness Constraints S. Race MA591R

Introduction to NMF Factor A = WH A – matrix of data m non-negative scalar variables n measurements form the columns of A W – m x r matrix of basis vectors H – r x n coefficient matrix Describes how strongly each building block is present in measurement vectors

Introduction to NMF cont Purpose: parts-based representation of the data Data compression Noise reduction Examples: Term – Document Matrix Image processing Any data composed of hidden parts

Introduction to NMF cont Optimize accuracy of solution: min || A-WH || F where W,H 0 We can drop nonnegative constraints min || A-(W.W)(H.H) || Many options for objective function Many options for algorithm W,H will depend on initial choices Convergence is not always guaranteed

Common Algorithms Alternating Least Squares Paatero 1994 Multiplicative Update Rules Lee-Seung 2000 Nature Used by Hoyer Gradient Descent Hoyer 2004 Berry-Plemmons 2004

Why is sparsity important? Nature of some data Text-mining Disease patterns Better Interpretation of Results Storage concerns

Non-negative Sparse Coding I Proposed by Patrik Hoyer in 2002 Add a penalty function to the objective to encourage sparseness OBJ: Parameter λ controls trade-off between accuracy and sparseness f is strictly increasing: f=Σ H ij works

Sparse Objective Function The objective can always be decreased by scaling W up, H down Set W= cW and H=(1/c)H Thus, alone the objective will simply yield the NMF solution Constraint on the scale of H or W is needed Fix norm of columns of W or rows of H

Non-negative Sparse Coding I Pros Simple, efficient Guaranteed to reach global minimum using multiplicative update rule Cons Sparseness controlled implicitly: Optimal λ found by trial and error Sparseness only constrained for H

NMF with sparseness constraints II First need some way to define the sparseness of a vector A vector with one nonzero entry is maximally sparse A multiple of the vector of all ones, e, is minimally sparse CBS Inequality How can we combine these ideas?

Hoyers Sparseness Parameter sparseness(x)= where n is the dimensionality of x This measure indicates that we can control a vectors sparseness by manipulating its L1 and L2 norms

Picture of Sparsity function for vectors w/ n=2

Implementing Sparseness Constraints Now that we have an explicit measure of sparseness, how can we incorporate it into the algorithm? Hoyer: at each step, project each column of a matrix onto the nearest vector of desired sparseness.

Hoyers Projection Algorithm Problem: Given any vector, x, find the closest (in the Euclidean sense) non-negative vector s with a given L 1 norm and a given L 2 norm We can easily solve this problem in the 3 dimensional case and extend the result.

Hoyers Projection Algorithm Set s i =x i + (L 1 -Σx i )/n for all i Set Z={} Iterate Set m i =L 1 /(n-size(Z)) if i in Z, 0 otherwise Set s=m+β(s-m) where β0 solves quadratic If s, non-negative were finished Set Z=Z U {i : s i <0} Set s i =0 for all i in Z Calculate c=(Σs i – L 1 )/(n-size(Z)) Set s i =s i -c for all i not in Z

The Algorithm in words Project x onto hyperplane Σs i =L 1 Within this space, move radially outward from center of joint constraint hypersphere toward point If result non-negative, destination reached Else, set negative values to zero and project to new point in similar fashion

NMF with sparseness constraints Step 1: Initialize W, H to random positive matrices Step 2: If constraints apply to W or H or both, project each column or row respectively to have unchanged L 2 norm and desired L 1 norm

NMF w/ Sparseness Algorithm Step 3: Iterate If sparseness constraints on W apply, Set W=W-μ w (WH-A)H T Project columns of W as in step 2 Else, take standard multiplicative step If sparseness constraints on H apply Set H=H- μ H W T (WH-A) Project rows of H as in step 2 Else, take standard multiplicative step

Advantages of New Method Sparseness controlled explicitly with a parameter that is easily interpretted Sparseness of W, H or both can be constrained Number of iterations required grows very slowly with the dimensionality of the problem

Dotted Lines Represent Min and Max Iterations Solid Line shows average number required

An Example from Hoyers Work

Text Mining Results Text to Matrix Generator Dimitrios Zeimpekis and E. Gallopoulos Dimitrios ZeimpekisE. Gallopoulos University of Patras group/Projects/TMG/ group/Projects/TMG/ NMF with sparseness constraints from Hoyers web page