Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiplicative updates for L1-regularized regression

Similar presentations


Presentation on theme: "Multiplicative updates for L1-regularized regression"— Presentation transcript:

1 Multiplicative updates for L1-regularized regression
Prof. Lawrence Saul Dept of Computer Science & Engineering UC San Diego (Joint work with Fei Sha & Albert Park)

2 Trends in data analysis
Larger data sets In 1990s : thousands of examples In : millions or billions Increased dimensionality High resolution, multispectral images Large vocabulary text processing Gene expression data

3 How do we scale? Faster computers: Massive parallelism:
Moore’s law is not enough. Data acquisition is too fast. Massive parallelism: Effective, but expensive. Not always easy to program. Brain over brawn: New, better algorithms. Intelligent data analysis.

4 Searching for sparse models
Less is more: Number of nonzero parameters should not scale with size or dimensionality. Models with sparse solutions: Support vector machines Nonnegative matrix factorization L1-norm regularized regression

5 An unexpected connection
Different problems large margin classification high dimensional data analysis linear and logistic regression Similar learning algorithms Multiplicative vs additive updates Guarantees of monotonic convergence

6 This talk I. Multiplicative updates II. Sparse regression
Unusual form Attractive properties II. Sparse regression L1 norm regularization Relation to quadratic programming III. Experimental results Sparse solutions Convex duality Large-scale problems

7 Part I. Multiplicative updates
Be fruitful and multiply.

8 Nonnegative quadratic programming (NQP)
Optimization Solutions Cannot be found analytically. Tend to be sparse.

9 Matrix decomposition Quadratic form Nonnegative components - =

10 Multiplicative update
Matrix-vector products By construction, these vectors are nonnegative. Iterative update multiplicative elementwise no learning rate enforces nonnegativity

11 Fixed points vi = 0 vi > 0
When multiplicative factor is less than unity, element decays quickly to zero. vi > 0 When multiplicative factor equals unity, partial derivative vanishes: (Av+b)i = 0.

12 Attractive properties for NQP
Theoretical guarantees Objective decreases at each iteration. Updates converge to global minimum. Practical advantages No learning rate. No constraint checking. Easy to implement (and vectorize).

13 Part II. Sparse regression
Feature selection via L1 norm regularization…

14 Linear regression Training examples Model fitting vector inputs
scalar outputs Model fitting tractable: least squares ill-posed: if dimensionality exceeds n

15 Regularization L2 norm L1 norm What is the difference?

16 L2 versus L1 L2 norm L1 norm Differentiable Analytically tractable
Favors small (but nonzero) weights. L1 norm Non-differentiable, but convex Requires iterative solution. Estimated weights are sparse!

17 Reformulation as NQP L1-regularized regression Change of variables
Separate out +/- elements of w. Introduce nonnegativity constraints.

18 L1 norm as NQP change of variables These problems are equivalent!

19 Why reformulate? Differentiability Multiplicative updates
Simpler to optimize a smooth function, even with constraints. Multiplicative updates Well-suited to NQP. Monotonic convergence. No learning rate. Enforce nonnegativity.

20 Logistic regression Training examples L1-regularized model-fitting
vector inputs binary (0/1) outputs L1-regularized model-fitting Solve optimization via multiple L1-regularized linear regressions.

21 Part III. Experimental results

22 Convergence to sparse solution
Evolution of weight vector under multiplicative updates for L1-regularized linear regression.

23 Primal-dual convergence
The convex dual of NQP is NQP! Multiplicative updates can also solve dual. Duality gap bounds intermediate errors.

24 Large-scale implementation
L1-regularized logistic regression on n=19K documents and d=1.2M features (70/20/10 split for train/test/dev)

25 Discussion Related work based on: Strengths of our approach:
auxiliary functions iterative least squares nonnegativity constraints Strengths of our approach: simplicity scalability modularity insights from related models


Download ppt "Multiplicative updates for L1-regularized regression"

Similar presentations


Ads by Google