Reduced rank regression via convex optimization

Reduced rank regression via convex optimization
Ming Gu, UC Berkeley Reduced rank regression via convex optimization

Content Classical Reduced Rank Regression (RRR)
Convex Programming based RRR Models Equivalent forms New: Dual formulation New: Efficient algorithm based on Duality Variations of RRR Numerical Experiments Future Work Matrix Computations Seminar 9/19/2018

Reduced Rank Regression (I)
Multivariate Least Squares Regression minX || A X – B||F , where A is nxm, B is nxl, and X mxl. Solution: [Q, R] = qr(A); X = R-1 (Q’ B). But there may be too many parameters in X. (More parameters = Less useful model.) Fix: Require X to be low-rank. Matrix Computations Seminar 9/19/2018

Reduced Rank Regression (II)
Model: min rank(X)≤r || A X – B||F Solution: [Q,R]=qr(A); [U,S,V]=svd(Q’B); X=R-1 (U(:,1:r) S(1:r,1:r) V(:,1:r)T) Applications: time series analysis, finance, signal processing, …. Problem: Solution discontinuous in r. Matrix Computations Seminar 9/19/2018

convex optimization: Background (I)
Compressive sensing: minx ||Ax – b||2 s.t. ||x||1 ≤ τ Motivation: ℓ1-norm restriction allows “sparse” optimal solutions and often exact recovery. Buzz word in scientific computing. Applications: Coding/info theory; Statistical signal processing; Machine learning. Medical imaging. Algorithms: Steepest descent based methods; Proximal gradient methods. Matrix Computations Seminar 9/19/2018

Convex Optimization: background (II)
Low-rank matrix recovery: minX ||A(X)–b||22 + μ||X||* where A(X)=linear map of X ||X||*=||svd(X)||1 (nuclear norm) Opt sln often low-rank, exact. Ref: Candes, Ma, Wright, … Robust principle component analysis: minL,S ||L||* + ||S||1 s.t. M = L + S Opt sln L low-rank, S outlier. Ref: Candes, Li, Ma, Wright, … Matrix Computations Seminar 9/19/2018

Reduced Rank Regression
Convex Program Formulation min X ||AX-B||F (LSτ, LASSO) s.t. ||X||*≤τ ||X||* = ||svd(X)||1 (Nuclear norm) Nuclear norm induces low-rank optimal solutions. Solution depends continuously on τ. Ref: Yuan/Ekici/Monteiro/Lu; Lu/Monteiro/Yuan. Matrix Computations Seminar 9/19/2018

More reduced rank regression models
MinX 1/2||AX– B||F2 + λ||X||* (QPλ) MinX ||X||*, s.t. ||AX– B||F ≤σ (BPσ, Basis Pursuit Denoise) Both equivalent to min X ||AX-B||F, s.t. ||X||*≤τ (LSτ,Lasso) for unknown values of λ and σ. QPλ and LSτ solved in the literature. Our work: efficient method for BPσ and LSτ , assuming approximate knowledge of noise level. Matrix Computations Seminar 9/19/2018

Existing Algorithm: VNS (2009)
Solves QPλ Variant of Nesterov’s Smooth method (VNS) “Optimal” iteration bound Works only on small problems. Prev algorithms even much less efficient. Ref: Lu, Monteiro, Yuan Matrix Computations Seminar 9/19/2018

Our work: Start with the Dual
Model: min X ||AX-B||F, s.t. ||X||*≤τ (LSτ ) Let R = A X–B. Define Lagrange dual function L(Y,λ) = infR,X{|| R ||F-tr(YT(AX-B-R))+λ(||X||*-τ)} L(Y,λ) = -∞ unless ||Y||F ≤1, ||ATY||2 ≤ λ. If so, L (Y,λ) = tr(YTB)- λτ. Dual: max Y, λ tr(YTB)- λτ, s.t. ||Y||F ≤1, ||ATY||2 ≤ λ. Y = (A X–B)/||AX-B||F, λ= ||ATY||2 dual feasible. Duality gap ||AX-B||F-(tr(YTB)-λτ)≥0. Equality at optimal. What does this mean? Matrix Computations Seminar 9/19/2018

Strategy Prev. Work Algorithm Design: LSτ
Search direction based on steepest descent. Spectral projection for feasibility. Based on work of van den Berg, Friedlander. Prev. Work spgl1, Spectral Projected Gradient in ℓ1 norm. Only major changes: ℓ1 norm to nuclear norm; ℓ2 norm to Frobenius norm. Matrix Computations Seminar 9/19/2018

Spectral Projection Given matrix C, project onto feasible set:
Pτ(C) = {argminX || C – X||F, s.t. ||X||* ≤τ.} * Pτ(C) available using svd(C). * is in general low-rank Prev. work includes: Birgin, et al; Berg, et al. for ℓ1-norm ball. C Pτ(C) Matrix Computations Seminar 9/19/2018

Spectral Projected Gradient
repeat until duality_gap tiny Compute Gradient G = AT(AX-B) Choose step-size α repeat Xnew= Pτ(X- αG) if || A Xnew -B||F < || AX-B||F X = Xnew break. end if α = α/2. end repeat Problem solved: min X ||AX-B||F, s.t. ||X||*≤τ (LSτ,Lasso) Matrix Computations Seminar 9/19/2018

σ Algorithm Design: BPσ τ Solve for the right τ in LSτ
Use In-Exact Newton’s Method Algorithm based on van den Berg and Friedlander. σ τ Matrix Computations Seminar 9/19/2018

Spectral Projected Gradient (SPG*)
repeat until convergence Compute approx optimal solution X in LSτ R = B-AX; φ = ||R||F; φ’ =-||A’R||2/φ; τ = τ - (φ - σ)/φ’; end repeat (Inexact Newton Method) Problem solved: min X, ||X||* s.t. ||AX-B||F ≤ σ (BPσ, Basis Pursuit Denoise) Auxiliary problem: min X ||AX-B||F, s.t. ||X||*≤τ (LSτ,Lasso) Matrix Computations Seminar 9/19/2018

Numerical comparison SPG* on random data VNS on random data
Solving BPσ VNS on random data Solving QPλ Matrix Computations Seminar 9/19/2018

SPG* works with minor changes.
More RRR Models min X1, X2, …, xl,X ||[A1x1,A2x2, …, Alxl]+AX-B||F, s.t. ||X||*≤τ Regression on individual columns and whole matrix Dual Program: max Y, λ tr(YTB)- λτ, s.t. ||Y||F ≤1, ||ATY||2 ≤ λ, tr(YTAj) = 0, j = 1, …, l. SPG* works with minor changes. Matrix Computations Seminar 9/19/2018

Motivation: mixed AR and Var model (I)
Given time series y1, y2, …, yn, … An auto-regressive (AR) model is of the form yt = yt-1α1 + yt-2α2 + … + yt-rαr + εt, where εt is white noise at time step t. AR coefficients are solution to LS: minx ||Ax-b||2: A= X= , b= Matrix Computations Seminar 9/19/2018

Motivation: mixed AR and Var model (II)
Given l time series y1j, y2j, …, ynj, …, j=1, …, l A vector auto-regressive (VAR) model is of the form Yt = Yt-1α1 + Yt-2α2 + … + Yt-rαr + Εt, where Εt is white noise at time t, Yt=(yt1,…,ytl) Each αi is now an l xl matrix, instead of a scalar. VAR coefficients are solution to LS: minX ||AX-B||F: X= A= , B= Matrix Computations Seminar 9/19/2018

motivation: mixed AR and Var model (III)
Given l time series y1j, y2j, …, ynj, …, j=1, …, l Use one AR model for each j for idiosyncratic info. Use one single VAR model for info common to all. Mixed AR-VAR model min X1, X2, …, xl,X ||[A1x1,A2x2, …, Al xl]+AX-B||F, s.t. ||X||*≤τ Each column is an independent AR model. Norm-restricted X is VAR model. X needs to be low-rank to be useful. Matrix Computations Seminar 9/19/2018

Numerical Experiment: DOW stocks (I)
30 DOW component stocks. Daily log_returns starting from 01/03/2008. log_return = log ((close today)/(close yesterday)) Returns clipped at 5%. Individual ARs + Joint VAR. Model parameters: r = 10, s = 500; Results similar with other choices of r and s. Matrix Computations Seminar 9/19/2018

Numerical Experiment: DOW stocks (II)
Matrix Computations Seminar 9/19/2018

Conclusions and future work
New efficient algorithm for reduced rank regression. Very fast and reliable in numerical experiments. Future work: More efficient and reliable implementations. More applications of reduced rank regression. Extensions to other forms of reduced rank regression. In general, convex optimization techniques for other regression problems. Matrix Computations Seminar 9/19/2018

Talk online at http://math.berkeley.edu/~mgu/LAPACKSeminar.htm
Thank you Matrix Computations Seminar 9/19/2018

Reduced rank regression via convex optimization

Similar presentations

Presentation on theme: "Reduced rank regression via convex optimization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reduced rank regression via convex optimization

Similar presentations

Presentation on theme: "Reduced rank regression via convex optimization"— Presentation transcript:

Similar presentations

About project

Feedback