Presentation is loading. Please wait.

Presentation is loading. Please wait.

Gregory Beylkin, Jochen Garcke, and Martin J. Mohlenkamp

Similar presentations


Presentation on theme: "Gregory Beylkin, Jochen Garcke, and Martin J. Mohlenkamp"— Presentation transcript:

1 Gregory Beylkin, Jochen Garcke, and Martin J. Mohlenkamp
Multivariate Regression and Machine Learning with Sums of Separable Functions Gregory Beylkin, Jochen Garcke, and Martin J. Mohlenkamp Presented by Zujia Huang, MATH 6645

2 Overview Problem: multivariable regression in high dimensions
Sometimes dimension d is large but the underlying function is fairly simple The approach: approximate function as a separable function Or a sum of separable functions: r = separation rank Coefficient sl is such that 𝑔 𝑖 𝑙 =1

3 Overview The form is frequently used in statistics, but with contraints such as orthogonality or positivity of subfunction gli If such constraint is removed, good approximation can be achieved with small r, increasing computational efficiency The basic idea is to avoid the “curse of dimensionality” E.g. Fully explicit multivariable model would require big O proportional to N2 But using a sum of products model it would be proportional to only N

4 Algorithm Data-driven inner product
The least squares for the inner product: Collapse to 1D subproblems: (minimize) Starting from intial g(x): Fix components in all directions but one For example, to solve subproblem on m=1, Set to fixed, then:

5 One Dimensional Subproblems
1. Linear dependence gli depends linearly on some set of coefficients Given a space of finite dimensions to search for gli E.g. A polynomial Choose basis functions and where cl is a set of coefficients. e.g. quadratic: 𝑔 1 𝑙 = 𝑐 0 + 𝑐 1 𝑥+ 𝑐 2 𝑥 2 = 1 𝑥 𝑥 2 ∗ 𝑐 0 𝑐 1 𝑐 2 * denotes transpose

6 Take gradient of error and set to zero, we get a linear equation Az=b where
The system has a block structure Coefficients: 𝑐 𝑙 =𝑧(𝑙,:) Finally, renormalize gl1 and incorporate the norm into sl A(1,1’) A(2,1’) A(3,1’) A(1,2’) A(2,2’) A(1,3’) A(2,3’) A(3,3’) b(1) b(2) b(3) Az=b M e.g. r=3 [c1] [c2] [c3] = 1

7 One Dimensional Subproblems
2. Nonlinear dependence gli depends nonlinearly on some set of coefficients Input for nonlinear optimization is the vector for errors: If a derivative is needed:

8 Iterative Improvement
Minimize the 1D least squares for each gli Then, use ALS(Alternating Least Squares), for example: Or update in all directions simultaneously (may increase error) Repeat process and monitor change until convergence Minimize MSEm=1 Get improved gl1 Minimize MSEm=2 Get improved gl2

9 Vector-Valued Functions
Often times in machine learning problems, g is vector-valued If approximate each response variable separately, computation cost will be V times more for V independent problems Alternative method proposed: replaced the coefficient scalar sl by a vector Instead of solving independent problems, use correlations between response variables to get a more compact representation Cost : vrdM -> r(v+dM)

10 Numerical Examples The authors used some synthetic datasets to examine the regression method Friedman datasets, N=20000 Friedman 1 dataset MSE for fitting using polynomials in each direction Functions along each dimension, polynomial d=3 For x6-x10 it’s constant because variables are unused. gli(xi)

11 Numerical Examples Friedman 2 dataset
MSE for fitting using polynomials in each direction Functions along each dimension, r=2 and d=2, the model already captures % variance

12 Numerical Examples Friedman 3 dataset MSE (polynomials) MSE (rational)
MSE (multilevel tent functions)

13 Numerical Examples: Vector-Valued Data
Helicopter flight data : use current states to predict [yaw rate, forward velocity, lateral velocity] Rank 6, multilevel basis at level 5 sl is now a 1x3 vector Captues 80% variance

14 Conclusions Given enough noise-free data, complicated functions can be well approximated by the sum of products form This method is faster than most ML algorithms and is reasonably accurate Best from lit. This method, poly. This method, multilevel

15 Reference G. Beylkin, J. Garcke, and M. Mohlenkamp, Multivariate regression and machine learning with sums of separable functions, SIAM J. Sci. Comput., 31(3), 1840–1857. D. Meyer, F. Leisch, and K. Hornik, The support vector machine under test, Neurocomputing, 55 (2003), pp. 169–186. A. Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger, and E. Liang, Autonomous inverted helicopter flight via reinforcement learning, in International Symposium on Experimental Robotics, 2004.


Download ppt "Gregory Beylkin, Jochen Garcke, and Martin J. Mohlenkamp"

Similar presentations


Ads by Google