Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regularized Least-Squares and Convex Optimization.

Similar presentations


Presentation on theme: "Regularized Least-Squares and Convex Optimization."— Presentation transcript:

1 Regularized Least-Squares and Convex Optimization

2 www.ncsu.edu/crsc/events/ugw06/.../Optimizat ionUW06.ppt

3 Regularization In mathematics and statistics, particularly in the fields of machine learning and inverse problems, regularization involves introducing additional information in order to prevent overfitting.

4 Overfitting Machine learning model describes random error or noise instead of the underlying relationship. The model is excessively complex, such as having too many parameters relative to the number of observations. Overfitting cause poor predictive performance.

5 Overfitting Credit Approve (Credit Limit) x 1 =Score x 2 =Salary x 3 =married x 4 =age x 5 =# children x 6 =glass x 7 =hair x 8 =height x 9 =address x 10 =weight x 11 =Staus x 12 =…

6 Instability  Two variables are highly correlated. This allows the one variable can go very large in the positive direction, while the other grows appropriately large in the negative direction cancelling the first variable’s effect. o Different values based on different instances. o High variance models cause instability of the least squares procedure.

7 Overfitting  Cross Validation  Leave-one-out  Regularization o Form a penalty for complexity, such as restrictions for smoothness or bounds on the vector space norm  Ridge regression  Lasso  L2-norm

8 As mentioned in the previous lecture, ridge regression penalizes the size of the regression coefficients. Specially, the ridge regression estimate is defined as the value of that minimizes Ridge Regression

9 Theorem: The solution to the ridge regression problem is given by Note the similarity to the ordinary least squares solution, but with the addition of a “ridge" down the diagonal Ridge Regression

10  If A is standardized and y is centered  Solution is indexed by the tuning parameter λ. o Cross Validation  Inclusion of λ makes problem non-singular even if ATA is not invertible o This was the original motivation for ridge regression (Hoerl and Kennard, 1970) Ridge Regression

11 Smooth and convex, can be solved using gradient descent, steepest descent, Newton, quasi-Newton. Ridge Regression

12 LASSO coefficients are the solutions to the L1 optimization problem Least Absolute Shrinkage and Selection Operator (LASSO)

13  The regularization term penalizes all factors equally.  This makes the x *SPARSE* o A sparse x means reduced complexity o Can be viewed as a selection of relevant / important features LASSO

14 Optimization is the mathematical discipline which is concerned with finding the maxima and minima of functions, possibly subject to constraints. Convex Optimization

15 What do we optimize? A real function of n variables with or without constrains

16 Unconstrained optimization

17 Optimization with constraints

18 Analytical solution Good algorithms and software High accuracy and high reliability Time complexity: Mathematical Optimization Convex Optimization Least-squares LP Nonlinear Optimization A mature technology!

19 Convex Hull A set C is convex if every point on the line segment connecting x and y is in C. The convex hull for a set of points X is the minimal convex set containing X.

20 Example The non-negative orthant, Norm balls. Let ||.|| be some norm on Rn (e.g., the Euclidean norm, ). Then the set {x : ||x||≤1} is a convex set.

21 Intersections and Unions of Convex Sets C 1, C 2,…, C k are convex sets. Then their intersection is also a convex set. The union of convex sets in general will not be convex.

22 Why Study Convex Optimization? With only a bit exaggeration, we can say that, if you formulate a practical problem as a convex optimization problem, than you have solved the original problem. If not, there is little chance you can solve it. -- Section 1.3.2, p8, Convex Optimization

23 Mathematical Optimization All learning is some optimization problem -> Stick to canonical form x = (x 1, x 2, …, x p ) – opt. variables ; x* f 0 : R p -> R – objective function f i : R p -> R – constraint function

24 What is Convex Optimization? OP with convex objective and constraint functions f 0, …, f m are convex = convex OP that has an efficient solution!

25 Convex Function Definition: the weighted mean of function evaluated at any two points is greater than or equal to the function evaluated at the weighted mean of the two points

26 Convex Function What does definition mean? Pick any two points x, y and evaluate along the function, f(x), f(y) Draw the line passing through the two points f(x) and f(y) Convex if function evaluated on any point along the line between x and y is below the line between f(x) and f(y) f is concave is –f is convex.

27 Convex Function

28 Convex!

29 Convex Function Not Convex!!!

30 Convex Function Easy to see why convexity allows for efficient solution Just “slide” down the objective function as far as possible and will reach a minimum

31 First Order Condition for Convexity Function f: R n →R is differentiable (gradient ▽ x f(x) exists at all points x in the domain of f). Then f is convex if and only if D(f) is convex set and for all x,y ∈ D(f) (x, f(x)) f(y)

32 Second Order Condition for Convexity Function f: R n →R is twice differentiable. Then f is convex if and only if D(f) is convex set and its Hessian is positive semidefinite: for all x ∈ D(f), ≥ refers to positive semi-definiteness rather than component- wise inequality.

33 Example Exponential. Let f: R→R, f(x)=e ax for any a ∈ R. To show f is convex, we can simply take the second derivative f”(x)=a 2 e ax, which is positive for all x. Quadratic function: f: R n →R, f(x)=1/2x T Ax+b T +c for a symmetric matrix A ∈ S n, b ∈ R n, c ∈ R. A is positive semidefininite then the function is convex.

34 Convex Optimization Problems Convex optimization where f is a convex optimization and C is a convex set. Another formulation is where g i are convex functions and h i are affine function, and x is optimization variable.


Download ppt "Regularized Least-Squares and Convex Optimization."

Similar presentations


Ads by Google