Regularized Least-Squares and Convex Optimization.

Slides:



Advertisements
Similar presentations
Lecture 4. Linear Models for Regression
Advertisements

Chapter Outline 3.1 Introduction
Regularization David Kauchak CS 451 – Fall 2013.
Edge Preserving Image Restoration using L1 norm
Prediction with Regression
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Introducción a la Optimización de procesos químicos. Curso 2005/2006 BASIC CONCEPTS IN OPTIMIZATION: PART II: Continuous & Unconstrained Important concepts.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Convex Optimization Chapter 1 Introduction. What, Why and How  What is convex optimization  Why study convex optimization  How to study convex optimization.
Lecture 8 – Nonlinear Programming Models Topics General formulations Local vs. global solutions Solution characteristics Convexity and convex programming.
Exact or stable image\signal reconstruction from incomplete information Project guide: Dr. Pradeep Sen UNM (Abq) Submitted by: Nitesh Agarwal IIT Roorkee.
Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at
P M V Subbarao Professor Mechanical Engineering Department
Data mining and statistical learning - lecture 6
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Visual Recognition Tutorial
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
Function Optimization Newton’s Method. Conjugate Gradients
Tutorial 12 Unconstrained optimization Conjugate gradients.
Methods For Nonlinear Least-Square Problems
Today Linear Regression Logistic Regression Bayesians v. Frequentists
The Widrow-Hoff Algorithm (Primal Form) Repeat: Until convergence criterion satisfied return: Given a training set and learning rate Initial:  Minimize.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Unconstrained Optimization Problem
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
1 Linear Methods for Regression Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Support Vector Regression (Linear Case:)  Given the training set:  Find a linear function, where is determined by solving a minimization problem that.
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Classification and Regression
Today Wrap up of probability Vectors, Matrices. Calculus
Calibration & Curve Fitting
Collaborative Filtering Matrix Factorization Approach
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 14 Introduction to Regression Bastian Leibe.
ENCI 303 Lecture PS-19 Optimization 2
Nonlinear Programming.  A nonlinear program (NLP) is similar to a linear program in that it is composed of an objective function, general constraints,
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Mathematical formulation XIAO LIYING. Mathematical formulation.
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Nonlinear Programming Models
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
ECE 8443 – Pattern Recognition LECTURE 08: DIMENSIONALITY, PRINCIPAL COMPONENTS ANALYSIS Objectives: Data Considerations Computational Complexity Overfitting.
R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
CpSc 881: Machine Learning
Machine Learning CUNY Graduate Center Lecture 2: Math Primer.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.
Introduction to Optimization
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Data Mining Lectures Lecture 7: Regression Padhraic Smyth, UC Irvine ICS 278: Data Mining Lecture 7: Regression Algorithms Padhraic Smyth Department of.
LECTURE 14: DIMENSIONALITY REDUCTION: PRINCIPAL COMPONENT REGRESSION March 21, 2016 SDS 293 Machine Learning B.A. Miller.
Optimal Control.
Bounded Nonlinear Optimization to Fit a Model of Acoustic Foams
Deep Feedforward Networks
Background on Classification
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Boosting and Additive Trees (2)
CSE 4705 Artificial Intelligence
Roberto Battiti, Mauro Brunato
Lecture 8 – Nonlinear Programming Models
Collaborative Filtering Matrix Factorization Approach
CSCI B609: “Foundations of Data Science”
The Bias Variance Tradeoff and Regularization
EE 458 Introduction to Optimization
Presentation transcript:

Regularized Least-Squares and Convex Optimization

ionUW06.ppt

Regularization In mathematics and statistics, particularly in the fields of machine learning and inverse problems, regularization involves introducing additional information in order to prevent overfitting.

Overfitting Machine learning model describes random error or noise instead of the underlying relationship. The model is excessively complex, such as having too many parameters relative to the number of observations. Overfitting cause poor predictive performance.

Overfitting Credit Approve (Credit Limit) x 1 =Score x 2 =Salary x 3 =married x 4 =age x 5 =# children x 6 =glass x 7 =hair x 8 =height x 9 =address x 10 =weight x 11 =Staus x 12 =…

Instability  Two variables are highly correlated. This allows the one variable can go very large in the positive direction, while the other grows appropriately large in the negative direction cancelling the first variable’s effect. o Different values based on different instances. o High variance models cause instability of the least squares procedure.

Overfitting  Cross Validation  Leave-one-out  Regularization o Form a penalty for complexity, such as restrictions for smoothness or bounds on the vector space norm  Ridge regression  Lasso  L2-norm

As mentioned in the previous lecture, ridge regression penalizes the size of the regression coefficients. Specially, the ridge regression estimate is defined as the value of that minimizes Ridge Regression

Theorem: The solution to the ridge regression problem is given by Note the similarity to the ordinary least squares solution, but with the addition of a “ridge" down the diagonal Ridge Regression

 If A is standardized and y is centered  Solution is indexed by the tuning parameter λ. o Cross Validation  Inclusion of λ makes problem non-singular even if ATA is not invertible o This was the original motivation for ridge regression (Hoerl and Kennard, 1970) Ridge Regression

Smooth and convex, can be solved using gradient descent, steepest descent, Newton, quasi-Newton. Ridge Regression

LASSO coefficients are the solutions to the L1 optimization problem Least Absolute Shrinkage and Selection Operator (LASSO)

 The regularization term penalizes all factors equally.  This makes the x *SPARSE* o A sparse x means reduced complexity o Can be viewed as a selection of relevant / important features LASSO

Optimization is the mathematical discipline which is concerned with finding the maxima and minima of functions, possibly subject to constraints. Convex Optimization

What do we optimize? A real function of n variables with or without constrains

Unconstrained optimization

Optimization with constraints

Analytical solution Good algorithms and software High accuracy and high reliability Time complexity: Mathematical Optimization Convex Optimization Least-squares LP Nonlinear Optimization A mature technology!

Convex Hull A set C is convex if every point on the line segment connecting x and y is in C. The convex hull for a set of points X is the minimal convex set containing X.

Example The non-negative orthant, Norm balls. Let ||.|| be some norm on Rn (e.g., the Euclidean norm, ). Then the set {x : ||x||≤1} is a convex set.

Intersections and Unions of Convex Sets C 1, C 2,…, C k are convex sets. Then their intersection is also a convex set. The union of convex sets in general will not be convex.

Why Study Convex Optimization? With only a bit exaggeration, we can say that, if you formulate a practical problem as a convex optimization problem, than you have solved the original problem. If not, there is little chance you can solve it. -- Section 1.3.2, p8, Convex Optimization

Mathematical Optimization All learning is some optimization problem -> Stick to canonical form x = (x 1, x 2, …, x p ) – opt. variables ; x* f 0 : R p -> R – objective function f i : R p -> R – constraint function

What is Convex Optimization? OP with convex objective and constraint functions f 0, …, f m are convex = convex OP that has an efficient solution!

Convex Function Definition: the weighted mean of function evaluated at any two points is greater than or equal to the function evaluated at the weighted mean of the two points

Convex Function What does definition mean? Pick any two points x, y and evaluate along the function, f(x), f(y) Draw the line passing through the two points f(x) and f(y) Convex if function evaluated on any point along the line between x and y is below the line between f(x) and f(y) f is concave is –f is convex.

Convex Function

Convex!

Convex Function Not Convex!!!

Convex Function Easy to see why convexity allows for efficient solution Just “slide” down the objective function as far as possible and will reach a minimum

First Order Condition for Convexity Function f: R n →R is differentiable (gradient ▽ x f(x) exists at all points x in the domain of f). Then f is convex if and only if D(f) is convex set and for all x,y ∈ D(f) (x, f(x)) f(y)

Second Order Condition for Convexity Function f: R n →R is twice differentiable. Then f is convex if and only if D(f) is convex set and its Hessian is positive semidefinite: for all x ∈ D(f), ≥ refers to positive semi-definiteness rather than component- wise inequality.

Example Exponential. Let f: R→R, f(x)=e ax for any a ∈ R. To show f is convex, we can simply take the second derivative f”(x)=a 2 e ax, which is positive for all x. Quadratic function: f: R n →R, f(x)=1/2x T Ax+b T +c for a symmetric matrix A ∈ S n, b ∈ R n, c ∈ R. A is positive semidefininite then the function is convex.

Convex Optimization Problems Convex optimization where f is a convex optimization and C is a convex set. Another formulation is where g i are convex functions and h i are affine function, and x is optimization variable.