Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regularized risk minimization

Similar presentations


Presentation on theme: "Regularized risk minimization"— Presentation transcript:

1 Regularized risk minimization
Usman Roshan

2 Supervised learning for two classes
We are given n training samples (xi,yi) for i=1..n drawn i.i.d from a probability distribution P(x,y). Each xi is a d-dimensional vector (xi in Rd) and yi is +1 or -1 Our problem is to learn a function f(x) for predicting the labels of test samples xi’ in Rd for i=1..n’ also drawn i.i.d from P(x,y)

3 Loss function Loss function: c(x,y,f(x)) Maps to [0,inf] Examples:

4 Test error We quantify the test error as the expected error on the test set (in other words the average test error). In the case of two classes: We want to find f that minimizes this but we need P(y|x) which we don’t have access to.

5 Expected risk Suppose we don’t have test data (x’). Then we average the test error over all possible data points x This is also known as the expected risk or the expected value of the loss function in Bayesian decision theory We want to find f that minimizes this but we don’t have all data points. We only have training data. And we don’t know P(y,x)

6 Empirical risk Since we only have training data we can’t calculate the expected risk (we don’t even know P(x,y)). Solution: we approximate P(x,y) with the empirical distribution pemp(x,y) The delta function δx(y)=1 if x=y and 0 otherwise.

7 Empirical risk We can now define the empirical risk as
Once the loss function is defined and training data is given we can then find f that minimizes this.

8 Bounding the expected risk
Recall from earlier that we bounded the expected risk with the empirical risk plus a complexity term. This suggests we should minimize empirical risk plus classifier complexity.

9 Regularized risk minimization
Minimize Note the additional term added to the empirical risk. This term measures classifier complexity.

10 Representer theorem Plays a central role in statistical estimation
Taken from Learning with Kernels by Scholkopf and Smola

11 Regularized empirical risk
Linear regression Logistic regression SVM

12 Single layer neural network
Linear regression regularized risk: Single layer neural network regularized risk:

13 Other loss functions From “A Scalable Modular Convex Solver for Regularized Risk Minimization”, Teo et. al., KDD 2007

14 Regularizer L1 norm: L1 gives sparse solution (many entries will be zero) Logistic loss with L1 also known as “lasso” L2 norm:

15 Regularized risk minimizer exercise
Compare SVM to regularized logistic regression Software: Version 2.1 executables for OSL machines available on course website


Download ppt "Regularized risk minimization"

Similar presentations


Ads by Google