Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03

Outline  Regression Background  Linear ε- Insensitive Loss Algorithm Primal Formulation Dual Formulation Kernel Formulation  Quadratic ε- Insensitive Loss Algorithm  Kernel Ridge Regression & Gaussian Process

Regression = find a function that fits the observations Observations: (1949,100) (1950,117)... (1996,1462) (1997,1469) (1998,1467) (1999,1474) (x,y) pairs

Linear fit... Not so good...

Better linear fit... Take logarithm of y and fit a straight line

Transform back to original So so...

So what is regression about? Construct a model of a process, using examples of the process. Input: x (possibly a vector) Output: f(x) (generated by the process) Examples: Pairs of input and output {y, x} Our model: The function is our estimate of the true function g(x)

Assumption about the process The “fixed regressor model” x(n) Observed input y(n) Observed output g[x(n)] True underlying function  (n) I.I.D noise process with zero mean Data set:

Example  2

Model Sets (examples) g(x) = 0.5 + x + x 2 + 6x 3 11 22 33  1 ={a+bx};  2 ={a+bx+cx 2 };  3 ={a+bx+cx 2 +dx 3 }; Linear; Quadratic; Cubic; 1  2  31  2  3

Idealized regression  g(x) Model Set (our hypothesis set) f opt (x)   Error Find appropriate model family  and find f(x)   with minimum “distance” to g(x) (“error”)

How measure “distance”? Q: What is the distance (difference) between functions f and g?

Margin Slack Variable For Example(xi, yi), function f, Margin slack variable θ: target accuracy in test γ : difference between target accuracy and margin in training

ε- Insensitive Loss Function  Let ε= θ-γ, Margin Slack Variable  Linear ε- Insensitive Loss:  Quadratic ε- Insensitive Loss

Linear ε- Insensitive Loss  a Linear SV Machine ξ ξ Yi-

Basic Idea of SV Regression  Starting point We have input data X = {(x 1,y 1 ), …., (x N,y N )}  Goal We want to find a robust function f(x) that has at most ε deviation from the targets y, while at the same time being as flat as possible.  Idea Simple Regression Problem + Optimization + Kernel Trick

 Thus setting:  Primal Regression Problem

Linear ε- Insensitive Loss Regression min subject to ε  decide Insensitive Zone C  a trade-off between error and ||w||  εand C must be tuned simultaneously Regression is more difficult than Classification?

Parameters used in SV Regression

Dual Formulation Lagrangian function will help us to formulate the dual problem ε: insensitive loss β i * : Lagrange Multiplier ξ i : difference value for points above εband ξ i * : difference value for points below εband Optimality Conditions

Dual Formulation(Cont’) Dual Problem Solving

KKT Optimality Conditions and b  KKT Optimality Conditions  b can be computed as follows This means that the Lagrange multipliers will only be non-zero for points outside the  band. Thus these points are the support vectors  

The Idea of SVM input space feature space     

Kernel Version  Why can we use Kernel? The complexity of a function’s representation depends only on the number of SVs  the complete algorithm can be described in terms of inner product. An implicit mapping to the feature space  Mapping via Kernel

Quadratic ε- Insensitive Loss Regression Problem: min subject to Kernel Formulation

Kernel Ridge Regression & Gaussian Processes  ε= 0  Least Square Linear Regression The weight decay factor is controlled by C  min (λ~1/C) subject to  Kernel Formulation (I: Identity Matrix) is also the mean of a Gaussian distribution

Architecture of SV Regression Machine similar to regression in a three-layered neural network!? b

Conclusion  SVM is a useful alternative to neural network  Two key concepts of SVM optimization kernel trick  Advantages of SV Regression Represent solution by a small subset of training points Ensure the existence of global minimum Ensure the optimization of a reliable eneralization bound

Discussion1: Influence of an insensitivity band on regression quality  17 measured training data points are used.  Left: ε= 0.1  15 SV are chosen  Right: ε= 0.5  6 chosen SV produced a much better regression function

 Enables sparseness within SVs, but guarantees sparseness?  Robust (robust to small changes in data/ model)  Less sensitive to outliers Discussion2: ε- Insensitive Loss

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

Similar presentations

Presentation on theme: "Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

Similar presentations

Presentation on theme: "Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03."— Presentation transcript:

Similar presentations

About project

Feedback