Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

Similar presentations


Presentation on theme: "Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03."— Presentation transcript:

1 Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03

2 Outline  Regression Background  Linear ε- Insensitive Loss Algorithm Primal Formulation Dual Formulation Kernel Formulation  Quadratic ε- Insensitive Loss Algorithm  Kernel Ridge Regression & Gaussian Process

3 Regression = find a function that fits the observations Observations: (1949,100) (1950,117)... (1996,1462) (1997,1469) (1998,1467) (1999,1474) (x,y) pairs

4 Linear fit... Not so good...

5 Better linear fit... Take logarithm of y and fit a straight line

6 Transform back to original So so...

7 So what is regression about? Construct a model of a process, using examples of the process. Input: x (possibly a vector) Output: f(x) (generated by the process) Examples: Pairs of input and output {y, x} Our model: The function is our estimate of the true function g(x)

8 Assumption about the process The “fixed regressor model” x(n) Observed input y(n) Observed output g[x(n)] True underlying function  (n) I.I.D noise process with zero mean Data set:

9 Example  2

10 Model Sets (examples) g(x) = 0.5 + x + x 2 + 6x 3 11 22 33  1 ={a+bx};  2 ={a+bx+cx 2 };  3 ={a+bx+cx 2 +dx 3 }; Linear; Quadratic; Cubic; 1  2  31  2  3

11 Idealized regression  g(x) Model Set (our hypothesis set) f opt (x)   Error Find appropriate model family  and find f(x)   with minimum “distance” to g(x) (“error”)

12 How measure “distance”? Q: What is the distance (difference) between functions f and g?

13 Margin Slack Variable For Example(xi, yi), function f, Margin slack variable θ: target accuracy in test γ : difference between target accuracy and margin in training

14 ε- Insensitive Loss Function  Let ε= θ-γ, Margin Slack Variable  Linear ε- Insensitive Loss:  Quadratic ε- Insensitive Loss

15 Linear ε- Insensitive Loss  a Linear SV Machine ξ ξ Yi-

16 Basic Idea of SV Regression  Starting point We have input data X = {(x 1,y 1 ), …., (x N,y N )}  Goal We want to find a robust function f(x) that has at most ε deviation from the targets y, while at the same time being as flat as possible.  Idea Simple Regression Problem + Optimization + Kernel Trick

17  Thus setting:  Primal Regression Problem

18 Linear ε- Insensitive Loss Regression min subject to ε  decide Insensitive Zone C  a trade-off between error and ||w||  εand C must be tuned simultaneously Regression is more difficult than Classification?

19 Parameters used in SV Regression

20 Dual Formulation Lagrangian function will help us to formulate the dual problem ε: insensitive loss β i * : Lagrange Multiplier ξ i : difference value for points above εband ξ i * : difference value for points below εband Optimality Conditions

21 Dual Formulation(Cont’) Dual Problem Solving

22 KKT Optimality Conditions and b  KKT Optimality Conditions  b can be computed as follows This means that the Lagrange multipliers will only be non-zero for points outside the  band. Thus these points are the support vectors  

23 The Idea of SVM input space feature space     

24 Kernel Version  Why can we use Kernel? The complexity of a function’s representation depends only on the number of SVs  the complete algorithm can be described in terms of inner product. An implicit mapping to the feature space  Mapping via Kernel

25 Quadratic ε- Insensitive Loss Regression Problem: min subject to Kernel Formulation

26 Kernel Ridge Regression & Gaussian Processes  ε= 0  Least Square Linear Regression The weight decay factor is controlled by C  min (λ~1/C) subject to  Kernel Formulation (I: Identity Matrix) is also the mean of a Gaussian distribution

27 Architecture of SV Regression Machine similar to regression in a three-layered neural network!? b

28 Conclusion  SVM is a useful alternative to neural network  Two key concepts of SVM optimization kernel trick  Advantages of SV Regression Represent solution by a small subset of training points Ensure the existence of global minimum Ensure the optimization of a reliable eneralization bound

29 Discussion1: Influence of an insensitivity band on regression quality  17 measured training data points are used.  Left: ε= 0.1  15 SV are chosen  Right: ε= 0.5  6 chosen SV produced a much better regression function

30  Enables sparseness within SVs, but guarantees sparseness?  Robust (robust to small changes in data/ model)  Less sensitive to outliers Discussion2: ε- Insensitive Loss


Download ppt "Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03."

Similar presentations


Ads by Google