Download presentation

Presentation is loading. Please wait.

Published byGenevieve Mallery Modified over 3 years ago

1
VC theory, Support vectors and Hedged prediction technology

2
Overfitting in classification Assume a family C of classifiers of points in feature space F. A family of classifiers is a map from C F to {0,1} (Negative and positive class). For each subset X of F and each c in C, c(X) defines a partitioning of X into two classes. C shatters X if every partitioning of X is accomplished by some c in C If every point set X of size d is shattered by C, then the VC dimension is at least d. If a point set of d+1 elements cannot be shattered by C, then the VC-dimension is at most d.

3
VC-dimension of hyperplanes The set of points on the line shatters any two points, but not three The set of lines in the plane shatters any three non-collinear points, but no four points. Any d+2 points in E^d can be partitioned into two blocks whose convex hulls intersect. VC-dimension of hyperplanes in E^d is thus d+1.

4
Why VC-dimension? Elegant and pedagogical, not very useful. Bounds future error of classifier, PAC-learning. Exchangeable distribution of (xi, yi). For first N points, training error for c is observed error rate for c. Goodness of selecting from C a classifier with best performance on training set depends on VC-dimension h:

5
Why VC-dimension?

6
Classify with hyperplanes Frank Rosenblatt (1928 – 1971) Pioneering work in classifying by hyperplanes in high-dimensional spaces. Criticized by Minsky-Papert, since real classes are not normally linearly separable. ANN research taken up again in 1980:s, with non-linear mappings to get improved separation. Predecessor to SVM/kernel methods

7
Find parallel hyperplanes Separate examples by wide margin hyperplanes (classifications). Enclose examples between hyperplanes (regression). If necessary, non-linearly map examples to high-dimensional space where they are better separated.

8
Find parallel hyperplanes Classification Red true separating plane. Blue: wide margin separation in sample Classify by plane between blue planes

9
Find parallel hyperplanes Regression Red: true central plane. Blue: narrowest margin enclosing sample New xk : predict yk so (xk, yk) lies on mid- plane (dotted).

15
From vector to scalar product

16
Soft Margins

17
Quadratic programming goes through also with soft margins. Specification of softness constant C is part of most packages. However, no prior rule for setting C is established, and experimentation is necessary for each application. Choice is between narrowing margin, allowing more outliers, and using a more liberal kernel (to be described).

18
SVM packages Inputs xi, yi, and KERNEL and SOFTNESS information Only output is , non-zero coefficients indicate support vectors. Hyperplane obtained by

19
Kernel Trick

21
Example: 2D space (x1,x2). Map to 5D space (c1*x1, c2*x2, c3*x1^2, c4*x1*x2, c5*x2^2). K(x,y)=(x y+1)^2 =2*x1*y1+2*x2*y2+x1^2*y1^2+x2^2*y2^2+2*x1*x2*y1*y2+1 = (x) (y), Where (x)= ((x1,x2)) = (√2x1, √2x2, x1^2, √2x1*x2, x2^2). Hyperplanes in R^5 are mapped back to conic sections in R^2!!

22
Kernel Trick Gaussian Kernel: K(x,y) = exp(-||x-y||^2/ 2

Similar presentations

OK

580.691 Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

580.691 Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google