Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning.

Similar presentations


Presentation on theme: "Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning."— Presentation transcript:

1 Mehdi Ghayoumi MSB rm 132 mghayoum@kent.edu Ofc hr: Thur, 11-12 a Machine Learning

2 Announcements Hw1 Feb-11 and Hw2 Mar-11 Programming HW1 Feb-25 and Programming Hw2 Apr-1 Quiz 1 March-11 and Quiz 2 Apr-8 First project report Mar-16,18. Second project report Apr-13,15. April 29 – Exam review May 4 – Final Exam

3 Machine Learning Announcements This Monday from 2 to 4 Mar-2 Project presentation Room 160 MSB

4 Machine Learning

5

6 Regularization: ridge regression, Lasso, group Lasso, graph-guided group lasso, (weighted) total variation, `1-`2 mixed norms, elastic net, fused Lasso, `p, `0 regularization, sparse coding, hierarchical sparse coding, nuclear norm, trace norm, K-SVD, basis pursuit denoising, quadratic basis pursuit denoising, sparse portfolio optimization, lowrank approximation, overlapped Schatten 1-norm, latent Schatten 1- norm, tensor completion, tensor denoising, composite penalties, sparse multicomposite problems, sparse PCA, structured sparsity;

7 Machine Learning Optimization: forward-backward splitting, alternating directions method of multipliers (ADMM), alternating proximal algorithms, gradient projection, stochastic generalized gradient, bilevel optimization, non-convex proximal splitting, block coordinate descent algorithm, parallel coordinate descent algorithm, stochastic gradient descent, semidefinite programming, robust iterative hard thresholding, alternating nonnegative least squares, hierarchical alternating least squares, conditional gradient algorithm, Frank-Wolfe algorithm, hybrid conditional gradientsmoothing algorithm;

8 Machine Learning Kernels and support vector machines: support estimation, kernel PCA, subspace learning, reproducing kernel Hilbert spaces, output kernel learning, multi-task learning, support vector machines, least squares support vector machines, Lagrange duality, primal and dual representations, feature map, Mercer kernel, Nyström approximation, Fenchel duality, kernel ridge regression, kernel mean embedding, causality, importance reweighting, domain adaptation, multilayer support vector machine, Gaussian process regression, kernel recursive least-squares, dictionary learning, quantized kernel LMS;

9 Machine Learning

10 SVM is related to statistical learning theory SVM was first introduced in 1992 SVM became popular because of its success in handwritten digit recognition SVM is now regarded as an important example of “kernel methods”, one of the key area in machine learning Machine Learning

11 An SVM is an abstract learning machine which will learn from a training data set and attempt to generalize and make correct predictions on novel data.

12 r ρ Distance from example x i to the separator is Machine Learning

13 Let training set {(x i, y i )} i=1..n, x i  R d, y i  {-1, 1} be separated by a hyperplane with margin ρ. Then for each training example (x i, y i ): w T x i + b ≤ - ρ/2 if y i = -1 w T x i + b ≥ ρ/2 if y i = 1 y i (w T x i + b) ≥ ρ/2  Machine Learning

14 For every support vector x s the above inequality is an equality. After rescaling w and b by ρ/2 in the equality, we obtain that distance between each x s and the hyperplane is Then the margin can be expressed through (rescaled) w and b as: Machine Learning

15 Then we can formulate the quadratic optimization problem: Find w and b such that is maximized and for all (x i, y i ), i=1..n : y i (w T x i + b) ≥ 1 Machine Learning

16 Which can be reformulated as: Find w and b such that Φ(w) = ||w|| 2 =w T w is minimized and for all (x i, y i ), i=1..n : y i (w T x i + b) ≥ 1 Machine Learning

17 Need to optimize a quadratic function subject to linear constraints. The solution involves constructing a dual problem where a Lagrange multiplier α i is associated with every inequality constraint in the primal problem: Find α 1 …α n such that Q( α ) = Σ α i - ½ ΣΣ α i α j y i y j x i T x j is maximized and (1) Σ α i y i = 0 (2) α i ≥ 0 for all α i Machine Learning

18 We will call this the primal formulation: where α i are Lagrange multipliers, and thus α i ≥ 0. At the minimum, we can take the derivatives with respect to b and w and set these to zero:

19 Machine Learning we get the dual formulation, also known as the Wolfe dual: which must be maximized with respect to the α i subject to the constraints:

20 Given a solution α 1 …α n to the dual problem, solution to the primal is: Each non-zero α i indicates that corresponding x i is a support vector. Then the classifying function is: keep in mind that solving the optimization problem involved computing the inner products x i T x j between all training points. w = Σ α i y i x i b = y k - Σ α i y i x i T x k for any α k > 0 f(x) = Σ α i y i x i T x + b Machine Learning

21 What if the training set is not linearly separable? Slack variables ξ i can be added to allow misclassification of difficult or noisy examples, resulting margin called soft. ξiξi ξiξi Machine Learning

22 The old formulation: Modified formulation incorporates slack variables: Parameter C can be viewed as a way to control over fitting Find w and b such that Φ(w) =w T w is minimized and for all (x i,y i ), i=1..n : y i (w T x i + b) ≥ 1 Find w and b such that Φ(w) =w T w + C Σ ξ i is minimized and for all (x i,y i ), i=1..n : y i (w T x i + b) ≥ 1 – ξ i,, ξ i ≥ 0 Machine Learning

23 Dual problem is identical to separable case : Again, x i with non-zero α i will be support vectors. Solution to the dual problem is: Find α 1 …α N such that Q( α ) = Σ α i - ½ ΣΣ α i α j y i y j x i T x j is maximized and (1) Σ α i y i = 0 (2) 0 ≤ α i ≤ C for all α i w = Σ α i y i x i b= y k (1- ξ k ) - Σ α i y i x i T x k for any k s.t. α k >0 f(x) = Σ α i y i x i T x + b Machine Learning

24 Vapnik has proved the following: The class of optimal linear separators has VC dimension h bounded as where ρ is the margin, D is the diameter of the smallest sphere that can enclose all of the training examples, and m 0 is the dimensionality. Machine Learning

25 0 0 0 x2x2 x x x Datasets that are linearly separable with some noise work out great: But what are we going to do if the dataset is just too hard? How about… mapping data to a higher-dimensional space: Machine Learning

26 General idea: the original feature space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x → φ(x) Machine Learning

27 Thank you!


Download ppt "Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, 11-12 a Machine Learning."

Similar presentations


Ads by Google