Presentation is loading. Please wait.

Presentation is loading. Please wait.

Supervised Learning Recap

Similar presentations


Presentation on theme: "Supervised Learning Recap"— Presentation transcript:

1 Supervised Learning Recap
Machine Learning

2 Last Time Support Vector Machines Kernel Methods

3 Today Review of Supervised Learning Unsupervised Learning
(Soft) K-means clustering Expectation Maximization Spectral Clustering Principle Components Analysis Latent Semantic Analysis

4 Supervised Learning Linear Regression Logistic Regression
Graphical Models Hidden Markov Models Neural Networks Support Vector Machines Kernel Methods

5 Major concepts Gaussian, Multinomial, Bernoulli Distributions
Joint vs. Conditional Distributions Marginalization Maximum Likelihood Risk Minimization Gradient Descent Feature Extraction, Kernel Methods

6 Some favorite distributions
Bernoulli Multinomial Gaussian

7 Maximum Likelihood Identify the parameter values that yield the maximum likelihood of generating the observed data. Take the partial derivative of the likelihood function Set to zero Solve NB: maximum likelihood parameters are the same as maximum log likelihood parameters

8 Maximum Log Likelihood
Why do we like the log function? It turns products (difficult to differentiate) and turns them into sums (easy to differentiate) log(xy) = log(x) + log(y) log(xc) = c log(x)

9 Risk Minimization Pick a loss function
Squared loss Linear loss Perceptron (classification) loss Identify the parameters that minimize the loss function. Take the partial derivative of the loss function Set to zero Solve

10 Frequentists v. Bayesians
Point estimates vs. Posteriors Risk Minimization vs. Maximum Likelihood L2-Regularization Frequentists: Add a constraint on the size of the weight vector Bayesians: Introduce a zero-mean prior on the weight vector Result is the same!

11 L2-Regularization Frequentists: Bayesians:
Introduce a cost on the size of the weights Bayesians: Introduce a prior on the weights

12 Types of Classifiers Generative Models Discriminative Models
Highest resource requirements. Need to approximate the joint probability Discriminative Models Moderate resource requirements. Typically fewer parameters to approximate than generative models Discriminant Functions Can be trained probabilistically, but the output does not include confidence information

13 Linear Regression Fit a line to a set of points

14 Linear Regression Extension to higher dimensions Polynomial fitting
Arbitrary function fitting Wavelets Radial basis functions Classifier output

15 Logistic Regression Fit gaussians to data for each class
The decision boundary is where the PDFs cross No “closed form” solution to the gradient. Gradient Descent

16 Graphical Models General way to describe the dependence relationships between variables. Junction Tree Algorithm allows us to efficiently calculate marginals over any variable.

17 Junction Tree Algorithm
Moralization “Marry the parents” Make undirected Triangulation Remove cycles >4 Junction Tree Construction Identify separators such that the running intersection property holds Introduction of Evidence Pass slices around the junction tree to generate marginals

18 Hidden Markov Models Sequential Modeling
Generative Model Relationship between observations and state (class) sequences

19 Perceptron Step function used for squashing.
Classifier as Neuron metaphor.

20 Perceptron Loss Classification Error vs. Sigmoid Error
Loss is only calculated on Mistakes Perceptrons use strictly classification error

21 Neural Networks Interconnected Layers of Perceptrons or Logistic Regression “neurons”

22 Neural Networks There are many possible configurations of neural networks Vary the number of layers Size of layers

23 Support Vector Machines
Maximum Margin Classification Small Margin Large Margin

24 Support Vector Machines
Optimization Function Decision Function

25 Visualization of Support Vectors

26 Questions? Now would be a good time to ask questions about Supervised Techniques.

27 Clustering Identify discrete groups of similar data points
Data points are unlabeled

28 Recall K-Means Algorithm Select K – the desired number of clusters
Initialize K cluster centroids For each point in the data set, assign it to the cluster with the closest centroid Update the centroid based on the points assigned to each cluster If any data point has changed clusters, repeat

29 k-means output

30 Soft K-means In k-means, we force every data point to exist in exactly one cluster. This constraint can be relaxed. Minimizes the entropy of cluster assignment

31 Soft k-means example

32 Soft k-means We still define a cluster by a centroid, but we calculate the centroid as the weighted mean of all the data points Convergence is based on a stopping threshold rather than changed assignments

33 Gaussian Mixture Models
Rather than identifying clusters by “nearest” centroids Fit a Set of k Gaussians to the data. p(x) = \pi_0f_0(x) + \pi_1f_1(x) + \pi_2f_2(x) + \ldots + \pi_kf_k(x)

34 GMM example

35 Gaussian Mixture Models
Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,

36 Graphical Models with unobserved variables
What if you have variables in a Graphical model that are never observed? Latent Variables Training latent variable models is an unsupervised learning application uncomfortable amused sweating laughing

37 Latent Variable HMMs We can cluster sequences using an HMM with unobserved state variables We will train the latent variable models using Expectation Maximization

38 Expectation Maximization
Both the training of GMMs and Gaussian Models with latent variables are accomplished using Expectation Maximization Step 1: Expectation (E-step) Evaluate the “responsibilities” of each cluster with the current parameters Step 2: Maximization (M-step) Re-estimate parameters using the existing “responsibilities” Related to k-means

39 Questions One more time for questions on supervised learning…

40 Next Time Gaussian Mixture Models (GMMs) Expectation Maximization


Download ppt "Supervised Learning Recap"

Similar presentations


Ads by Google