Machine Learning Basics

Machine Learning Basics
周岚

Supervised Learning Algorithms
Learn to associate some input with some output given a training set of examples of inputs x and outputs y

Probabilistic Supervised Learning
Estimating a probability distribution p(y | x) Using maximum likelihood estimation to find the best parameter vector θ for a parametric family of distributions p(y | x; θ) Linear regression logistic regression

Support Vector Machines
One of the most influential approaches Similar to logistic regression in that it is driven by a linear function kernel trick x(i) is a training example and α is a vector of coefficients Replace x by the output of a given feature function φ(x) Replace the dot product with a function

Make predictions using the function kernel trick is powerful allow us to learn models that are nonlinear often admit an implementation that is significantly more computational efficient Gaussian kernel

drawback to kernel machines the cost of evaluating the decision function is linear in the number of training examples mitigate this by learning an α vector that contains mostly zeros classifying a new example only for the training examples that have non-zero αi

Other Simple Supervised Learning Algorithms
k-nearest neighbors Very high capacity obtain high accuracy given a large training set Weakness high computational cost generalize very badly given a small, finite training set cannot learn that one feature is more discriminative than another

Other Simple Supervised Learning Algorithms
decision tree non-parametric regularized with size constraints

Unsupervised Learning Algorithms
A classic unsupervised learning task find the “best” representation of the data preserves as much information about x as possible keeping the representation simpler or more accessible than x itself Three ways of defining a simpler representation lower dimensional representations sparse representations independent representations

Principal Components Analysis
PCA learns a representation that has lower dimensionality than the original input whose elements have no linear correlation with each other

Principal Components Analysis

k-means Clustering divide the training set into k different clusters of examples that are near each other initializing k different centroids {μ(1), ,μ(k)} to different values alternating between two different steps until convergence each training example is assigned to cluster i, where i is the index of the nearest centroid μ(i) each centroid μ(i) is updated to the mean of all training examples x(j) assigned to cluster I One difficulty is that the clustering problem is inherently ill-posed

Stochastic Gradient Descent
An extension of the gradient descent algorithm gradient descent requires computing Computational cost : O(m)

Stochastic Gradient Descent
sample a minibatch of examples B = {x(1), , x(m) } The estimate of the gradient is formed as The stochastic gradient descent algorithm then follows the estimated gradient downhill is the learning rate

Building a Machine Learning Algorithm
Nearly all deep learning algorithms can be described as particular instances of a fairly simple recipe: combine a specification of a dataset, a cost function, an optimization procedure and a model Linear regression algorithm a dataset consisting of X and y cost function the model specification optimization algorithm defined by solving for where the gradient of the cost is zero using the normal equations. Recognizing that most machine learning algorithms can be described using this recipe helps to see the different algorithms as part of a taxonomy of methods for doing related tasks that work for similar reasons, rather than as a long list of algorithms that each have separate justifications.

Challenges Motivating Deep Learning
The simple machine learning algorithms work very well on a wide variety of important problems. not succeeded in solving the central problems in AI The development of deep learning was motivated in part by the failure of traditional algorithms to generalize well on such AI tasks.

The Curse of Dimensionality
Many machine learning problems become exceedingly difficult when the number of dimensions in the data is high. known as the curse of dimensionality The number of possible distinct configurations of a set of variables increases exponentially as the number of variables increases

The Curse of Dimensionality

Local Constancy and Smoothness Regularization
In order to generalize well, machine learning algorithms need to be guided by prior beliefs about what kind of function they should learn. implicit " priors" : smoothness prior The function we learn should not change very much within a small region

work extremely well enough examples for the learning algorithm to observe high points on most peaks and low points on most valleys of the true underlying function to be learned. If the function additionally behaves differently in different regions, it can become extremely complicated to describe with a set of training examples. core idea in deep learning the data was generated by the composition of factors or features, potentially at multiple levels in a hierarchy.

Manifold Learning A manifold is a connected region

Manifold Learning Many machine learning problems seem hopeless
if learn functions with interesting variations across all of Rn Manifold learning algorithms surmount this obstacle by assuming that most of Rn consists of invalid inputs, and that interesting inputs occur only along a collection of manifolds Evidence in favor of this assumption the probability distribution over images, text strings, and sounds that occur in real life is highly concentrated. we can also imagine such neighborhoods and transformations, at least informally.

Manifold Learning The data lies on a low-dimensional manifold
represent the data in terms of coordinates on the manifold rather than in terms of coordinates in Rn. Example : roads as 1-D manifolds embedded in 3-D space Extracting these manifold coordinates is challenging, but holds the promise to improve many machine learning algorithms.

Thank you!

Machine Learning Basics

Similar presentations

Presentation on theme: "Machine Learning Basics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning Basics

Similar presentations

Presentation on theme: "Machine Learning Basics"— Presentation transcript:

Similar presentations

About project

Feedback