Machine Learning – a Probabilistic Perspective Introduction Cui Jiaqi 2018.4.8
The goal of machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest.
Types of machine learning TYPE1: the predictive or supervised learning approach TYPE2: descriptive or unsupervised learning approach TYPE3: reinforcement learning Learning how to act or behave when given occasional reward or punishment signals.
Supervised learning Classification : - to make predictions on novel inputs - MAP (maximum a posteriori ) estimate Regression:like classification except the response variable is continuous
Unsupervised learning just given output data, without any inputs two differences from the supervised case: unsupervised learning is unconditional density estimation instead of is a vector of features, so we need to create multivariate probability models.
Discovering clusters to estimate the distribution over the number of clusters to estimate which cluster each point belongs to represent the cluster to which data point i is assigned.
Discovering latent factors dimensionality reduction: principal components analysis Discovering graph structure a set of correlated variables to discover which ones are most correlated with which others. graph G to discover new knowledge, and to get better joint probability density estimators. Matrix completion
K-nearest neighbors are the (indices of the) K nearest points to x in D is the indicator function defined as follows: Euclidean distance not work well with high dimensional inputs
Linear regression The connection between linear regression and Gaussians polynomial regression
Logistic regression generalize linear regression to the (binary) classification replace the Gaussian distribution for y with a Bernoulli distribution compute a linear combination of the inputs, but then pass this through a function that ensures 0 ≤ μ(x) ≤ 1 by defining
Logistic regression
Model selection misclassification rate about 80% of the data for the training set, and 20% for the validation set cross validation