Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.

Introduction to Machine Learning Supervised Learning 姓名 : 李政軒

Learning a Class from Examples Class C of a family car – Knowledge extraction: people expect from a family car? – Prediction: car x a family car? Output: Positive and negative examples Input : x 1 : price x 2 : engine power

Training set X Training set for the class of a family car. Each data point corresponds to one examole car, and the coordinates of the point indicate the price and engine power of that car “+” denotes a positive example of the class “-” denotes a negative example

Class C We may have to believe that for a car to be a family car,its price and engine power should be in a cetain range.

Hypothesis class H The class of family car defined by learing system C is the actual class h is our induced hypothesis

S, G, and the Version Space most specific hypothesis, S most general hypothesis, G Any h  H, between S and G is a valid hypothesis with no error, said to be consistent with the training set, and make up the version space. version space

Margin Choose h with largest margin – We choose the hypothesis with the largest margin, for best separation. The shaded instances are those that define the margin; other instances can be remove without affecting h.

VC Dimension N points can be labeled in 2 N ways as positive and negative H shatters N if there exists h  H consistent for any of these: VC( H ) = N Summary – The maximum number of points that can be shattered by H is called the VC dimension of H, and measures the capacity of H. – A look up table has infinite VC dimension.

Probably Approximately Correct (PAC) Learning The pr of a positive example falling in here (and causing an error) is at most ε. Each strip is at most ε/4 Pr that we miss a strip 1 ‒ ε/4 Pr that N instances miss a strip (1 ‒ ε/4) N Pr that N instances miss 4 strips 4(1 ‒ ε/4) N 4(1 ‒ ε/4) N ≤ δ and (1 ‒ x) ≤ exp[ ‒ x] 4exp[ ‒ εN/4] ≤ δ and N ≥ (4/ε)log(4/δ)

Noise and Model Complexity Use the simpler model because Simpler to use – Easy to check a point – Easy to check data instance Easier to train – Easy to find the corner values of a rectangle Easier to explain – more interpretable Generalizes better – less variance and less affected by single instances

There are three hypotheses induced, each one covering the instances of one class and leaving outside the instances of the other two classes. “?” are reject regions where no, or more than one, classis chosen. Learning Multiple Classes, C i, i =1,...,K Train hypotheses h i (x), i =1,..., K :

Regression Linear, second-order and sixth-order polynomials are fitted to the same set of points.

Model Selection & Generalization Learning is an ill-posed problem – Data is not sufficient to find a unique solution The need for inductive bias (assumptions ) about H – The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered – A classical example of an inductive bias is Occam's Razor, assuming that the simplest consistent hypothesis about the target function is actually the best.

Model Selection & Generalization Generalization: – How well a model trained on the training set predicts the right output for new instances. Underfitting: H less complex than C or f – The hypothesis is less complex than the function. Overfitting: H more complex than C or f – The hypothesis is more complex than the function.

Triple Trade-Off There is a trade-off between three factors Complexity of H, c ( H )  c ( H ) : the capacity of the hypothesis class The amount of training data, N Generalization error, E, on new data  As N  E   As c ( H )  first E  and then E 

Cross-Validation To estimate generalization error, we need data unseen during training. We split the data as – Training set (50%) – Validation set (25%) : validation error To test the generalization ability – Test (publication) set (25%) : expected error Contains examples not used in training or validation

Cross-Validation For example – To find the right order in polynomial regression. – Given a number of candidate polynomials of different orders. – For each order, we find the coefficients on the training set, calculate their errors on the validation set, and take the one that has the least validation error as the best polynomial. Resampling when there is few data

1. Model : – For example, in linear regression, the model is the linear function of the input whose slope and intercept are the parameters learned from the data. 2. Loss function : 3. Optimization procedure : Dimensions of a Supervised Learner

Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.

Similar presentations

Presentation on theme: "Introduction to Machine Learning Supervised Learning 姓名 : 李政軒."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.

Similar presentations

Presentation on theme: "Introduction to Machine Learning Supervised Learning 姓名 : 李政軒."— Presentation transcript:

Similar presentations

About project

Feedback