Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Pattern Recognition and Machine Learning
CH2 - Supervised Learning Computational learning theory
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Data mining in 1D: curve fitting
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
x – independent variable (input)
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Statistics and Machine Learning Fall, 2005 鮑興國 and 李育杰 National Taiwan University of Science and Technology.
Computational Learning Theory
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Part I: Classification and Bayesian Learning
Classification and Prediction: Regression Analysis
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Inductive learning Simplest form: learn a function from examples
Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.
Machine Learning CSE 681 CH2 - Supervised Learning.
Learning from observations
Learning from Observations Chapter 18 Through
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
Data Mining and Decision Support
Machine Learning 5. Parametric Methods.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Chapter 18 Section 1 – 3 Learning from Observations.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.
Machine Learning Basics ( 1/2 ) 周岚. Machine Learning Basics What do we mean by learning? Mitchell (1997) : A computer program is said to learn from experience.
Support Vector Machines
Introduce to machine learning
CH. 2: Supervised Learning
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Bias and Variance of the Estimator
INTRODUCTION TO Machine Learning
Computational Learning Theory
10701 / Machine Learning Today: - Cross validation,
Computational Learning Theory
Supervised Learning Berlin Chen 2005 References:
Supervised machine learning: creating a model
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
Support Vector Machines 2
Presentation transcript:

Introduction to Machine Learning Supervised Learning 姓名 : 李政軒

Learning a Class from Examples Class C of a family car – Knowledge extraction: people expect from a family car? – Prediction: car x a family car? Output: Positive and negative examples Input : x 1 : price x 2 : engine power

Training set X Training set for the class of a family car. Each data point corresponds to one examole car, and the coordinates of the point indicate the price and engine power of that car “+” denotes a positive example of the class “-” denotes a negative example

Class C We may have to believe that for a car to be a family car,its price and engine power should be in a cetain range.

Hypothesis class H The class of family car defined by learing system C is the actual class h is our induced hypothesis

S, G, and the Version Space most specific hypothesis, S most general hypothesis, G Any h  H, between S and G is a valid hypothesis with no error, said to be consistent with the training set, and make up the version space. version space

Margin Choose h with largest margin – We choose the hypothesis with the largest margin, for best separation. The shaded instances are those that define the margin; other instances can be remove without affecting h.

VC Dimension N points can be labeled in 2 N ways as positive and negative H shatters N if there exists h  H consistent for any of these: VC( H ) = N Summary – The maximum number of points that can be shattered by H is called the VC dimension of H, and measures the capacity of H. – A look up table has infinite VC dimension.

Probably Approximately Correct (PAC) Learning The pr of a positive example falling in here (and causing an error) is at most ε. Each strip is at most ε/4 Pr that we miss a strip 1 ‒ ε/4 Pr that N instances miss a strip (1 ‒ ε/4) N Pr that N instances miss 4 strips 4(1 ‒ ε/4) N 4(1 ‒ ε/4) N ≤ δ and (1 ‒ x) ≤ exp[ ‒ x] 4exp[ ‒ εN/4] ≤ δ and N ≥ (4/ε)log(4/δ)

Noise and Model Complexity Use the simpler model because Simpler to use – Easy to check a point – Easy to check data instance Easier to train – Easy to find the corner values of a rectangle Easier to explain – more interpretable Generalizes better – less variance and less affected by single instances

There are three hypotheses induced, each one covering the instances of one class and leaving outside the instances of the other two classes. “?” are reject regions where no, or more than one, classis chosen. Learning Multiple Classes, C i, i =1,...,K Train hypotheses h i (x), i =1,..., K :

Regression Linear, second-order and sixth-order polynomials are fitted to the same set of points.

Model Selection & Generalization Learning is an ill-posed problem – Data is not sufficient to find a unique solution The need for inductive bias (assumptions ) about H – The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered – A classical example of an inductive bias is Occam's Razor, assuming that the simplest consistent hypothesis about the target function is actually the best.

Model Selection & Generalization Generalization: – How well a model trained on the training set predicts the right output for new instances. Underfitting: H less complex than C or f – The hypothesis is less complex than the function. Overfitting: H more complex than C or f – The hypothesis is more complex than the function.

Triple Trade-Off There is a trade-off between three factors Complexity of H, c ( H )  c ( H ) : the capacity of the hypothesis class The amount of training data, N Generalization error, E, on new data  As N  E   As c ( H )  first E  and then E 

Cross-Validation To estimate generalization error, we need data unseen during training. We split the data as – Training set (50%) – Validation set (25%) : validation error To test the generalization ability – Test (publication) set (25%) : expected error Contains examples not used in training or validation

Cross-Validation For example – To find the right order in polynomial regression. – Given a number of candidate polynomials of different orders. – For each order, we find the coefficients on the training set, calculate their errors on the validation set, and take the one that has the least validation error as the best polynomial. Resampling when there is few data

1. Model : – For example, in linear regression, the model is the linear function of the input whose slope and intercept are the parameters learned from the data. 2. Loss function : 3. Optimization procedure : Dimensions of a Supervised Learner