CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.

Slides:



Advertisements
Similar presentations
CHAPTER 13: Alpaydin: Kernel Machines
Advertisements

Machine Learning: Intro and Supervised Classification
CHAPTER 9: Decision Trees
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
CH2 - Supervised Learning Computational learning theory
Data mining in 1D: curve fitting
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Feasibility of learning: the issues solution for infinite hypothesis sets VC generalization bound (mostly lecture 5 on AMLbook.com)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Learning From Data Chichang Jou Tamkang University.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Part I: Classification and Bayesian Learning
Classification and Prediction: Regression Analysis
For Better Accuracy Eick: Ensemble Learning
Probability theory: (lecture 2 on AMLbook.com)
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Evaluating Hypotheses Reading: Coursepack: Learning From Examples, Section 4 (pp )
Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.
Machine Learning CSE 681 CH2 - Supervised Learning.
Learning from observations
Learning from Observations Chapter 18 Through
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
CLASSIFICATION: Ensemble Methods
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
INTRODUCTION TO Machine Learning 3rd Edition
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 10: Logistic Regression. Binary classification Two classes Y = {0,1} Goal is to learn how to correctly classify the input into one of these two.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Goodfellow: Chap 5 Machine Learning Basics
CH. 2: Supervised Learning
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Bias and Variance of the Estimator
INTRODUCTION TO Machine Learning
Introduction to Data Mining, 2nd Edition by
Data Mining Practical Machine Learning Tools and Techniques
Computational Learning Theory
10701 / Machine Learning Today: - Cross validation,
Machine Learning” Lecture 1
Computational Learning Theory
Supervised Learning Berlin Chen 2005 References:
Supervised machine learning: creating a model
Shih-Yang Su Virginia Tech
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

CHAPTER 2: Supervised Learning

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples Class C of a “family car”  Prediction: Is car x a family car?  Knowledge extraction: What do people expect from a family car? Output: Positive (+) and negative (–) examples Input representation: x 1 : price, x 2 : engine power Learn a model f: RxR  {+,-}

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 Training set X

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Class C

Family Car Decision Tree HP>200 HP>100 Price>24K YES NO Yes No Yes No

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 Hypothesis class H Error of h on H

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 S, G, and the Version Space most specific hypothesis, S most general hypothesis, G h  H, between S and G is consistent and make up the version space (Mitchell, 1997)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8 VC Dimension N points can be labeled in 2 N ways as +/– H shatters N if there exists h  H consistent for any of these: VC(H ) = N Does not work for 5 points! An axis-aligned rectangle shatters 4 points only ! rectangles here Vapnik-Chervonenkis x try it: Homework Only 2 +/2- depicted!

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9 How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ? (Blumer et al., 1989) Each strip is at most ε /4 Pr that we miss a strip 1 ‒ ε /4 Pr that N instances miss a strip (1 ‒ ε /4) N Pr that N instances miss 4 strips 4(1 ‒ ε /4) N 4(1 ‒ ε /4) N ≤ δ and (1 ‒ x)≤exp( ‒ x) 4exp( ‒ ε N/4) ≤ δ and N ≥ (4/ ε )log(4/ δ ) Probably Approximately Correct (PAC) Learning  April 2011 Skipped!

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10 Use the simpler one because Simpler to use (lower computational complexity) Easier to train (lower space complexity) Easier to explain (more interpretable) Generalizes better (lower variance - Occam’s razor) Noise and Model Complexity 2 Other Classification Technique: Decision Trees and kNN

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 Multiple Classes, C i i=1,...,K Train hypotheses h i (x), i =1,...,K:

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 Regression

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Finding Regresssion Coefficients How to find w 1 and w 0 ? Solve: dE/dw 1 =0 and dE/dw 0 =0 And solve the two obtained equations! Ungraded Homework!

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Model Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique solution The need for inductive bias, assumptions about H Generalization: How well a model performs on new data Overfitting: H more complex than C or f Underfitting: H less complex than C or f

Underfitting and Overfitting Overfitting Underfitting: when model is too simple, both training and test errors are large Complexity of a Decision Tree := number of nodes It uses Underfitting Complexity of the classification function Overfitting: when model is too complex and test errors are large although training errors are small.

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Cross-Validation Two errors: training error, and testing error usually called generalization error. Typically, the training error is smaller than the generalization error. To estimate generalization error, we need data unseen during training. We could split the data as  Training set (50%)  Validation set (25%)  optional, for selecting ML algorithm parameters (e.g. model complexity)  Test (publication) set (25%) Resampling when there is few data Error on new examples; actually the testing error is used as an estimation of the generalization error!

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Triple Trade-Off There is a trade-off between three factors (Dietterich, 2003): 1. Complexity of H, c (H), 2. Training set size, N, 3. Generalization error, E on new data  As N  E   As c (H)  first E  and then E    s c (H)  the training error decreases for some time and then stays constant (frequently at 0) overfitting

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 Dimensions of a Supervised Learner 1. Model: 2. Loss function: 3. Optimization procedure: data set Model parameter Remark: This procedure is typical for Parametric approaches to supervised learning; Non-parametric approaches work differently!