CH. 2: Supervised Learning

Slides:



Advertisements
Similar presentations
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Advertisements

CH2 - Supervised Learning Computational learning theory
Data mining in 1D: curve fitting
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Instructor : Dr. Saeed Shiry
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Feasibility of learning: the issues solution for infinite hypothesis sets VC generalization bound (mostly lecture 5 on AMLbook.com)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
The Nature of Statistical Learning Theory by V. Vapnik
x – independent variable (input)
Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Part I: Classification and Bayesian Learning
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.
Machine Learning CSE 681 CH2 - Supervised Learning.
Learning from observations
Learning from Observations Chapter 18 Through
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
Machine Learning 5. Parametric Methods.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.
Dan Roth Department of Computer and Information Science
Computational Learning Theory
CSE 4705 Artificial Intelligence
Support Vector Machines
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Data Mining Lecture 11.
Bias and Variance of the Estimator
INTRODUCTION TO Machine Learning
Introduction to Data Mining, 2nd Edition by
Data Mining Practical Machine Learning Tools and Techniques
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
INF 5860 Machine learning for image classification
Computational Learning Theory
10701 / Machine Learning Today: - Cross validation,
Computational Learning Theory
CSCI B609: “Foundations of Data Science”
Mathematical Foundations of BME Reza Shadmehr
Machine learning overview
Machine Learning: UNIT-3 CHAPTER-2
Supervised Learning Berlin Chen 2005 References:
Lecture 14 Learning Inductive inference
Supervised machine learning: creating a model
Shih-Yang Su Virginia Tech
INTRODUCTION TO Machine Learning
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

CH. 2: Supervised Learning Supervised learning (SL) learns an unknown mapping f from the input i to an output o whose correct answers are provided by a supervisor. Given a training set figure out f. 1

2.1 Learning from Examples Example: Learn the class C of a “family car” Car representation: where x1: price, x2: engine power Given a training set 2

Through discussing with experts and analyzing the training data, a hypothesis class H of car class C, is defined as 3

Problem: Given X and H, the learning algorithm A attempts to find an that minimizes the error , where, 4

S, G, and Version Space Version space: For all hypotheses between S and G. 5

Choose h with largest margin 6

2.2 Vapnik-Chervonenkis (VC) Dimension N points can be labeled in 2N ways as +/– (or 1/0), e.g., N = 3 , if that separates +/– examples, H shatters N points, e.g., H: line class and rectangle class both can shatter 3 points VC(H): the maximum number of points that can be shattered by H e.g., VC(line) = 3; VC(axis-aligned rectangle) = 4 7

Examples: i) The VC dimension of the “line” hypothesis class is 3 in 2D space, i.e., VC(line) = 3 8

(ii) An axis-aligned rectangle shatters 4 points, i.e., VC(rectangle) = 4 (?) Only rectangles covering two points are shown  (iii) VC(triangle) = 7 Assignment (monotonicity property): For any two hypothesis classes 9

2.3 Probably Approximately Correct (PAC) Learning – using the tightest rectangle as the hypothesis, i.e., h = S C: actual class h: induced class (h,C) 10 10

Problem: How many training examples N should have, such that with probability at least , h has error at most ? Mathematically, where the region of difference between C and h. 11

In order that the probability of a positive car falling in (i.e., error) is at most Probability that fall in a strip is upper bounded by Probability that miss (i.e., correct) a strip Probability that N instances miss a strip Probability that N instances miss 4 strips This probability should be 12

(?) 13

2.4 Noise and Model Complexity Noise due to error, imprecision, uncertainty, etc.. Complicated hypotheses are generally necessary to cope with noise. However, simpler hypotheses make more sense because simple to use, easy to check, train, and explain, and good generalization Occam’s razor: simpler explanations are more plausible and any unnecessary complexity should be shaved off. 14

Examples: 1) Classification 2) Regression 15

Multiple Classes, Ci i = 1, ..., K 2.5 Learning Multiple Classes Multiple Classes, Ci i = 1, ..., K Training set: Treat a K-class classification problem as K 2-class problems, i.e., train hypotheses Total error: 16

2.6 Regression Different from classification problems whose outputs are Boolean value (Yes/No), the outputs of regression problems are numeric values. Training set: Find , s.t. Let g be the estimate of f. Total error: For linear model: 17

18

Solve (1) and (2) for 19

20

2.7 Model Selection Example: Learning a Boolean function from examples 21

Each training example removes half the hypotheses, e.g., input , output 0. This removes because their outputs are 1. The above example illustrates that a learning starts with all possible hypotheses and as more examples are seen, those inconsistent hypotheses are removed. Ill-posed problem: training examples are not sufficient to lead to a unique solution 22

Inductive bias: extra assumptions or restrictions about the hypothesis class may be introduced to make learning possible. Model selection: chooses a good inductive bias or determines between hypotheses. Generalization: How well a model trained on the training set predicts the right output for new instances Underfitting: Hypothesis class H is less complex than true class C, e.g., fit a line to data sampled from a 3rd order polynomial 23

Overfitting: Hypothesis class H is more complex than true class C, e.g., fit a 3rd order polynomial to data sampled from a line Triple trade-off: trade-off between 3 factors 1. The size of training set, N, 2. The complexity (or capacity) of H, c(H), 3. Generalization error, E, on new data 24

Data are divided: i) training set, ii) validation set, Cross-validation Data are divided: i) training set, ii) validation set, and iii) publication set. Training set: for inducing a hypothesis Validation set: for testing the generalization ability of the induced hypothesis Publication (test) set: for providing the expected error of the best hypothesis 25

2.8 Dimensions of a Supervised Learning Algorithm Training sample: 1. Model: defines the hypothesis class H 2. Loss function: 3. Optimization procedure: 26

27