Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

CHAPTER 13: Alpaydin: Kernel Machines
VC Dimension – definition and impossibility result
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
INTRODUCTION TO Machine Learning 2nd Edition
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
Data mining in 1D: curve fitting
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Feasibility of learning: the issues solution for infinite hypothesis sets VC generalization bound (mostly lecture 5 on AMLbook.com)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Part I: Classification and Bayesian Learning
Support Vector Machines
Neural Networks Lecture 8: Two simple learning algorithms
Probability theory: (lecture 2 on AMLbook.com)
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Machine Learning CSE 681 CH2 - Supervised Learning.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Support Vector Machines
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 10: Logistic Regression. Binary classification Two classes Y = {0,1} Goal is to learn how to correctly classify the input into one of these two.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall Perceptron Rule and Convergence Proof Capacity.
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Type I and Type II Errors. For type I and type II errors, we must know the null and alternate hypotheses. H 0 : µ = 40 The mean of the population is 40.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.
CH. 2: Supervised Learning
Ordering of Hypothesis Space
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
INTRODUCTION TO Machine Learning
INF 5860 Machine learning for image classification
Supervised Learning Berlin Chen 2005 References:
Linear Discrimination
Supervised machine learning: creating a model
INTRODUCTION TO Machine Learning
INTRODUCTION TO Machine Learning 3rd Edition
Presentation transcript:

Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension

Unsupervised learning: input only – no labels Coins in a vending machine cluster by size and weight How many clusters are here? Would different attributes make clusters more distinct?

Supervised learning: every example has a label Labels have enabled a model based on linear discriminants that will let the vending machine guess coin value without facial recognition.

Reinforcement learning: No one correct output Data: input, graded output Find relationship between input and high-grade outputs

In-sample error, E in How well do boundaries match training data? Out-of-sample error, E out How often will this system fail if implement in the field?

Quality of data mainly determines success of machine learning How many data points? How much uncertainty? We assume each datum is labeled correctly. Uncertainties is in values of attributes

Choosing the right model A good model has small in-sample error and generalizes well. Often a tradeoff between these characteristics is required.

A type of model defines an hypothesis set A particular member of the set is selected by minimizing some in-sample error. Error definition varies with problem but usually are local. (i.e. accumulated from error in each data point) Linear discrimants

9 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) examples of family cars Supervised learning is the focus of this course Example: Dichotomy based on 2 attributes

10 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) Assume that blue rectangle is the true boundary of class C In a real problem, of course, we don’t know this. Assume family car (class C) uniquely defined by a range of price and engine power

Hypothesis class H : axis aligned rectangles 11 In-sample error on h defined by Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) h = yellow rectangle is a particular member of H Count misclassifications

Hypothesis class H : axis aligned rectangles 12 For dataset shown, in-sample error on h is zero, but we expect out-of-sample error to be nonzero Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) h = yellow rectangle is a particular member of H h leaves room for false positives and false negatives

Should we expect the negative examples to cluster? family car

S, G, and the Version Space 14 most specific hypothesis, S, with no E in most general hypothesis, G any h  H, between S and G is consistent (no error) and makes up the version space Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

G S A dichotomizer has been trained by N examples. Results are poor due to limited data. An expert will label any additional attribute vector that I specify. Where should attribute vectors be chosen to make the most effective use of the expert?

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 16 Margin: distance between boundary and closest instance in a specified class S and G hypotheses have narrow margins; not expected to “generalize” well. Even though E in is zero, we expect E out to be large. Why? G S

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17 Choose h in the version space with largest margin to maximize generalization Data points that determine S and G are shaded. They “support” h with largest margins Logic behind “support vector machines” Greatest distance between S and G

Vapnik Chervonenkis Dimension, d VC H is a hypothesis set for a dichotomizer H(X) is set of dichotomies created by application to H to dataset X with N points N points can be labeled + 1 in 2 N ways. Regardless of size of H, |H(X)|bounded by 2 N. H “shatters” N points if there is no way to label the points that is not consistent with some member of H. d VC (H) = k if k is the largest number of points that can be shattered by H. d VC (H) is called the “capacity” of H 18 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Vapnik Chervonenkis Dimension, d VC To prove that d VC = k we get to choose the k points 19 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) To prove that d VC =3 for the 2D linear dichotomizer, better to chose the non- linear black points. Fact that 3 points in line cannot be shattered does not prove d VC < 3.

Every set of 4 points has 2 labeling are not linearly separable. k=4 is the break point for the 2D linear dichotomier. d vc (H)+1 is always a break point. For dD dichotomizer, d vc (H) = d+1. Break points

What is the VC dimension of the hypothesis class defined by the union of all axis-aligned rectangles?

VC dimension is conservative 22 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) VC dimension is based on all possible ways to label examples VC ignores the probability distribution from which dataset was drawn. In real-world, examples with small differences in attributes usually belong to the same class Basis of “similarity” classification methods. family car