PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

PAC Learning 8/5/2005. purpose Effort to understand negative selection algorithm from totally different aspects –Statistics –Machine learning What is.
Evaluating Classifiers
For Wednesday Read chapter 19, sections 1-3 No homework.
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
1 Polynomial Time Probabilistic Learning of a Subclass of Linear Languages with Queries Yasuhiro TAJIMA, Yoshiyuki KOTANI Tokyo Univ. of Agri. & Tech.
Perceptron.
Wrapper Induction for Information Extraction Nicholas KushmerickDaniel S.WeldRobert Doorenbos.
Lecture: Dudu Yanay.  Input: Each instance is associated with a rank or a rating, i.e. an integer from ‘1’ to ‘K’.  Goal: To find a rank-prediction.
Example set X Can Inductive Learning Work? Hypothesis space H Training set  Inductive hypothesis h size.
Machine Learning Week 2 Lecture 2.
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
CII504 Intelligent Engine © 2005 Irfan Subakti Department of Informatics Institute Technology of Sepuluh Nopember Surabaya - Indonesia.
Linear Learning Machines  Simplest case: the decision function is a hyperplane in input space.  The Perceptron Algorithm: Rosenblatt, 1956  An on-line.
Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion.
Computational Learning Theory
Probably Approximately Correct Model (PAC)
Vapnik-Chervonenkis Dimension
Computational Learning Theory; The Tradeoff between Computational Complexity and Statistical Soundness Shai Ben-David CS Department, Cornell and Technion,
Evaluating Hypotheses
Vapnik-Chervonenkis Dimension Definition and Lower bound Adapted from Yishai Mansour.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
CS 4700: Foundations of Artificial Intelligence
Computational Learning Theory
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
Part I: Classification and Bayesian Learning
PAC learning Invented by L.Valiant in 1984 L.G.ValiantA theory of the learnable, Communications of the ACM, 1984, vol 27, 11, pp
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Machine Learning Algorithms in Computational Learning Theory
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
For Friday Read chapter 18, section 7 Program 3 due.
For Wednesday Read chapter 18, section 7 Homework: –Chapter 18, exercise 11.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
Machine Learning, Decision Trees, Overfitting Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14,
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
For Wednesday No reading Homework: –Chapter 18, exercise 25 (a & b)
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Machine Learning Chapter 5. Evaluating Hypotheses
1 CSI5388 Current Approaches to Evaluation (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
For Monday Read Chapter 18, sections Homework: –Chapter 18, exercise 6.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Carla P. Gomes CS4700 Computational Learning Theory Slides by Carla P. Gomes and Nathalie Japkowicz (Reading: R&N AIMA 3 rd ed., Chapter 18.5)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
CS-424 Gregory Dudek Lecture 14 Learning –Inductive inference –Probably approximately correct learning.
Ch 2. The Probably Approximately Correct Model and the VC Theorem 2.3 The Computational Nature of Language Learning and Evolution, Partha Niyogi, 2004.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Evaluating Hypotheses
Computational Learning Theory
Computational Learning Theory
Computational Learning Theory
Computational Learning Theory
Evaluating Hypotheses
Computational Learning Theory
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
CSCI B609: “Foundations of Data Science”
Machine Learning: UNIT-3 CHAPTER-2
Evaluating Hypothesis
Lecture 14 Learning Inductive inference
Presentation transcript:

PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University

Learning Issues Under what conditions is successful learning … possible ? … assured for a particular learning algorithm ?

Sample Complexity How many training examples are needed … for a learner to converge (with high probability) to a successful hypothesis?

Computational Complexity How much computational effort is needed … for a learner to converge (with high probability) to a successful hypothesis?

The world X is the sample space Example: Two dice {(1,1),(1,2), …,(6,5),(6,6)} x x x x x x x x x x x x x

Weighted world X is a distribution over X Example: Biased dice {(1,1; p 11 ),(1,2 ; p 12 ), …,(6,5 ; p 65 ),(6,6 ; p 66 )} x x x x x x x x x x x x x

An event E is a subset of X Example: Two dice {(1,1),(1,2), …,(6,5),(6,6)} x x x x x x x x x x x x x

An event E is a subset of X Example: A pair in Two dice {(1,1),(2,2),(3,3),(4,4),(5,5),(6,6)} x x x x x x x x x x x x x

A Concept C is an indicator function of an event E Example: A pair in Two dice c(x,y) := (x==y) x x x x x x x x x x x x x

A hypotesis h is an approximation to a concept c Example: A separating hyperplane h(x,y) := (0.5).[1+sign(a.x+by+c)] x x x x x x x x x x x x x

The dataset D is an i.i.d. sample from (X, ) { } i=1, …,m m examples

An Inductive learner L is an algorithm that uses data D to produce h  H Example: The Perceptron Algorithm h(x,y) := (0.5).[1+sign(a(D).x+b(D).y+c(D))] x x x x x x x x x x x x x

Error Measures Training error of hypothesis h How often over training instances True error of hypothesis h How often over future random instances

True error

Learnability How to describe Learn-ability ? the number of training examples needed to learn a hypothesis for which = 0. Infeasible

PAC Learnability Weaken demands on the learner  true error accuracy  failure probability  and  can be arbitrarily small Probably Approximately Correct Learning

PAC Learnability C is PAC-learnable by L true error <  with probability  (1-  ) after reasonable # of examples reasonable time per example Reasonable polynomial in terms of 1/ , 1/ , n(size of examples) and target concept encoding length

PAC Learnability

C is PAC-Learnable each target concept in C can be learned from a polynominal number of training examples the processing time per example is also polynominal bounded polynomial in terms of 1/ , 1/ , n (size of examples) and target c encoding length