PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University
Learning Issues Under what conditions is successful learning … possible ? … assured for a particular learning algorithm ?
Sample Complexity How many training examples are needed … for a learner to converge (with high probability) to a successful hypothesis?
Computational Complexity How much computational effort is needed … for a learner to converge (with high probability) to a successful hypothesis?
The world X is the sample space Example: Two dice {(1,1),(1,2), …,(6,5),(6,6)} x x x x x x x x x x x x x
Weighted world X is a distribution over X Example: Biased dice {(1,1; p 11 ),(1,2 ; p 12 ), …,(6,5 ; p 65 ),(6,6 ; p 66 )} x x x x x x x x x x x x x
An event E is a subset of X Example: Two dice {(1,1),(1,2), …,(6,5),(6,6)} x x x x x x x x x x x x x
An event E is a subset of X Example: A pair in Two dice {(1,1),(2,2),(3,3),(4,4),(5,5),(6,6)} x x x x x x x x x x x x x
A Concept C is an indicator function of an event E Example: A pair in Two dice c(x,y) := (x==y) x x x x x x x x x x x x x
A hypotesis h is an approximation to a concept c Example: A separating hyperplane h(x,y) := (0.5).[1+sign(a.x+by+c)] x x x x x x x x x x x x x
The dataset D is an i.i.d. sample from (X, ) { } i=1, …,m m examples
An Inductive learner L is an algorithm that uses data D to produce h H Example: The Perceptron Algorithm h(x,y) := (0.5).[1+sign(a(D).x+b(D).y+c(D))] x x x x x x x x x x x x x
Error Measures Training error of hypothesis h How often over training instances True error of hypothesis h How often over future random instances
True error
Learnability How to describe Learn-ability ? the number of training examples needed to learn a hypothesis for which = 0. Infeasible
PAC Learnability Weaken demands on the learner true error accuracy failure probability and can be arbitrarily small Probably Approximately Correct Learning
PAC Learnability C is PAC-learnable by L true error < with probability (1- ) after reasonable # of examples reasonable time per example Reasonable polynomial in terms of 1/ , 1/ , n(size of examples) and target concept encoding length
PAC Learnability
C is PAC-Learnable each target concept in C can be learned from a polynominal number of training examples the processing time per example is also polynominal bounded polynomial in terms of 1/ , 1/ , n (size of examples) and target c encoding length