Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Computational Learning Theory.

Similar presentations


Presentation on theme: "Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Computational Learning Theory."— Presentation transcript:

1 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Computational Learning Theory

2 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 2 Content Introduction Probably Learning an Approximately Correct Hypothesis Sample Complexity for Finite Hypothesis Spaces Sample Complexity for the Infinite Hypothesis Space The Mistake Bound Model of Learning Summary

3 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 3 Introduction Goal: Theoretical characterisation of the difficulty of several types of ML problems Capabilities of several types of ML algorithms Answer to the questions: Under what condition is successful learning possible and impossible? Under what condition is a particular ML algorithm assured to learn successfully? PAC: Identify classes of hypotheses that can or cannot be learned given a polynomial number of training examples Define a natural complexity measure for hypothesis space that allows to limit the number of training examples required for inductive learning

4 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 4 Introduction 2 Task Given: Some training example, Space of candidate hypotheses H Goal: Inductive learning of Questions Sample complexity: How many training examples are needed for a learner to converge (with high probability) to a successful hypothesis? Computational complexity: How much computational effort is needed for a learner to converge (with high probability) to a successful hypothesis? Mistake bound: How many training examples will the learner misclassify before converging to a successful hypothesis?

5 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 5 Introduction 3 Possibility to set quantitative bounds on these measures, depending on attributes of the learning problem such as: The size or complexity of the hypothesis space considered by the learner The accuracy to which the target concept must be approximated The probability that the learner will output a successful hypothesis The manner in which training examples are represented to the learner

6 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 6 Content Introduction Probably Learning an Approximately Correct Hypothesis The Problem Setting Error of the Hypothesis PAC Learnability Sample Complexity for Finite Hypothesis Spaces Sample Complexity for the Infinite Hypothesis Space The Mistake Bound Model of Learning Summary

7 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 7 Probably Learning an Approximately Correct Hypothesis PAC (Probably approximately correct) Probably learning a approximately correct solution Restriction: We only consider the case of learning boolean valued concepts from noise free training data Result can be extended to the more general scenario of learning real-valued target functions (Natarajan 1991) Result can be extended learning from certain types of noisy data (Laird 1988, Kearns and Vazirani 1994)

8 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 8 The Problem Setting Names X - set of all possible instances over which target functions may be defined C - set of target concepts that our learner might be called upon to learn D - probability distribution which is generally not known to the learner as stationary: distribution does not change over time T - set of training examples H - space of candidate hypotheses Each target concept c in C corresponds to some subset of X or equivalent to some boolean-valued function Searched: After observing a sequence of training examples of c, L must output some h from H, which estimates c. Evaluation of success of L: Performance of h over new instances drawn randomly from X according to D

9 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 9 Error of the Hypothesis True error: error of h with respect to c observable L can only observe the performance of h over a training example Training error: Fraction of training examples misclassified by h Analysis: how probable is it that the observed training error for h gives a misleading estimate of the true

10 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 10 PAC Learnability Goal: characterise classes of target concepts that can be reliably learned from a reasonable number of randomly drawn training examples and a reasonable amount of computation Possible definition of the success of the training: for search : Problems Multiple hypotheses consistent with the training examples Non-representative training set Definition of PAC-Learning: Consider a concept class called C defined over a set of instances X of length n and a learner L using a hypothesis space H. C is PAC-learnable by L using h if for all, the distribution D over X, an such that, and such that, the learner L will with a probability of at least output a hypothesis such that, in a time that is polynomial in,, n and size(c)

11 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 11 Content Introduction Probably Learning an Approximately Correct Hypothesis Sample Complexity for Finite Hypothesis Spaces Agnostic learning and Inconsistent Hypotheses Conjunctions of Boolean Literals Are PAC-Learnable Sample Complexity for the Infinite Hypothesis Space The Mistake Bound Model of Learning Summary

12 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 12 Sample Complexity for Finite Hypothesis Spaces Definition: Sample complexity of the learning problem is the required number of training examples which are necessary for successful learning Depending on the constraints of the learning problem Consistent Learner: It outputs a hypothesis that perfectly fits the training data whenever possible Question: can a bound be derived for the number of training examples required by any consistent learner, independent of the specific alg. it uses to derive a consistent hypothesis? -> YES Significance of the version space : every consistent learner outputs a hypothesis belonging to the version space Therefore to limit the number of examples needed by any consistent learner we need only to limit the number of examples needed to assure that the version space contains no unacceptable hypotheses

13 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 13 Sample Complexity for Finite Hypothesis Spaces 2 Definition of -exhausted (Haussler 1988): Consider a hypothesis space H, target concept c, instance distribution D and a set of training examples T of c. The version space is said to be -exhausted with respect to c and D, if every hypothesis h in has an error less than with respect to c and D Picture: is 0.3 exhausted but not 0.1-exhausted

14 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 14 Sample Complexity for Finite Hypothesis Spaces 3 Theorem -exhausting the version space (Haussler 1988) If and D is a sequence of independent randomly drawn examples of some c then for any the probability that is not -exhausting (with respect to c) is less than or equal Important information: given the upper limit of the misclassification, using choose Hint 1: m grows linearly in, logarithmically in, and logarithmically in the size of H Hint 2: bound can be substantially overestimated:

15 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 15 Agnostic learning and Inconsistent Hypotheses Problem: Consistent hypotheses are not always possible (H does not contain c) Agnostic learning: choose hypothesis where for example Searched, so that if => with high possibility

16 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 16 Agnostic learning and Inconsistent Hypotheses 2 Analogous: m independent coin flips showing head with some probability (m distinct trials of a Bernoulli experiment) Hoeffding boundary: characterise the deviation between the true probability of some event and its observed frequency over m independent trials => Requirement: The error of must be limited => Interpretation: Given choose: m depends logarithmically on H and on but m now grows as

17 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 17 Conjunction of Boolean Literals are PAC-Learnable Example: C is the class where the target concept is described by a conjunction of boolean literals (a literal is any boolean variable or its negation) Is C PAC-learnable ->YES Any consistent learner will require only a polynomial number of training examples to learn any c in C Suggesting a specific algorithm that uses polynomial time per training example: Assumption H=C from the Theorem of Haussler follows: M grows linearly in the number of literals n, linearly in and logarithmically in

18 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 18 Conjunction of Boolean Literals are PAC-Learnable 2 Example with numbers: 10 boolean variables: Wanted: Safety 95% that the error of the hypothesis => algorithm with polynomial computing time Find-S Algorithm computes for each new positive training example the intersection of the literals shared by the current hypothesis and the new training example using time linear in n

19 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 19 Find-S: Finding a Maximally Specific Hypothesis Use the more_general_than partial ordering: Begin with the most specific possible hypothesis in H Generalise this hypothesis each time it fails to cover an observed positive example 1. Initialise h to the most specific hypothesis in H 2. For each positive training instance x For each attribute constraint in h If the constrain is satisfied by x Then do nothing Else replace in h by the next more general constraint that is satisfied by x 3. Output hypothesis h

20 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 20 Find-S: Finding a Maximally Specific Hypothesis (Example) 1. Step: 2. Step: 1.Example + 1 Step: 3. Step: substituting a '?' in place of any attribute value in h that is not satisfied by new example 3.negative Example: FIND-S algorithm simply ignores every negative example 4.Step:

21 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 21 Content Introduction Probably Learning an Approximately Correct Hypothesis Sample Complexity for Finite Hypothesis Spaces Sample Complexity for the Infinite Hypothesis Space Sample Complexity and the VC Dimension The Vapnik-Chervonenkis Dimension Sample Complexity and the VC Dimension The Mistake Bound Model of Learning Summary

22 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 22 Sample Complexity for Infinite Hypothesis Spaces Disadvantage of the estimation before: Weak boundary In the case of an infinite hypothesis space it cannot be used Def: Shattering a Set of Instances A set of instances S is shattered by a hypothesis space H if and only if for every dichotomy of S there exists some hypothesis in H consistent with this dichotomy here the measuring is not based on the number of distinct hypotheses in |H| but on the number of distinct instances form X that can be completely discriminated using H

23 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 23 Shattering a Set of Instance Follows from the definition: is not shattered by h : from the aspect of all hypotheses

24 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 24 The Vapnik-Chervonenkis Dimension S instance => different dichotomy Definition Vapnik-Chervonenkis Dimension: The Vapnik-Chervonenkis Dimension, VC(H), of hypothesis space H defined over the instance space X is the size of the largest finite subset of X shattered by H. If arbitrarily large finite sets of X can be shattered by H then Example: Let H the set of intervals on real numbers VC(H) =?

25 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 25 The Vapnik-Chervonenkis Dimension 2 Example: Let, H the set of linear decision surface in the x, y plane; VC(H) =3 shattering is obviously general case irregular special case no shattering possibility

26 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 26 Sample Complexity and the VC Dimension Earlier: the number of randomly drawn examples suffice to probably approximately learn any c in C Theorem: Lower bound on sample complexity Consider any concept class C such that, any learner L, and any and, then there exists a distribution D and a target concept in C such that: if L observes fewer examples than then with probability at least, L outputs a hypothesis h having Hint: Both boundaries are logarithmic in and linear in VC(H)

27 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 27 Content Introduction Probably Learning an Approximately Correct Hypothesis Sample Complexity for Finite Hypothesis Spaces Sample Complexity for the Infinite Hypothesis Space The Mistake Bound Model of Learning The Mistake Bound for the FIND-S Algorithm The Mistake Bound for the HALVING Algorithm Optimal Mistake Bounds WEIGHTED-MAJORITY Algorithm Summary

28 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 28 The Mistake Bound Model of Learning Mistake bound model: the learner is evaluated by the total number of mistakes it makes before it converges to the correct hypothesis. Problem Inductive learning It receives a set of training examples but after each x, the learner must predict the target value c(x) before it is shown the correct target value by the trainer Success: exact/PAC-learning How many mistakes will the learner make in its predictions before it learns the target concept. It is significant in practical application when the learning must be done while the system is in actual use Exact learning:

29 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 29 The Mistake Bound for the Find-S Algorithm Assumption:, H: conjunction of up to n boolean literals and their negations Learning without noisy Find-S algorithm: Initialise h as the most specific hypothesis For each positive training instance Remove from h any literal that is not satisfied by x Output hypothesis h Can we prove the total number of mistakes that Find-S will make before exactly learning C ->YES Note: No error on negative instances Step 1: any additional error => maximal n+1 errors (case )

30 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 30 The Mistake Bound for the HALVING Algorithm Every error => maximal Note: reduction of the version space also in the case of correct prediction Extension: WEIGHTED-MAJORITY Algorithm ( weighted vote) Refine the version space = Halving algorithm Maintaining the version space through majority vote decision +

31 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 31 Optimal Mistake Bounds Question: What is the optimal mistake bound for an arbitrary concept class C – the lowest worst case mistake bound in respect to all possible learning algorithms Let H=C for algorithm A: For example: Littlestone (1987)

32 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 32 WEIGHTED-MAJORITY Algorithm Generalisation of the Halving Algorithm Weighted vote among the pool of prediction algorithms Learns by altering the weight associated with each prediction algorithm Advantage: Accommodate inconsistent training data Note: => Halving algorithm Theorem: Relative mistake bound for WEIGHTED-MAJORITY Let T be any sequence of training examples, let A be any set of n prediction algorithms, and let k be the minimum number of mistakes made by any algorithm in A for the training sequence T. Then the number of mistakes over T made by the WEIGHTED- MAJORITY algorithm using is at most

33 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 33 WEIGHTED-MAJORITY Algorithm 2 denotes the prediction algorithm in the pool A of algorithms denotes the weight associated with For each i initialise For each training example Initialise and to 0 For each prediction algorithm If then If then predict If then predict If then predict 0 or 1 at random for c(x) For each prediction algorithm in A do If then

34 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 34 Content Introduction Probably Learning an Approximately Correct Hypothesis Sample Complexity for Finite Hypothesis Spaces Sample Complexity for the Infinite Hypothesis Space The Mistake Bound Model of Learning Summary

35 Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 35 Summary PAC learning versus exact learning Consistent and inconsistent hypothesis, agnostic learning VC-Dimension: complexity of hypothesis space - largest subset of instances that can be shattered Bound on the number of training examples sufficient for successful learning under the PAC model Mistake bound model: Analyse the number of training examples a learner will misclassify before it exactly learns the target concept WEIGHTED-MAJORITY Algorithm: combines the weighted votes of multiple prediction algorithms to classify new instances


Download ppt "Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Computational Learning Theory."

Similar presentations


Ads by Google