Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inductive Learning (2/2) Version Space and PAC Learning

Similar presentations


Presentation on theme: "Inductive Learning (2/2) Version Space and PAC Learning"— Presentation transcript:

1 Inductive Learning (2/2) Version Space and PAC Learning
Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19, Sections 19.1 through 19.3 CS121 – Winter 2003 Version Space and PAC Learning

2 Version Space and PAC Learning
Contents Introduction to inductive learning Logic-based inductive learning: Decision tree method Version space method Function-based inductive learning Neural nets + PAC learning Version Space and PAC Learning

3 Inductive Learning Scheme
Inductive hypothesis h Training set D + - Example set X {[A, B, …, CONCEPT]} Hypothesis space H {[CONCEPT(x)  S(A,B, …)]} Version Space and PAC Learning

4 Predicate-Learning Methods
Decision tree Version space Need to provide H with some “structure” Explicit representation of hypothesis space H Version Space and PAC Learning

5 Version Space Method V is the version space V H
For every example x in training set D do Eliminate from V every hypothesis that does not agree with x If V is empty then return failure Return V But the size of V is enormous!!! Idea: Define a partial ordering on the hypotheses in H and only represent the upper and lower bounds of V for this ordering Compared to the decision tree method, this algorithm is: incremental least-commitment Version Space and PAC Learning

6 Version Space and PAC Learning
Rewarded Card Example (r=1) v … v (r=10) v (r=J) v (r=Q) v (r=K)  ANY-RANK(r) (r=1) v … v (r=10)  NUM(r) (r=J) v (r=Q) v (r=K)  FACE(r) (s=) v (s=) v (s=) v (s=)  ANY-SUIT(s) (s=) v (s=)  BLACK(s) (s=) v (s=)  RED(s) An hypothesis is any sentence of the form: R(r)  S(s)  REWARD([r,s]) where: R(r) is ANY-RANK(r), NUM(r), FACE(r), or (r=j) S(s) is ANY-SUIT(s), BLACK(s), RED(s), or (s=k) Version Space and PAC Learning

7 Simplified Representation
For simplicity, we represent a concept by rs, with: r = a, n, f, 1, …, 10, j, q, k s = a, b, r, , , ,  For example: n represents: NUM(r)  (s=)  REWARD([r,s]) aa represents: ANY-RANK(r)  ANY-SUIT(s)  REWARD([r,s]) Version Space and PAC Learning

8 Extension of an Hypothesis
The extension of an hypothesis h is the set of objects that verifies h Examples: The extension of f is: {j, q, k} The extension of aa is the set of all cards Version Space and PAC Learning

9 More General/Specific Relation
Let h1 and h2 be two hypotheses in H h1 is more general than h2 iff the extension of h1 is a proper superset of h2’s Examples: aa is more general than f f is more general than q fr and nr are not comparable Version Space and PAC Learning

10 More General/Specific Relation
Let h1 and h2 be two hypotheses in H h1 is more general than h2 iff the extension of h1 is a proper superset of h2’s The inverse of the “more general” relation is the “more specific” relation The “more general” relation defines a partial ordering on the hypotheses in H Version Space and PAC Learning

11 Example: Subset of Partial Order
aa na ab nb n 4 4b a 4a Version Space and PAC Learning

12 Construction of Ordering Relation
1 10 n a f j k b a r Version Space and PAC Learning

13 G-Boundary / S-Boundary of V
An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V Version Space and PAC Learning

14 G-Boundary / S-Boundary of V
An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V An hypothesis in V is most specific iff no hypothesis in V is more general S-boundary S of V: Set of most specific hypotheses in V Version Space and PAC Learning

15 Example: G-/S-Boundaries of V
aa na ab nb n 4 4b a 4a aa We replace every hypothesis in S whose extension does not contain 4 by its generalization set Now suppose that 4 is given as a positive example 4 1 k S Version Space and PAC Learning

16 Example: G-/S-Boundaries of V
aa na ab Here, both G and S have size 1. This is not the case in general! 4a nb a 4b n 4 Version Space and PAC Learning

17 Example: G-/S-Boundaries of V
The generalization set of an hypothesis h is the set of the hypotheses that are immediately more general than h aa na ab 4a nb a Generalization set of 4 4b n Let 7 be the next (positive) example 4 Version Space and PAC Learning

18 Example: G-/S-Boundaries of V
aa na ab 4a nb a 4b n Let 7 be the next (positive) example 4 Version Space and PAC Learning

19 Example: G-/S-Boundaries of V
Specialization set of aa aa na ab nb a n Let 5 be the next (negative) example Version Space and PAC Learning

20 Example: G-/S-Boundaries of V
G and S, and all hypotheses in between form exactly the version space ab nb a 1. If an hypothesis between G and S disagreed with an example x, then an hypothesis G or S would also disagree with x, hence would have been removed n Version Space and PAC Learning

21 Example: G-/S-Boundaries of V
G and S, and all hypotheses in between form exactly the version space ab nb a 2. If there were an hypothesis not in this set which agreed with all examples, then it would have to be either no more specific than any member of G – but then it would be in G – or no more general than some member of S – but then it would be in S n Version Space and PAC Learning

22 Example: G-/S-Boundaries of V
At this stage … ab No Yes nb a Maybe n Do 8, 6, j satisfy CONCEPT? Version Space and PAC Learning

23 Example: G-/S-Boundaries of V
ab nb a n Let 2 be the next (positive) example Version Space and PAC Learning

24 Example: G-/S-Boundaries of V
ab nb Let j be the next (negative) example Version Space and PAC Learning

25 Example: G-/S-Boundaries of V
+ 4 7 2 – 5 j nb NUM(r)  BLACK(s)  REWARD([r,s]) Version Space and PAC Learning

26 Example: G-/S-Boundaries of V
Let us return to the version space … … and let 8 be the next (negative) example ab nb a The only most specific hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples n Version Space and PAC Learning

27 Example: G-/S-Boundaries of V
Let us return to the version space … … and let j be the next (positive) example ab nb a The only most general hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples n Version Space and PAC Learning

28 Version Space and PAC Learning
Version Space Update x  new example If x is positive then (G,S)  POSITIVE-UPDATE(G,S,x) Else (G,S)  NEGATIVE-UPDATE(G,S,x) If G or S is empty then return failure Version Space and PAC Learning

29 POSITIVE-UPDATE(G,S,x)
Eliminate all hypotheses in G that do not agree with x Version Space and PAC Learning

30 POSITIVE-UPDATE(G,S,x)
Eliminate all hypotheses in G that do not agree with x Minimally generalize all hypotheses in S until they are consistent with x Using the generalization sets of the hypotheses Version Space and PAC Learning

31 POSITIVE-UPDATE(G,S,x)
Eliminate all hypotheses in G that do not agree with x Minimally generalize all hypotheses in S until they are consistent with x Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G This step was not needed in the card example Version Space and PAC Learning

32 POSITIVE-UPDATE(G,S,x)
Eliminate all hypotheses in G that do not agree with x Minimally generalize all hypotheses in S until they are consistent with x Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G Remove from S every hypothesis that is more general than another hypothesis in S Return (G,S) Version Space and PAC Learning

33 NEGATIVE-UPDATE(G,S,x)
Eliminate all hypotheses in S that do not agree with x Minimally specialize all hypotheses in G until they are consistent with x Remove from G every hypothesis that is neither more general than nor equal to a hypothesis in S Remove from G every hypothesis that is more specific than another hypothesis in G Return (G,S) Version Space and PAC Learning

34 Example-Selection Strategy
Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps Version Space and PAC Learning

35 Version Space and PAC Learning
Example aa na ab 9? j? j? nb a n Version Space and PAC Learning

36 Example-Selection Strategy
Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps But picking the object that eliminates half the version space may be expensive Version Space and PAC Learning

37 Version Space and PAC Learning
Noise If some examples are misclassified the version space may collapse Possible solution: Maintain several G- and S-boundaries, e.g., consistent with all examples, all examples but one, etc… (Exercise: Develop this idea!) Version Space and PAC Learning

38 Current-Best-Hypothesis Search
Keep one hypothesis at each step Generalize or specialize the hypothesis at each new example Details left as an exercise… Version Space and PAC Learning

39 Version Space and PAC Learning
VSL vs DTL Decision tree learning (DTL) is more efficient if all examples are given in advance; else, it may produce successive hypotheses, each poorly related to the previous one Version space learning (VSL) is incremental DTL can produce simplified hypotheses that do not agree with all examples DTL has been more widely used in practice Version Space and PAC Learning

40 Can Inductive Learning Work?
Inductive hypothesis h size m Training set D + - Example set X Hypothesis space H f: correct hypothesis p(x): probability that example x is picked from X size |H| Version Space and PAC Learning

41 Approximately Correct Hypothesis
h  H is approximately correct (AC) with accuracy e iff: Pr[h(x)  f(x)]  e where x is an example picked with probability distribution p from X Version Space and PAC Learning

42 PAC Learning Procedure
L is Provably Approximately Correct (PAC) with confidence g iff: Pr[Pr[h(x)  f(x)] > e]  g Can L be PAC? If yes, how big should the size m of the training set D be? Version Space and PAC Learning

43 Version Space and PAC Learning
Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[g(x)  f(x)] > e So, the probability that g is consistent with all the examples in D is at most (1-e)m … … and he probability that there exists a non-AC hypothesis matching all the examples in D is at most |H|(1-e)m Version Space and PAC Learning

44 Version Space and PAC Learning
Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[g(x)  f(x)] > e So, the probability that g is consistent with all the examples in D is at most (1-e)m … … and he probability that there exists a non-AC hypothesis matching all the examples in D is at most |H|(1-e)m Therefore, L is PAC if the size m of the training set verifies: |H|(1-e)m  d Version Space and PAC Learning

45 Version Space and PAC Learning
Size of Training Set From |H|(1-e)m  g we derive: m  ln(g/|H|) / ln(1-e) Since e < -ln(1-e) for 0<e<1, we have: m  ln(g/|H|) / (-e) m  ln(|H|/g) / e So, m increases logarithmically with the size of the hypothesis space But how big is |H|? Version Space and PAC Learning

46 Version Space and PAC Learning
Importance of KIS Bias If H is the set of all logical sentences with n base predicates, then |H| = , and m is exponential in n If H is the set of all conjunctions of k << n base predicates picked among n predicates, then |H| = O(nk) and m is logarithmic in n  Importance of choosing a “good” KIS bias 2 2n Version Space and PAC Learning

47 Explanation-Based Learning
KB: Background knowledge D: Observed knowledge such that KB D Inductive learning Find h such that KB and h are consistent KB,h D Explanation-based learning Find h such that KB = KB1,KB2 KB1 h KB2,h D Example: Derivatives of functions KB1 is the general theory D consists of examples h defines the derivatives of usual functions KB2 gives simplification rules Nothing really new is learnt! Version Space and PAC Learning

48 Version Space and PAC Learning
Summary Version space method Structure of hypothesis space Generalization/specialization of hypothesis PAC learning Explanation-based learning Version Space and PAC Learning


Download ppt "Inductive Learning (2/2) Version Space and PAC Learning"

Similar presentations


Ads by Google