Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,

Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19, Sections 19.1 through 19.3 CS121 – Winter 2003

Contents Introduction to inductive learning Logic-based inductive learning: Decision tree method Version space method Function-based inductive learning Neural nets + PAC learning

+ + + + + + + + + + + + - - - - - - - - - - - - Example set X {[A, B, …, CONCEPT]} Inductive Learning Scheme Hypothesis space H {[CONCEPT(x)  S(A,B, …)]} Training set  Inductive hypothesis h

Predicate-Learning Methods Decision tree Version space Explicit representation of hypothesis space H Need to provide H with some “structure”

Version Space Method 1.V  H 2.For every example x in training set  do a. Eliminate from V every hypothesis that does not agree with x b. If V is empty then return failure 3.Return V Compared to the decision tree method, this algorithm is: incremental least-commitment But the size of V is enormous!!! V is the version space Idea: Define a partial ordering on the hypotheses in H and only represent the upper and lower bounds of V for this ordering

Rewarded Card Example (r=1) v … v (r=10) v (r=J) v (r=Q) v (r=K)  ANY-RANK(r) (r=1) v … v (r=10)  NUM(r) (r=J) v (r=Q) v (r=K)  FACE(r) (s=  ) v (s=  ) v (s=  ) v (s= )  ANY-SUIT(s) (s=  ) v (s=  )  BLACK(s) (s=  ) v (s= )  RED(s) An hypothesis is any sentence of the form: R(r)  S(s)  REWARD([r,s]) where: R(r) is ANY-RANK(r), NUM(r), FACE(r), or (r=j) S(s) is ANY-SUIT(s), BLACK(s), RED(s), or (s=k)

Simplified Representation For simplicity, we represent a concept by rs, with: r = a, n, f, 1, …, 10, j, q, k s = a, b, r, , , , For example: n  represents: NUM(r)  (s=  )  REWARD([r,s]) aa represents: ANY-RANK(r)  ANY-SUIT(s)  REWARD([r,s])

Extension of an Hypothesis The extension of an hypothesis h is the set of objects that verifies h Examples: The extension of f  is: {j , q , k  } The extension of aa is the set of all cards

More General/Specific Relation Let h 1 and h 2 be two hypotheses in H h 1 is more general than h 2 iff the extension of h 1 is a proper superset of h 2 ’s Examples: aa is more general than f  f is more general than q fr and nr are not comparable

More General/Specific Relation Let h 1 and h 2 be two hypotheses in H h 1 is more general than h 2 iff the extension of h 1 is a proper superset of h 2 ’s The inverse of the “more general” relation is the “more specific” relation The “more general” relation defines a partial ordering on the hypotheses in H

Example: Subset of Partial Order aa naab nb nn 44 4b aa 4a

Construction of Ordering Relation 110 n a f jk ……   b a r 

G-Boundary / S-Boundary of V An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V

G-Boundary / S-Boundary of V An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V An hypothesis in V is most specific iff no hypothesis in V is more general S-boundary S of V: Set of most specific hypotheses in V

aa naab nb nn 44 4b aa 4a Example: G-/S-Boundaries of V aa 44 11 k …… Now suppose that 4  is given as a positive example S G We replace every hypothesis in S whose extension does not contain 4  by its generalization set

Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Here, both G and S have size 1. This is not the case in general!

Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Let 7  be the next (positive) example Generalization set of 4  The generalization set of an hypothesis h is the set of the hypotheses that are immediately more general than h

Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Let 7  be the next (positive) example

Example: G-/S-Boundaries of V aa naab nb nn aa Let 5 be the next (negative) example Specialization set of aa

Example: G-/S-Boundaries of V ab nb nn aa G and S, and all hypotheses in between form exactly the version space 1. If an hypothesis between G and S disagreed with an example x, then an hypothesis G or S would also disagree with x, hence would have been removed

Example: G-/S-Boundaries of V ab nb nn aa G and S, and all hypotheses in between form exactly the version space 2. If there were an hypothesis not in this set which agreed with all examples, then it would have to be either no more specific than any member of G – but then it would be in G – or no more general than some member of S – but then it would be in S

Example: G-/S-Boundaries of V ab nb nn aa Do 8 , 6 , j  satisfy CONCEPT? Yes No Maybe At this stage …

Example: G-/S-Boundaries of V ab nb nn aa Let 2  be the next (positive) example

Example: G-/S-Boundaries of V ab nb Let j  be the next (negative) example

Example: G-/S-Boundaries of V nb + 4  7  2  – 5 j  NUM(r)  BLACK(s)  REWARD([r,s])

Example: G-/S-Boundaries of V ab nb nn aa … and let 8  be the next (negative) example Let us return to the version space … The only most specific hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples

Example: G-/S-Boundaries of V ab nb nn aa … and let j be the next (positive) example Let us return to the version space … The only most general hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples

Version Space Update 1.x  new example 2.If x is positive then (G,S)  POSITIVE-UPDATE(G,S,x) 3.Else (G,S)  NEGATIVE-UPDATE(G,S,x) 4.If G or S is empty then return failure

POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x

POSITIVE-UPDATE(G,S,x) 2.Minimally generalize all hypotheses in S until they are consistent with x Using the generalization sets of the hypotheses

POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x 2.Minimally generalize all hypotheses in S until they are consistent with x 3.Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G This step was not needed in the card example

POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x 2.Minimally generalize all hypotheses in S until they are consistent with x 3.Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G 4.Remove from S every hypothesis that is more general than another hypothesis in S 5.Return (G,S)

NEGATIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in S that do not agree with x 2.Minimally specialize all hypotheses in G until they are consistent with x 3.Remove from G every hypothesis that is neither more general than nor equal to a hypothesis in S 4.Remove from G every hypothesis that is more specific than another hypothesis in G 5.Return (G,S)

Example-Selection Strategy Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps

aa naab nb nn aa Example 9  ? j ? j  ?

Example-Selection Strategy Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps But picking the object that eliminates half the version space may be expensive

Noise If some examples are misclassified the version space may collapse Possible solution: Maintain several G- and S-boundaries, e.g., consistent with all examples, all examples but one, etc… (Exercise: Develop this idea!)

Current-Best-Hypothesis Search Keep one hypothesis at each step Generalize or specialize the hypothesis at each new example Details left as an exercise…

VSL vs DTL Decision tree learning (DTL) is more efficient if all examples are given in advance; else, it may produce successive hypotheses, each poorly related to the previous one Version space learning (VSL) is incremental DTL can produce simplified hypotheses that do not agree with all examples DTL has been more widely used in practice

+ + + + + + + + + + + + - - - - - - - - - - - - Example set X Can Inductive Learning Work? Hypothesis space H Training set  Inductive hypothesis h size m size |H| f : correct hypothesis p( x ): probability that example x is picked from X

Approximately Correct Hypothesis h  H is approximately correct (AC) with accuracy  iff: Pr[ h ( x )  f ( x )]   where x is an example picked with probability distribution p from X

PAC Learning Procedure L is Provably Approximately Correct (PAC) with confidence  iff: Pr[ Pr[ h ( x )  f ( x )] >  ]   Can L be PAC? If yes, how big should the size m of the training set  be?

Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[ g ( x )  f ( x )] >  So, the probability that g is consistent with all the examples in  is at most (1-  ) m … … and he probability that there exists a non-AC hypothesis matching all the examples in  is at most |H|(1-  ) m

Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[ g ( x )  f ( x )] >  So, the probability that g is consistent with all the examples in  is at most (1-  ) m … … and he probability that there exists a non-AC hypothesis matching all the examples in  is at most |H|(1-  ) m Therefore, L is PAC if the size m of the training set verifies: |H|(1-  ) m  

Size of Training Set From |H|(1-  ) m   we derive: m  ln(  /|H|) / ln(1-  ) Since  < -ln(1-  ) for 0<  <1, we have: m  ln(  /|H|) / (-  ) m  ln(|H|/  ) /  So, m increases logarithmically with the size of the hypothesis space But how big is |H|?

If H is the set of all logical sentences with n base predicates, then |H| =, and m is exponential in n If H is the set of all conjunctions of k << n base predicates picked among n predicates, then |H| = O(n k ) and m is logarithmic in n  Importance of choosing a “good” KIS bias Importance of KIS Bias 2 2n2n

Inductive learning Find h  such that KB and h are consistent KB, h  Explanation-Based Learning KB: Background knowledge  : Observed knowledge such that KB  Explanation-based learning Find h  such that KB = KB1,KB2 KB1 h KB2, h  Example: Derivatives of functions KB1 is the general theory  consists of examples h defines the derivatives of usual functions KB2 gives simplification rules Nothing really new is learnt!

Summary Version space method Structure of hypothesis space Generalization/specialization of hypothesis PAC learning Explanation-based learning

Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,

Similar presentations

Presentation on theme: "Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,

Similar presentations

Presentation on theme: "Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,"— Presentation transcript:

Similar presentations

About project

Feedback