Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
Propositional Logic Russell and Norvig: Chapter 6 Chapter 7, Sections 7.1—7.4 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2003/home.htm.
Ch. 19 – Knowledge in Learning Supplemental slides for CSE 327 Prof. Jeff Heflin.
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Formal Logic Proof Methods Direct Proof / Natural Deduction Conditional Proof (Implication Introduction) Reductio ad Absurdum Resolution Refutation.
Example set X Can Inductive Learning Work? Hypothesis space H Training set  Inductive hypothesis h size.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
1 Machine Learning: Symbol-based 10a 10.0Introduction 10.1A Framework for Symbol-based Learning 10.2Version Space Search 10.3The ID3 Decision Tree Induction.
18 LEARNING FROM OBSERVATIONS
Learning R&N: ch 19, ch 20. Types of Learning Supervised Learning - classification, prediction Unsupervised Learning – clustering, segmentation, pattern.
1 Machine Learning Chapter , 19.1, skim CMSC 471 Adapted from slides by Tim Finin and Marie desJardins. Some material adopted from notes.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Evaluating Hypotheses
1 Chapter 19 Knowledge in Learning Version spaces examples Additional sources used in preparing the slides: Jean-Claude Latombe’s CS121 slides: robotics.stanford.edu/~latombe/cs121.
Machine Learning: Symbol-based
Inductive Learning (1/2) Decision Tree Method (If it’s not simple, it’s not worth learning it) R&N: Chap. 18, Sect. 18.1–3.
1 Theory of Inductive Learning zSuppose our examples are drawn with a probability distribution Pr(x), and that we learned a hypothesis f to describe a.
Machine Learning: Symbol-Based
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
1 Machine Learning: Lecture 5 Experimental Evaluation of Learning Algorithms (Based on Chapter 5 of Mitchell T.., Machine Learning, 1997)
Inductive Learning (1/2) Decision Tree Method
Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
Chapter 3 Sec 3.3 With Question/Answer Animations 1.
I NTRODUCTION TO M ACHINE L EARNING. L EARNING Agent has made observations (data) Now must make sense of it (hypotheses) Hypotheses alone may be important.
CS-424 Gregory Dudek Lecture 14 Learning –Probably approximately correct learning (cont’d) –Version spaces –Decision trees.
Learning from Observations Chapter 18 Through
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Chapter 2: Concept Learning and the General-to-Specific Ordering.
Decision Tree Learning R&N: Chap. 18, Sect. 18.1–3.
CS B351: D ECISION T REES. A GENDA Decision trees Learning curves Combatting overfitting.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
Learning, page 1 CSI 4106, Winter 2005 Symbolic learning Points Definitions Representation in logic What is an arch? Version spaces Candidate elimination.
1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
Machine Learning Concept Learning General-to Specific Ordering
Introduction to Machine Learning
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
More Symbolic Learning CPSC 386 Artificial Intelligence Ellen Walker Hiram College.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Chapter 18 Section 1 – 3 Learning from Observations.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Automated Reasoning in Propositional Logic Russell and Norvig: Chapters 6 and 9 Chapter 7, Sections 7.5—7.6 CS121 – Winter 2003.
1 CMSC 671 Fall 2003 Class #24/25 –Wednesday, November 19 / Monday, November 24.
Chapter 2 Concept Learning
Learning from Observations
Class #22/23 –Tuesday, November 16 / Thursday, November 18
CS 9633 Machine Learning Concept Learning
Ordering of Hypothesis Space
Automated Reasoning in Propositional Logic
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Machine Learning Chapter 2
Inductive Learning (2/2) Version Space and PAC Learning
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19, Sections 19.1 through 19.3 CS121 – Winter 2003

Contents Introduction to inductive learning Logic-based inductive learning: Decision tree method Version space method Function-based inductive learning Neural nets + PAC learning

Example set X {[A, B, …, CONCEPT]} Inductive Learning Scheme Hypothesis space H {[CONCEPT(x)  S(A,B, …)]} Training set  Inductive hypothesis h

Predicate-Learning Methods Decision tree Version space Explicit representation of hypothesis space H Need to provide H with some “structure”

Version Space Method 1.V  H 2.For every example x in training set  do a. Eliminate from V every hypothesis that does not agree with x b. If V is empty then return failure 3.Return V Compared to the decision tree method, this algorithm is: incremental least-commitment But the size of V is enormous!!! V is the version space Idea: Define a partial ordering on the hypotheses in H and only represent the upper and lower bounds of V for this ordering

Rewarded Card Example (r=1) v … v (r=10) v (r=J) v (r=Q) v (r=K)  ANY-RANK(r) (r=1) v … v (r=10)  NUM(r) (r=J) v (r=Q) v (r=K)  FACE(r) (s=  ) v (s=  ) v (s=  ) v (s= )  ANY-SUIT(s) (s=  ) v (s=  )  BLACK(s) (s=  ) v (s= )  RED(s) An hypothesis is any sentence of the form: R(r)  S(s)  REWARD([r,s]) where: R(r) is ANY-RANK(r), NUM(r), FACE(r), or (r=j) S(s) is ANY-SUIT(s), BLACK(s), RED(s), or (s=k)

Simplified Representation For simplicity, we represent a concept by rs, with: r = a, n, f, 1, …, 10, j, q, k s = a, b, r, , , , For example: n  represents: NUM(r)  (s=  )  REWARD([r,s]) aa represents: ANY-RANK(r)  ANY-SUIT(s)  REWARD([r,s])

Extension of an Hypothesis The extension of an hypothesis h is the set of objects that verifies h Examples: The extension of f  is: {j , q , k  } The extension of aa is the set of all cards

More General/Specific Relation Let h 1 and h 2 be two hypotheses in H h 1 is more general than h 2 iff the extension of h 1 is a proper superset of h 2 ’s Examples: aa is more general than f  f is more general than q fr and nr are not comparable

More General/Specific Relation Let h 1 and h 2 be two hypotheses in H h 1 is more general than h 2 iff the extension of h 1 is a proper superset of h 2 ’s The inverse of the “more general” relation is the “more specific” relation The “more general” relation defines a partial ordering on the hypotheses in H

Example: Subset of Partial Order aa naab nb nn 44 4b aa 4a

Construction of Ordering Relation 110 n a f jk ……   b a r 

G-Boundary / S-Boundary of V An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V

G-Boundary / S-Boundary of V An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V An hypothesis in V is most specific iff no hypothesis in V is more general S-boundary S of V: Set of most specific hypotheses in V

aa naab nb nn 44 4b aa 4a Example: G-/S-Boundaries of V aa 44 11 k …… Now suppose that 4  is given as a positive example S G We replace every hypothesis in S whose extension does not contain 4  by its generalization set

Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Here, both G and S have size 1. This is not the case in general!

Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Let 7  be the next (positive) example Generalization set of 4  The generalization set of an hypothesis h is the set of the hypotheses that are immediately more general than h

Example: G-/S-Boundaries of V aa naab nb nn 44 4b aa 4a Let 7  be the next (positive) example

Example: G-/S-Boundaries of V aa naab nb nn aa Let 5 be the next (negative) example Specialization set of aa

Example: G-/S-Boundaries of V ab nb nn aa G and S, and all hypotheses in between form exactly the version space 1. If an hypothesis between G and S disagreed with an example x, then an hypothesis G or S would also disagree with x, hence would have been removed

Example: G-/S-Boundaries of V ab nb nn aa G and S, and all hypotheses in between form exactly the version space 2. If there were an hypothesis not in this set which agreed with all examples, then it would have to be either no more specific than any member of G – but then it would be in G – or no more general than some member of S – but then it would be in S

Example: G-/S-Boundaries of V ab nb nn aa Do 8 , 6 , j  satisfy CONCEPT? Yes No Maybe At this stage …

Example: G-/S-Boundaries of V ab nb nn aa Let 2  be the next (positive) example

Example: G-/S-Boundaries of V ab nb Let j  be the next (negative) example

Example: G-/S-Boundaries of V nb + 4  7  2  – 5 j  NUM(r)  BLACK(s)  REWARD([r,s])

Example: G-/S-Boundaries of V ab nb nn aa … and let 8  be the next (negative) example Let us return to the version space … The only most specific hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples

Example: G-/S-Boundaries of V ab nb nn aa … and let j be the next (positive) example Let us return to the version space … The only most general hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples

Version Space Update 1.x  new example 2.If x is positive then (G,S)  POSITIVE-UPDATE(G,S,x) 3.Else (G,S)  NEGATIVE-UPDATE(G,S,x) 4.If G or S is empty then return failure

POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x

POSITIVE-UPDATE(G,S,x) 2.Minimally generalize all hypotheses in S until they are consistent with x Using the generalization sets of the hypotheses

POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x 2.Minimally generalize all hypotheses in S until they are consistent with x 3.Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G This step was not needed in the card example

POSITIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in G that do not agree with x 2.Minimally generalize all hypotheses in S until they are consistent with x 3.Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G 4.Remove from S every hypothesis that is more general than another hypothesis in S 5.Return (G,S)

NEGATIVE-UPDATE(G,S,x) 1.Eliminate all hypotheses in S that do not agree with x 2.Minimally specialize all hypotheses in G until they are consistent with x 3.Remove from G every hypothesis that is neither more general than nor equal to a hypothesis in S 4.Remove from G every hypothesis that is more specific than another hypothesis in G 5.Return (G,S)

Example-Selection Strategy Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps

aa naab nb nn aa Example 9  ? j ? j  ?

Example-Selection Strategy Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps But picking the object that eliminates half the version space may be expensive

Noise If some examples are misclassified the version space may collapse Possible solution: Maintain several G- and S-boundaries, e.g., consistent with all examples, all examples but one, etc… (Exercise: Develop this idea!)

Current-Best-Hypothesis Search Keep one hypothesis at each step Generalize or specialize the hypothesis at each new example Details left as an exercise…

VSL vs DTL Decision tree learning (DTL) is more efficient if all examples are given in advance; else, it may produce successive hypotheses, each poorly related to the previous one Version space learning (VSL) is incremental DTL can produce simplified hypotheses that do not agree with all examples DTL has been more widely used in practice

Example set X Can Inductive Learning Work? Hypothesis space H Training set  Inductive hypothesis h size m size |H| f : correct hypothesis p( x ): probability that example x is picked from X

Approximately Correct Hypothesis h  H is approximately correct (AC) with accuracy  iff: Pr[ h ( x )  f ( x )]   where x is an example picked with probability distribution p from X

PAC Learning Procedure L is Provably Approximately Correct (PAC) with confidence  iff: Pr[ Pr[ h ( x )  f ( x )] >  ]   Can L be PAC? If yes, how big should the size m of the training set  be?

Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[ g ( x )  f ( x )] >  So, the probability that g is consistent with all the examples in  is at most (1-  ) m … … and he probability that there exists a non-AC hypothesis matching all the examples in  is at most |H|(1-  ) m

Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[ g ( x )  f ( x )] >  So, the probability that g is consistent with all the examples in  is at most (1-  ) m … … and he probability that there exists a non-AC hypothesis matching all the examples in  is at most |H|(1-  ) m Therefore, L is PAC if the size m of the training set verifies: |H|(1-  ) m  

Size of Training Set From |H|(1-  ) m   we derive: m  ln(  /|H|) / ln(1-  ) Since  < -ln(1-  ) for 0<  <1, we have: m  ln(  /|H|) / (-  ) m  ln(|H|/  ) /  So, m increases logarithmically with the size of the hypothesis space But how big is |H|?

If H is the set of all logical sentences with n base predicates, then |H| =, and m is exponential in n If H is the set of all conjunctions of k << n base predicates picked among n predicates, then |H| = O(n k ) and m is logarithmic in n  Importance of choosing a “good” KIS bias Importance of KIS Bias 2 2n2n

Inductive learning Find h  such that KB and h are consistent KB, h  Explanation-Based Learning KB: Background knowledge  : Observed knowledge such that KB  Explanation-based learning Find h  such that KB = KB1,KB2 KB1 h KB2, h  Example: Derivatives of functions KB1 is the general theory  consists of examples h defines the derivatives of usual functions KB2 gives simplification rules Nothing really new is learnt!

Summary Version space method Structure of hypothesis space Generalization/specialization of hypothesis PAC learning Explanation-based learning