Inductive Learning (2/2) Version Space and PAC Learning

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

BOOSTING & ADABOOST Lecturer: Yishay Mansour Itay Dangoor.
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
Formal Logic Proof Methods Direct Proof / Natural Deduction Conditional Proof (Implication Introduction) Reductio ad Absurdum Resolution Refutation.
Example set X Can Inductive Learning Work? Hypothesis space H Training set  Inductive hypothesis h size.
1 Machine Learning: Symbol-based 10a 10.0Introduction 10.1A Framework for Symbol-based Learning 10.2Version Space Search 10.3The ID3 Decision Tree Induction.
Learning R&N: ch 19, ch 20. Types of Learning Supervised Learning - classification, prediction Unsupervised Learning – clustering, segmentation, pattern.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Evaluating Hypotheses
1 Chapter 19 Knowledge in Learning Version spaces examples Additional sources used in preparing the slides: Jean-Claude Latombe’s CS121 slides: robotics.stanford.edu/~latombe/cs121.
Inductive Learning (1/2) Decision Tree Method (If it’s not simple, it’s not worth learning it) R&N: Chap. 18, Sect. 18.1–3.
Machine Learning: Symbol-Based
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Inductive Learning (1/2) Decision Tree Method
Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
CHAPTER 18 SECTION 1 – 3 Learning from Observations.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Decision Tree Learning R&N: Chap. 18, Sect. 18.1–3.
CS B351: D ECISION T REES. A GENDA Decision trees Learning curves Combatting overfitting.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
Machine Learning Concept Learning General-to Specific Ordering
Introduction to Machine Learning
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Chapter 18 Section 1 – 3 Learning from Observations.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Automated Reasoning in Propositional Logic Russell and Norvig: Chapters 6 and 9 Chapter 7, Sections 7.5—7.6 CS121 – Winter 2003.
CS623: Introduction to Computing with Neural Nets (lecture-18) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
More Rough Sets.
Chapter 2 Concept Learning
Learning from Observations
Learning from Observations
Class #22/23 –Tuesday, November 16 / Thursday, November 18
Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.
Equivalence Relations
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Introduce to machine learning
CS 9633 Machine Learning Concept Learning
CS 9633 Machine Learning Inductive-Analytical Methods
Analytical Learning Discussion (4 of 4):
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Ordering of Hypothesis Space
Chapter 5. Optimal Matchings
NP-Completeness Yin Tat Lee
Version Spaces Learning
Ch. 19 – Knowledge in Learning
Learning from Observations
Automated Reasoning in Propositional Logic
Artificial Intelligence
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Lecture 14 Learning Inductive inference
Machine Learning Chapter 2
Learning from Observations
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Mathematical Induction II
Machine Learning Chapter 2
Complexity Theory: Foundations
Presentation transcript:

Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19, Sections 19.1 through 19.3 CS121 – Winter 2003 Version Space and PAC Learning

Version Space and PAC Learning Contents Introduction to inductive learning Logic-based inductive learning: Decision tree method Version space method Function-based inductive learning Neural nets + PAC learning Version Space and PAC Learning

Inductive Learning Scheme Inductive hypothesis h Training set D + - Example set X {[A, B, …, CONCEPT]} Hypothesis space H {[CONCEPT(x)  S(A,B, …)]} Version Space and PAC Learning

Predicate-Learning Methods Decision tree Version space Need to provide H with some “structure” Explicit representation of hypothesis space H Version Space and PAC Learning

Version Space Method V is the version space V H For every example x in training set D do Eliminate from V every hypothesis that does not agree with x If V is empty then return failure Return V But the size of V is enormous!!! Idea: Define a partial ordering on the hypotheses in H and only represent the upper and lower bounds of V for this ordering Compared to the decision tree method, this algorithm is: incremental least-commitment Version Space and PAC Learning

Version Space and PAC Learning Rewarded Card Example (r=1) v … v (r=10) v (r=J) v (r=Q) v (r=K)  ANY-RANK(r) (r=1) v … v (r=10)  NUM(r) (r=J) v (r=Q) v (r=K)  FACE(r) (s=) v (s=) v (s=) v (s=)  ANY-SUIT(s) (s=) v (s=)  BLACK(s) (s=) v (s=)  RED(s) An hypothesis is any sentence of the form: R(r)  S(s)  REWARD([r,s]) where: R(r) is ANY-RANK(r), NUM(r), FACE(r), or (r=j) S(s) is ANY-SUIT(s), BLACK(s), RED(s), or (s=k) Version Space and PAC Learning

Simplified Representation For simplicity, we represent a concept by rs, with: r = a, n, f, 1, …, 10, j, q, k s = a, b, r, , , ,  For example: n represents: NUM(r)  (s=)  REWARD([r,s]) aa represents: ANY-RANK(r)  ANY-SUIT(s)  REWARD([r,s]) Version Space and PAC Learning

Extension of an Hypothesis The extension of an hypothesis h is the set of objects that verifies h Examples: The extension of f is: {j, q, k} The extension of aa is the set of all cards Version Space and PAC Learning

More General/Specific Relation Let h1 and h2 be two hypotheses in H h1 is more general than h2 iff the extension of h1 is a proper superset of h2’s Examples: aa is more general than f f is more general than q fr and nr are not comparable Version Space and PAC Learning

More General/Specific Relation Let h1 and h2 be two hypotheses in H h1 is more general than h2 iff the extension of h1 is a proper superset of h2’s The inverse of the “more general” relation is the “more specific” relation The “more general” relation defines a partial ordering on the hypotheses in H Version Space and PAC Learning

Example: Subset of Partial Order aa na ab nb n 4 4b a 4a Version Space and PAC Learning

Construction of Ordering Relation 1 10 n a f j k …   b a r   Version Space and PAC Learning

G-Boundary / S-Boundary of V An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V Version Space and PAC Learning

G-Boundary / S-Boundary of V An hypothesis in V is most general iff no hypothesis in V is more general G-boundary G of V: Set of most general hypotheses in V An hypothesis in V is most specific iff no hypothesis in V is more general S-boundary S of V: Set of most specific hypotheses in V Version Space and PAC Learning

Example: G-/S-Boundaries of V aa na ab nb n 4 4b a 4a aa We replace every hypothesis in S whose extension does not contain 4 by its generalization set Now suppose that 4 is given as a positive example 4 1 k … S Version Space and PAC Learning

Example: G-/S-Boundaries of V aa na ab Here, both G and S have size 1. This is not the case in general! 4a nb a 4b n 4 Version Space and PAC Learning

Example: G-/S-Boundaries of V The generalization set of an hypothesis h is the set of the hypotheses that are immediately more general than h aa na ab 4a nb a Generalization set of 4 4b n Let 7 be the next (positive) example 4 Version Space and PAC Learning

Example: G-/S-Boundaries of V aa na ab 4a nb a 4b n Let 7 be the next (positive) example 4 Version Space and PAC Learning

Example: G-/S-Boundaries of V Specialization set of aa aa na ab nb a n Let 5 be the next (negative) example Version Space and PAC Learning

Example: G-/S-Boundaries of V G and S, and all hypotheses in between form exactly the version space ab nb a 1. If an hypothesis between G and S disagreed with an example x, then an hypothesis G or S would also disagree with x, hence would have been removed n Version Space and PAC Learning

Example: G-/S-Boundaries of V G and S, and all hypotheses in between form exactly the version space ab nb a 2. If there were an hypothesis not in this set which agreed with all examples, then it would have to be either no more specific than any member of G – but then it would be in G – or no more general than some member of S – but then it would be in S n Version Space and PAC Learning

Example: G-/S-Boundaries of V At this stage … ab No Yes nb a Maybe n Do 8, 6, j satisfy CONCEPT? Version Space and PAC Learning

Example: G-/S-Boundaries of V ab nb a n Let 2 be the next (positive) example Version Space and PAC Learning

Example: G-/S-Boundaries of V ab nb Let j be the next (negative) example Version Space and PAC Learning

Example: G-/S-Boundaries of V + 4 7 2 – 5 j nb NUM(r)  BLACK(s)  REWARD([r,s]) Version Space and PAC Learning

Example: G-/S-Boundaries of V Let us return to the version space … … and let 8 be the next (negative) example ab nb a The only most specific hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples n Version Space and PAC Learning

Example: G-/S-Boundaries of V Let us return to the version space … … and let j be the next (positive) example ab nb a The only most general hypothesis disagrees with this example, hence no hypothesis in H agrees with all examples n Version Space and PAC Learning

Version Space and PAC Learning Version Space Update x  new example If x is positive then (G,S)  POSITIVE-UPDATE(G,S,x) Else (G,S)  NEGATIVE-UPDATE(G,S,x) If G or S is empty then return failure Version Space and PAC Learning

POSITIVE-UPDATE(G,S,x) Eliminate all hypotheses in G that do not agree with x Version Space and PAC Learning

POSITIVE-UPDATE(G,S,x) Eliminate all hypotheses in G that do not agree with x Minimally generalize all hypotheses in S until they are consistent with x Using the generalization sets of the hypotheses Version Space and PAC Learning

POSITIVE-UPDATE(G,S,x) Eliminate all hypotheses in G that do not agree with x Minimally generalize all hypotheses in S until they are consistent with x Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G This step was not needed in the card example Version Space and PAC Learning

POSITIVE-UPDATE(G,S,x) Eliminate all hypotheses in G that do not agree with x Minimally generalize all hypotheses in S until they are consistent with x Remove from S every hypothesis that is neither more specific than nor equal to a hypothesis in G Remove from S every hypothesis that is more general than another hypothesis in S Return (G,S) Version Space and PAC Learning

NEGATIVE-UPDATE(G,S,x) Eliminate all hypotheses in S that do not agree with x Minimally specialize all hypotheses in G until they are consistent with x Remove from G every hypothesis that is neither more general than nor equal to a hypothesis in S Remove from G every hypothesis that is more specific than another hypothesis in G Return (G,S) Version Space and PAC Learning

Example-Selection Strategy Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps Version Space and PAC Learning

Version Space and PAC Learning Example aa na ab 9? j? j? nb a n Version Space and PAC Learning

Example-Selection Strategy Suppose that at each step the learning procedure has the possibility to select the object (card) of the next example Let it pick the object such that, whether the example is positive or not, it will eliminate one-half of the remaining hypotheses Then a single hypothesis will be isolated in O(log |H|) steps But picking the object that eliminates half the version space may be expensive Version Space and PAC Learning

Version Space and PAC Learning Noise If some examples are misclassified the version space may collapse Possible solution: Maintain several G- and S-boundaries, e.g., consistent with all examples, all examples but one, etc… (Exercise: Develop this idea!) Version Space and PAC Learning

Current-Best-Hypothesis Search Keep one hypothesis at each step Generalize or specialize the hypothesis at each new example Details left as an exercise… Version Space and PAC Learning

Version Space and PAC Learning VSL vs DTL Decision tree learning (DTL) is more efficient if all examples are given in advance; else, it may produce successive hypotheses, each poorly related to the previous one Version space learning (VSL) is incremental DTL can produce simplified hypotheses that do not agree with all examples DTL has been more widely used in practice Version Space and PAC Learning

Can Inductive Learning Work? Inductive hypothesis h size m Training set D + - Example set X Hypothesis space H f: correct hypothesis p(x): probability that example x is picked from X size |H| Version Space and PAC Learning

Approximately Correct Hypothesis h  H is approximately correct (AC) with accuracy e iff: Pr[h(x)  f(x)]  e where x is an example picked with probability distribution p from X Version Space and PAC Learning

PAC Learning Procedure L is Provably Approximately Correct (PAC) with confidence g iff: Pr[Pr[h(x)  f(x)] > e]  g Can L be PAC? If yes, how big should the size m of the training set D be? Version Space and PAC Learning

Version Space and PAC Learning Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[g(x)  f(x)] > e So, the probability that g is consistent with all the examples in D is at most (1-e)m … … and he probability that there exists a non-AC hypothesis matching all the examples in D is at most |H|(1-e)m Version Space and PAC Learning

Version Space and PAC Learning Can L Be PAC? Let g be an arbitrary element of H that is not approximately correct Since g is not AC, we have: Pr[g(x)  f(x)] > e So, the probability that g is consistent with all the examples in D is at most (1-e)m … … and he probability that there exists a non-AC hypothesis matching all the examples in D is at most |H|(1-e)m Therefore, L is PAC if the size m of the training set verifies: |H|(1-e)m  d Version Space and PAC Learning

Version Space and PAC Learning Size of Training Set From |H|(1-e)m  g we derive: m  ln(g/|H|) / ln(1-e) Since e < -ln(1-e) for 0<e<1, we have: m  ln(g/|H|) / (-e) m  ln(|H|/g) / e So, m increases logarithmically with the size of the hypothesis space But how big is |H|? Version Space and PAC Learning

Version Space and PAC Learning Importance of KIS Bias If H is the set of all logical sentences with n base predicates, then |H| = , and m is exponential in n If H is the set of all conjunctions of k << n base predicates picked among n predicates, then |H| = O(nk) and m is logarithmic in n  Importance of choosing a “good” KIS bias 2 2n Version Space and PAC Learning

Explanation-Based Learning KB: Background knowledge D: Observed knowledge such that KB D Inductive learning Find h such that KB and h are consistent KB,h D Explanation-based learning Find h such that KB = KB1,KB2 KB1 h KB2,h D Example: Derivatives of functions KB1 is the general theory D consists of examples h defines the derivatives of usual functions KB2 gives simplification rules Nothing really new is learnt! Version Space and PAC Learning

Version Space and PAC Learning Summary Version space method Structure of hypothesis space Generalization/specialization of hypothesis PAC learning Explanation-based learning Version Space and PAC Learning