Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
Games of Prediction or Things get simpler as Yoav Freund Banter Inc.
Combining Inductive and Analytical Learning Ch 12. in Machine Learning Tom M. Mitchell 고려대학교 자연어처리 연구실 한 경 수
Università di Milano-Bicocca Laurea Magistrale in Informatica
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Chapter 2 - Concept learning
Machine Learning: Symbol-Based
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Computing & Information Sciences Kansas State University Lecture 01 of 42 Wednesday, 24 January 2008 William H. Hsu Department of Computing and Information.
Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
For Monday Read chapter 18, sections 5-6 Homework: –Chapter 18, exercises 1-2.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
Machine Learning CSE 681 CH2 - Supervised Learning.
CpSc 810: Machine Learning Decision Tree Learning.
Learning from Observations Chapter 18 Through
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Chapter 2: Concept Learning and the General-to-Specific Ordering.
Decision Tree Learning R&N: Chap. 18, Sect. 18.1–3.
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Machine Learning Chapter 5. Artificial IntelligenceChapter 52 Learning 1. Rote learning rote( โรท ) n. วิถีทาง, ทางเดิน, วิธีการตามปกติ, (by rote จากความทรงจำ.
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
START OF DAY 1 Reading: Chap. 1 & 2. Introduction.
1 Inductive Learning (continued) Chapter 19 Slides for Ch. 19 by J.C. Latombe.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 2-Concept Learning (1/3) Eduardo Poggi
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, August 26, 1999 William.
Machine Learning: Lecture 2
Machine Learning Concept Learning General-to Specific Ordering
Data Mining and Decision Support
Kansas State University Department of Computing and Information Sciences CIS 690: Implementation of High-Performance Data Mining Systems Thursday, 20 May.
Artificial Intelligence Machine Learning. Learning Learning can be described as normally a relatively permanent change that occurs in behaviour as a result.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS464 Introduction to Machine Learning1 Concept Learning Inducing general functions from specific training examples is a main issue of machine learning.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Inductive Learning (2/2) Version Space and PAC Learning Russell and Norvig: Chapter 18, Sections 18.5 through 18.7 Chapter 18, Section 18.5 Chapter 19,
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Chapter 2 Concept Learning
CS 9633 Machine Learning Concept Learning
Machine Learning Chapter 2
CH. 2: Supervised Learning
Ordering of Hypothesis Space
Concept Learning.
IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M
Concept Learning Berlin Chen 2005 References:
Machine Learning Chapter 2
Inductive Learning (2/2) Version Space and PAC Learning
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination AlgorithmThe Candidate Elimination Algorithm Inductive BiasInductive Bias

Main Ideas The algorithm that finds the maximally specific hypothesis is limited in that it only finds one of many hypotheses consistent with the training data. The algorithm that finds the maximally specific hypothesis is limited in that it only finds one of many hypotheses consistent with the training data. The Candidate Elimination Algorithm (CEA) finds ALL hypotheses consistent with the training data. The Candidate Elimination Algorithm (CEA) finds ALL hypotheses consistent with the training data. CEA does that without explicitly enumerating all consistent hypotheses. CEA does that without explicitly enumerating all consistent hypotheses. Applications: Applications: oChemical Mass Spectroscopy oControl Rules for Heuristic Search

Consistency vs Coverage h1 h2 h1 covers a different set of examples than h2 h2 is consistent with training set D h1 is not consistent with training set D Positive examples Negative examples Training set D

Version Space VS Hypothesis space H Version space: Subset of hypothesis from H consistent with training set D. with training set D.

List-Then-Eliminate Algorithm Algorithm: 1. Version Space VS : All hypotheses in H 2. For each training example X Remove every hypothesis h in H inconsistent Remove every hypothesis h in H inconsistent with X: h(x) = c(x) with X: h(x) = c(x) 3.Output the version space VS Comments: This is unfeasible. The size of H is unmanageable.

Previous Exercise: Mushrooms Let’s remember our exercise in which we tried to classify mushrooms as poisonous or not-poisonous. Training set D : ((red,small,round,humid,low,smooth), poisonous) ((red,small,elongated,humid,low,smooth),poisonous) ((gray,large,elongated,humid,low,rough), not-poisonous) ((red,small,elongated,humid,high,rough), poisonous)

Consistent Hypotheses Our first algorithm found only one out of six consistent hypotheses: (red,small,?,humid,?,?) (red,small,?,humid,?,?) (?,small,?,humid,?,?)(red,?,?,humid,?,?)(red,small,?,?,?,?) (red,?,?,?,?,?)(?,small,?,?,?,?)G: S: S: Most specific G: Most general

Candidate-Elimination Algorithm (red,small,?,humid,?,?) (red,small,?,humid,?,?) (red,?,?,?,?,?)(?,small,?,?,?,?)G: S: The candidate elimination algorithm keeps two lists of hypotheses consistent with the training data: The list of most specific hypotheses S and The list of most general hypotheses G This is enough to derive the whole version space VS. VS

Candidate-Elimination Algorithm 1.Initialize G to the set of maximally general hypotheses in H 2.Initialize S to the set of maximally specific hypotheses in H 3.For each training example X do a)If X is positive: generalize S if necessary b)If X is negative: specialize G if necessary 4.Output {G,S}

Positive Examples a)If X is positive:  Remove from G any hypothesis inconsistent with X  For each hypothesis h in S not consistent with X  Remove h from S  Add all minimal generalizations of h consistent with X such that some member of G is more general than h such that some member of G is more general than h  Remove from S any hypothesis more general than any other hypothesis in S any other hypothesis in S G: S: h inconsistent add minimal generalizations

Negative Examples b) If X is negative: Remove from S any hypothesis inconsistent with X For each hypothesis h in G not consistent with X Remove g from G Add all minimal generalizations of h consistent with X such that some member of S is more specific than h such that some member of S is more specific than h Remove from G any hypothesis less general than any other hypothesis in G hypothesis in G G: S:h inconsistent add minimal specializations

An Exercise Initialize the S and G sets: S: (0,0,0,0,0,0) G: (?,?,?,?,?,?) Let’s look at the first two examples: ((red,small,round,humid,low,smooth),poisonous) ((red,small,elongated,humid,low,smooth),poisonous)

An Exercise: two positives The first two examples are positive: ((red,small,round,humid,low,smooth),poisonous) ((red,small,elongated,humid,low,smooth),poisonous) S: (0,0,0,0,0,0) (red,small,round,humid,low,smooth) (red,small,round,humid,low,smooth) (red,small,?,humid,low,smooth) G: (?,?,?,?,?,?) generalize specialize

An Exercise: first negative The third example is a negative example: ((gray,large,elongated,humid,low,rough), not-poisonous) S:(red,small,?,humid,low,smooth) G: (?,?,?,?,?,?) generalize specialize (red,?,?,?,?,?,?) (?,small,?,?,?,?) (?,?,?,?,?,smooth) (red,?,?,?,?,?,?) (?,small,?,?,?,?) (?,?,?,?,?,smooth) Why is (?,?,round,?,?,?) not a valid specialization of G

An Exercise: another positive The fourth example is a positive example: ((red,small,elongated,humid,high,rough), poisonous) S:(red,small,?,humid,low,smooth) generalize specialize G: (red,?,?,?,?,?,?) (?,small,?,?,?,?) (?,?,?,?,?,smooth) (red,small,?,humid,?,?)

The Learned Version Space VS G: (red,?,?,?,?,?,?) (?,small,?,?,?,?) S: (red,small,?,humid,?,?) (red,?,?,humid,?,?) (red,small,?,?,?,?) (?,small,?,humid,?,?)

Points to Consider  Will the algorithm converge to the right hypothesis? The algorithm is guaranteed to converge to the right The algorithm is guaranteed to converge to the right hypothesis provided the following: hypothesis provided the following: No errors exist in the examples No errors exist in the examples The target concept is included in the hypothesis space H The target concept is included in the hypothesis space H  What happens if there exists errors in the examples? The right hypothesis would be inconsistent and thus eliminated. The right hypothesis would be inconsistent and thus eliminated. If the S and G sets converge to an empty space we have evidence If the S and G sets converge to an empty space we have evidence that the true concept lies outside space H. that the true concept lies outside space H.

Query Learning Remember the version space VS after seeing our 4 examples on the mushroom database: G: (red,?,?,?,?,?,?) (?,small,?,?,?,?) S: (red,small,?,humid,?,?) (red,?,?,humid,?,?) (red,small,?,?,?,?) (?,small,?,humid,?,?) What would be a good question to pose to the algorithm? What example is best next?

Query Learning Remember there are three settings for learning: 1.Tasks are generated by a random process outside the learner 2.The learner can pose queries to a teacher 3.The learner explores its surroundings autonomously Here we focus on the second setting; posing queries to an expert. Version space strategy: Ask about the class of an example that would prune half of the Ask about the class of an example that would prune half of the space. Example: space. Example: (red,small,round,dry,low,smooth) (red,small,round,dry,low,smooth)

Query Learning In general if we are able to prune the version space by half on each new query then we can find an optimal hypothesis in the following Number of steps: log2 | VS | log2 | VS | Can you explain why?

Classifying Examples What if the version space VS has not collapsed into a single hypothesis and we are asked to classify a new instance? Suppose all hypotheses in set S agree that the instance is positive? Then we are sure that all hypotheses in VS agree the instance is positive. Why? The same can be said if the instance is negative by all members of set G. Why? In general one can vote over all hypotheses in VS if there is no In general one can vote over all hypotheses in VS if there is no unanimous agreement. unanimous agreement.

Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination AlgorithmThe Candidate Elimination Algorithm Inductive BiasInductive Bias

Inductive Bias Inductive bias is the preference for a hypothesis space H and a search mechanism over H. What would happen if we choose an H that contains all possible hypotheses? What would the size of H be? | H | = Size of the power set of the input space X. Example. You have n Boolean features. | X | = 2 n And the size of H is 2 2 n

Inductive Bias In this case, the candidate elimination algorithm would simply classify as positive the training examples it has seen. This is because H is so large, every possible hypothesis is contained within it. A Property of any Inductive Algorithm: It must have some embedded assumptions about the nature of H. Without assumptions learning is impossible.

Summary The candidate elimination algorithm exploits the general-specific The candidate elimination algorithm exploits the general-specific ordering of hypotheses to find all hypotheses consistent with the ordering of hypotheses to find all hypotheses consistent with the training data. training data. The version space contains all consistent hypotheses and is simply The version space contains all consistent hypotheses and is simply represented by two lists: S and G. represented by two lists: S and G. Candidate elimination algorithm is not robust to noise and assumes Candidate elimination algorithm is not robust to noise and assumes the target concept is included in the hypothesis space. the target concept is included in the hypothesis space. Any inductive algorithm needs some assumptions about the Any inductive algorithm needs some assumptions about the hypothesis space, otherwise it would be impossible to perform hypothesis space, otherwise it would be impossible to perform predictions. predictions.