CS 9633 Machine Learning Concept Learning

Slides:



Advertisements
Similar presentations
Concept Learning and the General-to-Specific Ordering
Advertisements

2. Concept Learning 2.1 Introduction
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
CS 484 – Artificial Intelligence1 Announcements Project 1 is due Tuesday, October 16 Send me the name of your konane bot Midterm is Thursday, October 18.
Università di Milano-Bicocca Laurea Magistrale in Informatica
Adapted by Doug Downey from: Bryan Pardo, EECS 349 Fall 2007 Machine Learning Lecture 2: Concept Learning and Version Spaces 1.
Chapter 2 - Concept learning
Machine Learning: Symbol-Based
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Concept Learning and Version Spaces
Artificial Intelligence 6. Machine Learning, Version Space Method
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2001.
Computing & Information Sciences Kansas State University Lecture 01 of 42 Wednesday, 24 January 2008 William H. Hsu Department of Computing and Information.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Friday, 25 January 2008 William.
CS 484 – Artificial Intelligence1 Announcements List of 5 source for research paper Homework 5 due Tuesday, October 30 Book Review due Tuesday, October.
CS 478 – Tools for Machine Learning and Data Mining The Need for and Role of Bias.
For Friday Read chapter 18, sections 3-4 Homework: –Chapter 14, exercise 12 a, b, d.
1 Machine Learning What is learning?. 2 Machine Learning What is learning? “That is what learning is. You suddenly understand something you've understood.
Machine Learning Chapter 11.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 19, 2000.
For Friday Finish Chapter 18 Homework: –Chapter 18, exercises 1-2.
General-to-Specific Ordering. 8/29/03Logic Based Classification2 SkyAirTempHumidityWindWaterForecastEnjoySport SunnyWarmNormalStrongWarmSameYes SunnyWarmHighStrongWarmSameYes.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul.
1 Concept Learning By Dong Xu State Key Lab of CAD&CG, ZJU.
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, January 22, 2001 William.
Chapter 2: Concept Learning and the General-to-Specific Ordering.
CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.
Concept Learning and the General-to-Specific Ordering 이 종우 자연언어처리연구실.
Outline Inductive bias General-to specific ordering of hypotheses
Overview Concept Learning Representation Inductive Learning Hypothesis
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 2-Concept Learning (1/3) Eduardo Poggi
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Thursday, August 26, 1999 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 34 of 41 Wednesday, 10.
Machine Learning: Lecture 2
Machine Learning Concept Learning General-to Specific Ordering
Kansas State University Department of Computing and Information Sciences CIS 690: Implementation of High-Performance Data Mining Systems Thursday, 20 May.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
CS464 Introduction to Machine Learning1 Concept Learning Inducing general functions from specific training examples is a main issue of machine learning.
Concept Learning and The General-To Specific Ordering
Computational Learning Theory Part 1: Preliminaries 1.
Concept learning Maria Simi, 2011/2012 Machine Learning, Tom Mitchell Mc Graw-Hill International Editions, 1997 (Cap 1, 2).
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Chapter 2 Concept Learning
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Computational Learning Theory
Concept Learning Machine Learning by T. Mitchell (McGraw-Hill) Chp. 2
CSE543: Machine Learning Lecture 2: August 6, 2014
Computational Learning Theory
Analytical Learning Discussion (4 of 4):
Machine Learning Chapter 2
Introduction to Machine Learning Algorithms in Bioinformatics: Part II
Ordering of Hypothesis Space
Machine Learning: Lecture 3
Concept Learning.
Concept learning Maria Simi, 2010/2011.
IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M
Machine Learning: Lecture 6
Concept Learning Berlin Chen 2005 References:
Machine Learning: UNIT-3 CHAPTER-1
Machine Learning Chapter 2
Inductive Learning (2/2) Version Space and PAC Learning
Implementation of Learning Systems
Version Space Machine Learning Fall 2018.
Machine Learning Chapter 2
Presentation transcript:

CS 9633 Machine Learning Concept Learning References: Machine Learning by Tom Mitchell, 1997, Chapter 2 Artificial Intelligence: A Modern Approach, by Russell and Norvig, Second Edition, 2003, pages 678 – 686 Elements of Machine Learning, by Pat Langley, 1996, Chapter 2 Computer Science Department CS 9633 KDD

Computer Science Department Concept Learning Inferring a Boolean-valued function from training examples Training examples are labeled as members or non-members of the concept Computer Science Department CS 9633 KDD

Concept Learning Task Defined By Set of instances over which target function is defined Target function Set of candidate hypotheses considered by the learner Set of available training examples Computer Science Department CS 9633 KDD

Computer Science Department Example Concept Days when you would enjoy water sports Computer Science Department CS 9633 KDD

Computer Science Department Instances X Possible days, each described by the attributes Sky (Sunny, Cloudy, Rainy) AirTemp (Warm, Cold) Humidity (Normal, High) Wind (Strong, Weak) Water (Warm, Cold) Forecast (Same, Change) Computer Science Department CS 9633 KDD

Computer Science Department Hypotheses H Each hypothesis is a vector of 6 constraints, specifying the values of 6 attributes For each attribute, hypothesis is: Value of ? if any value is acceptable for this attribute Single required value for the attribute Value of 0 if no value is acceptable Sample hypothesis (Rainy, ?, ?,?, Warm ,?) Computer Science Department CS 9633 KDD

General and Specific Hypotheses Most general hypothesis (?, ?, ?, ?, ?, ?) Most specific hypothesis (0, 0, 0, 0, 0, 0) Computer Science Department CS 9633 KDD

Computer Science Department Target Concept c EnjoySport: X  (0,1) Computer Science Department CS 9633 KDD

Computer Science Department Training Examples D Example Sky Air Temp Humidity Wind Water Forecast Enjoy Sport 1 Sunny Warm Normal Strong Same Yes 2 High 3 Rainy Cold Change No 4 Cool Computer Science Department CS 9633 KDD

Computer Science Department Determine: A hypothesis h in H such that h(x) = c(x) for all x in X Computer Science Department CS 9633 KDD

Inductive Learning Hypothesis Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples. Computer Science Department CS 9633 KDD

Concept Learning as Search Concept learning can be viewed as searching through a large space of hypotheses implicitly defined by the hypothesis representation. Computer Science Department CS 9633 KDD

Sample and hypothesis space size How many instances? How many hypotheses? How many semantically distinct hypotheses? Sky (Sunny, Cloudy, Rainy) AirTemp (Warm, Cold) Humidity (Normal, High) Wind (Strong, Weak) Water (Warm, Cold) Forecast (Same, Change) Computer Science Department CS 9633 KDD

Searching hypothesis space Goal is to efficiently search hypothesis space to find the hypothesis that best fits the training data Hypothesis space is potentially Very large Possibly infinite Computer Science Department CS 9633 KDD

General to Specific Ordering of Hypotheses It is often possible to use a natural general-to-specific ordering of the hypothesis space to organize the search Can often exhaustively search all of the space without explicitly enumerating all hypotheses Computer Science Department CS 9633 KDD

Computer Science Department Example h1 = <Sunny, ?, ?, Strong, ?, ?> h2 = <Sunny, ?, ?, ?, ?, ?> Which is more general? Computer Science Department CS 9633 KDD

Computer Science Department Notation For any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1. Computer Science Department CS 9633 KDD

Computer Science Department Definition Let hj and hk be boolean-valued functions defined over X. Then hj is more general than or equal to hk iff Definition of strictly more general than Computer Science Department CS 9633 KDD

Partially Ordered Sets Properties of a partial order Reflexive Transitive Antisymmetric Form a lattice Computer Science Department CS 9633 KDD

Computer Science Department Important Point The g and >g are dependent only on which instances satisfy the hypotheses and not on the target concept. We will now consider algorithms that take advantage of this partial order among hypotheses to organize the search space Computer Science Department CS 9633 KDD

Computer Science Department FIND-S Algorithm Approach: start with most specific hypothesis and then generalize the hypothesis when it does not cover a training example A hypothesis “covers” a training example”—correctly classifies example as true Computer Science Department CS 9633 KDD

Computer Science Department FIND-S Initialize h to the most specific hypothesis in H For each positive training instance x For each attribute constraint ai in h If the constraint ai is satisfied by x then Do nothing Else Replace ai in h by the next more general constraint that is satisfied by x Output hypothesis h Computer Science Department CS 9633 KDD

Apply to Training Examples D Sky Air Temp Humidity Wind Water Forecast Enjoy Sport 1 Sunny Warm Normal Strong Same Yes 2 High 3 Rainy Cold Change No 4 Cool Computer Science Department CS 9633 KDD

Computer Science Department Traversing lattice Specific General Computer Science Department CS 9633 KDD

Computer Science Department Properties of FIND-S Hypothesis space is described as a conjunction of attribute constraints Guaranteed to output the most specific hypothesis within H that is consistent with training examples. Final hypothesis is also consistent with negative examples if: Correct target concept is contained in H Training examples are correct Computer Science Department CS 9633 KDD

Computer Science Department Consider this example Attribute 1 (Possible values X,Y) Attribute 2 (Possible values A, B, C) Label X A Yes B C No Computer Science Department CS 9633 KDD

Computer Science Department Issues Has the learner converged to the correct target concept? Are there other consistent hypotheses? Why prefer the most specific hypothesis? Are the training examples consistent? What if there are several maximally specific consistent hypotheses? Computer Science Department CS 9633 KDD

Candidate Elimination Algorithm Goal is to output a description of all hypotheses consistent with the training examples. Computes description without explicitly enumerating all members. Is also called Least Commitment Search. Like FIND-S, it uses more-general-than partial ordering Computer Science Department CS 9633 KDD

Computer Science Department Definition A hypothesis h is consistent with a set of training examples D iff h(x) = c(x) for each example <x, c(x)> in D. Computer Science Department CS 9633 KDD

Computer Science Department Definition The version space, denoted VSH,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D Computer Science Department CS 9633 KDD

List-Then-Eliminate Algorithm VersionSpace a list of every hypothesis in H For each training example <x, c(x)> Remove from VersionSpace any hypothesis for which h(x)  c(s) Output the list of hypotheses in VersionSpace Computer Science Department CS 9633 KDD

More compact representation of VS Candidate-elimination algorithm uses a more compact representation of VS VS represented by most general and specific members. These members form a boundary that delimits the version space within the partially ordered hypothesis space. Also called Least Commitment Search. Computer Science Department CS 9633 KDD

S: G : {<Sunny, Warm, ?, Strong, ?, ?>} <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G : {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

Inconsistent Region G1 G2 G3 ... Gm G-set S1 S2 S3 ... Sn S-set Inconsistent Region

Definitions of general and specific boundaries Definition : The general boundary G, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D. Definition : The specific boundary G, with respect to hypothesis space H and training data D, is the set of minimally general (maximally specific) members of H consistent with D. Computer Science Department CS 9633 KDD

Computer Science Department Theorem 2.1 Version space representation theorem X is an arbitrary set of instances H a set of boolean-valued hypotheses defined over X c:X{0,1} is an arbitrary concept defined over X D is an arbitrary set of training examples {<x, c(x)>}. For all X, H, c, and D such that S and G are well defined, Computer Science Department CS 9633 KDD

Candidate-Elimination Learning Algorithm Initialize G to most general and S to most specific Use examples to refine Computer Science Department CS 9633 KDD

Initialize G to the set of maximally general hypotheses in H Initialize S to the set of maximally specific hypotheses in H For each training example d, do If d is a positive example Remove from G any hypothesis inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S Add to S all minimal generalizations h of s such that h is consistent with d, and some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S If d is a negative example Remove from S any hypothesis inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G Add to G all minimal generalizations h of g such that h is consistent with d, and some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G

{<0, 0, 0, 0, 0, 0>} S0: S1: {<Sunny, Warm, Normal, Strong, Warm, Same>} {<0, 0, 0, 0, 0, 0>} G0: {<?, ?, ?, ?, ?, ?>} G1: {<?, ?, ?, ?, ?, ?>} Training Example 1: <Sunny, Warm, Normal, Strong, Warm, Same>, Enjoy Sport = Yes

S1: S2: G1: G2: {<Sunny, Warm, Normal, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, Warm, Same>} {<Sunny, Warm, Normal, Strong, Warm, Same>} S1: S2: G1: {<?, ?, ?, ?, ?, ?>} G2: {<?, ?, ?, ?, ?, ?>} Training Example 2: <Sunny, Warm, High, Strong, Warm, Same>, Enjoy Sport = Yes

S2: S3: G2: G3: {<Sunny, Warm, Normal, Strong, Warm, Same>} {<Sunny, Warm, ?, Strong, Warm, Same>} S2: S3: G2: {<?, ?, ?, ?, ?, ?>} G3: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>} {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} {<?, ?, ?, ?, ?, ?>} {<Sunny, ?, ?, ?, ?, ?>,} Training Example 3: <Rainy, Cold, High, Strong, Warm, Change>, Enjoy Sport = No

S3: S3: G4: G3: {<Sunny, Warm, ?, Strong, Warm, Same>} Training Example 4: <Sunny, Warm, High, Strong, Cool, Change>, Enjoy Sport = Yes

Computer Science Department Questions Does the order of presentation of examples matter? How will you know when the concept has been learned? How do you know when you have presented enough training data? What happens if incorrectly labeled examples are presented? If the learner can request examples, which example should be requested next? Computer Science Department CS 9633 KDD

{<Sunny, Warm, ?, Strong, ?, ?>} <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} Select an example that would be classified as positive by some hypotheses and negative by others. Training Example Possibility: <Sunny, Warm, Normal, Light, Warm, Same> Positive

Partially Learned Concepts Can we classify unseen examples even though we still have multiple hypotheses? The answer is yes for some. Computer Science Department CS 9633 KDD

Computer Science Department Optimal strategy Generate instances that satisfy exactly half of the hypotheses in current VS. Correct query concept can be found in log2|VS| experiments Computer Science Department CS 9633 KDD

S: G: {<Sunny, Warm, ?, Strong, ?, ?>} <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} A: <Sunny, Warm, Normal, Strong, Cool, Change> ?

S: G: {<Sunny, Warm, ?, Strong, ?, ?>} <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} B: <Rainy, Cold, Normal, Light, Warm, Same> ?

S: G: {<Sunny, Warm, ?, Strong, ?, ?>} <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} C: <Sunny, Warm, Normal, Light, Warm, Same> ?

S: G: {<Sunny, Warm, ?, Strong, ?, ?>} <Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> {<?, Warm, ?, Strong, ?, ?,> G: {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>} D: <Sunny, Cold, Normal, Strong, Warm, Same> ?

Computer Science Department Inductive Bias Candidate Elimination converges toward the true concept if It is given enough training examples Its initial hypothesis space contains the target concept No noisy data Fundamental questions What if the target concept is not contained in the hypothesis space? Can we include every possible hypothesis? How does the size of the hypothesis space influence ability to generalize? How does the size of the hypothesis space influence number of examples we need? Computer Science Department CS 9633 KDD

Biased Hypothesis Space The hypothesis space we have been considering biases the concepts we can learn. An alternative is to use a hypothesis space that can represent every teachable concept (all possible subsets of X). Computer Science Department CS 9633 KDD

Unbiased Hypothesis Space We need to be able to represent all possible subsets of instances. If set X has |X| elements, what is the size of the power set of X? For our example, the size of X is 96. What is the size of the power set? What was the size of the hypothesis space we have been considering? Computer Science Department CS 9633 KDD

Consider an Alternate Definition of EnjoySport Want to be able to represent every subset of instances with new hypothesis space H’. Let H’ be the power set of X. One method: allow arbitrary disjunctions, conjunctions, and negations of earlier hypotheses. Target concept Sky = sunny or sky = cloudy would be: <Sunny, ?, ?, ?, ?, ?>  <Cloudy, ?, ?, ?, ?, ?> Computer Science Department CS 9633 KDD

Alternate representation Positive examples: x1, x2, and x3 Negative examples: x4 and x5 Most general hypotheses: {(x4  x5)} Most specific hypotheses: {(x1 x2 x3)} What can we classify unambiguously? Computer Science Department CS 9633 KDD

The Conclusion: Problem of Induction Generalizing from any set of observations is never logically justified, since there always exist many hypotheses that could account for the observed data. Learning system MUST limit or direct its search through the space of possible knowledge structures. This is called the bias of the system Two kinds of bias Representational bias (limit the hypotheses) Search bias (considers all possible concept descriptions but examines some earlier than others. Computer Science Department CS 9633 KDD

Computer Science Department Inductive Learning All inductive learning systems use inductive bias Learning algorithms can be characterized by the bias they use Useful point of comparison of learning algorithms Computer Science Department CS 9633 KDD

Computer Science Department Notation z inductively follows from y z deductively follows from y Computer Science Department CS 9633 KDD

Computer Science Department Notation L is an arbitrary learning algorithm Dc is an arbitrary set of training data for an arbitrary concept c After learning L is asked to classify a new instance xi Let L(xi, Dc) be the classification Inductive inference step Computer Science Department CS 9633 KDD

Inductive Bias Definition Consider a concept learning algorithm L for the set of instance X. Let c be an arbitrary concept defined over X, and let Dc={<x,c(x)>} be an arbitrary set of training examples of c. Let L(xi,Dc) denote the classification assigned to the instance xi by L after training on data Dc. The inductive bias of L is any minimal set of assertions B such that for any target concept c and corresponding training examples Dc Computer Science Department CS 9633 KDD

Inductive System Training Examples Candidate Elimination Algorithm Classification of new instance or “don’t know” New Instance Using Hypothesis Space H Training Examples Classification of new instance or “don’t know” New Instance Theorem Prover Assertion “H contains the target concept”

Computer Science Department Conclusions Concept learning as search General to specific ordering of hypotheses provides useful structure for guiding search Version spaces and candidate elimination provide useful framework for studying machine learning issues Inductive learning must use representational or search bias Computer Science Department CS 9633 KDD