Presentation on theme: "Concept Learning and the General-to-Specific Ordering"— Presentation transcript:
1Concept Learning and the General-to-Specific Ordering
2Context The learning task Concept learning as search NotationThe inductive learning hypothesisConcept learning as searchFind-S: finding a maximally specific hypothesisVersion Space and the Candidate-Elimination algorithmRemarksInductive BiasSummary
3The learning task Concept learning: Deriving a boolean-valued function from trainingexamples (inputs and output).
4NotationExample: Days on which my friend Aldo enjoys his favorite water sportRepresentation: hypothesis consists of constraints on the instance attributesFor each attribute:?: any value is acceptable: no value is acceptableExample: <?,Cold,High,?,?,?>Most general hypothesis: every day a is positive example: <?,?,?,?,?>Most specific hypothesis: no day is a positive example:
5Notation 2GivenInstance X: some possible days, each described by the attributes (Sky with possible values Sunny, Cloudy and Rainy, AirTemp with Warm, Cold, ...)Hypothesis H: each hypothesis is described by a conjunction of constraints on the attributes Sky, Airtemp, Humidity, Wind, Water and Forecast. The constraint contains '?', ' ' and/or specific valuesTarget concept c: EnjoySportTraining examples D: positive and negative examples of target functionDetermine A hypothesis h in H such that h(x)=c(x) for all x in X
6Notation 3Set of training examples: each consisting of an instance x from X along with its target concept value c(x) <x, c(x)>c(x) = 1: pos. example or member of the target conceptc(x) =0: neg. example or non-member of the target concept
7The inductive learning hypothesis The only information available about c is its value over the training examplesTherefore an inductive learning algorithm can at best guarantee that the output hypothesis fits the target concept over the training dataFundamental assumption: the best hypothesis regarding unseen instances is the hypothesis that best fits the observed dataThe Inductive Learning Hypothesis: any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.
8Content The learning task Concept learning as search General-to-Specific Ordering of HypothesisFind-S: finding a maximally specific hypothesisVersion Space and the Candidate-Elimination algorithmRemarksInductive BiasSummary
9Concept learning as search Note: by selecting a hypothesis representation the designer of the learning algorithm implicitly defines the space of all hypotheses, that the program can ever represent and therefore can ever learn.Homework:Example: 3*2*2*2*2*2= 96 distinct instanceExample: 5*4*4*4*4*4= 5120 syntactically distinct hypothesisExample: 1+(4*3*3*3*3*3)= 973 semantically distinct hypothesis
10General-to-Specific Ordering of Hypothesis Example general-to-specific: second is less constrained ->it classifies more positive instancesIn detail:for any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1Let and be boolean-valued functions defined over X Then is more_general_than_or_equal_to (written ) if and only ifLet and be boolean-valued function defined over X Then is more_general_than (written ) if and only if
11General-to-Specific Ordering of Hypothesis 2 The relations are defined independently of the target conceptThe relation more_general_than_or_equal_to defines a partial order over the hypothesis space HInformally: there may be pairs of and , such that and
12ExampleEach hypothesis corresponds to some subset of X (the subset of instances that it classifies positive)The arrow represents the more_general_than relation with the arrow pointing toward the less general hypothesis
13Find-S: finding a maximally specific hypothesis Use the more_general_than partial ordering:Begin with the most specific possible hypothesis in HGeneralise this hypothesis each time it fails to cover an observed positive example1. Initialise h to the most specific hypothesis in H2. For each positive training instance xFor each attribute constraint in hIf the constraint is satisfied by x Then do nothingElse replace in h by the next more general constraint that is satisfied by x3. Output hypothesis h
14Find-S: finding a maximally specific hypothesis (example) 1. Step:2.Step: 1.Example + 1 Step:3. Step: substituting a '?' in place of any attribute value in h that is not satisfied by new example3. negative Example:FIND-S algorithm simply ignores every negative example4.Step:
15RemarksIn the general case, as long as we assume that the hypothesis space H contains a hypothesis that describes the true target concept c and that the training data contains no errors, then the current hypothesis h can never require a revision in response to a negative example.Why?The current hypothesis is the most specific consistent one with the observed positive exampleTarget concept c must be more_general_than_or_equal_to hBut the target concept c will never cover a negative example, thus neither h.Therefore no revision to h will be required in response to any negative exampleIn the literature there are many different algorithms that use the same more_general_than partial ordering
16Content Representation The list-then-eliminate algorithm The learning taskConcept learning as searchFind-S: finding a maximally specific hypothesisVersion Space and the Candidate-Elimination AlgorithmKey Idea: output a description of the set of all hypotheses consistent with the training examplesRepresentationThe list-then-eliminate algorithmA more compact representation of the version spaceCandidate-elimination learning algorithmExampleRemarksInductive BiasSummary
17RepresentationDefinition: a hypothesis h is consistent with the set of training examples D if and only if h(x)=c(x) for each example in DDifference between consistent and satisfies: x satisfies h <=> h(x) = 1, no matter if x is a positive or negative example of the target concept x is consistent with h <=> h(x) =c(x), the hypothesis h has to classify the example correctly, corresponding to the target conceptDefinition: The version space denoted , with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D.
18The List-Then-Eliminate algorithm Simplest way to represent the version space: list all of its elements: The List-Then-Eliminate Algorithm 1. Version Space <- a list containing every hypothesis in H 2. For each training example remove from VersionSpace any hypothesis h for which 3. Output the list of hypotheses in VersionSpaceThis algorithm cannot be applied whenever the hyp. space H is infiniteAdvantage: It is guarantied to output all hypotheses consistent with the training dataDisadvantage: It's not efficiently computable (enumerating all hyp. in H).
19A more compact representation of the version space Representation: The version space is represented by its most general and least general members. These two members form the general and specific boundary sets which delimit the version space within the partially ordered hyp. space.
20Example of the List-Then-Eliminate Alg. The result:Arrows indicate more_general_than relationCandidate-Elimination represents the version space by storing only its most general members (labeled G) and its most specific ones (labeled S). With this two sets it is possible to enumerate all members of the version space needed for generating the hypothesis. The hypothesis we are looking for lies between these two sets in the general-to-specific partial ordering over hypothesis.
21Definition of the Boundary Sets Definition: the general boundary G, with respect to hyp. space H and training data D, is the set of maximally general members of H consistent with DDefinition: the specific boundary S, with respect to hyp. space H and training data D, is the set of minimally general (i.e. maximally specific) members of H consistent with D.
22Definition of the Boundary Sets (2) Theorem: (version space representation theorem) let X be an arbitrary set of instances and let H be a set of boolean-valued hypotheses defined over X. Let be an arbitrary target concept defined over x and let D be an arbitrary set of training examples For all X, H, c and D such that S and G are well defined
23Candidate-Eliminations Learning algorithm The algorithm computes the version space containing all hypotheses from H that are consistent with an observed sequence of training examplesBegin: version space set of all hypotheses in H G: most general hypothesis in H: S: most specific hypothesis in H:Delimit the entire hypothesis spaceAs each training example is considered the S and G boundary sets are generalised and specialised respectively to eliminate from the version space any hypothesis found inconsistent with the new training example.After all examples have been processed the computed version space contains all the hypotheses consistent with these examples and only these hypotheses.
24Candidate-Elimination Learning algorithm 2 1. Initialise G to set of maximally general hypotheses in H2. Initialise S to set of maximally specific hypotheses in HFor each training example d, doIf d is a positive exampleRemove from G any hyp. inconsistent with dFor each hypothesis s in S that is not consistent with dRemove s from SAdd to S all minimal generalisations h of s such thath is consistent with d and some member of G is more general than hRemove from S any hyp. that is more general than another hyp. in SIf d is a negative exampleRemove from S any hyp. inconsistent with dFor each hypothesis g in G that is not consistent with dRemove g from GAdd to G all minimal specialisation h of g such thath is consistent with d and some member of S is more specific than hRemove from G any hyp. That is less general than another hyp. in G
25An illustrative example 1st Example (Sunny,Warm, Normal, Strong, Warm, Same, Yes):S is overly specific: fails to cover this exampleMoving boundary to least more general hypothesisNo update of the G boundary is needed2nd Example (Sunny,Warm, High, Strong, Warm, Same, Yes):Generalising S, leaving G again unchanged
26An illustrative example 2 3rd Example (Rainy, Cold, High, Strong, Warm, Change, No):It reveals that G is overly general (incorrectly predicts this example as positive -> G must be specialisedThere are 6 attributes That could be specified, why are there only 3 new hypothesis?For example is a min spec and correctly labels the new example but it is not included in WHY? This hypotheses is inconsistent with the previously encountered positive examples Algorithm determine this by noting h is not more general than the current specific boundary
28An illustrative example 4 4th Example (Sunny, Warm, High, Strong, Cool, Change, Yes):Generalises SRemoving one member of G because this member fails to cover the new positive example. Why?It cannot be specialised (it would not make it cover the new example)It cannot be generalised (definition G; any more general hyp will cover at least one negative training example)Therefore the hypothesis must be dropped from G
29An illustrative example 5 and delimit the version space of all hyp. consistent with the set of incrementally observed training examplesThis learned version space is independent of the sequence in which the training examples are presented
30Content The learning task Concept learning as search Find-S: finding a maximally specific hypothesisVersion Space and the Candidate-Elimination algorithmRemarksDoes the candidate-elimination algorithm converge to a correct hypothesis?What training examples should the learner request next?Inductive BiasSummary
31Does the Candidate-Elimination algorithm converge to the correct hypothesis? Convergence:There are no errors in the training examplesThere are some hyp. in H that correctly describe the target concept.The target concept is exactly learned when the S and G boundary sets converge to a single identical hypothesisWhat will happen if the training data contains errorsAssume example 2 incorrectly as negative => remove the correct target concept (every hyp. inconsistent with the training examples is removed)Given sufficient additional training data the learner will eventually detect an inconsistency by noticing that S and G eventually converge to an empty version space.
32What training example should the learner request next? Before, we assumed that the training examples are provided to the learner by some external teacher.Definition: Query: the learner is allowed to conduct experiments in which it chooses the next instance, then obtains the correct classification for this instance from an external oracle.Example EnjoySport: what would be a general query-strategy?Clearly the learner should choose an instance that would be classified positive by some hypothesis, but negative by others.Shrinking the version space from six hypothesis to half this numberIn general the optimal query-strategy for a concept learner is to generate instances that satisfy exactly half the hypotheses in the current version spaceWhen it is possible the correct target concept can found with experiments
33Content The learning task Concept learning as search Find-S: finding a maximally specific hypothesisVersion Space and the Candidate-Elimination algorithmRemarksInductive BiasA biased hypothesis spaceAn unbiased learnerSummary
34Inductive Bias Questions: What if the target concept is not contained in the hyp. space?Can this difficulty be avoided by using a hyp. space that includes every possible hypothesis?How does the size of this hypothesis space influence the ability of the algorithm to generalise the unobserved instance?How does the size of the hypothesis space influence the number of training examples that must be observed?
35A biased hypothesis space Suppose we wish to assure that the hyp. space contains the unkown target conceptObvious solution: Enrich the hyp. space to include every possible hyp.Example: EnjoySport restricts hyp. space to include only the conjunctions of the attribute valuesIn fact for these training examples the alg. would find that there are zero hyp. in the version spaceWhy?: the most spec. hyp. consistent with the first two examples and representable in the given hyp space H isIt is overly general: it incorrectly covers the third (negative example) Problem: bias: only the conjunctive hyp.
36An Unbiased LearnerGoal: Assuming the target concept is in the hypothesis spaceSolution: Provide a hyp. space which is capable of representing every teachable concept => capable of representing every possible subset of the instance X; in general the power set of XExample: EnjoySport size of instance space X is 96 => distinct target conceptsReformulate the example: defining H' corresponding to the powerset of XAllow arbitrary disjunction, conjunction and negation of the earlier hyp.One way to define such an H' for example Sky = Sunny or Sky = Cloudy
37An Unbiased Learner 2Candidate-Elimination alg. can be used BUT new problem:The concept learning alg. is now completely unable to generalize beyond the observed exampleWhy? Suppose we represent 3 positive and 2 negative examples to the learnerS will contain the hyp. which is just the disjunction of the positive examplesG will cover the negative examplesS and G will be always the disjungtion/ negated disjungtion of the training example => the only examples that will be unambiguously classified by S and G are the observed training examples themselves
38SummaryConcept learning can be seen as a problem of searching through a large predefined space of potential hypothesesThe general_to_specific partial ordering of hyp. provides a useful structure for organizing the search through the hyp. spaceThe FIND-S performs a specific-to general search along one branch of the partial ordering to find the most specific hyp. consistent with the training examples.The Candidate-Elimination incrementally computes the sets of maximally specific (S) and maximally general (G) hyp.S and G delimit the entire set of hyp. consistent with the data and provide to identify the target concept. By looking at S and G one can determine whether the learner has converged to the target concept, wether the training data is inconsistent, which next query would be the most useful to refine the version space.Version space and the Candidate-Elimination alg. are not robust to noisy data. If the unknown target concept is not expressible in the provided hypothesis space that arises many problems.