Presentation is loading. Please wait.

Presentation is loading. Please wait.

CpSc 810: Machine Learning Concept Learning and General to Specific Ordering.

Similar presentations


Presentation on theme: "CpSc 810: Machine Learning Concept Learning and General to Specific Ordering."— Presentation transcript:

1 CpSc 810: Machine Learning Concept Learning and General to Specific Ordering

2 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources. The Copyright belong to the original authors. Thanks!

3 3 Review Choose the training experience Choose the target function Choose a representation Choose the parameter fitting algorithm Evaluate the entire system

4 4 What is a Concept? A Concept is a subset of objects or events defined over a larger set Example: The concept of a bird is the subset of all objects (i.e., the set of all things or all animals) that belong to the category of bird. Alternatively, a concept is a Boolean-valued function defined over this larger set Example: a function defined over all animals whose value is true for birds and false for every other animal. Things Animals Birds Cars

5 5 What is Concept-Learning? Given a set of examples labeled as members or non-members of a concept, concept-learning consists of automatically inferring the general definition of this concept. In other words, concept-learning is inferring a Boolean-valued function from training examples of its input and output.

6 6 Example of a Concept Learning task Concept: Good Days for Water Sports (values: Yes, No) Attributes/Features: Sky (values: Sunny, Cloudy, Rainy) AirTemp (values: Warm, Cold) Humidity (values: Normal, High) Wind (values: Strong, Weak) Water (Warm, Cool) Forecast (values: Same, Change) A training example: class

7 7 Example of a Concept Learning task Training Examples The task is to learn to predict the value of EnjoySport for an arbitrary day. Learn the general concept of EnjoySport Day Sky AirTemp Humidity Wind Water Forecast WaterSport 1 Sunny Warm Normal Strong Warm Same Yes 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change No 4 Sunny Warm High Strong Cool Change Yes

8 8 Terminology and Notation Instances X: the set of items over which the concept is defined is called the set of instances The concept to be learned is called the Target Concept (denoted by c: X--> {0,1}) The set of Training Examples D is a set of instances, x, along with their target concept value c(x). Members of the concept (instances for which c(x)=1) are called positive examples. Nonmembers of the concept (instances for which c(x)=0) are called negative examples.

9 9 Terminology and Notation Given a set of training examples of the target concept c, the problem faced by the learner is to hypothesize, or estimate c. Hypotheses Space H represents the set of all possible hypotheses. H is determined by the human designer’s choice of a hypothesis representation. Each Hypothesis h in H is described by a conjunction of constraints on the attributes. The goal of concept-learning is to find a hypothesis h:X --> {0,1} such that h(x)=c(x) for all x in X.

10 10 Representing Hypotheses For each attribute, the constrain can be A specific value (e.g. Water = Warm) “?” means “any value is acceptable” “0” means “no value is acceptable” Each hypothesis consists of a conjunctions of constraints on attributes. Example of a hypothesis: If the air temperature is cold and the humidity high then it is a good day for water sports) Selecting a Hypothesis Representation is an important step since it restricts (or biases) the space that can be searched. For example, the hypothesis “If the air temperature is cold or the humidity high then it is a good day for water sports” cannot be expressed in our chosen representation.

11 11 Prototypical Concept Learning Task

12 12 The inductive learning hypothesis The inductive learning hypothesis: Any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

13 13 Concept Learning as Search Concept Learning can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis representation. Goal of the search is to find the hypotheses that best fits the training examples. What is the possible hypotheses for the EnjoySport learning task Syntactically distinct hypotheses: 5*4*4*4*4*4=5120 Semantically distinct hypotheses (every hypothesis containing one or more “0” represents the empty set of instances): 1+(4*3*3*3*3*3)=973

14 14 General to Specific Ordering of Hypotheses Most General Hypothesis: Everyday is a good day for water sports Most Specific Hypothesis: No day is a good day for water sports Definition: Let h j and h k be boolean-valued functions defined over X. Then h j is more- general-than-or-equal-to h k iff for all x in X, [(h k (x) = 1) --> (h j (x)=1)] Example: h1 = h2 = Every instance that are classified as positive by h1 will also be classified as positive by h2 in our example data set. Therefore h2 is more general than h1.

15 15 Instances, Hypotheses, more_general_than

16 16 Find-S, a Maximally Specific Hypothesis Learning Algorithm Initialize h to the most specific hypothesis in H For each positive training instance x For each attribute constraint a i in h If the constraint a i is satisfied by x then do nothing else replace a i in h by the next more general constraint that is satisfied by x Output hypothesis h

17 17 Hypothesis Space Search by Find-S

18 18 Shortcomings of Find-S Although Find-S finds a hypothesis consistent with the training data, it does not indicate whether that is the only one available Is it a good strategy to prefer the most specific hypothesis? What if the training set is inconsistent (noisy)? What if there are several maximally specific consistent hypotheses? Find-S cannot backtrack!

19 19 Version Spaces A hypothesis h is consistent with a set of training examples D of target concept c iff h(x) = c(x) for each example in D. The version space, denoted VS H,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D.

20 20 The List-Then-Eliminate Algorithm 1. VersionSpace <- a list containing every hypothesis in H 2. For each training example, remove from VersionSpace any hypothesis h for which h(x)≠c(x). 3. Output the list of hypotheses in VersionSpace

21 21 Example Version Space Version Space for our training examples. Day Sky AirTemp Humidity Wind Water Forecast WaterSport 1 Sunny Warm Normal Strong Warm Same Yes 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change No 4 Sunny Warm High Strong Cool Change Yes

22 22 Discussions on List-Then-Eliminate It works whenever the hypothesis space H is finite. It is guaranteed to output all hypotheses consistent with the training data. Shortcoming: the exhaustively enumerating all hypotheses in H is an unrealistic requirement.

23 23 A Compact Representation for Version Spaces Instead of enumerating all the hypotheses consistent with a training set, we can represent its most specific and most general boundaries. The hypotheses included in-between these two boundaries can be generated as needed. Definition: The general boundary G, with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D. Definition: The specific boundary S, with respect to hypothesis space H and training data D, is the set of minimally general (i.e., maximally specific) members of H consistent with D.

24 24 Version space representation theorem Every member of the version space lies between these boundaries. Where x≥y means x is more general or equal to y. Proof: 1. every h satisfying the right-hand side of the above expression is in VS H,D, 2. every member of VS H,D satisfied the right-hand side of expression.

25 25 Candidate-Elimination Learning Algorithm

26 26 Candidate-Elimination Trace

27 27 Candidate-Elimination Trace

28 28 Candidate-Elimination Trace

29 29 Candidate-Elimination Trace

30 30 Candidate-Elimination Trace

31 31 Remarks on Version Spaces and Candidate-Elimination The candidate-Elimination algorithm computes the version space containing all (and only those) hypotheses from H that are consistent with an observed sequence of training examples. The version space learned by the Candidate- Elimination Algorithm will converge toward the hypothesis that correctly describes the target concept provided: (1) There are no errors in the training examples; (2) There is some hypothesis in H that correctly describes the target concept. Convergence can be speeded up by presenting the data in a strategic order. The best examples are those that satisfy exactly half of the hypotheses in the current version space. Version-Spaces can be used to assign certainty scores to the classification of new examples

32 32 How Should These Be Classified?

33 33 A Biased Hypothesis Space Given our previous choice of the hypothesis space representation, no hypothesis is consistent with the following examples: we have BIASED the learner to consider only conjunctive hypotheses In this case, we need a more expressive hypothesis space

34 34 An Un-Biased Learner In order to solve the problem caused by the bias of the hypothesis space, we can remove this bias and allow the hypotheses to represent every possible subset of instances by allowing arbitrary disjunctions, conjunctions and negations of previous hypotheses. The previous exmaples could then be expressed as: v However, such an unbiased learner is not able to generalize beyond the observed examples!!!! All the non-observed examples will be well- classified by half the hypotheses of the version space and misclassified by the other half.

35 35 A Example of Un-Biased Learner Rote-Learner: Learning corresponds simply to storing each observed training example in memory. Subsequent instances are classified by looking them up in memory. If the instance is found in memory, the stored classification is returned. Otherwise, the system refuses to classify the new instance.

36 36 The Futility of Bias-Free Learning Fundamental Property of Inductive Learning A learner that makes no a priori assumptions regarding the identity of the target concept has no rational basis for classifying any unseen instances. In EnjoySport, there are implicit assumption that the target concept could be represented by a conjunction of attribute values. We constantly have recourse to inductive biases Example: we all know that the sun will rise tomorrow. Although we cannot deduce that it will do so based on the fact that it rose today, yesterday, the day before, etc., we do take this leap of faith or use this inductive bias, naturally!

37 37 The Futility of Bias-Free Learning Notation: A |- B indicates that B follows deductively from A. A B indicates that B follows inductively from A. The inductive inference step:

38 38 Inductive Bias Consider Concept learning algorithm L Instances X, target concept c Training examples D c = { } Let L(x i, D c ) denote the classification assigned to the instance x i by L after training on data D c Definition: The inductive bias of L is any minimal set of additional assumptions B, for any target concept c and corresponding training examples D c, sufficient to justify its inductive inferences as deductive inferences. Inductive bias of Candidate-Elimination algorithm: The target concept c is contained in the given hypothesis space H.

39 39 Inductive Systems and Equivalent Deductive Systems

40 40 Ranking Inductive Learners according to their Biases Rote-Learner: This system simply memorizes the training data and their classification--- No generalization is involved. Candidate-Elimination: New instances are classified only if all the hypotheses in the version space agree on the classification Find-S: New instances are classified using the most specific hypothesis consistent with the training data Bias Strength Weak Strong

41 41 Summary Points Concept learning as search through H General to specific ordering over H Version space candidate elimination algorithm S and G boundaries characterize learner’s uncertainty Learner can generate useful queries Inductive leaps possible only if learner is biased Inductive learners can be modeled by equivalent deductive systems.


Download ppt "CpSc 810: Machine Learning Concept Learning and General to Specific Ordering."

Similar presentations


Ads by Google