Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.

Similar presentations


Presentation on theme: "Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data."— Presentation transcript:

1 Machine Learning Version Spaces Learning

2 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data mining  speed up learning  inductive learning  … Introduction:  Very many approaches to machine learning:

3 Version Spaces A concept learning technique based on refining models of the world.

4 4 Concept Learning  Example:  a student has the following observations about having an allergic reaction after meals: RestaurantMealDayCost Reaction Alma 3breakfastFridaycheapYes De MoetelunchFridayexpensiveNo Alma 3lunchSaturdaycheapYes SedesbreakfastSundaycheapNo Alma 3breakfastSundayexpensiveNo + - + - -  Concept to learn: under which circumstances do I get an allergic reaction after meals ??

5 5 In general  There is a set of all possible events:  Example: RestaurantMealDayCost 3 X 3 X 7 X 2 = 126 Reaction: Restaurant X Meal X Day X Cost --> Bool  There is a boolean function (implicitly) defined on this set.  Example: Find an inductive ‘guess’ of the concept, that covers all the examples!  We have the value of this function for SOME examples only.

6 6 Pictured: Set of all possible events events                                Given some examples:+ + + + - - - - Find a concept that covers all the positive examples and none of the negative ones!

7 7 Non-determinism  Many different ways to solve this! Set of all possible events events                                 + + + + - - - -  How to choose ?

8 8 An obvious, but bad choice:  The concept IS:  Alma 3 and breakfast and Friday and cheap OR  Alma 3 and lunch and Saturday and cheap  Does NOT generalize the examples any !! RestaurantMealDayCost Reaction Alma 3breakfastFridaycheapYes De MoetelunchFridayexpensiveNo Alma 3lunchSaturdaycheapYes SedesbreakfastSundaycheapNo Alma 3breakfastSundayexpensiveNo - - - ++

9 9 Pictured:  Only the positive examples are positive ! Set of all possible events events                                + + + + - - - -

10 10 Equally bad is:  The concept is anything EXCEPT:  De Moete and lunch and Friday and expensive AND  Sedes and breakfast and Sunday and cheap AND  Alma 3 and breakfast and Sunday and expensive RestaurantMealDayCost Reaction Alma 3breakfastFridaycheapYes De MoetelunchFridayexpensiveNo Alma 3lunchSaturdaycheapYes SedesbreakfastSundaycheapNo Alma 3breakfastSundayexpensiveNo +- + - -

11 11 Pictured:  Everything except the negative examples are positive: Set of all possible events events                                  + + + + - - - -

12 12 Solution: fix a language of hypotheses:  We introduce a fix language of concept descriptions. = hypothesis space  The concept can only be identified as being one of the hypotheses in this language  avoids the problem of having ‘useless’ conclusions  forces some generalization/induction to cover more than just the given examples.

13 13  most general hypothesis: [ ?, ?, ?, ?] Reaction - Example: RestaurantMealDayCost Reaction Alma 3breakfastFridaycheapYes De MoetelunchFridayexpensiveNo Alma 3lunchSaturdaycheapYes SedesbreakfastSundaycheapNo Alma 3breakfastSundayexpensiveNo + - + - -  Every hypothesis is a 4-tuple:  maximally specific: ex.: [ Sedes, lunch, Monday, cheap]  combinations of ? and values are allowed: ex.: [ De Moete, ?, ?, expensive]  or [ ?, lunch, ?, ?]  One more hypothesis:  (bottom = denotes no example)

14 14 Hypotheses relate to sets of possible events      Events  x2      x1 x1 = x2 = h1 = [?, lunch, Monday, ?] h2 = [?, lunch, ?, cheap] h3 = [?, lunch, ?, ?] General Specific               Hypothesesh2 h3 h1

15 15 Expressive power of this hypothesis language:  Conjunctions of explicit, individual properties  Examples:  [?, lunch, Monday, ?] : Meal = lunch  Day = Monday  [?, lunch, ?, cheap] : Meal = lunch  Cost = cheap  [?, lunch, ?, ?] : Meal = lunch  In addition to the 2 special hypotheses:  [ ?, ?, ?, ?] : anything   : nothing

16 16 Other languages of hypotheses are allowed  Example: identify the color of a given sequence of colored objects. the examples: red : + purple : - purple : - blue : + blue : + any_color mono_color polyo_color red blue green orange purple   A useful language of hypotheses:

17 17 Important about hypothesis languages:  They should have a specific general ordering. Corresponding to the set-inclusion of the events they cover.

18 18 Defining Concept Learning:  Given:  A set X of possible events :  Ex.: Eat-events:  Ex.: Eat-events:  An (unknown) target function c: X -> {-, +} An (unknown) target function c: X -> {-, +}  Ex.: Reaction: Eat-events -> { -, +}  A language of hypotheses H  Ex.: conjunctions: [ ?, lunch, Monday, ?]  A set of training examples D, with their value under c  Ex.: (,+), …  Find:  A hypothesis h in H such that for all x in D: x is covered by h  c(x) = +

19 19 The inductive learning hypothesis: If a hypothesis approximates the target function well over a sufficiently large number of examples, then the hypothesis will also approximate the target function well on other unobserved examples.

20 20 Find-S: a naïve algorithm Initialize: h :=  For each positive training example x in D Do: If h does not cover x: Replace h by a minimal generalization of h that covers x Return h

21 21 Initially: h =  Reaction example: Example 1: h = [Alma 3,breakfast,Friday,cheap]  minimal generalizations = the individual events Example 2: h = [Alma 3, ?, ?,cheap]  Generalization = replace something by ? no more positive examples: return h RestaurantMealDayCost Reaction Alma 3breakfastFridaycheapYes De MoetelunchFridayexpensiveNo Alma 3lunchSaturdaycheapYes SedesbreakfastSundaycheapNo Alma 3breakfastSundayexpensiveNo + - + - -

22 22 Properties of Find-S BethJoAlex HackerScientistFootballplayer H  Non-deterministic:  Depending on H, there maybe several minimal generalizations: Beth can be minimally generalized in 2 ways to include a new example Jo.

23 23 Properties of Find-S (2)  May pick incorrect hypothesis (w.r.t. the negative examples): D : BethAlexJo +-+ BethJoAlex HackerScientistFootballplayer H Is a wrong solution

24 24 Properties of Find-S (3)  Cannot detect inconsistency of the training data: D : BethJoBeth ++- BethAlexJo +-+ JoAlex Scientist HBeth  Nor inability of the language H to learn the concept:

25 25 Nice about Find-S:  It doesn’t have to remember previous examples ! If h already covered the 20 first examples, then h’ will as well h       XH  If the previous h already covered all previous examples, then a minimal generalization h’ will too !  h’   

26 26 Dual Find-S: Initialize: h := [ ?, ?,.., ?] For each negative training example x in D Do: If h does cover x: Replace h by a minimal specialization of h that does not cover x Return h

27 27 Reaction example: RestaurantMealDayCost Reaction Alma 3breakfastFridaycheapYes De MoetelunchFridayexpensiveNo Alma 3lunchSaturdaycheapYes SedesbreakfastSundaycheapNo Alma 3breakfastSundayexpensiveNo + - + - - Initially: h = [ ?, ?, ?, ?] Example 1:  h= [ ?, breakfast, ?, ?] Example 2:  h= [ Alma 3, breakfast, ?, ?] Example 3:  h= [ Alma 3, breakfast, ?, cheap]

28 28  Perform both Find-S and Dual Find-S:  Find-S deals with positive examples  Dual Find-S deals with negative examples Version Spaces: the idea:  BUT: do NOT select 1 minimal generalization or specialization at each step, but keep track of ALL minimal generalizations or specializations

29 29 Version spaces: initialization  The version spaces G and S are initialized to be the smallest and the largest hypotheses only. G = { [ ?, ?, …, ?] } S = {  }

30 30 Negative examples:  Replace the top hypothesis by ALL minimal specializations that DO NOT cover the negative example. Invariant: only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example S = {  } G = { [ ?, ?, …, ?] } G = {h1, h2, …, hn}

31 31 Positive examples:  Replace the bottom hypothesis by ALL minimal generalizations that DO cover the positive example. Invariant: only the hypotheses more general than the ones of S are still possible: they do cover the positive example G = {h1, h2, …, hn} S = {  } S = { h1’, h2’, …,hm’ }

32 32 Later: negative examples  Replace the all hypotheses in G that cover a next negative example by ALL minimal specializations that DO NOT cover the negative example. Invariant: only the hypotheses more specific than the ones of G are still possible: they don’t cover the negative example S = { h1’, h2’, …,hm’ } G = {h1, h2, …, hn} G = {h1, h21, h22, …, hn}

33 33 Later: positive examples  Replace the all hypotheses in S that do not cover a next positive example by ALL minimal generalizations that DO cover the example. Invariant: only the hypotheses more general than the ones of S are still possible: they do cover the positive example G = {h1, h21, h22, …, hn} S = { h1’, h2’, …,hm’ } S = { h11’, h12’, h13’, h2’, …,hm’ }

34 34 Optimization: negative:  Only consider specializations of elements in G that are still more general than some specific hypothesis (in S) G = {h1, h21, …, hn} S = { h1’, h2’, …,hm’ } Invariant: on S used ! … are more general than...

35 35 Optimization: positive:  Only consider generalizations of elements in S that are still more specific than some general hypothesis (in G) Invariant: on G used ! G = {h1, h21, h22, …, hn} S = { h13’, h2’, …,hm’ } … are more specific than...

36 36 Pruning: negative examples  The new negative example can also be used to prune all the S - hypotheses that cover the negative example. G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } Invariant only works for the previous examples, not the last one Cover the last negative example!

37 37 Pruning: positive examples  The new positive example can also be used to prune all the G - hypotheses that do not cover the positive example. G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } Don’t cover the last positive example

38 38 Eliminate redundant hypotheses  If a hypothesis from G is more specific than another hypothesis from G: eliminate it ! G = {h1, h21, …, hn} S = { h1’, h3’, …,hm’ } More specific than another general hypothesis Reason: Invariant acts as a wave front: anything above G is not allowed. The most general elements of G define the real boundary Obviously also for S !

39 39 Convergence:  Eventually, if G and S MAY get a common element: Version Spaces has converged to a solution.  Remaining examples need to be verified for the solution. G = {h1, h21, h22, …, hn} S = { h13’, h2’, …,hm’ }

40 40 Reaction example  Initialization: [ ?, ?, ?, ?] Mostgeneral Mostspecific

41 41 Alma3, breakfast, Friday, cheap: +  Positive example: minimal generalization of  [ ?, ?, ?, ?]  [Alma3, breakfast, Friday, cheap]

42 42 DeMoete, lunch, Friday, expensive: -  Negative example: minimal specialization of [ ?, ?, ?, ?]  15 possible specializations !! [Alma3, ?, ?, ?] [DeMoete, ?, ?, ?] [Sedes, ?, ?, ?] [?, breakfast, ?, ?] [?, lunch, ?, ?] [?, dinner, ?, ?] [?, ?, Monday, ?] [?, ?, Tuesday, ?] [?, ?, Wednesday, ?] [?, ?, Thursday, ?] [?, ?, Friday, ?] [?, ?, Saturday, ?] [?, ?, Sunday, ?] [?, ?, ?, cheap] [?, ?, ?, expensive] Matches the negative example X X X X Does not generalize the specific model Specific model: [Alma3, breakfast, Friday, cheap] X X X X X X X X Remain !

43 43 Result after example 2: [ ?, ?, ?, ?]  [Alma3, breakfast, Friday, cheap] [Alma3, ?, ?, ?] [?, breakfast, ?, ?] [?, ?, ?, cheap]

44 44 Alma3, lunch, Saturday, cheap: +  Positive example: minimal generalization of [Alma3, breakfast, Friday, cheap] [Alma3, breakfast, Friday, cheap] [Alma3, ?, ?, ?] [?, breakfast, ?, ?] [?, ?, ?, cheap] [Alma3, ?, ?, cheap] does not match new example

45 45 Sedes, breakfast, Sunday, cheap: -  Negative example: minimal specialization of the general models: [Alma3, ?, ?, ?] [?, ?, ?, cheap] [Alma3, ?, ?, cheap] The only specialization that is introduced is pruned, because it is more specific than another general hypothesis

46 46 Alma 3, breakfast, Sunday, expensive: -  Negative example: minimal specialization of [Alma3, ?, ?, ?] [Alma3, ?, ?, ?] [Alma3, ?, ?, cheap] Same hypothesis !!! Cheap food at Alma3 produces the allergy !

47 47 Version Space Algorithm: Initially: G := { the hypothesis that covers everything} S := {  } S := {  } For each new positive example: Generalize all hypotheses in S that do not cover the example yet, but ensure the following: example yet, but ensure the following: - Only introduce minimal changes on the hypotheses. - Only introduce minimal changes on the hypotheses. - Each new specific hypothesis is a specialization of - Each new specific hypothesis is a specialization of some general hypothesis. some general hypothesis. - No new specific hypothesis is a generalization of - No new specific hypothesis is a generalization of some other specific hypothesis. some other specific hypothesis. Prune away all hypotheses in G that do not cover the Prune away all hypotheses in G that do not cover the example. example.

48 48 Version Space Algorithm (2):... For each new negative example: Specialize all hypotheses in G that cover the example, but ensure the following: but ensure the following: - Only introduce minimal changes on the hypotheses. - Only introduce minimal changes on the hypotheses. - Each new general hypothesis is a generalization of - Each new general hypothesis is a generalization of some specific hypothesis. some specific hypothesis. - No new general hypothesis is a specialization of - No new general hypothesis is a specialization of some other general hypothesis. some other general hypothesis. Prune away all hypotheses in S that cover the example. Until there are no more examples: report S and G OR S or G become empty: report failure

49 49 Properties of VS:  Symmetry:  positive and negative examples are dealt with in a completely dual way.  Does not need to remember previous examples.  Noise:  VS cannot deal with noise !  If a positive example is given to be negative then VS eliminates the desired hypothesis from the Version Space G !

50 50 Termination:  If it terminates because of “no more examples”: [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] Then all these hypotheses, and all intermediate hypotheses, are still correct descriptions covering the test data. VS makes NO unnecessary choices !  Example (spaces on termination):

51 51 Termination (2):  If it terminates because of S or G being empty:  Then either:  The data is inconsistent (noise?)  The target concept cannot be represented in the hypothesis-language H. [Alma3, breakfast, ?, cheap]  [Alma3, lunch, ?, cheap] - - + +  Example:  target concept is:  given examples like:  this cannot be learned in our language H.

52 52 Which example next?  VS can decide itself which example would be most useful next. [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?]  It can ‘query’ a user for the most relevant additional classification !  Example: classified negative by 3 hypotheses classified positive by 3 hypotheses is the most informative new example

53 53 Use of partially learned concepts:  Example:  Example: [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?]  can be classified as positive  is covered by all remaining hypotheses !  it is enough to check that it is covered by the hypotheses in S ! (all others generalize these)

54 54 Use of partially learned concepts (2):  Example:  Example: [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?]  can be classified as negative  is not covered by any remaining hypothesis !   it is enough to check that it is not covered by any hypothesis in G ! (all others specialize these)

55 55 Use of partially learned concepts (3):  Example:  Example: [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?]  can not be classified  is covered by 3, not covered 3 hypotheses   no conclusion

56 56 Use of partially learned concepts (4):  Example:  Example: [Alma3, ?, ?, ?] [?, ?, Monday,?] [Alma3, ?, Monday, cheap] [Alma3, ?, ?, cheap] [?, ?, Monday, cheap] [Alma3, ?, Monday, ?] probably does not belong to the concept : ratio : 1/6  can only be classified with a certain degree of precision  is covered by 1, not covered 5 hypotheses

57 57 The relevance of inductive BIAS: choosing H  Our hypothesis language L fails to learn some concepts.  See example : [Alma3, breakfast, ?, cheap]  [Alma3, lunch, ?, cheap]  What about choosing a more expressive language H?  Assume H’:  allows conjunctions (as before)  allows disjunction and negation too !  Example: –Restaurant = Alma3  ~(Day = Monday)

58 58 Inductive BIAS (2):  This language H’ allows to represent ANY subset of the complete set of all events X RestaurantMealDayCost 3 X 3 X 7 X 2 = 126  But X has 126 elements  we can express 2 126 different hypotheses now !

59 59 Inductive BIAS (3)  Version Spaces using H’: RestaurantMealDayCost Reaction Alma 3breakfastFridaycheapYes De MoetelunchFridayexpensiveNo Alma 3lunchSaturdaycheapYes SedesbreakfastSundaycheapNo Alma 3breakfastSundayexpensiveNo + - + - - ? {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]} {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]  ~[Alma 3, breakfast, Sunday, expensive]}  ~[Alma 3, breakfast, Sunday, expensive]} {~[DeMoete. Lunch, Friday, expensive]} {[Alma 3, breakfast, Friday, cheap]  [Alma 3, lunch, Saturday, cheap]} {[Alma 3, breakfast, Friday, cheap]}

60 60 Inductive BIAS (3)  Resulting Version Spaces: G = {~[DeMoete. Lunch, Friday, expensive]  ~[Sedes, breakfast, Sunday, cheap]  ~[Alma 3, breakfast, Sunday, expensive]} S = {[Alma 3, breakfast, Friday, cheap]  [Alma 3, lunch, Saturday, cheap]}  We haven’t learned anything  merely restated our positive and negative examples !  In general: in order to be able to learn, we need an inductive BIAS (= assumption):  Example: “ The desired concept CAN be described as a conjunction of features “

61 61 Shift of Bias  Practical approach to the Bias problem:  Start VS with a very weak hypothesis language.  If the concept is learned: o.k.  Else  Refine your language and restart VS  Avoids the choice.  Gives the most general concept that can be learned.


Download ppt "Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data."

Similar presentations


Ads by Google