Presentation is loading. Please wait.

Presentation is loading. Please wait.

2015-10-41 EIE426-AICV1 Machine Learning Filename: eie426-machine-learning-0809.ppt.

Similar presentations


Presentation on theme: "2015-10-41 EIE426-AICV1 Machine Learning Filename: eie426-machine-learning-0809.ppt."— Presentation transcript:

1

2 2015-10-41 EIE426-AICV1 Machine Learning Filename: eie426-machine-learning-0809.ppt

3 2015-10-42 EIE426-AICV 2 Contents  Machine learning concepts and procedures  Learning by recording cases  Learning by building identification trees  Simplification of decision rules

4 2015-10-43 Machine Learning  Learning is based on coupling new information to previously acquired knowledge. Usually, a great deal of reasoning is involved. (1) Learning by analyzing differences (2) Learning by managing multiple models (3) Learning by explaining experience (4) Learning by correcting mistakes  Learning is based on digging useful regularity out of data (1) Learning by recording cases (2) Learning by building identification trees (3) Learning by training neural nets (4) Learning by simulation evolution Two kinds of learning: EIE426-AICV

5 2015-10-44 Learning by Recording Cases The consistency heuristic:  whenever you want to guess a property of something, given nothing else to go on but a set of reference cases, find the most similar case, as measured by known properties, for which the property is known. Guess that the unknown property is the same as that known property. This technique is good for problem domains in which good models are impossible to build. The learning will do nothing to the information in the recorded cases until that information is used. EIE426-AICV

6 2015-10-45 EIE426-AICV Learning by Recording Cases (cont.)

7 2015-10-46 EIE426-AICV Learning by Recording Cases (cont.)

8 2015-10-47 Finding Nearest Neighbors The straightforward way: calculate the distance to each other object and find the minimum among those distances. For n other objects, there are n distances to compute and (n-1) distance comparisons to do. EIE426-AICV

9 Decision Trees A decision tree is a representation that is a semantic tree in which  Each leaf node is connected to a set of possible answers.  Each non-leaf node is connected to a test that splits its set of possible answers into subsets corresponding to different test results.  Each branch carries a particular test result’s subset to another node. 2015-10-4 EIE426-AICV 8

10 Decision Trees (cont.) 2015-10-4 EIE426-AICV 9

11 K-D Tree A k-d tree is a representation That is a decision tree In which  The set of possible answers consists of points, one of which may be the nearest neighbor to a given point.  Each test specifies a coordinate, a threshold, and a neutral zone around the threshold containing no points.  Each test divides a set of points into two sets, according to on which side of the threshold each point lies. 2015-10-4 EIE426-AICV 10

12 2015-10-411 0 0 2 2 4 4 6 6 Height Width Red Orange Yellow Purple Red Violet Blue Green 2.00 U 4.00 U 2.00 U EIE426-AICV K-D Tree (cont.)

13 2015-10-412 Height > 3.5 Height > 5.5? Orange No Width > 3.5 Yes Width > 3.0 Height > 1.5? VioletRedGreenBlue No Yes Red Yes Height > 5.5? PurpleYellow No Yes EIE426-AICV K-D Tree (cont.)

14 K-D Tree (cont.) To divide the cases into sets,  If there is only one case, stop.  If this is the first division of cases, pick the vertical axis for comparison; otherwise, pick the axis that is different from the axis at the next higher level.  Considering only the axis of comparison, find the average position of the two middle objects. Call this average position the threshold, and construct a decision-tree test that compares unknowns in the axis of comparison against the threshold. Also note the position of the two middle objects in the axis of comparison. Call these positions the upper and lower boundaries.  Divide up all the objects into two subsets, according to on which side of the average position they lie.  Divide up the objects in each subset, forming a subtree for each, using this procedure. 2015-10-4 EIE426-AICV 13

15 2015-10-414 To find the nearest neighbor using the K-D procedure,  Determine whether there is only one element in the set under consideration.  If there is only one, report it.  Otherwise, compare the unknown, in the axis of comparison, against the current node’s threshold. The result determines the likely set.  Find the nearest neighbor in the likely set using this procedure.  Determine whether the distance to the nearest neighbor in the likely set is less than or equal to the distance to the other set’s boundary in the axis of comparison:  If it is, then report the nearest neighbor in the likely set.  If it is not, check the unlikely set using this procedure; return the nearer of the nearest neighbors in the likely set and in the unlikely set. EIE426-AICV K-D Tree (cont.)

16 2015-10-415 Learning by Building Identification Trees Identification-tree building is the most widely used learning method. Thousands of practical identification trees, for applications ranging from medical diagnosis to process control, has been built using the method. EIE426-AICV

17 2015-10-416 From Data to Identification Trees NameHairHeightWeightLotionResult Sarahblondeaveragelightnosunburned Danablonde tallaverageyesnone Alexbrown shortaverageyesnone Annieblonde shortaveragenosunburned Emilyred averageheavynosunburned Petebrown tallheavynonone Johnbrown averageheavynonone Katieblondeshortlightyesnone EIE426-AICV

18 2015-10-417 An identification tree is a representation That is a decision tree In Which  Each set of possible conclusions is established implicitly by a list of samples of known class. In the table, there are 3 x 3 x 3 x 2 = 54 possible combinations. The probability of an exact match with someone already observed is 8/54. It can be impractical to classify an unknown object by looking for an exact match. EIE426-AICV From Data to Identification Trees (cont.)

19 2015-10-418 Height Tall Average Short Dana Pete Weight 4Sarah Light Average Heavy Hair color BlondeRed Brown Alex Weight LightAverage Heavy Katie 4Annie Hair Blonde RedBrown 4Emily John EIE426-AICV Identification Tree

20 2015-10-419  The world is inherently simple. Therefore the smallest identification tree that is consistent with the samples is the one that is most likely to identify unknown objects correctly. Which is the right identification tree? How can you construct the smallest identification tree? EIE426-AICV Identification Tree (cont.)

21 2015-10-420 Tests Should Minimize Disorder EIE426-AICV

22 2015-10-421 EIE426-AICV Tests Should Minimize Disorder (cont.) Hair Color: Blonde 4 Samples: Sarah, Dana, Annie, Katie

23 2015-10-422 Information Theory Supplies a Disorder Formula Where n b is the number of samples in branch b, n t is the total number of samples in all branches, n bc is the number of samples in branch b of class c. EIE426-AICV

24 Disorder Formula 2015-10-4 EIE426-AICV 23 For two classes, A and B: If they are perfectly balanced, that is, n bc = 0.5 (c=1,2), then If there are only A’s or only B’s (perfect homogeneity), then

25 2015-10-424 As it moves from perfect homogeneity to perfect balance, disorder varies smoothly between zero and one. EIE426-AICV Disorder Measure

26 2015-10-425 TestDisorder Hair0.5 Height0.69 Weight0.94 Lotion0.61 The first test: Thus, the hair-color test is the winner. EIE426-AICV Disorder Measure (cont.)

27 2015-10-426 TestDisorder Height0.5 Weight1 Lotion0 Once the hair test is selected, the choice of another test to separate out the sunburned people from among Sarah, Dana, Annie, and Katie is decided by the following calculations: Thus, the lotion-used test is the clear winner. EIE426-AICV Disorder Measure (cont.)

28 Identification Tree Algorithm To generate an identification tree using SPROUTER,  Until each leaf node is populated by as homogeneous a sample set as possible:  Select a leaf node with an inhomogeneous sample set.  Replace that leaf node by a test node that divides the inhomogeneous sample set into minimally inhomogeneous subsets, according to some measure of disorder. 2015-10-4 EIE426-AICV 27

29 2015-10-428 From Trees to Rules If the person’s hair color is blonde and the person uses lotion, thennothing happens. Ifthe person’s hair color is blonde and the person uses no lotion, thenthe person turns red. Ifthe person’s hair color is red, thenthe person turns red. If the person’s hair color is brown, thennothing happens. EIE426-AICV

30 2015-10-429 Unnecessary Rule Antecedents Should be Eliminated If the person’s hair color is blonde and the person uses lotion. thennothing happens. Ifthe person uses lotion, then nothing happens. EIE426-AICV

31 2015-10-430 Contingency Table No change Sunburned Person is blonde20 Person is not blonde10 The first antecedent can be eliminated. No changeSunburned Person uses lotion20 Person uses no lotion02 The second antecedent cannot be eliminated. EIE426-AICV Keep the 2 nd antecedent Samples: Dana Alex Katie Keep the 1st antecedent Samples: Sarah Dana Annie Katie

32 2015-10-431 Ifthe person’s hair color is blonde the person does not use lotion thenthe person turns red No change Sunburned Person is blonde02 Person is not blonde21 The first antecedent cannot be eliminated. No change Sunburned Person uses no lotion02 Person uses lotion20 The second antecedent cannot be eliminated either. EIE426-AICV Contingency Table (cont.) Keep the 2 nd antecedent Samples: Sarah, Annie Emily, Pete John Keep the 1st antecedent Samples: Sarah Dana Annie Katie

33 2015-10-432 If the person’s hair color is red, then the person turns red. No changeSunburned Person is red haired 01 Person is not red haired52 The antecedent cannot be eliminated. If the person’s hair color is brown, thennothing happens. No changeSunburned Person is brown haired30 Person is not brown haired23 The antecedent cannot be eliminated. EIE426-AICV Contingency Table (cont.) No antecedent All 8 samples are considered. No antecedent All 8 samples are considered.

34 2015-10-433 Unnecessary Rules Should be Eliminated If the person’s hair color is blonde and the person uses no lotion, thenthe person turns red. ----- Rule 1 If the person uses lotion, then nothing happens. ----- Rule 2 If the person’s hair color is red, then the person turns red. ----- Rule 3 If the person’s hair color is brown, thennothing happens. ----- Rule 4 EIE426-AICV

35 Default Rules and Tie Breaker Default rule: Ifno other rule applies, thenthe person turns red, ----- Rule 5 or Ifno other rule applies, thennothing happens. ----- Rule 6 2015-10-4 EIE426-AICV 34 Choose the default rule to minimize the total number of rules. Tie breaker 1: Choose the default rule that covers the most common consequent in the sample set. Rule 6 is used together with Rules 1 and 3. Tie breaker 2: Choose the default rule that produces the simplest rules. Rule 5 is used together with Rules 2 and 4.

36 Rule Generation Algorithm To generate rules from an identification tree using PRUNER,  Create one rule for each root-to-leaf path in the identification tree.  Simplify each rule by discarding antecedents that have no effect on the conclusion reached by the rule.  Replace those rules that share the most common consequent by a default rule that is triggered when on other rule is triggered (eliminating as many other rules as possible). In the event of a tie, use some heuristic tie breaker to choose a default rule. 2015-10-4 EIE426-AICV 35


Download ppt "2015-10-41 EIE426-AICV1 Machine Learning Filename: eie426-machine-learning-0809.ppt."

Similar presentations


Ads by Google