2015-10-41 EIE426-AICV1 Machine Learning Filename: eie426-machine-learning-0809.ppt.

Slides:



Advertisements
Similar presentations
Building a Conceptual Understanding of Algebra with Algebra Tiles
Advertisements

ALGEBRA TILES Jim Rahn LL Teach, Inc.
Introduction to Artificial Intelligence CS440/ECE448 Lecture 21
DECISION TREES. Decision trees  One possible representation for hypotheses.
Enhancements to basic decision tree induction, C4.5
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 1.
Types of Algorithms.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
ISMT 161: Introduction to Operations Management
Discrete Structure Li Tak Sing( 李德成 ) Lectures
Machine Learning Decision Trees. Exercise Solutions.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Machine learning learning... a fundamental aspect of intelligent systems – not just a short-cut to kn acquisition / complex behaviour.
Information Extraction Lecture 6 – Decision Trees (Basic Machine Learning) CIS, LMU München Winter Semester Dr. Alexander Fraser, CIS.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
AA Trees another alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A balanced.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Decision Tree Algorithm
Learning: Identification Trees Larry M. Manevitz All rights reserved.
Decision Tree Learning
PPA 415 – Research Methods in Public Administration Lecture 5 – Normal Curve, Sampling, and Estimation.
LEARNING DECISION TREES
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Covering Algorithms. Trees vs. rules From trees to rules. Easy: converting a tree into a set of rules –One rule for each leaf: –Antecedent contains a.
MACHINE LEARNING. 2 What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Information Theory, Classification & Decision Trees Ling 572 Advanced Statistical Methods in NLP January 5, 2012.
Learning Chapter 18 and Parts of Chapter 20
Basic Data Mining Techniques
Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Machine Learning Version Spaces Learning. 2  Neural Net approaches  Symbolic approaches:  version spaces  decision trees  knowledge discovery  data.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Graph Theory Topics to be covered:
LEARNING DECISION TREES Yılmaz KILIÇASLAN. Definition - I Decision tree induction is one of the simplest, and yet most successful forms of learning algorithm.
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Learning with Decision Trees Artificial Intelligence CMSC February 20, 2003.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Relative Values. Statistical Terms n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the data  not sensitive to.
Searching by Authority Artificial Intelligence CMSC February 12, 2008.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.
LIMITATIONS OF ALGORITHM POWER
Learning with Decision Trees Artificial Intelligence CMSC February 18, 2003.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Classification and Regression Trees
Medical Decision Making Learning: Decision Trees Artificial Intelligence CMSC February 10, 2005.
The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.
Chapter 14 Probability Rules!. Do Now: According to the 2010 US Census, 6.7% of the population is aged 10 to 14 years, and 7.1% of the population is aged.
Iterative Dichotomiser 3 (ID3) Algorithm
Decision Trees.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Types of Algorithms.
Learning with Identification Trees
Types of Algorithms.
Machine Learning in Practice Lecture 17
Types of Algorithms.
Machine learning: building agents that are capable to learn from their own experience An autonomous agent is expected to learn from its own experience,
Clustering.
Presentation transcript:

EIE426-AICV1 Machine Learning Filename: eie426-machine-learning-0809.ppt

EIE426-AICV 2 Contents  Machine learning concepts and procedures  Learning by recording cases  Learning by building identification trees  Simplification of decision rules

Machine Learning  Learning is based on coupling new information to previously acquired knowledge. Usually, a great deal of reasoning is involved. (1) Learning by analyzing differences (2) Learning by managing multiple models (3) Learning by explaining experience (4) Learning by correcting mistakes  Learning is based on digging useful regularity out of data (1) Learning by recording cases (2) Learning by building identification trees (3) Learning by training neural nets (4) Learning by simulation evolution Two kinds of learning: EIE426-AICV

Learning by Recording Cases The consistency heuristic:  whenever you want to guess a property of something, given nothing else to go on but a set of reference cases, find the most similar case, as measured by known properties, for which the property is known. Guess that the unknown property is the same as that known property. This technique is good for problem domains in which good models are impossible to build. The learning will do nothing to the information in the recorded cases until that information is used. EIE426-AICV

EIE426-AICV Learning by Recording Cases (cont.)

EIE426-AICV Learning by Recording Cases (cont.)

Finding Nearest Neighbors The straightforward way: calculate the distance to each other object and find the minimum among those distances. For n other objects, there are n distances to compute and (n-1) distance comparisons to do. EIE426-AICV

Decision Trees A decision tree is a representation that is a semantic tree in which  Each leaf node is connected to a set of possible answers.  Each non-leaf node is connected to a test that splits its set of possible answers into subsets corresponding to different test results.  Each branch carries a particular test result’s subset to another node EIE426-AICV 8

Decision Trees (cont.) EIE426-AICV 9

K-D Tree A k-d tree is a representation That is a decision tree In which  The set of possible answers consists of points, one of which may be the nearest neighbor to a given point.  Each test specifies a coordinate, a threshold, and a neutral zone around the threshold containing no points.  Each test divides a set of points into two sets, according to on which side of the threshold each point lies EIE426-AICV 10

Height Width Red Orange Yellow Purple Red Violet Blue Green 2.00 U 4.00 U 2.00 U EIE426-AICV K-D Tree (cont.)

Height > 3.5 Height > 5.5? Orange No Width > 3.5 Yes Width > 3.0 Height > 1.5? VioletRedGreenBlue No Yes Red Yes Height > 5.5? PurpleYellow No Yes EIE426-AICV K-D Tree (cont.)

K-D Tree (cont.) To divide the cases into sets,  If there is only one case, stop.  If this is the first division of cases, pick the vertical axis for comparison; otherwise, pick the axis that is different from the axis at the next higher level.  Considering only the axis of comparison, find the average position of the two middle objects. Call this average position the threshold, and construct a decision-tree test that compares unknowns in the axis of comparison against the threshold. Also note the position of the two middle objects in the axis of comparison. Call these positions the upper and lower boundaries.  Divide up all the objects into two subsets, according to on which side of the average position they lie.  Divide up the objects in each subset, forming a subtree for each, using this procedure EIE426-AICV 13

To find the nearest neighbor using the K-D procedure,  Determine whether there is only one element in the set under consideration.  If there is only one, report it.  Otherwise, compare the unknown, in the axis of comparison, against the current node’s threshold. The result determines the likely set.  Find the nearest neighbor in the likely set using this procedure.  Determine whether the distance to the nearest neighbor in the likely set is less than or equal to the distance to the other set’s boundary in the axis of comparison:  If it is, then report the nearest neighbor in the likely set.  If it is not, check the unlikely set using this procedure; return the nearer of the nearest neighbors in the likely set and in the unlikely set. EIE426-AICV K-D Tree (cont.)

Learning by Building Identification Trees Identification-tree building is the most widely used learning method. Thousands of practical identification trees, for applications ranging from medical diagnosis to process control, has been built using the method. EIE426-AICV

From Data to Identification Trees NameHairHeightWeightLotionResult Sarahblondeaveragelightnosunburned Danablonde tallaverageyesnone Alexbrown shortaverageyesnone Annieblonde shortaveragenosunburned Emilyred averageheavynosunburned Petebrown tallheavynonone Johnbrown averageheavynonone Katieblondeshortlightyesnone EIE426-AICV

An identification tree is a representation That is a decision tree In Which  Each set of possible conclusions is established implicitly by a list of samples of known class. In the table, there are 3 x 3 x 3 x 2 = 54 possible combinations. The probability of an exact match with someone already observed is 8/54. It can be impractical to classify an unknown object by looking for an exact match. EIE426-AICV From Data to Identification Trees (cont.)

Height Tall Average Short Dana Pete Weight 4Sarah Light Average Heavy Hair color BlondeRed Brown Alex Weight LightAverage Heavy Katie 4Annie Hair Blonde RedBrown 4Emily John EIE426-AICV Identification Tree

 The world is inherently simple. Therefore the smallest identification tree that is consistent with the samples is the one that is most likely to identify unknown objects correctly. Which is the right identification tree? How can you construct the smallest identification tree? EIE426-AICV Identification Tree (cont.)

Tests Should Minimize Disorder EIE426-AICV

EIE426-AICV Tests Should Minimize Disorder (cont.) Hair Color: Blonde 4 Samples: Sarah, Dana, Annie, Katie

Information Theory Supplies a Disorder Formula Where n b is the number of samples in branch b, n t is the total number of samples in all branches, n bc is the number of samples in branch b of class c. EIE426-AICV

Disorder Formula EIE426-AICV 23 For two classes, A and B: If they are perfectly balanced, that is, n bc = 0.5 (c=1,2), then If there are only A’s or only B’s (perfect homogeneity), then

As it moves from perfect homogeneity to perfect balance, disorder varies smoothly between zero and one. EIE426-AICV Disorder Measure

TestDisorder Hair0.5 Height0.69 Weight0.94 Lotion0.61 The first test: Thus, the hair-color test is the winner. EIE426-AICV Disorder Measure (cont.)

TestDisorder Height0.5 Weight1 Lotion0 Once the hair test is selected, the choice of another test to separate out the sunburned people from among Sarah, Dana, Annie, and Katie is decided by the following calculations: Thus, the lotion-used test is the clear winner. EIE426-AICV Disorder Measure (cont.)

Identification Tree Algorithm To generate an identification tree using SPROUTER,  Until each leaf node is populated by as homogeneous a sample set as possible:  Select a leaf node with an inhomogeneous sample set.  Replace that leaf node by a test node that divides the inhomogeneous sample set into minimally inhomogeneous subsets, according to some measure of disorder EIE426-AICV 27

From Trees to Rules If the person’s hair color is blonde and the person uses lotion, thennothing happens. Ifthe person’s hair color is blonde and the person uses no lotion, thenthe person turns red. Ifthe person’s hair color is red, thenthe person turns red. If the person’s hair color is brown, thennothing happens. EIE426-AICV

Unnecessary Rule Antecedents Should be Eliminated If the person’s hair color is blonde and the person uses lotion. thennothing happens. Ifthe person uses lotion, then nothing happens. EIE426-AICV

Contingency Table No change Sunburned Person is blonde20 Person is not blonde10 The first antecedent can be eliminated. No changeSunburned Person uses lotion20 Person uses no lotion02 The second antecedent cannot be eliminated. EIE426-AICV Keep the 2 nd antecedent Samples: Dana Alex Katie Keep the 1st antecedent Samples: Sarah Dana Annie Katie

Ifthe person’s hair color is blonde the person does not use lotion thenthe person turns red No change Sunburned Person is blonde02 Person is not blonde21 The first antecedent cannot be eliminated. No change Sunburned Person uses no lotion02 Person uses lotion20 The second antecedent cannot be eliminated either. EIE426-AICV Contingency Table (cont.) Keep the 2 nd antecedent Samples: Sarah, Annie Emily, Pete John Keep the 1st antecedent Samples: Sarah Dana Annie Katie

If the person’s hair color is red, then the person turns red. No changeSunburned Person is red haired 01 Person is not red haired52 The antecedent cannot be eliminated. If the person’s hair color is brown, thennothing happens. No changeSunburned Person is brown haired30 Person is not brown haired23 The antecedent cannot be eliminated. EIE426-AICV Contingency Table (cont.) No antecedent All 8 samples are considered. No antecedent All 8 samples are considered.

Unnecessary Rules Should be Eliminated If the person’s hair color is blonde and the person uses no lotion, thenthe person turns red Rule 1 If the person uses lotion, then nothing happens Rule 2 If the person’s hair color is red, then the person turns red Rule 3 If the person’s hair color is brown, thennothing happens Rule 4 EIE426-AICV

Default Rules and Tie Breaker Default rule: Ifno other rule applies, thenthe person turns red, Rule 5 or Ifno other rule applies, thennothing happens Rule EIE426-AICV 34 Choose the default rule to minimize the total number of rules. Tie breaker 1: Choose the default rule that covers the most common consequent in the sample set. Rule 6 is used together with Rules 1 and 3. Tie breaker 2: Choose the default rule that produces the simplest rules. Rule 5 is used together with Rules 2 and 4.

Rule Generation Algorithm To generate rules from an identification tree using PRUNER,  Create one rule for each root-to-leaf path in the identification tree.  Simplify each rule by discarding antecedents that have no effect on the conclusion reached by the rule.  Replace those rules that share the most common consequent by a default rule that is triggered when on other rule is triggered (eliminating as many other rules as possible). In the event of a tie, use some heuristic tie breaker to choose a default rule EIE426-AICV 35