Presentation on theme: "Knowledge Acquisition and Modelling"— Presentation transcript:
1 Knowledge Acquisition and Modelling Decision Tables and Decision Trees
2 Decision Trees Can be left to right Or top down Map of a reasoning processDescribes in a tree like structureA graphical representation of a decision situationDecision situation points are connected together by arcs and terminate in ovalsMain componentsDecision points represented by nodesActionsParticular choices from a decision point represented by arcs (or straight lines)Can be left to rightOr top down
3 Decision Trees Essentially flowcharts A natural order of ‘micro decisions’ (Boolean – yes/no decisions) to reach a conclusionIn simplest form all you need isA startA cascade of Boolean decisions (each with exactly outbound branches)A set of decision nodes and representing all the ‘leaves’ of the decision tree
5 An Example Bank - loan application Classify application : approved class, denied class Criteria - Target Class approved if 3 binary attributes have certain value:(a) borrower has good credit history (credit rating in excess of some threshold)(b) loan amount less than some percentage of collateral value (e.g., 80% home value)(c) borrower has income to make payments on loanPossible scenarios = 32 = 8If the parameters for splitting the nodes can be adjusted, the number of scenarios grows exponentially.
6 How They Work Decision rules - partition sample of data Terminal node (leaf) indicates the class assignmentTree partitions samples into mutually exclusive groupsOne group for each terminal node All pathsstart at the root nodeend at a leaf (terminal node = decision node)
7 How They Work Each path represents a decision rule joining (AND) of all the tests along that pathseparate paths that result in the same class are disjunctions (ORs)All paths - mutually exclusivefor any one case - only one path will be followedfalse decisions on the left branchtrue decisions on the right branch
8 Disjunctive Normal Form Non-terminal node - model identifies an attribute to be testedtest splits attribute into mutually exclusive disjoint setssplitting continues until a node - one class (terminal node or leaf) Structure - disjunctive normal formlimits form of a rule to conjunctions (adding) of termsallows disjunction (or-ing) over a set of rules
9 Predicting Commute Time Leave AtIf we leave at 10 AM and there are no cars stalled on the road, what will our commute time be?10 AM9 AM8 AMStall?Accident?LongNoYesNoYesShortLongMediumLong
10 Inductive LearningMaking a series of Boolean decisions and following the relevant branch:Did we leave at 10 AM?Did a car stall on the road?Is there an accident on the road?Answering each yes/no question allows traversal of tree to reach a conclusion
11 Decision Tree as a Rule Set IF hour == 8am THEN commute time = longIF hour == 9am AND accident == yes THEN commute time = longIF hour == 9am AND accident == no THEN commute time = mediumIF hour == 10am and stall == yesTHEN commute time = longIF hour == 10am and stall == noTHEN commute time = short
12 How to Create a Decision Tree Make a list of attributes that we can measureThese attributes (for now) must be discreteWe then choose a target attribute that we want to predictThen create an experience table that lists what we have seen in the past
14 Choosing AttributesThe previous experience decision table showed 4 attributes: hour, weather, accident and stallBut the decision tree only showed 3 attributes: hour, accident and stallWhy is that?
15 Choosing AttributesMethods for selecting attributes show that weather is not a discriminating attributeWe use the principle of Occam’s Razor: Given a number of competing hypotheses, the simplest one is preferable
16 Decision Tree Algorithms The basic idea behind any decision tree algorithm is as follows:Choose the best attribute(s) to split the remaining instances and make that attribute a decision nodeRepeat this process for recursively for each childStop when:All the instances have the same target attribute valueThere are no more attributesThere are no more instances
17 Identifying the Best Attributes Refer back to our original decision treeLeave At9 AM10 AM8 AMAccident?Stall?LongYesNoYesNoShortLongMediumLongHow did we know to split on leave at and then on stall and accident and not weather?
18 ID3 HeuristicTo determine the best attribute, we look at the ID3 heuristicID3 splits attributes based on their entropy.Entropy is the measure of disinformationEntropy is minimized when all values of the target attribute are the same.If we know that commute time will always be short, then entropy = 0Entropy is maximized when there is an equal chance of all values for the target attribute (i.e. the result is random)If commute time = short in 3 instances, medium in 3 instances and long in 3 instances, entropy is maximized
19 Entropy Calculation of entropy Entropy(S) = ∑(i=1 to l)-|Si|/|S| * log2(|Si|/|S|)S = set of examplesSi = subset of S with value vi under the target attributel = size of the range of the target attribute
20 ID3 ID3 splits on attributes with the lowest entropy We calculate the entropy for all values of an attribute as the weighted sum of subset entropies as follows:∑(i = 1 to k) |Si|/|S| Entropy(Si), where k is the range of the attribute we are testingWe can also measure information gain (which is inversely proportional to entropy) as follows:Entropy(S) - ∑(i = 1 to k) |Si|/|S| Entropy(Si)
21 ID3Given our commute time sample set, we can calculate the entropy of each attribute at the root nodeAttributeExpected EntropyInformation GainHour0.6511WeatherAccidentStall
22 Problems with ID3 ID3 is not optimal Uses expected entropy reduction, not actual reductionMust use discrete (or discretized) attributesWhat if we left for work at 9:30 AM?We could break down the attributes into smaller values…
23 Problems with ID3If we broke down leave time to the minute, we might get something like this:8:02 AM8:03 AM9:05 AM9:07 AM9:09 AM10:02 AMLongMediumShortLongLongShortSince entropy is very low for each branch, we have n branches with n leaves. This would not be helpful for predictive modeling.
24 Problems with ID3 Can use a technique known as discretization choose cut points, such as 9AM for splitting continuous attributesgenerally lie in a subset of boundary points, such that a boundary point is where two adjacent instances in a sorted list have different target value attributes
25 Pruning (another technique for attribute selection) Pre-PruningDecide during the building process when to stop adding attributes(possibly based on their information gain)May be problematicIndividually attributes may not contribute much to a decisionBut what about in combination with other attributesmay have a significant impactPost-Pruningwaits until the full decision tree has built and then prunes the attributesSubtree ReplacementSubtree Raising
26 Subtree Replacement Entire subtree is replaced by a single leaf node A 54123
27 Subtree Replacement Node 6 replaced the subtree Generalizes tree a little more, but may increase accuracyAB654
28 Subtree Raising Entire subtree is raised onto another node A B C 5 4 1 23
29 Subtree Raising Entire subtree is raised onto another node This was not discussed in detail as it is not clear whether this is really worthwhile (as it is very time consuming)AC123
30 Problems with Decision Trees While decision trees classify quickly, the time for building a tree may be higher than another type of classifierDecision trees suffer from a problem of errors propagating throughout a treeA very serious problem as the number of classes increases
31 Error PropagationSince decision trees work by a series of local decisions, what happens when one of these local decisions is wrong?Every decision from that point on may be wrongWe may never return to the correct path of the tree
32 Decision tree representation of salary decision Modern Systems Analysis and Design Fourth Edition Jeffrey A. Hoffer , Joey F. George, Joseph S. Valacich
33 Decision TablesUsed to lay out in tabular form all possible situations which a decision may encounter and to specify which action to take in each of these situations.A matrix representation of the logic of a decisionSpecifies the possible conditions and the resulting actions
34 Terminology Decision Table Condition Stubs Action Stubs Rules A decision table is a tabular form that presents a set of conditions and their corresponding actions.Condition StubsCondition stubs describe the conditions or factors that will affect the decision or policy.They are listed in the upper section of the decision table.Action StubsAction stubs describe, in the form of statements, the possible policy actions or decisions.They are listed in the lower section of the decision table.RulesRules describe which actions are to be taken under a specific combination of conditions.They are specified by first inserting different combinations of condition attribute values and then putting X's in the appropriate columns of the action section of the table.
35 ExampleModern Systems Analysis and Design Fourth Edition Jeffrey A. Hoffer , Joey F. George, Joseph S. Valacich
36 Decision Table Methodology 1. Identify Conditions & ValuesFind the data attribute each condition tests and all of the attribute's values.2. Compute Max Number of RulesMultiply the number of values for each condition data attribute by each other.3. Identify Possible ActionsDetermine each independent action to be taken for the decision or policy.4. Enter All Possible RulesFill in the values of the condition data attributes in each numbered rule column.5. Define Actions for each RuleFor each rule, mark the appropriate actions with an X in the decision table.6. Verify the PolicyReview completed decision table with end-users.7. Simplify the TableEliminate and/or consolidate rules to reduce the number of columns.
37 A Simple ExampleScenario: A marketing company wishes to construct a decision table to decide how to treat clients according to three characteristics:Gender,City Dweller, andage group: A (under 30), B (between 30 and 60), C (over 60).The company has four products (W, X, Y and Z) to test market.Product W will appeal to male city dwellers.Product X will appeal to young males.Product Y will appeal to Female middle aged shoppers who do not live in cities.Product Z will appeal to all but older males.
38 Identify Conditions & Values The three data attributes tested by the conditions in this problem aregender, with values M and F;city dweller, with value Y and N; andage group, with values A, B, and Cas stated in the problem.
39 2. Compute Maximum Number of Rules The maximum number of rules is 2 x 2 x 3 = 12
40 3. Identify Possible Actions The four actions are:market product W,market product X,market product Y,market product Z.
41 4. Enter All Possible Conditions RULES123456789101112SexmfCityynYAgeabc
42 5. Define Actions for each Rule Market123456789101112WXYZ
43 Full table RULES Actions Market W Z 1 2 3 4 5 6 7 8 9 10 11 12 Sex m f CityynYAgeabcActionsMarketWXZ
44 6. Verify the PolicyLet us assume that the client agreed with our decision table.
45 7. Simplify the Table There appear to be no impossible rules. Note that rules 2, 4, 6, 7, 10, 12 have the same action pattern.Rules 2, 6 and 10 havetwo of the three condition values (gender and city dweller) identical andall three of the values of the non- identical value (age) are covered,so they can be condensed into a single column 2.The rules 4 and 12 have identical action pattern, but they cannot be combined because the indifferent attribute "Age" does not have all its values covered in these two columns.Age group B is missing.
46 The revised table is as follows: RULES12345678910SexMFCityYNAgeA-BCActionsMarketWXZ
47 Step 1. Identify Conditions & Values We first examine the problem and identify the data attributes upon which the decision or policy depends.We then list the possible values of each data attribute.Often, answering the question: "What do I need to know in order to take action in this situation?" will help identify the appropriate condition attributes.
48 Step 2. Compute Maximum Number of Rules A rule is determined by a different combination of the condition attributes values.Since we have listed these values in the previous step, the multiplication rule of counting tells us that there will be no more columns than the product of the number of values for each of the condition attributes.This can be easily verified by constructing a tree diagram listing all possible values of each attribute for each branch of the preceding attribute.The number of leaves of the tree will be the product described above.Since some combinations of attribute values may be impossible, the actual number of rules may be less that the maximum.
49 Step 3. Identify Possible Actions The actions describe the decisions to be made or the policy rules to be followed.Asking the question, "What are the different options for implementing the decision or policy?", should help identify the possible actions.
50 Step 4. Enter All Possible Rules We now begin to build the decision table by listingthe condition descriptions in the left margin of the upper part of the table andthe action descriptions in the left margin of the lower part.Then we write consecutive numbers from 1 to the maximum number of rules across the top.In the rule columns and the condition rows, we list all possible combinations of condition attribute values.A rule of thumb for arranging the rule combinations is to alternate the possible values for the first condition, then repeat each value of the second condition as many times as there are values in the first condition, repeat each value of the third condition as many times as needed to cover one iteration of the second condition values, etc.
51 Step 5. Define Actions for each Rule In this step we decide which actions are appropriate for each combination of condition attribute values and mark an X in that column of the action row.This should be fairly straightforward if the decision making procedure is well defined.If it is not well defined then the organization of the decision table makes it easier to get the end-user to specify the action(s) for each rule.
52 step 6. Verify the PolicyReview the completed decision table with the end-users.Resolve any rules for which the actions are not specific.Verify that rules you think are impossible or cannot in actuality occur.Resolve apparent contradictions, such as one rule with two contradictory actions.Finally, verify that each rule's actions are correct.
53 Step 7. Simplify the Decision Table In this step we look for and eliminate impossible rules, and also combine rules with indifferent conditions. An indifferent condition is one whose values do not affect the decision and always result in the same action.Impossible rules are those in which the given combination of condition attribute values cannot occur according to the specifications of the problem.(E.G. if we assumed for marketing purposes that all middle-aged men lived in the city).To determine indifferent conditions, first look for rules with exactly the same actions.From these, find those whose condition values are the same except for one and only one condition (called the indifferent condition).This latter set of rules has the potential for being collapsed into a single rule with the indifferent condition value replaced with a dash.Note that all possible values of the indifferent condition must be present among the rules to be combined before they can be collapsed.
54 ExerciseA student may receive a final course grade of A, B, C, D, or F. In deriving the student's final course grade, the instructor first determines an initial or tentative grade for the student, which is determined in the following manner for a student who has:received a total of no lower than 90 percent on the first 3 assignments and received a score no lower than 70 percent on the 4th assignment will receive an initial grade of A for the course.scored a total lower than 90 percent but no lower than 80 percent on the first 3 assignments and received a score no lower 70 percent on the 4th assignment will receive an initial grade of B for the course.received a total lower than 80 percent but no lower than 70 percent on the first 3 assignments and received a score no lower than 70 percent on the 4th assignment will receive an initial grade of C for the course.scored a total lower than 70 percent but no lower than 60 percent on the first 3 assignments and received a score no lower 70 percent on the 4th assignment will receive an initial grade of D for the course.a total lower than 60 percent on the first 3 assignments, or received a score lower than 70 percent on the 4th assignment, will receive an initial grade of F for the course.Once the instructor has determined the initial course grade for the student, the final course grade will be determined.The student's final course grade will be the same as his or her initial course grade if no more than 3 class periods during the semester were missed.Otherwise, the student's final course grade will be one letter grade lower than his or her initial course grade (for example, an A will become a B).