17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.

17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to ask whether or not expert system rules could be learned automatically. There are two major types of learning, inductive and deductive. Both types can be used in learning rules. Neural-network learning, for example, is inductive because the functions learned are hypotheses about some underlying and unknown function. In successful learning, the hypotheses typically give correct outputs for most inputs, but they might also err. Inductive rule-learning methods create new rules about a domain- not derivable from any previous rules. I present methods for inductively learning rules in both the propositional and the predicate calculus. Deductive rule learning enhances the efficiency of a system's performance by deducing additional rules from previously known domain rules and facts. The conclusions that the system can derive using these additional rules could also have been derived without them. But with the additional rules, the system might perform more efficiently. I will explain a technique called explanation-based generalization (EBG) for deducing additional rules. 17.5.1 Learning Propositional Calculus Rules Several methods for inductive rule learning have been proposed; I describe one of them here. I first describe the general idea for propositional Horn clause logic. Then, I show how a similar technique can be used to learn first-order logic Horn clause rules. To frame my discussion, I use again my simple example of approving a bank loan. Instead of being given rules for this problem, suppose we are given a training set consisting of the values of attributes for a large number of individuals. To illustrate, consider the data given in Table 17.1 (I use 1 for True and 0 for False). This table might be compiled, for example, from records of loan applications and the decisions made by human loan officers. Members of the training set for which the value of OK is 1 are called positive instances; members for which the value of OK is 0 are called negative instances. From the training set, we desire to induce rules of the form

α 1 ∧ α 2 ∧ … α n ⊃ OK where the α 1 are propositional parameters from the set {APP, RATING, INC, BAL}. If the antecedent of a rule has value True for an instance in the training set, we say that the rule covers that instance. We can change any existing rule to make it cover fewer instances by adding an parameter to its antecedent. Such a change makes the rule more specific. Two rules can cover more instances than can one alone. Adding a rule makes the system using these rules more general. We seek a set of rules that covers all and only the positive instances in the training set. Searching for a set of rules can be computationally difficult. I describe a “greedy" method, which I call separate and conquer. We first attempt to find a single rule that covers only positive instances - even if it doesn't cover all the positive instances. We search for such a rule by starting with a rule that covers all instances (positive and negative), and we gradually make it more specific by adding parameters to its antecedent. Since a single rule might not cover all the positive instances, we gradually add rules (making them as specific as needed as we go) until the entire set of rules covers all and only the positive instances. Here is how the method works for our example. We start with the provisional rule T ⊃ OK which covers all instances. Now we must add an parameter to make it cover fewer negative instances-working toward covering only positive ones. Which parameter (from the set {APP, RATING, INC, BAL}) should we add? Several criteria have been used for making the selection. To keep my discussion simple, I will base our decision on an easy-to-calculate ratio: r α = n α + /n α

where n α is the total number of (positive and negative) instances covered by the α parameter, and n α + is the total number of positive instances covered by the α parameter. We select that α yielding the largest value of r α. In our case, the values are r APP = 3/6 = 0.5 r RATING = 4/6 = 0.667 r INC = 3/6 = 0.5 r BAL = 3/4 = 0.75 So, we select BAL, yielding the provisional rule BAL ⊃ OK This rule covers the positive instances 3, 4, and 7, but also covers the negative instance 1, so we must specialize it further. We use the same technique to select another parameter. The calculations for the rα's must now take into account the fact that we have already decided that the first component in the antecedent is BAL: r APP = 2/3 = 0.667 r RATING = 3/3 = 1.0 r INC = 2/2 = 1.0 Table 17.2 IndividualAPPRATINGINCBALOK 110010 311011 401111 711111

Here we have a tie between RATING and INC. We might select RATING because r RATING is based on a larger sample. (You should explore the consequences of selecting INC instead.) The rule BAL Λ RATING ⊃ OK covers only positive instances, so we do not need to add further parameters to the antecedent of this rule. But this rule does not cover all of the positive instances. Specifically, it does not cover positive instance 6. So, we must add another rule. To learn the next rule, we first eliminate from the table all of the positive instances already covered by the first rule, to obtain the data shown in Table 17.3. We begin the process all over again with this reduced table, starting with the rule T ⊃ OK. This rule covers some negative instances, namely, 1, 2, 5, 8, and 9. To select an parameter to add to the antecedent, we calculate Reduced Data r APP = 1/4 = 0.25 r RATING = 1/3 = 0.333 r INC = 1/4 = 0.25 r BAL = 0/1 = 0.0 Again, a tie. Let's arbitrarily select APP to give us the rule RATING ⊃ OK. This rule covers negative Instances 5 and 9, so we must add another parameter TABLE 17.3 to the antecedent. r APP = 1/2 = 0.5 r INC = 1/2 = 0.5 r BAL = 0/0 = 0.0 IndividualAPPRATINGINCBALOK 501100 611101 911000 TABLE 17.4

We select APP to give us the rule APP Λ RATING ⊃ OK. This rule covers negative instance 9. So we must add another parameter to the antecedent. The reduced data is shown in table 17.5 below: r INC = 1/1 = 1 r BAL = 0/0 = 0.0 Table 17.5 We select INC to make the rule more specific. Making this rule yet more specific finally results in the rule APP Λ RATING Λ INC ⊃ OK. These two rules, namely, BAL Λ RATING ⊃ OK and APP Λ RATING Λ INC ⊃ OK cover all and only the positive instances, so we are finished. IndividualAPPRATINGINCBALOK 611101 911000

17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.

Similar presentations

Presentation on theme: "17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.

Similar presentations

Presentation on theme: "17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to."— Presentation transcript:

Similar presentations

About project

Feedback