ACTION RULES Slides by A A Tzacheva.. 2 Knowledge Discovery The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.

ACTION RULES Slides by A A Tzacheva.

2 Knowledge Discovery The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. (Fayyad, et al 1996) Data mining is the process of discovering meaningful new correlations, patterns and trends by sifting through large amounts of data stored in repositories. Gartner Group “The Saying that Knowledge Is Power Is Not Quite True… Used Knowledge Is Power” Edward E. Free

3 Knowledge Discovery of Databases (KDD) is a new area of research that combines many algorithms and techniques used in artificial intelligence, statistics, databases, machine learning, etc. KDD is the process of extracting previously unknown, not obvious, new, and interesting information from huge amount of data Past research on data mining has mostly been focused on techniques for generating rules from datasets Knowledge Discovery

4 Data Warehouse Prepared data Data Cleaning Integration Selection Transformation Data Mining Patterns Evaluation Visualization Knowledge Base

5 Knowledge Discovery [Pohle, 2003] Many data mining systems are great in deriving useful statistics and patterns from huge amounts of data, but they are not very smart in interpreting these results, which is crucial for turning them into interesting, understandable and actionable knowledge. Lack of sophisticated tool support for incorporating human domain knowledge into the mining process. This domain knowledge should be updated with the mining results. Mining Process (Fayyad): [[Business Understanding]  [Domain Knowledge]]  [Data Understanding]  [Data Preparation]  [ Modeling/Mining]  [Evaluation]  [ Deployment]

6 Associations: two conditions occur together, with some confidence Associations: two conditions occur together, with some confidence PresumptiveObjective  E = [Cond 1  Cond 2 ] Interestingness Function Data Mining Task: For a given dataset D, language of facts L, interestingness function I D, L and threshold c, find association E such that I D,L (E) > c efficiently. Knowledge Engineer defines c

7 The rules discovered by data mining algorithm are large and we want a subset of rules, which are interesting, because these algorithms discover accurate rules rather than interesting rules.  Association is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm. There are two aspects of rules’ interestingness that have been studied in data mining literature, objective and subjective measures Objective measures are data-driven and domain-independent. Generally, these measures evaluate the rules based on the quality as well as the similarity between them, rather than considering the user belief about the domain. * Note: Domain here is meant in a sense of the type of data – ex. financial data (means financial domain), medical data (means medical domain), therefore if the measures being calculated are independent of the domain, then they can always be calculated - no matter if the data is financial, medical or any other type. Are All the “Discovered” Patterns Interesting?

8 Objective Measure Examples  Assume    is an association rule Some objective measures are: Support or Strength: card[  ] Confidence or Certainty Factor: card[  ]/card[  ] Coverage Factor: card[  ]/card[  ] Leverage: card[  ]/n – [card[  ]/n]*[card[  ]/n] Lift: n  card[  ]/[card[  ]*card[  ]]

9 Problem: Subjective Interestingness Rule is: Rule is:  unexpected, if it contradicts the user belief about the domain and therefore surprises the user  novel, if to some extent contributes to new knowledge  actionable, if the user can take an action to his/her advantage based on this rule In the data mining literature the has been quantified in terms of unexpectedness. For example: In the data mining literature the actionability has been quantified in terms of unexpectedness. For example: -the most of actionable knowledge is unexpected. -the most of unexpected knowledge is actionable.

10  So, if we are able to calculate the actionability of a rule (like the way we are able to calculate support and confidence) then, we have a way of knowing whether our rule is unexpected and interesting (we have a way to measure the interestingness of the rule)  In the data mining literature, data mining is viewed as the process of turning data into information, information into action, and action into value or profit.  However, the task of finding actionable rules is not trivial. As actionability is seen as an elusive concept because it is difficult to know the space of all rules and the actions to be attached to them. Actionability - Subjective

11 Argument: Objective Unexpectedness Unexpectedness /does not depend on domain knowledge/  If r = [A  B 1 ] has a high confidence and r 1 = [A*C  B 2 ] has a high confidence, then r 1 is unexpected. Unexpectedness is inherently subjective and prior beliefs of the user form its important component. [Padmanabhan & Tuzhilin]  A  B is unexpected with respect to the belief    on the dataset D if the following conditions hold:  B    = False [ B and  logically contradict each other]  A   happen together on a large subset of D   A*   B is true, which means A*    

12 The actionability measure is based on the rules’ benefit to the user, that is, the user can do something to his/her interest with the rule. The actionability measure is based on the rules’ benefit to the user, that is, the user can do something to his/her interest with the rule. This measure is very important for the rules to be interesting in the sense that the users always are looking for patterns to improve their performance and establishing better work. This measure is very important for the rules to be interesting in the sense that the users always are looking for patterns to improve their performance and establishing better work. The practical implication of getting information is to improve the business, that is, the information must ensure the success of business for decision-making. Actions can be performed to make the business succeed. The practical implication of getting information is to improve the business, that is, the information must ensure the success of business for decision-making. Actions can be performed to make the business succeed. Actionability

13 Actionable rule mining deals with benefit- driven actions required for decision making Rules are unexpected if they "surprise" the user, and rules are actionable if the user can do something with them to his/her advantage For example, a user may be able to change the nondesirable/non-profitable patterns to desirable/profitable patterns There are methods which define actionability as an approximation of unexpectedness. There are methods which define actionability as an approximation of unexpectedness. In order to produce unexpected and/or actionable rules, the system must know what the user expects, i.e., his/her existing knowledge or concepts about the domain. In order to produce unexpected and/or actionable rules, the system must know what the user expects, i.e., his/her existing knowledge or concepts about the domain. Machine learning also typically assumes that the domain knowledge is correct or at least partially correct. Machine learning also typically assumes that the domain knowledge is correct or at least partially correct. Actionable Rules and Action Rules Although both unexpectedness and actionability are important, actionability is the key concept in most applications because actionable rules allow the user to do his/her job better by taking some specific actions in response to the discovered knowledge. Although both unexpectedness and actionability are important, actionability is the key concept in most applications because actionable rules allow the user to do his/her job better by taking some specific actions in response to the discovered knowledge.

14 Next, we will focus on a special type of rules, called action rules, which are actionable rules. We will study a well-defined algorithm for discovering such rules. Next, we will focus on a special type of rules, called action rules, which are actionable rules. We will study a well-defined algorithm for discovering such rules. These rules can be constructed from classification rules to suggest a way to re-classify objects (for instance customers, or patients) to a desired state. These rules can be constructed from classification rules to suggest a way to re-classify objects (for instance customers, or patients) to a desired state. In e-commerce applications, this re-classification may mean that a consumer not interested in a certain product, now may buy it, and therefore may fall into a group of more profitable customers. In medical domain, this re-classification may mean how to change the class of a tumor from malignant to benign. In e-commerce applications, this re-classification may mean that a consumer not interested in a certain product, now may buy it, and therefore may fall into a group of more profitable customers. In medical domain, this re-classification may mean how to change the class of a tumor from malignant to benign. Action Rules

15 These groups are described by values of classification attributes in a decision table schema. By a decision table we mean any information system where the set of attributes is partitioned into conditions and decisions. These groups are described by values of classification attributes in a decision table schema. By a decision table we mean any information system where the set of attributes is partitioned into conditions and decisions. To discover action rules it is required that the set of conditions is partitioned into stable conditions and flexible conditions/attributes. For simplicity reason, we also assume that there is only one decision attribute. To discover action rules it is required that the set of conditions is partitioned into stable conditions and flexible conditions/attributes. For simplicity reason, we also assume that there is only one decision attribute. Action Rules For example, date of birth is a stable attribute, and interest rate on any customer account is a flexible attribute (dependable on bank). For example, date of birth is a stable attribute, and interest rate on any customer account is a flexible attribute (dependable on bank). The assumption that the decision attribute d is flexible is quite essential. The assumption that the decision attribute d is flexible is quite essential. Action Rules

16 Decision table Any information system S of the form S = ( A Fl  A St  {d} ), where  d is a distinguished attribute called decision.  the elements of A St are called stable conditions  the elements of A Fl  {d} are called flexible conditions Example of action rule: [ (b 1, v 1  w 1 )  (b 2, v 2  w 2 )  …  (b p, v p  w p )](x)  [ (b 1, v 1  w 1 )  (b 2, v 2  w 2 )  …  (b p, v p  w p )](x)  [(d, k 1  k 2 )](x) This means that, if we change the value of attribute b 1 from v 1 to  w 1,and the value of attribute b 2 from v 2 to  w 2,and so on, and the value of attribute b p from v p to  w p, then the value of the decision attribute d, will change from k 1  to the desired value k 2. This means that, if we change the value of attribute b 1 from v 1 to  w 1,and the value of attribute b 2 from v 2 to  w 2,and so on, and the value of attribute b p from v p to  w p, then the value of the decision attribute d, will change from k 1  to the desired value k 2. Action Rules Assumption: (  i)[(1  i  p)  (b i  A Fl )] – in other words, the attributes b 1, b 2, …, b p are all flexible attributes * Note: the objects are the rows of the database (decision table), and the attributes are the columns of the database.

17 X abcd x1x1 0S0L x2x2 0R1L x3x3 0S1L x4x4 0R1L x5x5 2P2L x6x6 2P2L x7x7 2S2H {a, c} - stable attributes, {b,d} - flexible attributes, d - decision attribute. d - decision attribute. (its values are L – Low profitability customer, and H – High profitability customer) (its values are L – Low profitability customer, and H – High profitability customer) Action Rules Rules discovered: r 1 = [ Rules discovered: r 1 = [ (b, P)  (d, L)] r 2 = [(a, 2) ^ r 2 = [(a, 2) ^ (b, S)  (d, H)] Decision Table (r 1, r 2 )- action rule: [(b, P  S)](x)  [(d, L  H)](x)

18 Practical Examples Next, we see some action rules extracted from 3 different databases – 2 in medical domain, and 1 in financial domain: Next, we see some action rules extracted from 3 different databases – 2 in medical domain, and 1 in financial domain:  Binding to thrombin database  Insurance company benchmark database  Breast cancer database However, we first need to introduce one more notation – the cost of an action rule However, we first need to introduce one more notation – the cost of an action rule

19 Usually, there is a cost (monetary or moral) association with undertaking some kind of an action. For example: Usually, there is a cost (monetary or moral) association with undertaking some kind of an action. For example:  decreasing the interest rate on a customer account (re- classifying the customer from one interest rate group to another) may cost us mailing a letter to them, and doing some internal administration to the account, say $5.  relocating an employee from one city to another (re- classifying the employee from one division to another) may cost us the moving expense, in addition to a moral cost – some negative emotions of the employee about it may influence his/her future performance or perception of our organization. We will denote the cost with - a number from 0 to + . The cost will be close to 0 if the action is trivial (very easy to accomplish) and the cost will be close to plus infinity +  if the action is very difficulty (almost impossible) to accomplish. We will denote the cost with  - a number from 0 to + . The cost will be close to 0 if the action is trivial (very easy to accomplish) and the cost will be close to plus infinity +  if the action is very difficulty (almost impossible) to accomplish. Cost of Action Rule

20 If we have an information system S, S = (X, A, V), where X are the objects, A are the attributes, and V are the values of the attributes, assume attribute b  A is flexible, and b 1, b 2  V b (b 1 and b 2 are some of the values of b). Assumption: If we have an information system S, S = (X, A, V), where X are the objects, A are the attributes, and V are the values of the attributes, assume attribute b  A is flexible, and b 1, b 2  V b (b 1 and b 2 are some of the values of b). By we mean a number from (0, +  ] which By  S (X, b 1, b 2 ) we mean a number from (0, +  ] which describes the average predicted cost of approved action associated with a possible re-classification of qualifying objects X from class b 1 to class b 2. Object X qualifies for re-classification from b 1 to b 2, if b(X) = b 1 (currently the value of b is b 1 for that object) Cost of Action Rule

21 Cost of Action Rule Action rule : Action rule r: [(b 1, v 1 → w 1 )  (b 2, v 2 → w 2 )  …  ( b p, v p → w p )](x)  (d, k 1 → k 2 )(x) the left hand side of the rule The cost of the left hand side of the rule r (cost Left ) equal to the : sum of the costs of the terms listed in the left hand side: =  {  S (v i, w i ) : 1  i  p} cost Left =  {  S (v i, w i ) : 1  i  p} Action rule r is feasible in S, if <  S (k 1, k 2 ). Action rule r is feasible in S, if cost Left <  S (k 1, k 2 ). For any feasible action rule r, the cost of the conditional Part (left hand side) of r is lower than the cost of its decision part (right hand side, where the decision attribute is listed)

22 Binding to Thrombin Database The first database, is the Binding to Thrombin database, is used for drug design, and provided in the KDD Cup 2001 Competition. The first database, is the Binding to Thrombin database, is used for drug design, and provided in the KDD Cup 2001 Competition. Drugs are typically small organic molecules that achieve their desired activity by binding to a target site on a receptor. The first step in the discovery of a new drug is usually to identify and isolate the receptor to which it should bind, followed by testing many small molecules for their ability to bind to the target site. Drugs are typically small organic molecules that achieve their desired activity by binding to a target site on a receptor. The first step in the discovery of a new drug is usually to identify and isolate the receptor to which it should bind, followed by testing many small molecules for their ability to bind to the target site. This leaves researchers with the task of determining what separates the active (binding) compounds from the inactive (non-binding) ones. Such a determination can then be used in the design of new compounds that not only bind, but also have all the other properties required for a drug (solubility, oral absorption, lack of side effects, appropriate duration of action, toxicity, etc.). This leaves researchers with the task of determining what separates the active (binding) compounds from the inactive (non-binding) ones. Such a determination can then be used in the design of new compounds that not only bind, but also have all the other properties required for a drug (solubility, oral absorption, lack of side effects, appropriate duration of action, toxicity, etc.).

23 Binding to Thrombin Database The data set consists of 1909 compounds (the objects /rows in the database) tested for their ability to bind to a target site on thrombin, a key receptor in blood clotting. The data set consists of 1909 compounds (the objects /rows in the database) tested for their ability to bind to a target site on thrombin, a key receptor in blood clotting. Each compound is described by binary features (the attributes / columns in the database), which describe three- dimensional properties of the molecule. Biological activity in general, and receptor binding affinity in particular, correlate with various structural and physical properties of small organic molecules. The task with KDD Cup 2001 was to determine which of these properties are critical in this case and to learn to accurately predict the class value: Active or Inactive. Each compound is described by binary features (the attributes / columns in the database), which describe three- dimensional properties of the molecule. Biological activity in general, and receptor binding affinity in particular, correlate with various structural and physical properties of small organic molecules. The task with KDD Cup 2001 was to determine which of these properties are critical in this case and to learn to accurately predict the class value: Active or Inactive. In this testing we use the class attribute, which has value A for active and I for inactive, as the re-classification attribute for the actionRules. In this way, we provide suggestions to the user to what molecular properties can be changed in order to reclassify the chemical compound from inactive to active class, in order to bind to thrombin. In this testing we use the class attribute, which has value A for active and I for inactive, as the re-classification attribute for the actionRules. In this way, we provide suggestions to the user to what molecular properties can be changed in order to reclassify the chemical compound from inactive to active class, in order to bind to thrombin.

24 Binding to Thrombin Database The following results were found with LowestCostReclassifier (software for extracting action rules of lowest cost) : The following results were found with LowestCostReclassifier (software for extracting action rules of lowest cost) : decisionAttribute = activity valueFrom = 0 valueTo = 1 dataFile= thrombin numberOfObjects: 1908 minConfidenceL1 = 0.65 minFeasibilityL3 = 0.0001 knownCost = 0.3 maxCostL2 = 0.01 ---- Goal Node: (f304, 1->0 | 0.12327) => (activity, 0->1 | 0.3) 1 (f172, 0->1 | 0.00472118) => (f304, 1->0 | 0.12327) 0.998424 ---- Action Rule of Min Cost Found: ---- (f172, 0->1 | 0.00472118) => (activity, 0->1 | 0.3) 0.998424

25 Insurance Company Benchmark database The next database used is in the financial domain, the Insurance Company Benchmark (COIL 2000) database used with the CoIL 2000 Challenge. The next database used is in the financial domain, the Insurance Company Benchmark (COIL 2000) database used with the CoIL 2000 Challenge. The data contains 5,822 tuples (the customers /rows in the database). The features (the attributes / columns in the database) include product usage data and socio-demographic data derived from zip area codes. The data contains 5,822 tuples (the customers /rows in the database). The features (the attributes / columns in the database) include product usage data and socio-demographic data derived from zip area codes. The data was supplied by the Dutch data mining company Sentient Machine Research and is based on a real world business problem. The data was supplied by the Dutch data mining company Sentient Machine Research and is based on a real world business problem. In our testing the user would like to reclassify the attribute Contribution car policies from a value of 5 to 6. In our testing the user would like to reclassify the attribute Contribution car policies from a value of 5 to 6.

26 Insurance Company Benchmark database decisionAttribute = Contribution car policies valueFrom = 5 valueTo = 6 dataFile= Insurance numberOfObjects: 5822 minConfidenceL1 = 0.72 minFeasibilityL3 = 0.0001 knownCost = 0.4 maxCostL2 = 0.02 ---- Goal Node: (Private health insurance, 3->4 | 0.02423844) => (Contribution car policies, 5->6 | 0.4) 0.833333 (High level education, 2->3 | 0.00176027) ^ (Social class B1, 2->4 | 0.0146667) => (Private health insurance, 3->4 | 0.3) 0.714286 ---- Action Rule of Min Cost Found: ---- (High level education, 2->3 | 0.00176027) ^ (Social class B1, 2->4 | 0.0146667) => (Contribution car policies, 5->6 | 0.4) 0.714286 The following results were found with LowestCostReclassifier : The following results were found with LowestCostReclassifier :

27 Breast Cancer Database Time Another database used is a breast cancer database. It was obtained at the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. Another database used is a breast cancer database. It was obtained at the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg. It contains a class attribute which classifies the tumor as benign or malignant. The rest of the attributes (columns in the database) contain descriptions of common factors radiologists, and pathologists examine in order to place the diagnosis, such as Clump Thickness, Uniformity of Cell Shape, Bare Nuclei, etc. It contains a class attribute which classifies the tumor as benign or malignant. The rest of the attributes (columns in the database) contain descriptions of common factors radiologists, and pathologists examine in order to place the diagnosis, such as Clump Thickness, Uniformity of Cell Shape, Bare Nuclei, etc. The database has 700 instances (the objects /rows in the database), Benign: 458 (65.5%) and Malignant: 241 (34.5%). The database has 700 instances (the objects /rows in the database), Benign: 458 (65.5%) and Malignant: 241 (34.5%). The class attribute is used as the re-classification attribute with LowestCostReclassifier. This way, we will provide suggestions/actions to be undertaken in order to change the class from malignant to benign. The class attribute is used as the re-classification attribute with LowestCostReclassifier. This way, we will provide suggestions/actions to be undertaken in order to change the class from malignant to benign.

28 Breast Cancer Database decisionAttribute = Class valueFrom = 4 valueTo = 2 dataFile= brcancer numberOfObjects: 699 minConfidenceL1 = 0.7 minFeasibilityL3 = 0.0001 knownCost = 0.3 maxCostL2 = 0.00059... !!!!!!!List of sifted action rules is empty i.e. no feasible action rules were found, so this state/child has no descendents: (Marginal Adhesion, 1->3 | 0.00128449) @@@@@@@@ Traversing next node The list Q of nodes traversed was empty. The following results were found with LowestCostReclassifier : The following results were found with LowestCostReclassifier :

29 Breast Cancer Database ---- Best Node ---- (Uniformity of Cell Size, 8->1 | 0.00179706) ^ (Normal Nucleoli, 5->3 | 0.00721622) => (Class, 4->2 | 0.3) 1 (Marginal Adhesion, 1->3 | 0.00128449) => (Uniformity of Cell Size, 8->1 | 0.00179706) 0.85 ---- Action Rule of Min Cost Found: ---- (Marginal Adhesion, 1->3 | 0.00128449) => (Class, 4->2 | 0.3) 0.85 In this case, the goal could not be reached, possibly because the desired maximum cost specified, 0.00059 was too low. However, still the best node found thus far was returned, which cost 0.00128449 is still lower that the currently known cost to the user 0.3. In this case, the goal could not be reached, possibly because the desired maximum cost specified, 0.00059 was too low. However, still the best node found thus far was returned, which cost 0.00128449 is still lower that the currently known cost to the user 0.3.

ACTION RULES Slides by A A Tzacheva.. 2 Knowledge Discovery The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.

Similar presentations

Presentation on theme: "ACTION RULES Slides by A A Tzacheva.. 2 Knowledge Discovery The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ACTION RULES Slides by A A Tzacheva.. 2 Knowledge Discovery The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.

Similar presentations

Presentation on theme: "ACTION RULES Slides by A A Tzacheva.. 2 Knowledge Discovery The nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable."— Presentation transcript:

Similar presentations

About project

Feedback