Presentation on theme: "SDS-Rules and Association Rules March 17, 2004Nicosia, Cyprus Tomáš Karban 1 Jan Rauch 2 Milan Šimůnek 2 1 Charles University, Prague Dept. of Software."— Presentation transcript:
SDS-Rules and Association Rules March 17, 2004Nicosia, Cyprus Tomáš Karban 1 Jan Rauch 2 Milan Šimůnek 2 1 Charles University, Prague Dept. of Software Engineering 2 University of Economics, Prague Dept. of Information and Knowledge Engineering ACM Symposium on Applied Computing SAC 2004
SDS-Rules and Association Rules2 Agenda Introduction to association rules Motivation of SDS-rules SDS-rules in details SDS quantifiers Disjoint sets Implementation technique Application on medical data Conclusion
SDS-Rules and Association Rules3 Association Rules (1) Express relation between premise (antecedent) and consequence (succedent) and are Boolean attributes derived as conjunctions from columns of studied data table (rows = objects) stands for quantifier – truth condition of association rule based on contingency table of and Example: account(low) & salary(low) 90% loan_quality(bad)
SDS-Rules and Association Rules4 Association Rules (2) Contingency table Founded implication Various quantifiers available: implications, double implications, equivalence, statistical hypotheses tests, above/outside average relations, etc. ab cd
SDS-Rules and Association Rules5 Motivation of SDS-rules Describe interesting relations between couples of disjoint sets (usually catch their difference) Use similar way, same methods Example: get couples of sets that differ significantly in selected property get all properties that differ on fixed pair of sets combination of both... Motivation comes directly from demands of STULONG project (atherosclerosis risk factors)
SDS-Rules and Association Rules6 SDS-Rules (1) SDS-rules can be understood as an extension to association rules SDS-rules have the form ( , , ) , define two disjoint sets A and B defines some property symbol stands for SDS-quantifier, which defines relation of two sets in the property
SDS-Rules and Association Rules7 SDS-Rules (2) Table of frequencies is extended to six-fold (called “SDS-table”) ab cd ( )( ) ef first set second set outside both sets
SDS-Rules and Association Rules8 Asymmetric Multiplicative Difference Quantifier the first set contains at least k-times more percent of objects with the property than the second set both sets have size bigger than Base
SDS-Rules and Association Rules9 Symmetric Additive Difference Quantifier the percentage of the objects with the property differs between the first and the second set at least by p both sets have size bigger than Base
SDS-Rules and Association Rules10 Disjoint sets Empty intersection of sets can be arranged syntactically by forcing common attribute to and Coefficients (i.e. values of the attribute) of common attribute are disjoint sets are disjoint Example: account(low) & salary(mid) salary(low) & sex(male)
SDS-Rules and Association Rules11 Implementation Technique Data representation – bit strings for every value of every attribute being used Bit string length = number of objects in data table Value “1” in the position i of bit string s(x) = object i has value x for the attribute s Fast operations on bit strings – AND, OR, NOT Building bit strings for the first set, the second set and for studied property Calculation of SDS-table – counting of “1” in bit strings Truth value of SDS-rule – expression on frequencies from SDS-table Memory conservative
SDS-Rules and Association Rules12 Application on Medical Data STULONG project (“longitudinal study”) studied prevalence of risk factors of atherosclerosis 1400 middle-aged men detailed entry examination, 20 years of checkups Among many other analytical questions: Are there strong relations concerning entry examination and the cause of death? Are there differences in entry examination between men of the risk group, who came down with observed cardiovascular disease (during control examinations) and those who stayed healthy?
SDS-Rules and Association Rules13 Results (1) If we compare the group of patients, who are divorced, have reached apprentice school education and have other responsibility in their jobs, with the second group of patients, who are already pensioners, there is a 53.8% difference in the presence of other cause of death.
SDS-Rules and Association Rules15 Results (2) If we compare the group of patients, who came down with some cardiovascular disease during the control checks, with those, who stayed healthy, we see that in the second group there were 3.97% more patients working in a managerial position.
SDS-Rules and Association Rules17 Conclusion A new method of describing potentially interesting patterns by SDS-rules was described Method was inspired by and applied on medical data, other application domains can surely benefit as well Method is computationally effective Drawback – results are usually large and SDS-rules produced are similar in certain domains (“nuggets”) additional software tool for “online result browsing” Development of statistical SDS-quantifiers is in progress