Presentation is loading. Please wait.

Presentation is loading. Please wait.

SDS-Rules and Classification Tomáš Karban ECML/PKDD 2003 – Dubrovnik (Cavtat) September 22, 2003.

Similar presentations


Presentation on theme: "SDS-Rules and Classification Tomáš Karban ECML/PKDD 2003 – Dubrovnik (Cavtat) September 22, 2003."— Presentation transcript:

1 SDS-Rules and Classification Tomáš Karban ECML/PKDD 2003 – Dubrovnik (Cavtat) September 22, 2003

2 2 Motivation for SDS-rules STULONG project (http://euromise.vse.cz/challenge2003/)http://euromise.vse.cz/challenge2003/ Middle-aged men studied with respect to atherosclerosis (heart disease) risk factors Studying differences between normal group and risk group of patients Compare groups of patients with different physical, social, family and biochemical background Searching for couples of sets that differ markedly in the selected property

3 3 SDS-Rules SDS-Rules can be understood as an extension to association rules SDS-rules have the form  ( , ,  ) ,  define two disjoint sets A and B  defines some property symbol  stands for SDS-quantifier, which defines relation of two sets in the property 

4 4 Extend the Four-Fold Table   ab  cd (  )(  ) ef Table of frequencies is extended to six-fold: objects outside the sets A and B

5 5 distribution of the property  differs in absolute value between set A and set B by more than p and both sets have reasonable size SDS-Quantifiers Symmetric Additive Difference With this quantifier we are able to find those couples of sets, which differ significantly in the property 

6 6 Syntactical rule for  and  : one common attributed forced Example 1: smoking(no) & beer(half liter a day) smoking(5-10 cigarettes) & coffee(2 cups a day) Example 2: smoking(no) & beer(half liter a day) coffee(2 cups a day) & BMI(>25) Definition of Disjoint Sets

7 7 Analytical Questions Are there any strong relations concerning entry examination and cause of death? Are there differences in entry examination between men of the risk group, who came down with observed cardiovascular disease (during control examinations) and those who stayed healthy?

8 8 SDS Results (1) If we compare the group of patients, who are divorced, have reached apprentice school education and have other responsibility in their jobs, with the second group of patients, who are already pensioners, there is a 53.8% difference in the presence of other cause of death.

9 9 SDS Results (2) If we compare the group of patients, who came down with some cardiovascular disease during the control checks, with those, who stayed healthy, we see that in the second group there were 3.97% more patients working in a managerial position.

10 10 SDS Results (3) Comparing the group of patients, who do not drink beer and have BMI index equal to or greater than 27, against the group, where patients drink more than 1 liter of beer a day and have cholesterol level between 200 and 250mg, we can see that there are 36.0% more patients coming down with some cardiovascular disease in the first group.

11 11 Conclusion for SDS-rules There are virtually hundreds or thousands of SDS- rules in every presented task. SDS-rules of one task are often very similar How much is some particular attribute important in cedent conjunction? “SDS-rule neighborhood browsing” Semi-automatically generalize or refine acquired knowledge Attributes were divided into logical groups, inter- group relations were not studied; consult an expert if there is some important problem not covered

12 12 Classification Estimate death cause based on the attributes from entry examination Estimate, if the patient stayed healthy during control examinations, based on the attributes from entry examination Weka 3.2.3 was used J48 decision tree and rules, neural net, Bayes classifier + stacking

13 13 Classification Results (1) Poor results: All models tend to estimate cause of death for all patients to the biggest class – tumorous disease (29,92%)  Insufficient information to successfully estimate cause of death

14 14 Classification Results (2) Estimating of staying healthy during control examinations was the same failure successfulness comparable to the size of biggest class (those, who stayed healthy) – approx. 66%  Insufficient information to successfully estimate, if the patient stayed healthy

15 15 References Hájek, Havránek: Mechanizing Hypothesis Formation – Mathematical Foundations for a General Theory (Springer-Verlag, 1978) Rauch, Šimůnek: Alternative Approach to Mining Association Rules (in proceedings of the workshop ICDM02, Japan, 2002) The STULONG project: http://euromise.vse.cz/stulong-en/http://euromise.vse.cz/stulong-en/

16 16 SDS Results (1)

17 17 SDS Results (2)

18 18 SDS Results (3)


Download ppt "SDS-Rules and Classification Tomáš Karban ECML/PKDD 2003 – Dubrovnik (Cavtat) September 22, 2003."

Similar presentations


Ads by Google