# Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of.

## Presentation on theme: "Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of."— Presentation transcript:

Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of Economics Prague

FDM 20032 … KL-Miner, First Experience KL-Miner Basic features Application example Implementation principles Scalability Concluding remarks

FDM 20033 KL-Miner -- Data and Patterns M A1A1 A2A2 …APAP o1o1 212…1 o2o2 15…4 … ………… onon 39…2 Data: Data Matrix Patterns i.e. KL-hypothesis: R C / row attribute R {A 1, …, A P }, possible values i.e. categories: r 1, …, r K column attribute C {A 1, …, A P }, possible values i.e. categories: c 1, …, c L Boolean attribute derived from other attributes A 1, …, A P KL quantifier …. Condition imposed on contingency table of R and C

FDM 20034 KL – quantifiers Contingency table of R and C: Examples of quantifiers: Simple aggregate function: Kendalls quantifier: e.g. | b | P

FDM 20035 Kendalls quantifier b 0;1 b > 0 … positive ordinal dependence b < 0 … negative ordinal dependence b = 0 … ordinal independence | b | = 1 … C is a function of R Kendalls quantifier: e. g. | b | p or | b | p :Kendalls coeficient:

FDM 20036 KL-Miner application example STULONG Project, 1419 patients, entry examination See http://euromise.vse.czhttp://euromise.vse.cz

FDM 20037 STULONG attributes examples (1) Systolic blood pressure Smoking Group of patients

FDM 20038 STULONG attributes examples (2) Skinfold above musculus triceps (mm) Beer – amount / day 219 attributes total 38 ordinal attributes We use 17 ordinal attributes

FDM 20039 Example - analytic question Are there any ordinal dependencies among attributes under some conditions? at least 50 patients | b | 0.75 relevant conditions :

FDM 200310 Example – relevant condition specification (1) Group of patients (normal), Group of patients (risk), … Beer 10(yes), Beer 12(yes), …, Beer 10(yes) Beer 12(yes) Sliding windows …

FDM 200311 Example – relevant condition specification (2) 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15,....., 43, 44, 45, 46, 47, 48, 49, 50........... 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15,....., 43, 44, 45, 46, 47, 48, 49, 50 Sliding window

FDM 200312 Example – output overview 2 min 1sec 550 310 verifications 25 hypotheses 3.06 GHz 512 MB DDR SDRAM

FDM 200313 Example – output detail (1) b = 0.82 (i.e. strong positive ordinal dependence)

FDM 200314 Example – output detail (2) b = 0.78 (i.e. strong positive ordinal dependence)

FDM 200315 Implementation principles (1) M A1A1 A2A2 …APAP A 1 [1]A 1 [2]A 1 [3] o1o1 212…1010 o2o2 15…4100 …………………… onon 39…2 001 AttributesCards of categories of A 1 Attributes are represented by cards of categories i.e. strings of bits

FDM 200316 Implementation principles (2) CARD [ ] = bit string representation of Booelan attribute CARD [ Group of patients (normal) Beer 10(yes) Beer 12(yes) ] = Group of patients [normal] Beer 10[yes] Beer 12[yes] Count( ) – number of 1 in the bit string

FDM 200317 Implementation principles (3) n 1,1 = Count( R[ r 1 ] C[c 1 ] CARD [ ])

FDM 200318 Scalability 75 000 verifications approximately linear

FDM 200319 Concluding remarks KL-Miner practically interesting results Suitable for interactive work Further quantifiers Combinations with further mining procedures

Download ppt "Mining for Patterns Based on Contingency Tables by KL-Miner First Experience Jan Rauch Milan Šimůnek (PhD. student) Václav Lín (student) University of."

Similar presentations