Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Presenter : Chien-Hsing Chen Author: Geoffrey I. Webb.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Presenter : Chien-Hsing Chen Author: Geoffrey I. Webb."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Presenter : Chien-Hsing Chen Author: Geoffrey I. Webb Discovering significant rules 2006,SIGKDD

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Method Experiments Conclusions Personal Opinion

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Numerous techniques have been developed that seek to avoid false discoveries. No one provides a generic solution that is both flexible enough to accommodate definitions of true and false discoveries. Some real-world tasks there is potential for all “discoveries” to be false unless appropriate safeguards are employed.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective This paper present two generic techniques true and false discoveries to be specified significant rules false discoveries Non-redundant Productive Significance Bonferroni

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Association rules Introduction  finding rules from data D = t 1, t 2,..., t n, where each transaction or record t i ⊆ I and I = {item1, item2,... itemm} is the set of items of  a i = v i, j where a i represents an attribute and v i, j a value of a i.  no transaction t i, 1 ≤ i ≤ n may contain two items a i =v i, j and a i =v i,k, j != k.  Rules take the form x → y where x ⊂ I and y ∈ I. (limit y= single value) minsup minconf

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Problem Statement {pregnant} → oedema 懷孕 → 大肚子 {pregnant female} → oedema {pregnant} → female {pregnant, dataminer} → oedema unproductive rules

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Techniques for Preventing false discoveries  It discard rules x → y for which ∃ z ∈ x : sup(x → y) = sup(x\{z} → y). ─ Sup ( {pregnant female} → oedema ) = sup ( {pregnant} → oedema ) conf({pregnant} → oedema) conf({female} → oedema) A minimum improvement constraint is stronger than a non-redundant rule constraint as it rejects all redundant rules as well as many unproductive rules. 50%

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Techniques for Preventing false discoveries multiple comparisons or multiple tests problem The use of the statistical test avoids the “reject all rules” This use can reduce type-1 error with respect to minimum support or confidence constrains. Statistical Fisher test

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 The Within-Search Approach  Bonferroni adjustment replaces α in the hypothesis tests with α = α/r, where r is the number of tests performed. ─ Holm procedure  Within-search approach ─ With an appropriate Bonferroni adjustment, are applied to rules as they are encountered during the search process. α true discovery false discovery

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Holdout Evaluation Benjamini-Yekutieli procedure t-test a family of procedures depending on two prior distributions

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Experiments 1 10,000 transaction, 100 binary variables 100 data sets were generated. Non-redundant Productive Significance Bonferroni Non-redundant + holdout Productive + holdout no rules

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Experiments 2 10,000 transactions each containing values for 20 binary variables. Each of the X values was randomly generated with each value being equiprobable. All treatments found all true rules relating to X and Y65. Non-redundant, productive, significance found as many false discoveries. Neither the Bonferroni nor either of the holdout treatments made any false discoveries. “X55, X60”

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Experiments 3-1

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Experiments 3-2 passed by holdout evaluation

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Conclusion majority rules are spurious

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 Opinion Advantage Significant rules Drawback Fisher exact test Application …


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Presenter : Chien-Hsing Chen Author: Geoffrey I. Webb."

Similar presentations


Ads by Google