Presentation is loading. Please wait.

Presentation is loading. Please wait.

Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.

Similar presentations


Presentation on theme: "Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department."— Presentation transcript:

1 Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department of Computer Sciences University of Wisconsin, Madison USA

2 Main Contribution Greedy tree learning algorithms suffer from myopia This is remedied by Lookahead, which is computationally very expensive We present an approach to efficiently address the myopia of tree learners

3 Task Setting Given: m examples of n Boolean attributes each, labeled according to a function f over some subset of the n attributes Do: Learn the Boolean function f

4 TDIDT Algorithm Top Down Induction of Decision Trees Greedy algorithm Chooses feature that locally optimizes some measure of “purity” of class labels Information Gain Gini Index

5 TDIDT Example +110 −010 −101 +000 Valuex3x3 x2x2 x1x1

6 x1x1 x 1 =0 (2+, 1−) x 1 =1 (1−) ─ x 2 =0 (1+) x 2 =1 (1+,1−) x2x2 TDIDT Example

7 Outline Introduction to TDIDT algorithm Myopia and “Hard” Functions Skewing Experiments with Skewing Algorithm Sequential Skewing Experiments with Sequential Skewing Conclusions and Future Work

8 Myopia and Correlation Immunity For certain Boolean functions, no variable has “gain” according to standard purity measures (e.g., entropy, Gini) No variable correlated with class In cryptography, correlation immune Given such a target function, every variable looks equally good (bad) In an application, the learner will be unable to differentiate between relevant and irrelevant variables

9 A Correlation Immune Function f=x 1  x 2 x2x2 x1x1 011 101 110 000

10 Examples In Drosophila, Survival is an exclusive-or function of Gender and the expression of the SxL gene In drug binding (Ligand-Domain), binding may have an exclusive-or subfunction of Ligand Charge and Domain Charge

11 Learning Hard Functions Standard method of learning hard functions with TDIDT: depth-k Lookahead O(mn 2 k+1 -1 ) for m examples in n variables Can we devise a technique that allows TDIDT algorithms to efficiently learn hard functions?

12 Key Idea Correlation immune functions aren’t hard – if the data distribution is significantly different from uniform

13 Example Uniform distribution can be sampled by setting each variable (feature) independently of all others, with probability 0.5 of being set to 1 Consider a distribution where each variable has probability 0.75 of being set to 1.

14 Example x1x1 x2x2 x3x3 f 00 0 0 1 01 0 1 1 10 0 1 1 11 0 0 1

15 x1x1 x2x2 x3x3 fWeightSum 00 0 0 1 01 0 1 1 10 0 1 1 11 0 0 1

16 Example x1x1 x2x2 x3x3 fSum 00 0 0 1 01 0 1 1 10 0 1 1 11 0 0 1

17 Example x1x1 x2x2 x3x3 fSum 00 0 0 1 01 0 1 1

18 Example x1x1 x2x2 x3x3 fSum 10 0 1 1 11 0 0 1

19 Example x1x1 x2x2 x3x3 fWeight 00 00 01 01 10 01 11 00

20 Key Idea Given a large enough sample and a second distribution sufficiently different from the first, we can learn functions that are hard for TDIDT algorithms under the original distribution.

21 Issues to Address How can we get a “sufficiently different” distribution? Our approach: “skew” the given sample by choosing “favored settings” for the variables Not-large-enough sample effects? Our approach: Average “goodness” of any variable over multiple skews

22 Skewing Algorithm For T trials do Choose a favored setting for each variable Reweight the sample Calculate entropy of each variable split under this weighting For each variable that has sufficient gain, increment a counter Split on the variable with the highest count

23 Experiments ID3 vs. ID3 with Skewing (ID3 to avoid issues to do with parameters, pruning, etc.) Synthetic Propositional Data Examples of 30 Boolean variables Target Boolean functions of 2-6 of these variables Randomly chosen targets and randomly chosen hard targets UCI Datasets ( Perlich et al, JMLR 2003 ) 10 fold cross validation Evaluation metric: Weighted Accuracy = average of accuracy over positives and negatives

24 Results (3-variable Boolean functions) Random functionsHard functions

25 Results (4-variable Boolean functions) Random functionsHard functions

26 Results (5-variable Boolean functions) Random functionsHard functions

27 Results (6-variable Boolean functions) Random functionsHard functions

28 Current Shortcomings Sensitive to noise, high-dimensional data Very small signal on the hardest CI functions (parity) given more than 3 relevant variables Only very small gains on real-world datasets attempted so far Few correlation immune functions in practice? Noise, dimensionality, not enough examples?


Download ppt "Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department."

Similar presentations


Ads by Google