Download presentation

Presentation is loading. Please wait.

Published byJadon Loomer Modified over 2 years ago

1
Entropy Estimation and Applications to Decision Trees

2
Estimation Distribution over K=8 classes Repeat 50,000 times: 1.Generate N samples 2.Estimate entropy from samples N=10 N=100 N=50000 H=1.289

3
Estimation Estimating the true entropy Goals: 1. Consistency: large N guarantees correct result 2. Low variance: variation of estimates small 3. Low bias: expected estimate should be correct

4
Discrete Entropy Estimators

5
UCI classification data sets Accuracy on test set Plugin vs. Grassberger Better trees Experimental Results Source: [Nowozin, “Improved Information Gain Estimates for Decision Tree Induction”, ICML 2012]

6
In regression, differential entropy – measures remaining uncertainty about y – is a function of a distribution Differential Entropy Estimation Problem – q is not from a parametric family Solution 1: project onto a parametric family Solution 2: non-parametric entropy estimation

7
Solution 1: parametric family – Uniform minimum variance unbiased estimator (UMVUE) [Ahmed, Gokhale, “Entropy expressions and their estimators for multivariate distributions”, IEEE Trans. Inf. Theory, 1989]

8
Solution 1: parametric family

9

10
Solution 2: Non-parametric entropy estimation [Kozachenko, Leonenko, “Sample estimate of the entropy of a random vector”, Probl. Peredachi Inf., 1987] [Beirlant, Dudewicz, Győrfi, van der Meulen, “Nonparametric entropy estimation: An overview”, 2001] [Wang, Kulkarni, Verdú, “Universal estimation of information measures for analog sources”, FnT Comm. Inf. Th., 2009]

11
Solution 2: Non-parametric estimation

12
Experimental Results [Nowozin, “Improved Information Gain Estimates for Decision Tree Induction”, ICML 2012]

13
Streaming Decision Trees

14
Streaming Data “Infinite data” setting 10 possible splits and their scores When to stop and make a decision?

15
Streaming Decision Trees [Domingos, Hulten, “Mining High-Speed Data Streams”, KDD 2000] [Jin, Agralwal, “Efficient Decision Tree Construction on Streaming Data”, KDD 2003] [Loh, Nowozin, “Faster Hoeffding racing: Bernstein races via jackknife estimates”, ALT 2013] Score splits on a subset of samples only Domingos/Hulten (Hoeffding Trees), 2000: – Compute sample count n for given precision – Streaming decision tree induction – Incorrect confidence intervals, but work well in practice Jin/Agralwal, 2003: – Tighter confidence interval, asymptotic derivation using delta method Loh/Nowozin, 2013: – Racing algorithm (bad splits are removed early) – Finite sample confidence intervals for entropy and gini

16
Multivariate Delta Method [DasGupta, “Asymptotic Theory of Statistics and Probability”, Springer, 2008]

17
Delta Method for the Information Gain [Small, “Expansions and Asymptotics for Statistics”, CRC, 2010] [DasGupta, “Asymptotic Theory of Statistics and Probability”, Springer, 2008]

18
Delta Method Example

19
Statistical problem Large body of literature exists on entropy estimation Better estimators yield better decision trees Distribution of estimate relevant in the streaming setting Conclusion on Entropy Estimation

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google