Presentation is loading. Please wait.

Presentation is loading. Please wait.

Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Similar presentations


Presentation on theme: "Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and."— Presentation transcript:

1 Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and the ONR nowak@engr.wisc.edu

2 Basic Problem Classification: build a decision rule based on labeled training data Given n training points, how well can we do ?

3 Smooth Decision Boundaries Suppose that the Bayes decision boundary behaves locally like a Lipschitz function Mammen & Tsybakov ‘99

4 Dyadic Thinking about Classification Trees recursive dyadic partition

5 Pruned dyadic partition Pruned dyadic tree Dyadic Thinking about Classification Trees Hierarchical structure facilitates optimization

6 The Classification Problem Problem:

7 Classifiers The Bayes Classifier: Minimum Empirical Risk Classifier:

8 Generalization Error Bounds

9

10

11 Selecting a good h

12 Convergence to Bayes Error

13 Ex. Dyadic Classification Trees labeled training data Bayes decision boundary complete RDP pruned RDP Dyadic classification tree

14 Codes for DCTs 0 1 0 0 00 1 11 1 1 code-lengths: ex: code: 0001001111 + 6 bits for leaf labels

15 Error Bounds for DCTs Compare with CART:

16 Rate of Convergence Suppose that the Bayes decision boundary behaves locally like a Lipschitz function Mammen & Tsybakov ‘99 C. Scott & RN ‘02

17 Why too slow ? because Bayes boundary is a (d-1)-dimensional manifold “good” trees are unbalanced all |T| leaf trees are equally favored

18 Local Error Bounds in Classification Spatial Error Decomposition:Mansour & McAllester ‘00

19 Relative Chernoff Bound

20

21 Local Error Bounds in Classification

22 Bounded Densities

23 Global vs. Local Key: local complexity is offset by small volumes!

24 Local Bounds for DCTs

25 Unbalanced Tree J leafs depth J-1 Global bound: Local bound:

26 Convergence to Bayes Error Mammen & Tsybakov ‘99 C. Scott & RN ‘03

27 Concluding Remarks ~ data dependent bound Neural Information Processing Systems 2002, 2003 nowak@engr.wisc.edu


Download ppt "Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and."

Similar presentations


Ads by Google