Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Bootstrap Interval Estimator for Bayes’ Classification Error Chad M. Hawes a,b, Carey E. Priebe a a The Johns Hopkins University, Dept of Applied Mathematics.

Similar presentations


Presentation on theme: "A Bootstrap Interval Estimator for Bayes’ Classification Error Chad M. Hawes a,b, Carey E. Priebe a a The Johns Hopkins University, Dept of Applied Mathematics."— Presentation transcript:

1 A Bootstrap Interval Estimator for Bayes’ Classification Error Chad M. Hawes a,b, Carey E. Priebe a a The Johns Hopkins University, Dept of Applied Mathematics & Statistics b The Johns Hopkins University Applied Physics Laboratory Abstract PMH Distribution Given finite length classifier training set, we propose a new estimation approach that provides an interval estimate of the Bayes’-optimal classification error L*, by: Assuming power-law decay for unconditional error rate of k- nearest neighbor (kNN) classifier Constructing bootstrap-sampled training sets of varying size Evaluating kNN classifier on bootstrap training sets to estimate unconditional error rate Fitting resulting kNN error rate decay as function of training set size to assumed power-law form Standard kNN rule provides upper bound on L* Hellman’s (k,k’) nearest neighbor rule with reject option provides lower bound on L* Result is asymptotic interval estimate of L* using finite sample We apply this L* interval estimator to two classification datasets Motivation Approach: Part 1 Pima Indians Knowledge of Bayes’-optimal classification error L* tells us the best any classification rule could do on a given classification problem: Difference between your classifier’s error rate L n and L* indicates how much improvement is possible by changes to your classifier, for a fixed feature set If L* is small and |L n -L*| is large, then it’s worth spending time & money to improve your classifier Knowledge of Bayes’-optimal classification error L* indicates how good our features are for discriminating between our (two) classes: If L* is large and |L n -L*| is small, then better to spend time & money finding better features (changing F XY ) than improving your classifier Estimate of Bayes’ error L* is useful for guiding where to invest time & money for classification rule improvement and feature development Theory Model & Notation We have training data: Conditional probability of error for kNN rule: Finite sample: Asymptotic: Feature Vector: Class Label: We have testing data: We build k-nearest neighbor (kNN) classification rule: denoted as Unconditional probability of error for kNN rule: Finite sample: Asymptotic: Empirical distribution puts mass 1/n on n training samples No approach to estimate Bayes’ error can work for all joint distributions F XY : Devroye 1982: For any (fixed) integer n,  >0, and classification rule g n there exists a distribution F XY with Bayes’ error L*=0 such that there exist conditions on F XY for which our technique applies Asymptotic kNN-rule error rates form an interval bound on L*: Devijver 1979: For fixed k:, where lower bound is asymptotic error rate of the kNN-rule with reject option (Hellman 1970) if estimate asymptotic rates w/ finite sample, we have L* estimate KNN-rule’s unconditional error follows known form for class of distributions F XY : Snapp & Venkatesh 1998: Under regularity conditions on F XY, the finite sample unconditional error rate of the kNN-rule, for fixed k, follows the asymptotic expansion there exists known parametric form for kNN-rule’s error rate decay 1.Construct B bootstrap-sampled training datasets of size n j from D n using For each bootstrap-constructed training dataset, estimate kNN-rule conditional error rate on test set T m, yielding 2.Estimate mean & variance of for training sample size n j : Mean provides estimate of unconditional error rate Variance used for weighted fitting of error rate decay curve 3.Repeat steps 1 and 2 for desired training sample sizes : Yields estimates 4.Construct estimated unconditional error rate decay curve versus training sample size n Approach: Part 2 1.Assume kNN-rule error rates decay according to simple power-lay form: 2.Perform weighted nonlinear least squares fit to constructed error rate curve: Use variance of bootstrapped conditional error rate estimates as weights 3.Resulting forms upper bound for L*: Strong assumption on form of error rate decay enables estimate of asymptotic error rate using only a finite sample 4.Repeat entire procedure using Hellman’s (k,k’) nearest neighbor rule with reject option to form lower bound estimate for L*: This yields interval estimate for Bayes’ classification error as Priebe, Marchette, Healy (PMH) distribution has known L* = 0.0653, d=6: Training size n = 2000 Test set size m = 2000 Symbols are bootstrap estimates of unconditional error rate Interval estimate: UCI Pima Indian Diabetes distribution has unknown L*, d=8: Training size n = 500 Test set size m = 268 Symbols are bootstrap estimates of unconditional error rate Interval estimate: References [1] Devijver, P. “New error bounds with the nearest neighbor rule,” IEEE Trans. on Informtion Theory, 25, 1979. [2] Devroye, L. “Any discrimination rule can have an arbitrarily bad probability of error for finite sample size,” IEEE Trans. on Pattern Analysis & Machine Intelligence, 4, 1982. [3] Hellman, M. “The nearest neighbor classification rule with a reject option,” IEEE Trans. on Systems Science & Cybernetics, 6, 1970. [4] Priebe, C., D. Marchette, & D. Healy. “Integrated sensing and processing decision trees,” IEEE Trans. on Pattern Analysis & Machine Intelligence, 26, 2004. [5] Snapp, R. & S. Venkatesh. “Asymptotic expansions of the k nearest neighbor risk,” Annals of Statistics, 26, 1998.


Download ppt "A Bootstrap Interval Estimator for Bayes’ Classification Error Chad M. Hawes a,b, Carey E. Priebe a a The Johns Hopkins University, Dept of Applied Mathematics."

Similar presentations


Ads by Google