Presentation is loading. Please wait.

Presentation is loading. Please wait.

Score Function for Data Mining Algorithms Based on Chapter 7 of Hand, Manilla, & Smyth David Madigan.

Similar presentations


Presentation on theme: "Score Function for Data Mining Algorithms Based on Chapter 7 of Hand, Manilla, & Smyth David Madigan."— Presentation transcript:

1 Score Function for Data Mining Algorithms Based on Chapter 7 of Hand, Manilla, & Smyth David Madigan

2 Introduction e.g. how to pick the “best” a and b in Y = aX + b usual score in this case is the sum of squared errors Scores for patterns versus scores for models Predictive versus descriptive scores Typical scores are a poor substitute for utility-based scores

3 Predictive Model Scores Assume all observations equally important Depend on differences rather than values Symmetric Proper scoring rules

4 Descriptive Model Scores “Pick the model that assigns highest probability to what actually happened” Many different scoring rules for non- probabilistic models

5 Scoring Models with Different Complexities Bias-variance tradeoff (MSE=Variance + Bias 2 ) score(model) = error(model) + penalty-function(model) AIC: where S L is the negative log-likelihood Selects overly complex models?

6 Bayesian Criterion Typically impossible to compute analytically All sorts of Monte Carlo approximations

7 Laplace Method for p(D|M) (i.e., the log of the integrand divided by n) Laplace’s Method:

8 Laplace cont. Tierney & Kadane (1986, JASA) show the approximation is O(n -1 ) Using the MLE instead of the posterior mode is also O(n -1 ) Using the expected information matrix in  is O(n -1/2 ) but convenient since often computed by standard software Raftery (1993) suggested approximating by a single Newton step starting at the MLE Note the prior is explicit in these approximations

9 Monte Carlo Estimates of p(D|M) Draw iid  1,…,  m from p(  ): In practice has large variance

10 Monte Carlo Estimates of p(D|M) (cont.) Draw iid  1,…,  m from p(  |D): “Importance Sampling”

11 Monte Carlo Estimates of p(D|M) (cont.) Newton and Raftery’s “Harmonic Mean Estimator” Unstable in practice and needs modification

12 p(D|M) from Gibbs Sampler Output Suppose we decompose  into (  1,  2 ) such that p(  1 |D,  2 ) and p(  2 |D,  1 ) are available in closed- form… First note the following identity (for any  * ): Chib (1995) p(D|  * ) and p(  * ) are usually easy to evaluate. What about p(  * |D)?

13 p(D|M) from Gibbs Sampler Output The Gibbs sampler gives (dependent) draws from p(  1,  2 |D) and hence marginally from p(  2 |D)…

14 What about 3 parameter blocks… To get these draws, continue the Gibbs sampler sampling in turn from: OK ? and

15 Bayesian Information Criterion BIC is an O(1) approximation to p(D|M) Circumvents explicit prior Approximation is O(n -1/2 ) for a class of priors called “unit information priors.” No free lunch (Weakliem (1998) example)

16 Score Functions on Hold-Out Data Instead of penalizing complexity, look at performance on hold-out data Note: even using hold-out data, performance results can be optimistically biased

17 classification performance when data are in fact noise...


Download ppt "Score Function for Data Mining Algorithms Based on Chapter 7 of Hand, Manilla, & Smyth David Madigan."

Similar presentations


Ads by Google