Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by Yung-Kyun Noh Biointelligence Laboratory,

Similar presentations


Presentation on theme: "Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by Yung-Kyun Noh Biointelligence Laboratory,"— Presentation transcript:

1 Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh Biointelligence Laboratory, Seoul National University

2 2(C) 2006, SNU Biointelligence Lab, 3.4Bayesian Model Comparison 3.5The Evidence Approximation  3.5.1Evaluation of the evidence function  3.5.2Maximizing the evidence function  3.5.3Effective number of parameters 3.6Limitations of Fixed Basis Functions

3 3(C) 2006, SNU Biointelligence Lab, Bayesian Model Comparison (1/3) The problem of model selection from a Bayesian perspective  Over-fitting associated with maximum likelihood can be avoided by marginalizing over the model parameters instead of making point estimates of their values.  It also allow multiple complexity parameters to be determined simultaneously as part of the training process. (relevance vector machine)  The Bayesian view of model comparison simply involves the use of probabilities to represent uncertainty in the choice of model. Posterior  : prior, a preference for different models.  : model evidence (marginal likelihood), the preference shown by the data for different models. Parameters have been marginalized out.

4 4(C) 2006, SNU Biointelligence Lab, Bayesian Model Comparison (2/3) Bayes factor: the ratio of model evidences for two models Predictive distribution: mixture distribution. Averaging the predictive distribution weighted by the posterior probabilities. Model evidence  Sampling perspective: Marginal likelihood can be viewed as the probability of generating the data set D from a model whose parameters are sampled at random from the prior. Posterior distribution over parameters  Evidence is the normalizing term that appears in the denominator when evaluating the posterior distribution over parameters

5 5(C) 2006, SNU Biointelligence Lab, Bayesian Model Comparison (3/3) Assume that the posterior distribution is sharply peaked around the most probable value w MAP.  For a model having a set of M parameters, A simple model has little variability and so will generate data sets that are fairly similar to each other. A complex model spreads its predictive probability over too broad a range of data sets and so assigns relatively small probability to any one of them.

6 6(C) 2006, SNU Biointelligence Lab, The Evidence Approximation (1/2) Fully Bayesian treatment of linear basis function model  Hyperparameters: α, β.  Prediction: Marginalize w.r.t. hyperparameters as well as w. Predictive distribution  If the posterior distribution is sharply peaked around values, the predictive distribution is obtained simply by marginalizing over w in which are fixed to the values.

7 7(C) 2006, SNU Biointelligence Lab, The Evidence Approximation (2/2) If the prior is relatively flat,  In the evidence framework the values of are obtained by maximizing the marginal likelihood function. Hyperparameters can be determined from the training data alone from this method. (w/o recourse to cross-validation) Recall that the ratio α/β is analogous to a regularization parameter. Maximizing evidence  Set evidence function’s derivative equal to zero, re-estimate equations for α,β.  Use technique called the expectation maximization (EM) algorithm.

8 8(C) 2006, SNU Biointelligence Lab, Evaluation of the Evidence Function Marginal likelihood Model evidence

9 9(C) 2006, SNU Biointelligence Lab, Maximizing the Evidence Function Maximization of  Set derivative w.r.t α, β to zero.  w.r.t. α  u i and λ i are eigenvector and eigenvalue described by  Maximizing hyperparameter  w.r.t. β

10 10(C) 2006, SNU Biointelligence Lab, Effective Number of Parameters (1/2) γ: effective total number of well determined parameters

11 11(C) 2006, SNU Biointelligence Lab, Effective Number of Parameters (2/2) Optimal α Test err. Log evidence

12 12(C) 2006, SNU Biointelligence Lab, Limitations of Fixed Basis Functions Models comprising a linear combination of fixed, nonlinear basis functions.  Have closed-form solutions to the least-squares problem.  Have a tractable Bayesian treatment. The difficulty  The basis functions are fixed before the training data set is observed, and is a manifestation of the curse of dimensionality. Properties of data sets to alleviate this problem  The data vectors {x n } typically lie close to a nonlinear manifold whose intrinsic dimensionality is smaller than that of the input space  Target variables may have significant dependence on only a small number of possible directions within the data manifold.


Download ppt "Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Summarized by Yung-Kyun Noh Biointelligence Laboratory,"

Similar presentations


Ads by Google