Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1
Content Saskia Klein & Steffen Bollmann 2 Recap from last weak Bayesian Linear Regression What is linear regression? Application of the Bayesian Theory on Linear Regression Example Comparison to Conventional Linear Regression Bayesian Logistic Regression Naive Bayes classifier Source: Bishop (ch. 3,4); Barber (ch. 10)
Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior
Conjugate prior In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. For any member of the exponential family, there exists a conjugate prior that can be written in the form Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma
Linear Regression Saskia Klein & Steffen Bollmann 5
Linear Regression Saskia Klein & Steffen Bollmann 6
Examples of linear regression models Saskia Klein & Steffen Bollmann 7
Bayesian Linear Regression Saskia Klein & Steffen Bollmann 8
Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior
Bayesian Linear Regression - Likelihood Saskia Klein & Steffen Bollmann 10
Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior
Conjugate prior In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. For any member of the exponential family, there exists a conjugate prior that can be written in the form Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma
Bayesian Linear Regression - Prior Saskia Klein & Steffen Bollmann 13
Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior
Bayesian Linear Regression – Posterior Distribution Saskia Klein & Steffen Bollmann 15
Example Linear Regression Saskia Klein & Steffen Bollmann 16 matlab
Predictive Distribution Saskia Klein & Steffen Bollmann 17
Common Problem in Linear Regression: Overfitting/model complexitiy Saskia Klein & Steffen Bollmann 18 Least Squares approach (maximizing the likelihood): point estimate of the weights Regularization: regularization term and value needs to be chosen Cross-Validation: requires large datasets and high computational power Bayesian approach: distribution of the weights good prior model comparison: computationally demanding, validation data not required
From Regression to Classification Saskia Klein & Steffen Bollmann 19
Classification Saskia Klein & Steffen Bollmann 20 decision boundary
Bayesian Logistic Regression Saskia Klein & Steffen Bollmann 21
Bayesian Logistic Regression Saskia Klein & Steffen Bollmann 22
Example Saskia Klein & Steffen Bollmann 23 Barber: DemosExercises\demoBayesLogRegression.m
Example Saskia Klein & Steffen Bollmann 24 Barber: DemosExercises\demoBayesLogRegression.m
Naive Bayes classifier Saskia Klein & Steffen Bollmann 25 Why naive? strong independence assumptions assumes that the presence/absence of a feature of a class is unrelated to the presence/absence of any other feature, given the class variable Ignores relation between features and assumes that all feature contribute independently to a class [
Saskia Klein & Steffen Bollmann Thank you for your attention 26