Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Basics of Statistical Estimation
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
Model Assessment, Selection and Averaging
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Chapter 4: Linear Models for Classification
Naïve Bayes Classifier
Assuming normally distributed data! Naïve Bayes Classifier.
Classification and risk prediction
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
Visual Recognition Tutorial
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Computer vision: models, learning and inference
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Review of Lecture Two Linear Regression Normal Equation
Crash Course on Machine Learning
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
INTRODUCTION TO Machine Learning 3rd Edition
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Biointelligence Laboratory, Seoul National University
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
MATH 643 Bayesian Statistics. 2 Discrete Case n There are 3 suspects in a murder case –Based on available information, the police think the following.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning 5. Parametric Methods.
Dirichlet Distribution
Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Probability Theory and Parameter Estimation I
CS 2750: Machine Learning Density Estimation
Ch3: Model Building through Regression
CH 5: Multivariate Methods
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Special Topics In Scientific Computing
Distributions and Concepts in Probability Theory
OVERVIEW OF BAYESIAN INFERENCE: PART 1
Mathematical Foundations of BME Reza Shadmehr
LECTURE 07: BAYESIAN ESTIMATION
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Presentation transcript:

Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1

Content Saskia Klein & Steffen Bollmann 2  Recap from last weak  Bayesian Linear Regression  What is linear regression?  Application of the Bayesian Theory on Linear Regression  Example  Comparison to Conventional Linear Regression  Bayesian Logistic Regression  Naive Bayes classifier  Source:  Bishop (ch. 3,4); Barber (ch. 10)

Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior

Conjugate prior In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. For any member of the exponential family, there exists a conjugate prior that can be written in the form Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma

Linear Regression Saskia Klein & Steffen Bollmann 5

Linear Regression Saskia Klein & Steffen Bollmann 6

Examples of linear regression models Saskia Klein & Steffen Bollmann 7

Bayesian Linear Regression Saskia Klein & Steffen Bollmann 8

Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior

Bayesian Linear Regression - Likelihood Saskia Klein & Steffen Bollmann 10

Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior

Conjugate prior In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. For any member of the exponential family, there exists a conjugate prior that can be written in the form Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma

Bayesian Linear Regression - Prior Saskia Klein & Steffen Bollmann 13

Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior

Bayesian Linear Regression – Posterior Distribution Saskia Klein & Steffen Bollmann 15

Example Linear Regression Saskia Klein & Steffen Bollmann 16  matlab

Predictive Distribution Saskia Klein & Steffen Bollmann 17

Common Problem in Linear Regression: Overfitting/model complexitiy Saskia Klein & Steffen Bollmann 18  Least Squares approach (maximizing the likelihood):  point estimate of the weights  Regularization: regularization term and value needs to be chosen  Cross-Validation: requires large datasets and high computational power  Bayesian approach:  distribution of the weights  good prior  model comparison: computationally demanding, validation data not required

From Regression to Classification Saskia Klein & Steffen Bollmann 19

Classification Saskia Klein & Steffen Bollmann 20 decision boundary

Bayesian Logistic Regression Saskia Klein & Steffen Bollmann 21

Bayesian Logistic Regression Saskia Klein & Steffen Bollmann 22

Example Saskia Klein & Steffen Bollmann 23  Barber: DemosExercises\demoBayesLogRegression.m

Example Saskia Klein & Steffen Bollmann 24  Barber: DemosExercises\demoBayesLogRegression.m

Naive Bayes classifier Saskia Klein & Steffen Bollmann 25  Why naive?  strong independence assumptions  assumes that the presence/absence of a feature of a class is unrelated to the presence/absence of any other feature, given the class variable  Ignores relation between features and assumes that all feature contribute independently to a class [

Saskia Klein & Steffen Bollmann Thank you for your attention 26