Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Basics of Statistical Estimation

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.

Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,

Model Assessment, Selection and Averaging

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Chapter 4: Linear Models for Classification

Naïve Bayes Classifier

Assuming normally distributed data! Naïve Bayes Classifier.

Classification and risk prediction

1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Today Linear Regression Logistic Regression Bayesians v. Frequentists

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.

Visual Recognition Tutorial

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.

Computer vision: models, learning and inference

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Review of Lecture Two Linear Regression Normal Equation

Crash Course on Machine Learning

Chapter Two Probability Distributions: Discrete Variables

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

INTRODUCTION TO Machine Learning 3rd Edition

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Biointelligence Laboratory, Seoul National University

Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.

MATH 643 Bayesian Statistics. 2 Discrete Case n There are 3 suspects in a murder case –Based on available information, the police think the following.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores.

Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.

Machine Learning 5. Parametric Methods.

Dirichlet Distribution

Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.

Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.

Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.

Probability Theory and Parameter Estimation I

CS 2750: Machine Learning Density Estimation

Ch3: Model Building through Regression

CH 5: Multivariate Methods

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

Special Topics In Scientific Computing

Distributions and Concepts in Probability Theory

OVERVIEW OF BAYESIAN INFERENCE: PART 1

Mathematical Foundations of BME Reza Shadmehr

LECTURE 07: BAYESIAN ESTIMATION

Multivariate Methods Berlin Chen

Mathematical Foundations of BME

Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

Presentation transcript:

Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1

Content Saskia Klein & Steffen Bollmann 2  Recap from last weak  Bayesian Linear Regression  What is linear regression?  Application of the Bayesian Theory on Linear Regression  Example  Comparison to Conventional Linear Regression  Bayesian Logistic Regression  Naive Bayes classifier  Source:  Bishop (ch. 3,4); Barber (ch. 10)

Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior

Conjugate prior In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. For any member of the exponential family, there exists a conjugate prior that can be written in the form Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma

Linear Regression Saskia Klein & Steffen Bollmann 5

Linear Regression Saskia Klein & Steffen Bollmann 6

Examples of linear regression models Saskia Klein & Steffen Bollmann 7

Bayesian Linear Regression Saskia Klein & Steffen Bollmann 8

Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior

Bayesian Linear Regression - Likelihood Saskia Klein & Steffen Bollmann 10

Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior

Conjugate prior In general, for a given probability distribution p(x|η), we can seek a prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior. For any member of the exponential family, there exists a conjugate prior that can be written in the form Important conjugate pairs include: Binomial – Beta Multinomial – Dirichlet Gaussian – Gaussian (for mean) Gaussian – Gamma (for precision) Exponential – Gamma

Bayesian Linear Regression - Prior Saskia Klein & Steffen Bollmann 13

Maximum a posterior estimation The bayesian approach to estimate parameters of the distribution given a set of observations is to maximize posterior distribution. It allows to account for the prior information. evidence likelihood prior posterior

Bayesian Linear Regression – Posterior Distribution Saskia Klein & Steffen Bollmann 15

Example Linear Regression Saskia Klein & Steffen Bollmann 16  matlab

Predictive Distribution Saskia Klein & Steffen Bollmann 17

Common Problem in Linear Regression: Overfitting/model complexitiy Saskia Klein & Steffen Bollmann 18  Least Squares approach (maximizing the likelihood):  point estimate of the weights  Regularization: regularization term and value needs to be chosen  Cross-Validation: requires large datasets and high computational power  Bayesian approach:  distribution of the weights  good prior  model comparison: computationally demanding, validation data not required

From Regression to Classification Saskia Klein & Steffen Bollmann 19

Classification Saskia Klein & Steffen Bollmann 20 decision boundary

Bayesian Logistic Regression Saskia Klein & Steffen Bollmann 21

Bayesian Logistic Regression Saskia Klein & Steffen Bollmann 22

Example Saskia Klein & Steffen Bollmann 23  Barber: DemosExercises\demoBayesLogRegression.m

Example Saskia Klein & Steffen Bollmann 24  Barber: DemosExercises\demoBayesLogRegression.m

Naive Bayes classifier Saskia Klein & Steffen Bollmann 25  Why naive?  strong independence assumptions  assumes that the presence/absence of a feature of a class is unrelated to the presence/absence of any other feature, given the class variable  Ignores relation between features and assumes that all feature contribute independently to a class [

Saskia Klein & Steffen Bollmann Thank you for your attention 26