Bayesian Framework EE 645 ZHAO XIN. A Brief Introduction to Bayesian Framework The Bayesian Philosophy Bayesian Neural Network Some Discussion on Priors.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Pattern Recognition and Machine Learning
A Statistician’s Games * : Bootstrap, Bagging and Boosting * Please refer to “Game theory, on-line prediction and boosting” by Y. Freund and R. Schapire,
Pattern Recognition and Machine Learning
Kriging.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to form.
Supervised Learning Recap
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Industrial Engineering College of Engineering Bayesian Kernel Methods for Binary Classification and Online Learning Problems Theodore Trafalis Workshop.
Visual Recognition Tutorial
Classification and risk prediction
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Evaluating Hypotheses
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Machine Learning CMPT 726 Simon Fraser University
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Study of Sparse Online Gaussian Process for Regression EE645 Final Project May 2005 Eric Saint Georges.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
BraMBLe: The Bayesian Multiple-BLob Tracker By Michael Isard and John MacCormick Presented by Kristin Branson CSE 252C, Fall 2003.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Non-Bayes classifiers. Linear discriminants, neural networks.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Dropout as a Bayesian Approximation
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 23: Linear Support Vector Machines Geoffrey Hinton.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Applied statistics Usman Roshan.
Evaluating Classifiers
Probability Theory and Parameter Estimation I
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Ch3: Model Building through Regression
Classification of unlabeled data:
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Maximum Likelihood Estimation
Data Mining Lecture 11.
Bias and Variance of the Estimator
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Linear Hierarchical Modelling
Pattern Recognition and Machine Learning
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
Probabilistic Surrogate Models
Presentation transcript:

Bayesian Framework EE 645 ZHAO XIN

A Brief Introduction to Bayesian Framework The Bayesian Philosophy Bayesian Neural Network Some Discussion on Priors

Bayesian ’ s Rule Likelihood Prior Distribution Normalizing Constant

Bayesian Prediction

Hierarchical Model

An Example Bayesian Network

Some Discussion on Priors Priors Converging to Gaussian Process If the number of Hidden Units is infinite Priors Leads to smooth and Brownian Functions Fractional Brownian Priors Priors Converging to Non-Gaussian Stable Process

Bayesian Framework for LS RBF Kernel SVM MUD Basic Problem and Solution Probabilistic Interpretation of the LS SVM First Level Inference Second Level Inference Third Level Inference Basic MUD Model Results and Discussion Summary

Basic Problem for LS SVM

Basic Solution for LS SVM

The Formula for SVM

First Level Inference

Some Assumptions of this Level Separable Gaussian Prior for conditional P(w,b) Independent Data Points Gaussian Distributed Errors Variance of b goes to infinite

Result of the First Level

Conditional Distribution of Weight w and Bias b

Unbalance Case of 1 st Level If the means of +1 class and – 1 class are not perfectly project to +1 and – 1, the bias term will come. We will introduce 2 new random variables as followed.

Last Solution for First Level

Second Level Inference

Result of Second Level Inference

Last Solution for Second Level

Third Level Inference

Some Assumption in this Level

Last Solution for Third Level

Some Comments for this Level For Gaussian Kernel machine, the variance of Gaussian function can represent the model H It ’ s impossible to calculate for all the possible model Luckily, in general, such as in Gaussian Kernel SVM, the performance of classifier is pretty smooth with respect to the varying of model parameter. Therefore, we can just take sample of the model in the area we feel interested.

A Synchronous CDMA Transmitter

The LS SVM Receiver Diagram

Results and Discussions

First Inference

Second Inference

Third Inference (Plot 1)

A Sample of Parameter Chosen

Detector Performance

Some Discussions on this Detector The first inference does better the performance of LS SVM detector especially in high SNR region by considering the bias term. The LS SVM detector is very smooth with respect to the varying of those hyper-parameters, which means the adaptive LS SVM will reasonably work well if the channel properties are not varying fast. The computation for 2 nd and 3 rd inference are very complex, so it ’ s not worthwhile to do calculation here. We can choose some approximation formula instead.

Summary of Bayesian Network Pick up a basic neural network. Properly choose the Priors (physically right and easy for theoretical deduction). Find a reasonable hierarchical framework (a three-level inference framework is very typical), apply the Bayesian Rule there and find some beneficial assumption to simplify the problem.

Some Comments on Bayesian Framework It can help us to physically understand a neural network model. It can theoretically help us to find the way to optimize the parameters and more important those hyper-parameters which can be sometimes impossibly set otherwise. It even can make up some exist methods in some given problems.

Reference Tony V. G., Johan A. K. Suykens, A Bayesian Framework for Least Square Support Vector Machine Classifiers N. Cristianini, John S., An Introduction to Support Vector Machine, 2000 Radford M. Neal, Bayesian Learning for Neural Network, 1996 Sergio Verdo, Multiuser Detection