Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Bayesian Learning Provides practical learning algorithms
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Lecture 5 Bayesian Learning
What we will cover here What is a classifier
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
Naïve Bayes Classifier
On Discriminative vs. Generative classifiers: Naïve Bayes
What is Statistical Modeling
Bayesian Learning No reading assignment for this topic
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Decision Theory Naïve Bayes ROC Curves
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Monday, March 6, 2000 William.
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Probabilistic Model of Sequences Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Thanks to Nir Friedman, HU
Bayes Classification.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Text Classification, Active/Interactive learning.
Naive Bayes Classifier
B AYESIAN L EARNING & G AUSSIAN M IXTURE M ODELS Jianping Fan Dept of Computer Science UNC-Charlotte.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 29 October 2004 William.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
B AYESIAN L EARNING & G AUSSIAN M IXTURE M ODELS Jianping Fan Dept of Computer Science UNC-Charlotte.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Classification And Bayesian Learning
Bayesian Learning Provides practical learning algorithms
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham.
Bayesian Learning Reading: C. Haruechaiyasak, “A tutorial on naive Bayes classification” (linked from class website)
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Bayesian Learning Evgueni Smirnov Overview Bayesian Theorem Maximum A Posteriori Hypothesis Naïve Bayes Classifier Learning Text Classifiers.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Naive Bayes Classifier
Bayesian Rule & Gaussian Mixture Models
Data Mining Lecture 11.
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Naive Bayes Classifier
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham

Overview Today we learn about: Bayes rule & turn this into a classifier –E.g. How to decide if a patient is ill or healthy, based on A probabilistic model of the observed data Prior knowledge

Training data: examples of the form (d,h(d)) –where d are the data objects to classify (inputs) –and h(d) are the correct class info for d, h(d)  {1,…K} Goal: given d new, provide h(d new ) Classification problem

A word about the Bayesian framework Allows us to combine observed data and prior knowledge Provides practical learning algorithms It is a generative (model based) approach, which offers a useful conceptual framework –This means that any kind of objects (e.g. time series, trees, etc.) can be classified, based on a probabilistic model specification

Bayes’ Rule Who is who in Bayes’ rule

Probabilities – auxiliary slide for memory refreshing Have two dice h 1 and h 2 The probability of rolling an i given die h 1 is denoted P(i|h 1 ). This is a conditional probability Pick a die at random with probability P(h j ), j=1 or 2. The probability for picking die h j and rolling an i with it is called joint probability and is P(i, h j )=P(h j )P(i| h j ). For any events X and Y, P(X,Y)=P(X|Y)P(Y) If we know P(X,Y), then the so-called marginal probability P(X) can be computed as Probabilities sum to 1. Conditional probabilities sum to 1 provided that their conditions are the same.

Does patient have cancer or not? A patient takes a lab test and the result comes back positive. It is known that the test returns a correct positive result in only 98% of the cases and a correct negative result in only 97% of the cases. Furthermore, only of the entire population has this disease. 1. What is the probability that this patient has cancer? 2. What is the probability that he does not have cancer? 3. What is the diagnosis?

Choosing Hypotheses Maximum Likelihood hypothesis: Generally we want the most probable hypothesis given training data.This is the maximum a posteriori hypothesis: –Useful observation: it does not depend on the denominator P(d)

Now we compute the diagnosis –To find the Maximum Likelihood hypothesis, we evaluate P(d|h) for the data d, which is the positive lab test and chose the hypothesis (diagnosis) that maximises it: –To find the Maximum A Posteriori hypothesis, we evaluate P(d|h)P(h) for the data d, which is the positive lab test and chose the hypothesis (diagnosis) that maximises it. This is the same as choosing the hypotheses gives the higher posterior probability.

The Naïve Bayes Classifier What can we do if our data d has several attributes? Naïve Bayes assumption: Attributes that describe data instances are conditionally independent given the classification hypothesis –it is a simplifying assumption, obviously it may be violated in reality –in spite of that, it works well in practice The Bayesian classifier that uses the Naïve Bayes assumption and computes the MAP hypothesis is called Naïve Bayes classifier One of the most practical learning methods Successful applications: –Medical Diagnosis –Text classification

Example. ‘Play Tennis’ data

Naïve Bayes solution Classify any new datum instance x=(a 1,…a T ) as: To do this based on training examples, we need to estimate the parameters from the training examples: –For each target value (hypothesis) h –For each attribute value a t of each datum instance

Based on the examples in the table, classify the following datum x: x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong) That means: Play tennis or not? Working:

Learning to classify text Learn from examples which articles are of interest The attributes are the words Observe the Naïve Bayes assumption just means that we have a random sequence model within each class! NB classifiers are one of the most effective for this task Resources for those interested: –Tom Mitchell: Machine Learning (book) Chapter 6.

Results on a benchmark text corpus

Remember Bayes’ rule can be turned into a classifier Maximum A Posteriori (MAP) hypothesis estimation incorporates prior knowledge; Max Likelihood doesn’t Naive Bayes Classifier is a simple but effective Bayesian classifier for vector data (i.e. data with several attributes) that assumes that attributes are independent given the class. Bayesian classification is a generative approach to classification

Resources Textbook reading (contains details about using Naïve Bayes for text classification): Tom Mitchell, Machine Learning (book), Chapter 6. Further reading for those interested to learn more: