A Practical Course in Graphical Bayesian Modeling; Class 1 Eric-Jan Wagenmakers.

Slides:



Advertisements
Similar presentations
Bayesian Statistics Without Tears: Prelude
Advertisements

Bayes rule, priors and maximum a posteriori
Lecture 7. Distributions
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Welcome to Amsterdam!. Bayesian Modeling for Cognitive Science: A WinBUGS Workshop.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Bayesian Estimation in MARK
Statistics: Purpose, Approach, Method. The Basic Approach The basic principle behind the use of statistical tests of significance can be stated as: Compare.
COUNTING AND PROBABILITY
SI485i : NLP Day 2 Probability Review. Introduction to Probability Experiment (trial) Repeatable procedure with well-defined possible outcomes Outcome.
Binomial Distribution & Bayes’ Theorem. Questions What is a probability? What is the probability of obtaining 2 heads in 4 coin tosses? What is the probability.
Parameter Estimation using likelihood functions Tutorial #1
Bayesian statistics – MCMC techniques
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Short review of probabilistic concepts
Probability theory Much inspired by the presentation of Kren and Samuelsson.
Visual Recognition Tutorial
Evaluating Hypotheses
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Chapter 6 Probability.
COMP14112: Artificial Intelligence Fundamentals L ecture 3 - Foundations of Probabilistic Reasoning Lecturer: Xiao-Jun Zeng
Bayesian Inference Using JASP
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.
Probability Rules!. ● Probability relates short-term results to long-term results ● An example  A short term result – what is the chance of getting a.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
Theory of Probability Statistics for Business and Economics.
St5219: Bayesian hierarchical modelling lecture 2.1.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
1 TABLE OF CONTENTS PROBABILITY THEORY Lecture – 1Basics Lecture – 2 Independence and Bernoulli Trials Lecture – 3Random Variables Lecture – 4 Binomial.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
1 CHAPTERS 14 AND 15 (Intro Stats – 3 edition) PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY.
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
Once again about the science-policy interface. Open risk management: overview QRAQRA.
1 CHAPTER 7 PROBABILITY, PROBABILITY RULES, AND CONDITIONAL PROBABILITY.
PROBABILITY IN OUR DAILY LIVES
Sixth lecture Concepts of Probabilities. Random Experiment Can be repeated (theoretically) an infinite number of times Has a well-defined set of possible.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Sampling and estimation Petter Mostad
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Lecture 1.31 Criteria for optimal reception of radio signals.
What is Probability? Quantification of uncertainty.
Bayes Net Learning: Bayesian Approaches
Basic Probability Theory
Natural Language Processing
Review of Probability and Estimators Arun Das, Jason Rebello
Introduction to Probability
Lecture 11 Sections 5.1 – 5.2 Objectives: Probability
CSCI 5832 Natural Language Processing
Statistical NLP: Lecture 4
Tutorial 2: Sample Space and Probability 2
I flip a coin two times. What is the sample space?
M248: Analyzing data Block A UNIT A3 Modeling Variation.
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

A Practical Course in Graphical Bayesian Modeling; Class 1 Eric-Jan Wagenmakers

Outline  A bit of probability theory  Bayesian foundations  Parameter estimation: A simple example  WinBUGS and R2WinBUGS

Probability Theory (Wasserman, 2004)  The sample space Ω is the set of possible outcomes of an experiment.  If we toss a coin twice then Ω = {HH, HT, TH, TT}.  The event that the first toss is heads is A = {HH, HT}.

Probability Theory (Wasserman, 2004)  denotes intersection: “A and B”  denotes union: “A or B”

Probability Theory (Wasserman, 2004) P is a probability measure when the following axioms are satisfied: 2. Probabilities add to Probabilities are never negative: 2. The probability of the union of non-overlapping (disjoint) events is its sum:

Probability Theory (Wasserman, 2004) For any events A and B: Ω AB

Conditional Probability The conditional probability of A given B is Ω AB

Conditional Probability You will often encounter this as Ω AB

Conditional Probability From and follows Bayes’ rule.

Bayes’ Rule

The Law of Total Probability Let A 1,…,A k be a partition of Ω. Then, for any event B:

The Law of Total Probability This is just a weighted average of P(B) over the disjoint sets A 1,…,A k. For instance, when all P(A i ) are equal, the equation becomes:

Bayes’ Rule Revisited

Example (Wasserman, 2004)  I divide my into three categories: “spam”, “low priority”, and “high priority”.  Previous experience suggests that the a priori probabilities of a random belonging to these categories are.7,.2, and.1, respectively.

Example (Wasserman, 2004)  The probabilities of the word “free” occurring in the three categories is.9,.01,.01, respectively.  I receive an with the word “free”. What is the probability that it is spam?

Outline  A bit of probability theory  Bayesian foundations  Parameter estimation: A simple example  WinBUGS and R2WinBUGS

The Bayesian Agenda  Bayesians use probability to quantify uncertainty or “degree of belief” about parameters and hypotheses.  Prior knowledge for a parameter θ is updated through the data to yield the posterior knowledge.

The Bayesian Agenda Also note that this equation allows one to learn, from the probability of what is observed, something about what is not observed.

The Bayesian Agenda  But why would one measure “degree of belief” by means of probability? Couldn’t we choose something else that makes sense?  Yes, perhaps we can, but the choice of probability is anything but ad-hoc.

The Bayesian Agenda  Assume “degree of belief” can be measured by a single number.  Assume you are rational, that is, not self- contradictory or “obviously silly”.  Then degree of belief can be shown to follow the same rules as the probability calculus.

The Bayesian Agenda  For instance, a rational agent would not hold intransitive beliefs, such as:

The Bayesian Agenda  When you use a single number to measure uncertainty or quantify evidence, and these numbers do not follow the rules of probability calculus, you can (almost certainly?) be shown to be silly or incoherent.  One of the theoretical attractions of the Bayesian paradigm is that it ensures coherence right from the start.

Coherence Example a la De Finetti  There exists a ticket that says “If the French national soccer team wins the 2010 World Cup, this ticket pays $1.”  You must determine the fair price for this ticket.  After you set the price, I can choose to either sell the ticket to you, or to buy the ticket from you. This is similar to how you would divide a pie according to the rule “you cut, I choose”.  Please write this number down, you are not allowed to change it later!

Coherence Example a la De Finetti  There exists another ticket that says “If the Spanish national soccer team wins the 2010 World Cup, this ticket pays $1.”  You must again determine the fair price for this ticket.

Coherence Example a la De Finetti  There exists a third ticket that says “If either the French or the Spanish national soccer team wins the 2010 World Cup, this ticket pays $1.”  What is the fair price for this ticket?

Bayesian Foundations  Bayesians use probability to quantify uncertainty or “degree of belief” about parameters and hypotheses.  Prior knowledge for a parameter θ is updated through the data to yield posterior knowledge.  This happens through the use of probability calculus.

Bayes’ Rule Posterior Distribution Prior Distribution Likelihood Marginal Probability of the Data

Bayesian Foundations This equation allows one to learn, from the probability of what is observed, something about what is not observed. Bayesian statistics was long known as “inverse probability”.

Nuisance Variables  Suppose θ is the mean of a normal distribution, and α is the standard deviation.  You are interested in θ, but not in α.  Using the Bayesian paradigm, how can you go from P(θ, α | x) to P(θ | x)? That is, how can you get rid of the nuisance parameter α? Show how this involves P(α).

Nuisance Variables

Predictions  Suppose you observe data x, and you use a model with parameter θ.  What is your prediction for new data y, given that you’ve observed x? In other words, show how you can obtain P(y|x).

Predictions

Want to Know More?

Outline  A bit of probability theory  Bayesian foundations  Parameter estimation: A simple example  WinBUGS and R2WinBUGS

Bayesian Parameter Estimation: Example  We prepare for you a series of 10 factual true/false questions of equal difficulty.  You answer 9 out of 10 questions correctly.  What is your latent probability θ of answering any one question correctly?

Bayesian Parameter Estimation: Example  We start with a prior distribution for θ. This reflect all we know about θ prior to the experiment. Here we make a standard choice and assume that all values of θ are equally likely a priori.

Bayesian Parameter Estimation: Example  We then update the prior distribution by means of the data (technically, the likelihood) to arrive at a posterior distribution.

The Likelihood  We use the binomial model, in which P(D|θ) is given by where n =10 is the number of trials, and s=9 is the number of successes.

Bayesian Parameter Estimation: Example  The posterior distribution is a compromise between what we knew before the experiment (i.e., the prior) and what we have learned from the experiment (i.e., the likelihood). The posterior distribution reflects all that we know about θ.

Mode = % confidence interval: (0.59, 0.98)

Bayesian Parameter Estimation: Example  Sometimes it is difficult or impossible to obtain the posterior distribution analytically.  In this case, we can use Markov chain Monte Carlo algorithms to sample from the posterior. As the number of samples increases, the approximation to the analytical posterior becomes arbitrarily small.

Mode = % confidence interval: (0.59, 0.98) With 9000 samples, almost identical to analytical result.

Outline  A bit of probability theory  Bayesian foundations  Parameter estimation: A simple example  WinBUGS and R2WinBUGS

WinBUGS Bayesian inference Using Gibbs Sampling You want to have this installed (plus the registration key)

WinBUGS  Knows many probability distributions (likelihoods);  Allows you to specify a model;  Allows you to specify priors;  Will then automatically run the MCMC sampling routines and produce output.

Want to Know More About MCMC?

Models in WinBUGS  The models you can specify in WinBUGS are directed acyclical graphs (DAGs).

Models in WinBUGS (Spiegelhalter, 1998) A BD C E Below, E depends only on C

Models in WinBUGS (Spiegelhalter, 1998) A BD C E If the nodes are stochastic, the joint distribution factorizes…

Models in WinBUGS (Spiegelhalter, 1998) A BD C E P(A,B,C,D,E) = P(A) P(B) P(C|A,B) P(D|A,B) P(E|C)

Models in WinBUGS (Spiegelhalter, 1998) A BD C E This means we can sometimes perform “local” computations to get what we want

Models in WinBUGS (Spiegelhalter, 1998) A BD C E What is P(C|A,B,D,E)?

Models in WinBUGS (Spiegelhalter, 1998) A BD C E P(C|A,B,D,E) is proportional to P(C|A,B) P(E|C)  D is irrelevant

WinBUGS & R  WinBUGS produces MCMC samples.  We want to analyze the output in a nice program, such as R.  This can be accomplished using the R package “R2WinBUGS”

End of Class 1