Bayesian Methods What they are and how to use them in

Bayesian Methods What they are and how to use them in
Forensic Science/Computing

This is probably a more apt meme for us:
Bayesian Methods This is probably a more apt meme for us: Credit:unknown

A better understanding of the world
Bayesian Statistics The basic Bayesian philosophy: Prior Knowledge × Data = Updated Knowledge A better understanding of the world Prior × Data = Posterior

Bayesian Statistics Bayesian-ism can be a lot like a religion
There are different “sects” of Bayesians. The “fundamentalist” followers in each sect think the others are apostates and heretics… The major Bayesian “churches”: Bayes Nets Graphical Models Steffan Lauritzen (Oxford) Judea Pearl (UCLA) Parametric BUGS (Bayesian Using Gibbs Sampling) MCMC (Markov-Chain Monte Carlo) Andrew Gelman (Columbia) David Speigelhalter (Cambridge) Empirical Bayes Data-driven Brad Efron (Stanford) We’ll learn the basics of using these

Is this a “fair coin”?

Is this a “fair coin”? Before we gather any data on this coin’s flipping behavior, what do we believe about its probability to land on heads? Represent your beliefs about a parameter before you’ve gathered data as a prior (a priori) density over it

Some prior beliefs we may have about pHeads for the coin

Is this a “fair coin”? Now flip the coin and gather some data: x = 1 = “Heads”, 0 = “Tails” Based on this data and what we believed about pHeads before, what can we say about it now?

Is this a “fair coin”? The data (likelihood)
Our beliefs about pHeads after we gathered the data (a posteriori probability) Our beliefs about pHeads before we gathered the data (a priori probability)

Is this a “fair coin”? The likelihood of observing the data given pHeads is the data model Here, good models for the data are either the Bernoulli or Binomial likelihoods or

Is this a “fair coin”? The lets determine the posterior with a Beta(1,1) prior on pHeads and a Binomial likelihood model for the data Directed Acyclic Graph (DAG) representation: joint PDF

Is this a “fair coin”? The model is simple enough that we can obtain an analytical solution for the posterior:

Is this a “fair coin”? The model is simple enough that we can obtain an analytical solution for the posterior: Conjugate model: When the posterior is the same form as the prior.

At this point, what would you bet on, H or T?
Side note: the MLE for p At this point, what would you bet on, H or T? Given this model, why does the posterior look like “the data”?

Is this a “fair coin”? Most of the posteriors we will model will not have an analytical form. Picking picking any prior in general leads to an analytically intractable posterior

Is this a “fair coin”? Most of the posteriors we will model will not have an analytical form. For example: From the law of total probability. Can’t do this integral analytically...

Is this a “fair coin”? But, we can (often) get these posteriors numerically: General trick: Markov Chain Monte Carlo: MCMC

MCMC in a Nutshell But, we can (often) get these posteriors numerically: By specifying these MCMC allows us to sample proportionally from this We avoid having to explicitly evaluate any nasty integrals

Back to: Is this a “fair coin”?
data{ int n; int s; real mu; real <lower=0> sigma; } parameters{ real Z; model{ Z ~ normal(mu,sigma); s ~ binomial_logit(n,Z); generated quantities{ real pi; pi = inv_logit(Z); model{ # Likelihood: s ~ dbinom(n, ppi) # Prior: Z ~ dnorm(0, 1/(1.25^2)) ppi <- ilogit(Z) } Stan language (B)ayesian Inference (U)sing (G)ibbs (S)ampling BUGS language JAGS Dialect

Back to: Is this a “fair coin”?
Prior Posterior So do you believe the coin is fair after observing data?

A Glimpse Into Regression
It’s easy to expand into many other statistical methods within the Bayesian framework Key: all parameters of a model, instead of being unknown but fixed (frequentist), have distributions (Bayesian). These are given a priori distributions which are updated in light of the data xi and yi response variable error explanatory or predictor variable intercept regression coefficients

GC-Ethanol: Azevedo Peak Area Ratio (standardized) Concentration (standardized)

GC-Ethanol: Azevedo Best fit line: Simple linear regression Priors: Fairly uninformative, but realisticGelman Fairly uninformative, but realisticGelman A standard realistic choice Likelihood (Data model):

data { int<lower=0> N; vector[N] x; vector[N] y; } parameters { real beta0; // Intercept real <lower=0> beta1; // Slope real<lower=0> epsilon; // Residuals (noise) model { // Priors on regression coef, intercept and noise beta0 ~ cauchy(0,1); beta1 ~ cauchy(0,5); epsilon ~ normal(0,1); // Likelihood ("vectorized" form) y ~ normal(beta0 + beta1 * x, epsilon); Stan language simple linear regression

model { # Priors on regression coef, intercept and noise beta0 ~ dnorm(0,0.0001) beta1 ~ dnorm(0,0.0001) epsilon ~ dnorm(0,1) T(1.0E-8,1.0E12) tau <- 1/pow(epsilon,2) # Need precision for BUGS/JAGS # Likelihood for(i in 1:N) { mu[i] <- beta0 + beta1*x[i] y[i] ~ dnorm(mu[i], tau) } JAGS language simple linear regression

Priors

Posteriors p(b0|Data) p(b1|Data) p(ei|Data)

Lines From the Posterior
GC-Ethanol: Azevedo 95% Highest Posterior Density Intervals for Epost[yi|xi] Peak Area Ratio (standardized) Epost[yi|xi] Concentration (standardized)

Some First Cautions Bayesians will tell you the answer to your question, but you need a frequentist to tell you if they’re rightSaunders Though there are many opinions out there about “checking” your Bayesian model: Try multiple priors, Sensitivity analysis Posterior predictive checking Frequentist properties (See Efron)

Lines From the Posterior
GC-Ethanol: Azevedo 95% Highest Predictive Posterior Density Intervals for yi(xi) Peak Area Ratio (standardized) Epost[yi|xi] Concentration (standardized)

Bayesian Networks A “scenario” is represented by a joint probability function Contains variables relevant to a situation which represent uncertain information Contain “dependencies” between variables that describe how they influence each other. A graphical way to represent the joint probability function is with nodes and directed lines Called a Bayesian NetworkPearl

Bayesian Networks (A Very!!) Simple exampleWiki:
What is the probability the Grass is Wet? Influenced by the possibility of Rain Influenced by the possibility of Sprinkler action Sprinkler action influenced by possibility of Rain Construct joint probability function to answer questions about this scenario: Pr(Grass Wet, Rain, Sprinkler)

Bayesian Networks Pr(Sprinkler | Rain) Pr(Rain)
Rain: yes no Sprinkler: was on 40% 1% was off 60% 99% Pr(Rain) Rain: yes 20% no 80% Pr(Grass Wet | Rain, Sprinkler) Sprinkler: was on was off Rain: yes no Grass Wet: 99% 90% 80% 0% 1% 10% 100%

Bayesian Networks Pr(Sprinkler) Pr(Rain) Pr(Grass Wet)
Other probabilities are adjusted given the observation You observe grass is wet. Pr(Grass Wet)

Bayesian Networks Areas where Bayesian Networks are used
Medical recommendation/diagnosis IBM/Watson, Massachusetts General Hospital/DXplain Image processing Business decision support Boeing, Intel, United Technologies, Oracle, Philips Information search algorithms and on-line recommendation engines Space vehicle diagnostics NASA Search and rescue planning US Military Requires software. Some free stuff: GeNIe (University of Pittsburgh)G, SamIam (UCLA)S Hugin (Free only for a few nodes)H gR R-packagesgR

Bayesian Statistics Bayesian network for the provenance of a painting given trace evidence found on that painting

Hypothesis Testing Frequentist hypothesis testing:
Assume/derive a “null” probability model for a statistic E.g.: Sample averages follow a Gaussian curve Say sample statistic falls here “Wow”! That’s an unlikely value under the null hypothesis (small p-value)

Hypothesis Testing Bayesian hypothesis testing:
Assume/derive a “null” probability model for a statistic Assume an “alternative” probability model p(x|null) p(x|alt) Say sample statistic falls here

The “Bayesian Framework”
Bayes’ RuleAitken, Taroni: Hp = the prosecution’s hypothesis Hd = the defences’ hypothesis E = any evidence I = any background information

{ { { The “Bayesian Framework” Odd’s form of Bayes’ Rule:
Posterior odds in favour of prosecution’s hypothesis Likelihood Ratio Prior odds in favour of prosecution’s hypothesis Posterior Odds = Likelihood Ratio × Prior Odds

The likelihood ratio has largely come to be the main quantity of interest in their literature: A measure of how much “weight” or “support” the “evidence” gives to one hypothesis relative to the other Here, Hp relative to Hd Major Players: Evett, Aitken, Taroni, Champod Influenced by Dennis Lindley

Likelihood ratio ranges from 0 to infinity Points of interest on the LR scale: LR = 0 means evidence TOTALLY DOES NOT SUPPORT Hp in favour of Hd LR = 1 means evidence does not support either hypothesis more strongly LR = ∞ means evidence TOTALLY SUPPORTS Hp in favour of Hd

A standard verbal scale of LR “weight of evidence” IS IN NO WAY, SHAPE OR FORM, SETTLED IN THE STATISTICS LITERATURE! A popular verbal scale is due to Jefferys but there are others READ British R v. T footwear case!

Bayesian Networks Likelihood Ratio can be obtained from the BN once evidence is entered Use the odd’s form of Bayes’ Theorem: Probabilities of the theories after we entered the evidence Probabilities of the theories before we entered the evidence

Computing the LR from our painting provenance example:

Bayesian Methods What they are and how to use them in

Similar presentations

Presentation on theme: "Bayesian Methods What they are and how to use them in"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bayesian Methods What they are and how to use them in

Similar presentations

Presentation on theme: "Bayesian Methods What they are and how to use them in"— Presentation transcript:

Similar presentations

About project

Feedback