Presentation on theme: "Bayesian models for fMRI data"— Presentation transcript:
1 Bayesian models for fMRI data Klaas Enno StephanTranslational Neuromodeling Unit (TNU) Institute for Biomedical Engineering, University of Zurich & ETH ZurichLaboratory for Social & Neural Systems Research (SNS), University of ZurichWellcome Trust Centre for Neuroimaging, University College LondonWith many thanks for slides & images to:FIL Methods group,particularly Guillaume Flandin and Jean DaunizeauThe Reverend Thomas Bayes( )SPM Course Zurich13-15 February 2013
6 Principles of Bayesian inference Formulation of a generative modellikelihood p(y|)prior distribution p()Observation of datayUpdate of beliefs based upon observations, given a prior state of knowledge
7 Posterior mean & variance of univariate Gaussians Likelihood & PriorPosteriorPosterior:LikelihoodPriorPosterior mean = variance-weighted combination of prior mean and data mean
8 Same thing – but expressed as precision weighting Likelihood & priorPosteriorPosterior:LikelihoodPriorRelative precision weighting
9 Same thing – but explicit hierarchical perspective Likelihood & PriorPosteriorPosteriorLikelihoodPriorRelative precision weighting
10 Why should I know about Bayesian stats? Because Bayesian principles are fundamental forstatistical inference in generalsophisticated analyses of (neuronal) systemscontemporary theories of brain function
11 Problems of classical (frequentist) statistics p-value: probability of observing data in the effect’s absenceLimitations:One can never accept the null hypothesisGiven enough data, one can always demonstrate a significant effectCorrection for multiple comparisons necessarySolution: infer posterior probability of the effect
12 posterior distribution Generative models: Forward and inverse problemsforward problemlikelihood priorinverse problemposterior distribution
13 Dynamic causal modeling (DCM) EEG, MEGfMRIForward model:Predicting measured activity given a putative neuronal stateModel inversion:Estimating neuronal mechanisms from brain activity measuresFriston et al. (2003) NeuroImage
14 sensations – predictions The Bayesian brain hypothesis & free-energy principlesensations – predictionsPrediction errorChange sensory inputChangepredictionsActionPerceptionMaximizing the evidence (of the brain's generative model)= minimizing the surprise about the data (sensory inputs).Friston et al. 2006,J Physiol Paris
15 Individual hierarchical Bayesian learning volatilityassociationsevents in the worldsensory stimuliMathys et al. 2011, Front. Hum. Neurosci.
16 Aberrant Bayesian message passing in schizophrenia: abnormal (precision-weighted) prediction errorsabnormal modulation of NMDAR-dependent synaptic plasticity at forward connections of cortical hierarchiesBackward & lateralinputForward & lateralg: generative model: expectation of approximate recognition density: parameters of generative model (= connection strengths between levels): hyperparameters (= parameters encoding the uncertainty of the approximate recognition model): prediction errorForward recognition effectsDe-correlating lateral interactionsBackward generation effectsLateral interactionsmediating priorsStephan et al. 2006, Biol. Psychiatry
17 Why should I know about Bayesian stats? Because SPM is getting more and more Bayesian:Segmentation & spatial normalisationPosterior probability maps (PPMs)1st level: specific spatial priors2nd level: global spatial priorsDynamic Causal Modelling (DCM)Bayesian Model Selection (BMS)EEG: source reconstruction
18 Bayesian segmentation Posterior probability and normalisationSpatial priorson activation extentPosterior probabilitymaps (PPMs)Dynamic CausalModellingImage time-seriesStatistical parametric map (SPM)KernelDesign matrixRealignmentSmoothingGeneral linear modelStatisticalinferenceGaussianfield theoryNormalisationp <0.05TemplateParameter estimates
19 Spatial normalisation: Bayesian regularisation Deformations consist of a linear combination of smooth basis functions (3D DCT).Find maximum a posteriori (MAP) estimates:Deformation parametersMAP:“Difference” between template and source imageSquared distance between parameters and their expected values (regularisation)
21 Bayesian segmentation with empirical priors Goal: for each voxel, compute probability that it belongs to a particular tissue type, given its intensityLikelihood: Intensities are modelled by a mixture of Gaussian distributions representing different tissue classes (e.g. GM, WM, CSF).Priors: obtained from tissue probability maps (segmented images of 151 subjects).p (tissue | intensity)p (intensity | tissue) ∙ p (tissue)Ashburner & Friston 2005, NeuroImage
22 Bayesian fMRI analyses General Linear Model:withWhat are the priors?In “classical” SPM, no priors (= “flat” priors)Full Bayes: priors are predefinedEmpirical Bayes: priors are estimated from the data, assuming a hierarchical generative modelParameters of one level = priors for distribution of parameters at lower levelParameters and hyperparameters at each level can be estimated using EM
23 Posterior Probability Maps (PPMs) Posterior distribution: probability of the effect given the datamean: size of effect precision: variabilityPosterior probability map: images of the probability that an activation exceeds some specified threshold, given the data yTwo thresholds:activation threshold : percentage of whole brain mean signalprobability that voxels must exceed to be displayed (e.g. 95%)
24 2nd level PPMs with global priors 1st level (GLM):2nd level (shrinkage prior):Heuristically: use the variance of mean-corrected activity over voxels as prior variance of at any particular voxel.(1) reflects regionally specific effects assume that it is zero on average over voxels variance of this prior is implicitly estimated by estimating (2)In the absence of evidenceto the contrary, parameterswill shrink to zero.
25 2nd level PPMs with global priors 1st level (GLM):voxel-specific2nd level (shrinkage prior):global pooled estimateover voxelsCompute Cε and C via ReML/EM, and apply the usual rule for computing posterior mean & covariance for Gaussians:Friston & Penny 2003, NeuroImage
27 PPMs and multiple comparisons Friston & Penny (2003): No need to correct for multiple comparisons:Thresholding a PPM at 95% confidence: in every voxel, the posterior probability of an activation is 95%.At most, 5% of the voxels identified could have activations less than .Independent of the search volume, thresholding a PPM thus puts an upper bound on the false discovery rate.NB: being debated
28 PPMs vs.SPMs PPMs: Show activations greater than a given size SPMs: Show voxels with non-zero activations
29 PPMs: pros and cons Advantages Disadvantages One can infer that a cause did not elicit a responseInference is independent of search volumedo not conflate effect-size and effect-variabilityEstimating priors over voxels is computationally demandingPractical benefits are yet to be establishedThresholds other than zero require justification
30 Model comparison and selection Given competing hypotheses on structure & functional mechanisms of a system, which model is the best?Pitt & Miyung (2002) TICSWhich model represents the best balance between model fit and model complexity?For which model m does p(y|m) become maximal?
31 Bayesian model selection (BMS) Model evidence:Gharamani, 2004p(y|m)yall possible datasetsaccounts for both accuracy and complexity of the modelVarious approximations, e.g.:negative free energy, AIC, BICa measure of generalizabilityMcKay 1992, Neural Comput.Penny et al. 2004a, NeuroImage
32 Approximations to the model evidence Logarithm is a monotonic functionMaximizing log model evidence= Maximizing model evidenceLog model evidence = balance between fit and complexityNo. ofparametersIn SPM2 & SPM5, interface offers 2 approximations:No. ofdata pointsAkaike Information Criterion:Bayesian Information Criterion:Penny et al. 2004a, NeuroImage
33 The (negative) free energy approximation Under Gaussian assumptions about the posterior (Laplace approximation):
34 The complexity term in F In contrast to AIC & BIC, the complexity term of the negative free energy F accounts for parameter interdependencies.The complexity term of F is higherthe more independent the prior parameters ( effective DFs)the more dependent the posterior parametersthe more the posterior mean deviates from the prior meanNB: Since SPM8, only F is used for model selection !
35 Bayes factorsTo compare two models, we could just compare their log evidences.But: the log evidence is just some number – not very intuitive!A more intuitive interpretation of model comparisons is made possible by Bayes factors:positive value, [0;[B12p(m1|y)Evidence1 to 350-75%weak3 to 2075-95%positive20 to 15095-99%strong 150 99%Very strongKass & Raftery classification:Kass & Raftery 1995, J. Am. Stat. Assoc.
36 BMS in SPM8: an example M1 M2 M3 M4 attention PPC PPC BF 2966 M2 better than M1attentionstimV1V5stimV1V5M1M2M3M4V1V5stimPPCM3attentionM3 better than M2BF 12F = 2.450Posterior model probability in lower plot is a normalised probability:p(m_i|y) = p(y|m_i)/sum(p(y|m_i))Note that under flat model priors p(m_i|y) = p(y|m_i)V1V5stimPPCM4attentionM4 better than M3BF 23F = 3.144