Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian models for fMRI data SPM Course Zurich 13-15 February 2013 With many thanks for slides & images to: FIL Methods group, particularly Guillaume.

Similar presentations


Presentation on theme: "Bayesian models for fMRI data SPM Course Zurich 13-15 February 2013 With many thanks for slides & images to: FIL Methods group, particularly Guillaume."— Presentation transcript:

1 Bayesian models for fMRI data SPM Course Zurich February 2013 With many thanks for slides & images to: FIL Methods group, particularly Guillaume Flandin and Jean Daunizeau The Reverend Thomas Bayes ( ) Klaas Enno Stephan Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering, University of Zurich & ETH Zurich Laboratory for Social & Neural Systems Research (SNS), University of Zurich Wellcome Trust Centre for Neuroimaging, University College London

2

3 Bayes‘ Theorem Reverend Thomas Bayes “Bayes‘ Theorem describes, how an ideally rational person processes information." Wikipedia Likelihood Prior Evidence Posterior

4 Given data y and parameters , the joint probability is: Eliminating p(y,  ) gives Bayes’ rule: Likelihood Prior Evidence Posterior Bayes’ Theorem

5 Bayesian inference: an animation

6 y y  Observation of data likelihood p(y|  ) prior distribution p(  ) likelihood p(y|  ) prior distribution p(  )  Formulation of a generative model  Update of beliefs based upon observations, given a prior state of knowledge Principles of Bayesian inference

7 Likelihood & Prior Posterior: Posterior mean = variance-weighted combination of prior mean and data mean Prior Likelihood Posterior Posterior mean & variance of univariate Gaussians

8 Likelihood & prior Posterior: Prior Likelihood Posterior Same thing – but expressed as precision weighting Relative precision weighting

9 Likelihood & Prior Posterior Relative precision weighting Prior Likelihood Posterior Same thing – but explicit hierarchical perspective

10  Why should I know about Bayesian stats? Because Bayesian principles are fundamental for statistical inference in general sophisticated analyses of (neuronal) systems contemporary theories of brain function

11 p-value: probability of observing data in the effect’s absence Limitations:  One can never accept the null hypothesis  Given enough data, one can always demonstrate a significant effect  Correction for multiple comparisons necessary Solution: infer posterior probability of the effect Problems of classical (frequentist) statistics

12 forward problem likelihood  prior inverse problem posterior distribution Generative models: Forward and inverse problems

13 Model inversion: Estimating neuronal mechanisms from brain activity measures EEG, MEG fMRI Dynamic causal modeling (DCM) Forward model: Predicting measured activity given a putative neuronal state Friston et al. (2003) NeuroImage

14 The Bayesian brain hypothesis & free-energy principle Change sensory input sensations – predictions Prediction error Change predictions ActionPerception Maximizing the evidence (of the brain's generative model) = minimizing the surprise about the data (sensory inputs). Friston et al. 2006, J Physiol Paris

15 events in the world associations volatility Individual hierarchical Bayesian learning sensory stimuli Mathys et al. 2011, Front. Hum. Neurosci.

16 Forward & lateral Backward & lateral input Forward recognition effects Backward generation effects De-correlating lateral interactions Lateral interactions mediating priors Stephan et al. 2006, Biol. Psychiatry Aberrant Bayesian message passing in schizophrenia: abnormal (precision-weighted) prediction errors  abnormal modulation of NMDAR-dependent synaptic plasticity at forward connections of cortical hierarchies Aberrant Bayesian message passing in schizophrenia: abnormal (precision-weighted) prediction errors  abnormal modulation of NMDAR-dependent synaptic plasticity at forward connections of cortical hierarchies

17  Why should I know about Bayesian stats? Because SPM is getting more and more Bayesian: Segmentation & spatial normalisation Posterior probability maps (PPMs) –1 st level: specific spatial priors –2 nd level: global spatial priors Dynamic Causal Modelling (DCM) Bayesian Model Selection (BMS) EEG: source reconstruction

18 RealignmentSmoothing Normalisation General linear model Statistical parametric map (SPM) Image time-series Parameter estimates Design matrix Template Kernel Gaussian field theory p <0.05 Statisticalinference Bayesian segmentation and normalisation Bayesian segmentation and normalisation Spatial priors on activation extent Spatial priors on activation extent Posterior probability maps (PPMs) Posterior probability maps (PPMs) Dynamic Causal Modelling Dynamic Causal Modelling

19 Spatial normalisation: Bayesian regularisation Deformations consist of a linear combination of smooth basis functions (3D DCT). Find maximum a posteriori (MAP) estimates: MAP: Deformation parameters “Difference” between template and source image Squared distance between parameters and their expected values (regularisation)

20 Template image Affine registration. (  2 = 472.1) Non-linear registration without regularisation. (  2 = 287.3) Non-linear registration using regularisation. (  2 = 302.7) Spatial normalisation: overfitting

21 Bayesian segmentation with empirical priors Goal: for each voxel, compute probability that it belongs to a particular tissue type, given its intensity Likelihood: Intensities are modelled by a mixture of Gaussian distributions representing different tissue classes (e.g. GM, WM, CSF). Priors: obtained from tissue probability maps (segmented images of 151 subjects). Goal: for each voxel, compute probability that it belongs to a particular tissue type, given its intensity Likelihood: Intensities are modelled by a mixture of Gaussian distributions representing different tissue classes (e.g. GM, WM, CSF). Priors: obtained from tissue probability maps (segmented images of 151 subjects). Ashburner & Friston 2005, NeuroImage p (tissue | intensity)  p (intensity | tissue) ∙ p (tissue)

22 General Linear Model: What are the priors? with In “classical” SPM, no priors (= “flat” priors) Full Bayes: priors are predefined Empirical Bayes: priors are estimated from the data, assuming a hierarchical generative model Parameters of one level = priors for distribution of parameters at lower level Parameters and hyperparameters at each level can be estimated using EM Bayesian fMRI analyses

23 Posterior Probability Maps (PPMs) Posterior distribution: probability of the effect given the data Posterior probability map: images of the probability that an activation exceeds some specified threshold , given the data y Two thresholds: activation threshold  : percentage of whole brain mean signal probability  that voxels must exceed to be displayed (e.g. 95%) Two thresholds: activation threshold  : percentage of whole brain mean signal probability  that voxels must exceed to be displayed (e.g. 95%) mean: size of effect precision: variability

24 2 nd level PPMs with global priors In the absence of evidence to the contrary, parameters will shrink to zero. In the absence of evidence to the contrary, parameters will shrink to zero. 1 st level (GLM): 2 nd level (shrinkage prior): 0 Heuristically: use the variance of mean- corrected activity over voxels as prior variance of  at any particular voxel.  (1) reflects regionally specific effects  assume that it is zero on average over voxels  variance of this prior is implicitly estimated by estimating  (2)

25 2 nd level PPMs with global priors 1 st level (GLM): 2 nd level (shrinkage prior): Compute C ε and C  via ReML/EM, and apply the usual rule for computing posterior mean & covariance for Gaussians: voxel-specific global  pooled estimate over voxels Friston & Penny 2003, NeuroImage

26 PPMs vs. SPMs LikelihoodPrior Posterior SPMsSPMs PPMsPPMs Bayesian test: Classical t-test:

27 PPMs and multiple comparisons Friston & Penny (2003): No need to correct for multiple comparisons: Thresholding a PPM at 95% confidence: in every voxel, the posterior probability of an activation   is  95%. At most, 5% of the voxels identified could have activations less than . Independent of the search volume, thresholding a PPM thus puts an upper bound on the false discovery rate. NB: being debated

28 PPMs vs.SPMs PPMs: Show activations greater than a given size SPMs: Show voxels with non-zero activations

29 PPMs: pros and cons One can infer that a cause did not elicit a response Inference is independent of search volume do not conflate effect- size and effect- variability One can infer that a cause did not elicit a response Inference is independent of search volume do not conflate effect- size and effect- variability Disadvantages Advantages Estimating priors over voxels is computationally demanding Practical benefits are yet to be established Thresholds other than zero require justification Estimating priors over voxels is computationally demanding Practical benefits are yet to be established Thresholds other than zero require justification

30 Model comparison and selection Given competing hypotheses on structure & functional mechanisms of a system, which model is the best? For which model m does p(y|m) become maximal? Which model represents the best balance between model fit and model complexity? Pitt & Miyung (2002) TICS

31 Model evidence: Bayesian model selection (BMS) accounts for both accuracy and complexity of the model a measure of generalizability all possible datasets y p(y|m) Gharamani, 2004 McKay 1992, Neural Comput. Penny et al. 2004a, NeuroImage Various approximations, e.g.: -negative free energy, AIC, BIC

32 Logarithm is a monotonic function Maximizing log model evidence = Maximizing model evidence In SPM2 & SPM5, interface offers 2 approximations: Akaike Information Criterion: Bayesian Information Criterion: Log model evidence = balance between fit and complexity Penny et al. 2004a, NeuroImage Approximations to the model evidence No. of parameters No. of data points

33 The (negative) free energy approximation Under Gaussian assumptions about the posterior (Laplace approximation):

34 The complexity term in F In contrast to AIC & BIC, the complexity term of the negative free energy F accounts for parameter interdependencies. The complexity term of F is higher –the more independent the prior parameters (  effective DFs) –the more dependent the posterior parameters –the more the posterior mean deviates from the prior mean NB: Since SPM8, only F is used for model selection !

35 Bayes factors positive value, [0;  [ But: the log evidence is just some number – not very intuitive! A more intuitive interpretation of model comparisons is made possible by Bayes factors: To compare two models, we could just compare their log evidences. B 12 p(m 1 |y)Evidence 1 to %weak 3 to %positive 20 to %strong  150  99% Very strong Kass & Raftery classification: Kass & Raftery 1995, J. Am. Stat. Assoc.

36 V1 V5 stim PPC M2 attention V1 V5 stim PPC M1 attention V1 V5 stim PPC M3 attention V1 V5 stim PPC M4 attention BF  2966  F = M2 better than M1 BF  12  F = M3 better than M2 BF  23  F = M4 better than M3 M1 M2 M3 M4 BMS in SPM8: an example

37 Thank you


Download ppt "Bayesian models for fMRI data SPM Course Zurich 13-15 February 2013 With many thanks for slides & images to: FIL Methods group, particularly Guillaume."

Similar presentations


Ads by Google