Karl Friston, Andrew Holmes, Stefan Kiebel, Will Penny

Slides:



Advertisements
Similar presentations
Experimental Design Christian Ruff With thanks to: Rik Henson Daniel Glaser Christian Ruff With thanks to: Rik Henson Daniel Glaser.
Advertisements

Hierarchical Models and
1 st Level Analysis: design matrix, contrasts, GLM Clare Palmer & Misun Kim Methods for Dummies
Experimental Design Rik Henson With thanks to: Karl Friston, Andrew Holmes Experimental Design Rik Henson With thanks to: Karl Friston, Andrew Holmes.
Group analyses of fMRI data Methods & models for fMRI data analysis in neuroeconomics November 2010 Klaas Enno Stephan Laboratory for Social and Neural.
The General Linear Model Or, What the Hell’s Going on During Estimation?
Statistical Inference Rik Henson With thanks to: Karl Friston, Andrew Holmes, Stefan Kiebel, Will Penny Statistical Inference Rik Henson With thanks to:
Classical inference and design efficiency Zurich SPM Course 2014
The General Linear Model (GLM) Methods & models for fMRI data analysis in neuroeconomics November 2010 Klaas Enno Stephan Laboratory for Social & Neural.
07/01/15 MfD 2014 Xin You Tai & Misun Kim
The General Linear Model (GLM)
The General Linear Model (GLM) SPM Course 2010 University of Zurich, February 2010 Klaas Enno Stephan Laboratory for Social & Neural Systems Research.
Statistical Inference
Group analyses of fMRI data Methods & models for fMRI data analysis 28 April 2009 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.
Multiple comparison correction Methods & models for fMRI data analysis 29 October 2008 Klaas Enno Stephan Branco Weiss Laboratory (BWL) Institute for Empirical.
Group analyses of fMRI data Methods & models for fMRI data analysis 26 November 2008 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.
SPM short course – May 2003 Linear Models and Contrasts The random field theory Hammering a Linear Model Use for Normalisation T and F tests : (orthogonal.
General Linear Model & Classical Inference
The General Linear Model and Statistical Parametric Mapping Stefan Kiebel Andrew Holmes SPM short course, May 2002 Stefan Kiebel Andrew Holmes SPM short.
General Linear Model & Classical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM M/EEGCourse London, May.
2nd Level Analysis Jennifer Marchant & Tessa Dekker
2nd level analysis – design matrix, contrasts and inference
With many thanks for slides & images to: FIL Methods group, Virginia Flanagin and Klaas Enno Stephan Dr. Frederike Petzschner Translational Neuromodeling.
7/16/2014Wednesday Yingying Wang
SPM Course Zurich, February 2015 Group Analyses Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London With many thanks to.
SPM short course – Oct Linear Models and Contrasts Jean-Baptiste Poline Neurospin, I2BM, CEA Saclay, France.
Group analyses of fMRI data Methods & models for fMRI data analysis November 2012 With many thanks for slides & images to: FIL Methods group, particularly.
Bayesian Inference and Posterior Probability Maps Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course,
Wellcome Dept. of Imaging Neuroscience University College London
FMRI design and analysis Advanced designs. (Epoch) fMRI example… box-car function = 11 +  (t) voxel timeseries 22 + baseline (mean) (box-car unconvolved)
Classical Inference on SPMs Justin Chumbley SPM Course Oct 23, 2008.
Contrasts & Statistical Inference
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
The General Linear Model (for dummies…) Carmen Tur and Ashwani Jha 2009.
Methods for Dummies Second level Analysis (for fMRI) Chris Hardy, Alex Fellows Expert: Guillaume Flandin.
Statistical Inference Christophe Phillips SPM Course London, May 2012.
FMRI Modelling & Statistical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course Chicago, Oct.
SPM short – Mai 2008 Linear Models and Contrasts Stefan Kiebel Wellcome Trust Centre for Neuroimaging.
Multiple comparisons problem and solutions James M. Kilner
The general linear model and Statistical Parametric Mapping I: Introduction to the GLM Alexa Morcom and Stefan Kiebel, Rik Henson, Andrew Holmes & J-B.
Contrasts & Statistical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course London, October 2008.
The general linear model and Statistical Parametric Mapping II: GLM for fMRI Alexa Morcom and Stefan Kiebel, Rik Henson, Andrew Holmes & J-B Poline.
Bayesian Inference in SPM2 Will Penny K. Friston, J. Ashburner, J.-B. Poline, R. Henson, S. Kiebel, D. Glaser Wellcome Department of Imaging Neuroscience,
The General Linear Model Christophe Phillips SPM Short Course London, May 2013.
The General Linear Model Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM fMRI Course London, October 2012.
Group Analyses Guillaume Flandin SPM Course London, October 2016
The General Linear Model (GLM)
The general linear model and Statistical Parametric Mapping
The General Linear Model
2nd Level Analysis Methods for Dummies 2010/11 - 2nd Feb 2011
A word on correlation/estimability
and Stefan Kiebel, Rik Henson, Andrew Holmes & J-B Poline
The General Linear Model (GLM)
Contrasts & Statistical Inference
The General Linear Model
Statistical Parametric Mapping
Experimental Design Christian Ruff With thanks to: Rik Henson
The general linear model and Statistical Parametric Mapping
SPM2: Modelling and Inference
The General Linear Model
Hierarchical Models and
The General Linear Model (GLM)
Contrasts & Statistical Inference
Bayesian Inference in SPM2
The General Linear Model
Experimental Design Christian Ruff With slides from: Rik Henson
The General Linear Model (GLM)
The General Linear Model
The General Linear Model
Contrasts & Statistical Inference
Presentation transcript:

Karl Friston, Andrew Holmes, Stefan Kiebel, Will Penny Statistical Analysis Rik Henson With thanks to: Karl Friston, Andrew Holmes, Stefan Kiebel, Will Penny

Overview fMRI time-series Statistical Parametric Map General Linear Model Design matrix Parameter Estimates Smoothing kernel Motion correction Spatial normalisation Standard template

Some Terminology SPM (“Statistical Parametric Mapping”) is a massively univariate approach - meaning that a statistic (e.g., T-value) is calculated for every voxel - using the “General Linear Model” Experimental manipulations are specified in a model (“design matrix”) which is fit to each voxel to estimate the size of the experimental effects (“parameter estimates”) in that voxel… … on which one or more hypotheses (“contrasts”) are tested to make statistical inferences (“p-values”), correcting for multiple comparisons across voxels (using “Gaussian Field Theory”) The parametric statistics assume continuous-valued data and additive noise that conforms to a “normal” distribution (“nonparametric” versions of SPM eschew such assumptions)

Some Terminology SPM usually focused on “functional specialisation” - i.e. localising different functions to different regions in the brain One might also be interested in “functional integration” - how different regions (voxels) interact Multivariate approaches work on whole images and can identify spatial/temporal patterns over voxels, without necessarily specifying a design matrix (PCA, ICA)... … or with an experimental design matrix (PLS, CVA), or with an explicit anatomical model of connectivity between regions - “effective connectivity” - eg using Dynamic Causal Modelling

Overview 1. General Linear Model Design Matrix 2. fMRI timeseries Global normalisation 2. fMRI timeseries Highpass filtering HRF convolution Temporal autocorrelation 3. Statistical Inference Gaussian Field Theory 4. Random Effects 5. Experimental Designs 6. Effective Connectivity

Overview 1. General Linear Model Design Matrix 2. fMRI timeseries Global normalisation 2. fMRI timeseries Highpass filtering HRF convolution Temporal autocorrelation 3. Statistical Inference Gaussian Field Theory 4. Random Effects 5. Experimental Designs 6. Effective Connectivity

General Linear Model… Parametric statistics one sample t-test two sample t-test paired t-test Anova AnCova correlation linear regression multiple regression F-tests etc… all cases of the General Linear Model

General Linear Model y = Xb + e Equation for single (and all) voxels: yj = xj1 b1 + … + xjP bP + ej ej ~ N(0,s2) yj : data for scan, j = 1…N xjp : explanatory variables / covariates / regressors, p = 1…P bp : parameters / regression slopes / fixed effects ej : residual errors, independent & identically (normally) distributed Equivalent matrix form: y = Xb + e X : “design matrix” / model

Matrix Formulation Equation for scan j Simultaneous equations for scans 1..N(J) …that can be solved for parameters b1..P(L) Regressors Scans

General Linear Model (Estimation) Residual errors and estimated error variance are: r = y - Y 2 = rTr / df where df are the degrees of freedom (assuming iid): df = N - rank(X) (=N-P if X full rank) ( R = I - XX+ r = Ry df = trace(R) ) Estimate parameters from least squares fit to data, y: b = (XTX)-1XTy = X+y (OLS estimates) Fitted response is: Y = Xb ^ ^ ^

General Linear Model (Inference) Specify contrast (hypothesis), c, a linear combination of parameter estimates, cT b c = [1 -1 0 0] T ^ Calculate T-stastistic for that contrast: T(N-p) = cTb / var(cTb) = cTb / sqrt(2cT(XTX)-1c) (c is a vector), or an F-statistic: F(p-p0,N-p) = [(r0Tr0 – rTr) / (p-p0)] / [rTr / (N-P)] where r0 and p0 are parameters of the reduced model specified by c (which is a matrix) Prob. of falsely rejecting Null hypothesis, H0: cTb=0 (“p-value”) ^ c = [ 2 -1 -1 0 -1 2 -1 0 -1 -1 2 0] F

Example PET experiment 12 scans, 3 conditions (1-way ANOVA) yj = x1j b1 + x2j b2 + x3j b3 + x4j b4 + ej where (dummy) variables: x1j = [0,1] = condition A (first 4 scans) x2j = [0,1] = condition B (second 4 scans) x3j = [0,1] = condition C (third 4 scans) x4j = [1] = grand mean T-contrast : [1 -1 0 0] tests whether A>B [-1 1 0 0] tests whether B>A F-contrast: [ 2 -1 -1 0 -1 2 -1 0 -1 -1 2 0] tests main effect of A,B,C rank(X)=3 11 9 12 8 21 19 22 18 31 29 32 28 1 0 0 1 0 1 0 1 0 0 1 1 = -10 10 20 + 1 -1 2 -2 c=[-1 1 0 0], T=10/sqrt(3.3*8) df=12-3=9, T(9)=1.94, p<.05

Global Effects May be variation in PET tracer dose from scan to scan Such “global” changes in image intensity (gCBF) confound local / regional (rCBF) changes of experiment Adjust for global effects by: - AnCova (Additive Model) - PET? - Proportional Scaling - fMRI? Can improve statistics when orthogonal to effects of interest (as here)… …but can also worsen when effects of interest correlated with global (as next) global global AnCova global Scaling

Global Effects (AnCova) 12 scans, 3 conditions, 1 confounding covariate yj = x1j b1 + x2j b2 + x3j b3 + x4j b4 + x5j b5 + ej where (dummy) variables: x1j = [0,1] = condition A (first 4 scans) x2j = [0,1] = condition B (second 4 scans) x3j = [0,1] = condition C (third 4 scans) x4j = grand mean x5j = global signal (mean over all voxels) (further mean-corrected over all scans) Global correlated here with conditions (and time) Global estimate can be scaled to, eg, 50ml/min/dl b1 b2 b3 b4 b5 rank(X)=4 11 9 12 8 21 19 22 18 31 29 32 28 1 0 0 1 -1 0 1 0 1 0 0 0 1 1 1 = 1.7 5.0 8.3 15 6.7 + 1 -1 2 -2 c=[-1 1 0 0], T=3.3/sqrt(3.8*8) df=12-4=8, T(8)=0.61, p>.05

Global Effects (fMRI) Two types of scaling: Grand Mean scaling and Global scaling Grand Mean scaling is automatic, global scaling is optional Grand Mean scales by 100/mean over all voxels and ALL scans (i.e, single number per session) Global scaling scales by 100/mean over all voxels for EACH scan (i.e, a different scaling factor every scan) Problem with global scaling is that TRUE global is not (normally) known… …we only estimate it by the mean over voxels So if there is a large signal change over many voxels, the global estimate will be confounded by local changes This can produce artifactual deactivations in other regions after global scaling Since most sources of global variability in fMRI are low frequency (drift), high-pass filtering may be sufficient, and many people to not use global scaling

A word on correlation/estimability If any column of X is a linear combination of any others (X is rank deficient), some parameters cannot be estimated uniquely (inestimable) … which means some contrasts cannot be tested (eg, only if sum to zero) This has implications for whether “baseline” (constant term) is explicitly or implicitly modelled rank(X)=2 A B A+B  cm = [1 0 0] cd = [1 -1 0]  A B “implicit” cm = [1 0] cd = [1 -1]  b1 = 1.6 b2 = 0.7 cd*b = [1 -1]*b = 0.9 A A+B “explicit” b1 = 0.9 b2 = 0.7 cd = [1 0]  cd*b = [1 0]*b = 0.9

A word on correlation/estimability If any column of X is a linear combination of any others (X is rank deficient), some parameters cannot be estimated uniquely (inestimable) … which means some contrasts cannot be tested (eg, only if sum to zero) This has implications for whether “baseline” (constant term) is explicitly or implicitly modelled (rank deficiency might be thought of as perfect correlation…) rank(X)=2 A B A+B  cm = [1 0 0] cd = [1 -1 0]  A B “implicit” A A+B “explicit” T = 1 1 0 1 X(1) * T = X(2) c(1) * T = c(2) [ 1 -1 ] * = [ 1 0 ] 1 1 0 1

A word on correlation/estimability When there is high (but not perfect) correlation between regressors, parameters can be estimated… …but the estimates will be inefficient estimated (ie highly variable) …meaning some contrasts will not lead to very powerful tests A B A+B  cm = [1 0 0] cd = [1 -1 0]  A B A+B convolved with HRF! cm = [1 0 0] cd = [1 -1 0]  () SPM shows pairwise correlation between regressors, but this will NOT tell you that, eg, X1+X2 is highly correlated with X3… … so some contrasts can still be inefficient, even though pairwise correlations are low

A word on orthogonalisation To remove correlation between two regressors, you can explicitly orthogonalise one (X1) with respect to the other (X2): X1^ = X1 – (X2X2+)X1 (Gram-Schmidt) Paradoxically, this will NOT change the parameter estimate for X1, but will for X2 In other words, the parameter estimate for the orthogonalised regressor is unchanged! This reflects fact that parameter estimates automatically reflect orthogonal component of each regressor… …so no need to orthogonalise, UNLESS you have a priori reason for assigning common variance to the other regressor Y X2 X1 b2^ X1^ b2 b1

A word on orthogonalisation X1 X2 b1 = 0.9 b2 = 0.7 X1 X2^ Orthogonalise X2 (Model M1) Orthogonalise X1 (Model M2) X1^ X2 b1(M2) = 0.9 b2(M2) = 1.15 = b1(M1) – b2(M1) = ( b1(M1) + b2(M1) )/2 b1(M1) = 1.6 b2(M1) = 0.7 T = 0.5 1 -0.5 1

Overview 1. General Linear Model Design Matrix 2. fMRI timeseries Global normalisation 2. fMRI timeseries Highpass filtering HRF convolution Temporal autocorrelation 3. Statistical Inference Gaussian Field Theory 4. Random Effects 5. Experimental Designs 6. Effective Connectivity

fMRI Analysis Scans are treated as a timeseries… … and can be filtered to remove low-frequency (1/f) noise Effects of interest are convolved with haemodynamic (BOLD) response function (HRF), to capture sluggish nature of response Scans can no longer be treated as independent observations… … they are typically temporally autocorrelated (for TRs<8s)

fMRI Analysis Scans are treated as a timeseries… … and can be filtered to remove low-frequency (1/f) noise Effects of interest are convolved with haemodynamic (BOLD) response function (HRF), to capture sluggish nature of response Scans can no longer be treated as independent observations… … they are typically temporally autocorrelated (for TRs<8s)

(Epoch) fMRI example… = b1 b2 + + (t) voxel timeseries box-car function = b1 b2 + baseline (mean) + (t) (box-car unconvolved)

(Epoch) fMRI example… y b1 b2   +  = X (voxel time series) data vector b1 b2  parameters  +  error vector = X design matrix

Low frequency noise Low frequency noise: Physical (scanner drifts) aliasing Low frequency noise: Physical (scanner drifts) Physiological (aliased) cardiac (~1 Hz) respiratory (~0.25 Hz) power spectrum highpass filter power spectrum noise signal (eg infinite 30s on-off)

(Epoch) fMRI example… ...with highpass filter 1 2 3 4 5 6 7 8 9 parameters   + error vector  = design matrix X data vector y

(Epoch) fMRI example… …fitted and adjusted data Raw fMRI timeseries Adjusted data fitted box-car highpass filtered (and scaled) fitted high-pass filter Residuals

fMRI Analysis Scans are treated as a timeseries… … and can be filtered to remove low-frequency (1/f) noise Effects of interest are convolved with haemodynamic (BOLD) response function (HRF), to capture sluggish nature of response Scans can no longer be treated as independent observations… … they are typically temporally autocorrelated (for TRs<8s)

Convolution with HRF  = Unconvolved fit Residuals Convolved fit hæmodynamic response  convolved with HRF = Boxcar function Convolved fit Residuals (less structure)

fMRI Analysis Scans are treated as a timeseries… … and can be filtered to remove low-frequency (1/f) noise Effects of interest are convolved with haemodynamic (BOLD) response function (HRF), to capture sluggish nature of response Scans can no longer be treated as independent observations… … they are typically temporally autocorrelated (for TRs<8s)

Temporal autocorrelation… Because the data are typically correlated from one scan to the next, one cannot assume the degrees of freedom (dfs) are simply the number of scans minus the dfs used in the model – need “effective degrees of freedom” In other words, the residual errors are not independent: Y = Xb + e e ~ N(0,s2V) V  I, V=AA' where A is the intrinsic autocorrelation Generalised least squares: KY = KXb + Ke Ke ~ N(0, s2V) V = KAA'K' (autocorrelation is a special case of “nonsphericity”…)

Temporal autocorrelation (History) KY = KXb + Ke Ke ~ N(0, s2V) V = KAA'K' One method is to estimate A, using, for example, an AR(p) model, then: K = A-1 V = I (allows OLS) This “pre-whitening” is sensitive, but can be biased if K mis-estimated Another method (SPM99) is to smooth the data with a known autocorrelation that swamps any intrinsic autocorrelation: K = S V = SAA'S’ ~ SS' (use GLS) Effective degrees of freedom calculated with Satterthwaite approximation ( df = trace(RV)2/trace(RVRV) ) This is more robust (providing the temporal smoothing is sufficient, eg 4s FWHM Gaussian), but less sensitive Most recent method (SPM2) is to restrict K to highpass filter, and estimate residual autocorrelation A using voxel-wide, one-step ReML…

Nonsphericity and ReML (SPM2) New in SPM2 Nonsphericity and ReML (SPM2) Scans cov(e) spherical Nonsphericity means (kind of) that: Ce = cov(e)  s2I Nonsphericity can be modelled by set of variance components: Ce = 1Q1 + 2Q2 + 3Q3 ... (i are hyper-parameters) - Non-identical (inhomogeneous): (e.g, two groups of subjects) - Non-independent (autocorrelated): (e.g, white noise + AR(1))

Nonsphericity and ReML (SPM2) New in SPM2 Nonsphericity and ReML (SPM2) Joint estimation of parameters and hyper-parameters requires ReML ReML gives (Restricted) Maximum Likelihood (ML) estimates of (hyper)parameters, rather than Ordinary Least Square (OLS) estimates ML estimates are more efficient, entail exact dfs (no Satterthwaite approx)… …but computationally expensive: ReML is iterative (unless only one hyper-parameter) To speed up: Correlation of errors (V) estimated by pooling over voxels Covariance of errors (s2V) estimated by single, voxel-specific scaling hyperparameter Ce = ReML( yyT, X, Q ) ^ bOLS = (XTX)-1XTy (= X+y) bML = (XTCe-1X)-1XTCe-1y V = ReML(  yjyjT, X, Q )

Nonsphericity and ReML (SPM2) New in SPM2 Nonsphericity and ReML (SPM2) Voxels to be pooled collected by first-pass through data (OLS) (biased if correlation structure not stationary across voxels?) Correlation structure V estimated iteratively using ReML once, pooling over all voxels Remaining hyper-parameter estimated using V and ReML noniteratively, for each voxel Estimation of nonsphericity is used to pre-whiten the data and design matrix, W=V-1/2 (or by KW, if highpass filter K present) (which is why design matrices in SPM2 can differ from those in SPM99 after estimation) X B W WX

The Full-Monty T-test (SPM2) New in SPM2 The Full-Monty T-test (SPM2)

Overview 1. General Linear Model Design Matrix 2. fMRI timeseries Global normalisation 2. fMRI timeseries Highpass filtering HRF convolution Temporal autocorrelation 3. Statistical Inference Gaussian Field Theory 4. Random Effects 5. Experimental Designs 6. Effective Connectivity

Multiple comparisons… If n=100,000 voxels tested with pu=0.05 of falsely rejecting Ho... …then approx n  pu (eg 5,000) will do so by chance (false positives, or “type I” errors) Therefore need to “correct” p-values for number of comparisons A severe correction would be a Bonferroni, where pc = pu /n… …but this is only appropriate when the n tests independent… … SPMs are smooth, meaning that nearby voxels are correlated => Gaussian Field Theory... SPM{t} Eg random noise Gaussian 10mm FWHM (2mm pixels) pu = 0.05

Gaussian Field Theory Consider SPM as lattice representation of continuous random field “Euler characteristic” - topological measure of “excursion set” (e.g, # components - # “holes”) Smoothness estimated by covariance of partial derivatives of residuals (expressed as “resels” or FWHM) Assumes: 1) residuals are multivariate normal 2) smoothness » voxel size (practically, FWHM  3  VoxDim) Not necessarily stationary: smoothness estimated locally as resels-per-voxel

E[(WAu)] = S Rd (W) rd (u) Generalised Form General form for expected Euler characteristic for D dimensions: E[(WAu)] = S Rd (W) rd (u) Rd (W): d-dimensional Minkowski – function of dimension, d, space W and smoothness: R0(W) = (W) Euler characteristic of W R1(W) = resel diameter R2(W) = resel surface area R3(W) = resel volume rd (W): d-dimensional EC density of Z(x) – function of dimension, d, threshold, u, and statistic, e.g. Z-statistic: r0(u) = 1- (u) r1(u) = (4 ln2)1/2 exp(-u2/2) / (2p) r2(u) = (4 ln2) exp(-u2/2) / (2p)3/2 r3(u) = (4 ln2)3/2 (u2 -1) exp(-u2/2) / (2p)2 r4(u) = (4 ln2)2 (u3 -3u) exp(-u2/2) / (2p)5/2

set-level: P(c  3, n  k, t  u) = 0.019 Levels of Inference Three levels of inference: extreme voxel values voxel-level inference big suprathreshold clusters cluster-level inference many suprathreshold clusters set-level inference Omnibus: P(c  7, t  u) = 0.031 voxel-level: P(t  4.37) = .048 n=82 n=32 n=12 set-level: P(c  3, n  k, t  u) = 0.019 cluster-level: P(n  82, t  u) = 0.029 Parameters: “Height” threshold, u - t > 3.09 “Extent” threshold, k - 12 voxels Dimension, D - 3 Volume, S - 323 voxels Smoothness, FWHM - 4.7 voxels

(Spatial) Specificity vs. Sensitivity

Small-volume correction If have an a priori region of interest, no need to correct for whole- brain! But can use GFT to correct for a Small Volume (SVC) Volume can be based on: An anatomically-defined region A geometric approximation to the above (eg rhomboid/sphere) A functionally-defined mask (based on an ORTHOGONAL contrast!) Extent of correction can be APPROXIMATED by a Bonferonni correction for the number of resels… ..but correction also depends on shape (surface area) as well as size (volume) of region (may want to smooth volume if rough)

Example SPM window

Overview 1. General Linear Model Design Matrix 2. fMRI timeseries Global normalisation 2. fMRI timeseries Highpass filtering HRF convolution Temporal autocorrelation 3. Statistical Inference Gaussian Field Theory 4. Random Effects 5. Experimental Designs 6. Effective Connectivity

Fixed vs. Random Effects Subjects can be Fixed or Random variables If subjects are a Fixed variable in a single design matrix (SPM “sessions”), the error term conflates within- and between-subject variance In PET, this is not such a problem because the within-subject (between-scan) variance can be as great as the between-subject variance; but in fMRI the between-scan variance is normally much smaller than the between-subject variance If one wishes to make an inference from a subject sample to the population, one needs to treat subjects as a Random variable, and needs a proper mixture of within- and between-subject variance In SPM, this is achieved by a two-stage procedure: 1) (Contrasts of) parameters are estimated from a (Fixed Effect) model for each subject 2) Images of these contrasts become the data for a second design matrix (usually simple t-test or ANOVA) Multi-subject Fixed Effect model Subject 1 error df ~ 300 Subject 2 Subject 3 Subject 4 Subject 5 Subject 6

Two-stage “Summary Statistic” approach 1st-level (within-subject) 2nd-level (between-subject) b1 ^ b2 b3 b4 b5 b6 contrast images of cbi  N=6 subjects (error df =5) One-sample t-test 1 ^ 2 3 4 5 6 w = within-subject error p < 0.001 (uncorrected) SPM{t} ^ bpop  WHEN special case of n independent observations per subject: var(bpop) = 2b / N + 2w / Nn

Limitations of 2-stage approach New in SPM2 Limitations of 2-stage approach Summary statistic approach is a special case, valid only when each subject’s design matrix is identical (“balanced designs”) In practice, the approach is reasonably robust to unbalanced designs (Penny, 2004) More generally, exact solutions to any hierarchical GLM can be obtained using ReML This is computationally expensive to perform at every voxel (so not implemented in SPM2) Plus modelling of nonsphericity at 2nd-level can minimise potential bias of unbalanced designs…

Inhomogeneous variance New in SPM2 Nonsphericity again! When tests at 2nd-level are more complicated than 1/2-sample t-tests, errors can be non i.i.d For example, two groups (e.g, patients and controls) may have different variances (non-identically distributed; inhomogeniety of variance) Or when taking more than one parameter per subject (repeated measures, e.g, multiple basis functions in event-related fMRI), errors may be non-independent (If nonsphericity correction selected, inhomogeniety assumed, and further option for repeated measures) Same method of variance component estimation with ReML (that used for autocorrelation) is used (Greenhouse-Geisser correction for repeated- measures ANOVAs is a special case approximation) Inhomogeneous variance (3 groups of 4 subjects) Q 1 2 3 Repeated measures (3 groups of 4 subjects) Q

Hierarchical Models y = X(1) (1) + e(1) (1) = X(2) (2) + e(2) … New in SPM2 Hierarchical Models Two-stage approach is special case of Hierarchical GLM In a Bayesian framework, parameters of one level can be made priors on distribution of parameters at lower level: “Parametric Empirical Bayes” (Friston et al, 2002) The parameters and hyperparameters at each level can be estimated using EM algorithm (generalisation of ReML) Note parameters and hyperparameters at final level do not differ from classical framework Second-level could be subjects; it could also be voxels… y = X(1) (1) + e(1) (1) = X(2) (2) + e(2) … (n-1) = X(n) (n) + e(n) Ce(i) =  k(i) Qk(i)

Parametric Empirical Bayes & PPMs New in SPM2 Parametric Empirical Bayes & PPMs Bayes rule: p(|y) = p(y|) p() Posterior Likelihood Prior (PPM) (SPM) What are the priors? In “classical” SPM, no (flat) priors In “full” Bayes, priors might be from theoretical arguments, or from independent data In “empirical” Bayes, priors derive from same data, assuming a hierarchical model for generation of that data

Parametric Empirical Bayes & PPMs New in SPM2 Parametric Empirical Bayes & PPMs Bayes rule: p(|y) = p(y|) p() Classical T-test Posterior Likelihood Prior (PPM) (SPM) For PPMs in SPM2, priors come from distribution over voxels If remove mean over voxels, prior mean can be set to zero (a “shrinkage” prior) One can threshold posteriors for a given probability of a parameter estimate greater than some value … …to give a posterior probability map (PPM) Bayesian test

Parametric Empirical Bayes & PPMs New in SPM2 Parametric Empirical Bayes & PPMs Activations greater than certain amount Voxels with non-zero activations Can infer no responses Cannot “prove the null hypothesis” No fallacy of inference Fallacy of inference (large df) Inference independent of search volume Correct for search volume Computationally expensive Computationally faster

Overview 1. General Linear Model Design Matrix 2. fMRI timeseries Global normalisation 2. fMRI timeseries Highpass filtering HRF convolution Temporal autocorrelation 3. Statistical Inference Gaussian Field Theory 4. Random Effects 5. Experimental Designs 6. Effective Connectivity

A taxonomy of design Categorical designs Parametric designs Subtraction - Additive factors and pure insertion Conjunction - Testing multiple hypotheses Parametric designs Linear - Cognitive components and dimensions Nonlinear - Polynomial expansions Factorial designs Categorical - Interactions and pure insertion - Adaptation, modulation and dual-task inference Parametric - Linear and nonlinear interactions - Psychophysiological Interactions

A taxonomy of design Categorical designs Parametric designs Subtraction - Additive factors and pure insertion Conjunction - Testing multiple hypotheses Parametric designs Linear - Cognitive components and dimensions Nonlinear - Polynomial expansions Factorial designs Categorical - Interactions and pure insertion - Adaptation, modulation and dual-task inference Parametric - Linear and nonlinear interactions - Psychophysiological Interactions

A categorical analysis Experimental design Word generation G Word repetition R R G R G R G R G R G R G G - R = Intrinsic word generation …under assumption of pure insertion, ie, that G and R do not differ in other ways

A taxonomy of design Categorical designs Parametric designs Subtraction - Additive factors and pure insertion Conjunction - Testing multiple hypotheses Parametric designs Linear - Cognitive components and dimensions Nonlinear - Polynomial expansions Factorial designs Categorical - Interactions and pure insertion - Adaptation, modulation and dual-task inference Parametric - Linear and nonlinear interactions - Psychophysiological Interactions

Cognitive Conjunctions One way to minimise problem of pure insertion is to isolate same process in several different ways (ie, multiple subtractions of different conditions) Task (1/2) Viewing Naming Stimuli (A/B) Objects Colours A1 A2 B2 B1 Visual Processing V Object Recognition R Phonological Retrieval P Object viewing R,V Colour viewing V Object naming P,R,V Colour naming P,V (Object - Colour viewing) [1 -1 0 0] & (Object - Colour naming) [0 0 1 -1] [ R,V - V ] & [ P,R,V - P,V ] = R & R = R (assuming RxP = 0; see later) Common object recognition response (R) Price et al, 1997

Cognitive Conjunctions New in SPM2 Cognitive Conjunctions Original (SPM97) definition of conjunctions entailed sum of two simple effects (A1-A2 + B1-B2) plus exclusive masking with interaction (A1-A2) - (B1-B2) Ie, “effects significant and of similar size” (Difference between conjunctions and masking is that conjunction p-values reflect the conjoint probabilities of the contrasts) SPM2 defintion of conjunctions uses advances in Gaussian Field Theory (e.g, T2 fields), allowing corrected p-values However, the logic has changed slightly, in that voxels can survive a conjunction even though they show an interaction A1-A2 B1-B2 p((A1-A2)= (B1-B2))>P2 p(A1=A2+B1=B2)<P1 + p(A1=A2)<p A1-A2 B1-B2 p(B1=B2)<p

A taxonomy of design Categorical designs Parametric designs Subtraction - Additive factors and pure insertion Conjunction - Testing multiple hypotheses Parametric designs Linear - Cognitive components and dimensions Nonlinear - Polynomial expansions Factorial designs Categorical - Interactions and pure insertion - Adaptation, modulation and dual-task inference Parametric - Linear and nonlinear interactions - Psychophysiological Interactions

Nonlinear parametric responses SPM{F} E.g, F-contrast [0 1 0] on Quadratic Parameter => Quadratic Linear Inverted ‘U’ response to increasing word presentation rate in the DLPFC Polynomial expansion: f(x) ~ b1 x + b2 x2 + ... …(N-1)th order for N levels

A taxonomy of design Categorical designs Parametric designs Subtraction - Additive factors and pure insertion Conjunction - Testing multiple hypotheses Parametric designs Linear - Cognitive components and dimensions Nonlinear - Polynomial expansions Factorial designs Categorical - Interactions and pure insertion - Adaptation, modulation and dual-task inference Parametric - Linear and nonlinear interactions - Psychophysiological Interactions

Interactions and pure insertion Presence of an interaction can show a failure of pure insertion (using earlier example)… A1 A2 B2 B1 Task (1/2) Viewing Naming Stimuli (A/B) Objects Colours Visual Processing V Object Recognition R Phonological Retrieval P Object viewing R,V Colour viewing V Object naming P,R,V,RxP Colour naming P,V Naming-specific object recognition viewing naming Object - Colour (Object – Colour) x (Viewing – Naming) [1 -1 0 0] - [0 0 1 -1] = [1 -1]  [1 -1] = [1 -1 -1 1] [ R,V - V ] - [ P,R,V,RxP - P,V ] = R – R,RxP = RxP

A taxonomy of design Categorical designs Parametric designs Subtraction - Additive factors and pure insertion Conjunction - Testing multiple hypotheses Parametric designs Linear - Cognitive components and dimensions Nonlinear - Polynomial expansions Factorial designs Categorical - Interactions and pure insertion - Adaptation, modulation and dual-task inference Parametric - Linear and nonlinear interactions - Psychophysiological Interactions

Psycho-physiological Interaction (PPI) Parametric, factorial design, in which one factor is psychological (eg attention) ...and other is physiological (viz. activity extracted from a brain region of interest) SPM{Z} V1 activity Attention time V1 attention V5 V5 activity no attention Attentional modulation of V1 - V5 contribution V1 activity

Psycho-physiological Interaction (PPI) New in SPM2 Psycho-physiological Interaction (PPI) PPIs tested by a GLM with form: y = (V1A).b1 + V1.b2 + A.b3 + e c = [1 0 0] However, the interaction term of interest, V1A, is the product of V1 activity and Attention block AFTER convolution with HRF We are really interested in interaction at neural level, but: (HRF  V1)  (HRF  A)  HRF  (V1  A) (unless A low frequency, eg, blocked; so problem for event-related PPIs) SPM2 can effect a deconvolution of physiological regressors (V1), before calculating interaction term and reconvolving with the HRF Deconvolution is ill-constrained, so regularised using smoothness priors (using ReML)

Overview 1. General Linear Model Design Matrix 2. fMRI timeseries Global normalisation 2. fMRI timeseries Highpass filtering HRF convolution Temporal autocorrelation 3. Statistical Inference Gaussian Field Theory 4. Random Effects 5. Experimental Designs 6. Effective Connectivity

Effective vs. functional connectivity Correlations: A B C 1 0.49 1 0.30 0.12 1 No connection between B and C, yet B and C correlated because of common input from A, eg: A = V1 fMRI time-series B = 0.5 * A + e1 C = 0.3 * A + e2 Functional connectivity A B C 0.49 0.31 Effective connectivity -0.02 2=0.5, ns. Model Correlations (Normalised ts) Analysis of correlation structure proper effcon (SEM) fallacy of wrong models

Dynamic Causal Modelling New in SPM2 Dynamic Causal Modelling PPIs allow a simple (restricted) test of effective connectivity Structural Equation Modelling is more powerful (Buchel & Friston, 1997) However in SPM2, Dynamic Causal Modelling (DCM) is preferred DCMs are dynamic models specified at the neural level The neural dynamics are transformed into predicted BOLD signals using a realistic biological haemodynamic forward model (HDM) The neural dynamics comprise a deterministic state-space model and a bilinear approximation to model interactions between variables

Dynamic Causal Modelling New in SPM2 Dynamic Causal Modelling The variables consist of: connections between regions self-connections direct inputs (eg, visual stimulations) contextual inputs (eg, attention) Connections can be bidirectional Variables estimated using EM algorithm Priors are: empirical (for haemodynamic model) principled (dynamics to be convergent) shrinkage (zero-mean, for connections) Inference using posterior probabilities Methods for Bayesian model comparison direct inputs - u1 (e.g. visual stimuli) contextual inputs - u2 (e.g. attention) z1 V1 z2 V5 z3 SPC y1 y2 y3 z = f(z,u,z)  Az + uBz + Cu y = h(z,h) + e z = state vector u = inputs  = parameters (connection/haemodynamic) .

Dynamic Causal Modelling New in SPM2 Dynamic Causal Modelling stimuli u1 context u2 u1  - + - Z1 u2 + z1 + Z2 - z2 - 

Dynamic Causal Modelling New in SPM2 Dynamic Causal Modelling V1 IFG V5 SPC Motion Photic Attention .82 (100%) .42 .37 (90%) .69 (100%) .47 .65 (100%) .52 (98%) .56 (99%) Friston et al. (2003) Effects Photic – dots vs fixation Motion – moving vs static Attenton – detect changes Büchel & Friston (1997) Attention modulates the backward-connections IFG→SPC and SPC→V5 The intrinsic connection V1→V5 is insignificant in the absence of motion

Some References Friston KJ, Holmes AP, Worsley KJ, Poline J-B, Frith CD, Frackowiak RSJ (1995) Statistical parametric maps in functional imaging: A general linear approach” Human Brain Mapping 2:189-210 Worsley KJ & Friston KJ (1995) Analysis of fMRI time series revisited — again” NeuroImage 2:173-181 Friston KJ, Josephs O, Zarahn E, Holmes AP, Poline J-B (2000) “To smooth or not to smooth” NeuroImage Zarahn E, Aguirre GK, D'Esposito M (1997) “Empirical Analyses of BOLD fMRI Statistics” NeuroImage 5:179-197 Holmes AP, Friston KJ (1998) “Generalisability, Random Effects & Population Inference” NeuroImage 7(4-2/3):S754 Worsley KJ, Marrett S, Neelin P, Evans AC (1992) “A three-dimensional statistical analysis for CBF activation studies in human brain”Journal of Cerebral Blood Flow and Metabolism 12:900-918 Worsley KJ, Marrett S, Neelin P, Vandal AC, Friston KJ, Evans AC (1995) “A unified statistical approach for determining significant signals in images of cerebral activation” Human Brain Mapping 4:58-73 Friston KJ, Worsley KJ, Frackowiak RSJ, Mazziotta JC, Evans AC (1994) Assessing the Significance of Focal Activations Using their Spatial Extent” Human Brain Mapping 1:214-220 Cao J (1999) The size of the connected components of excursion sets of 2, t and F fields” Advances in Applied Probability (in press) Worsley KJ, Marrett S, Neelin P, Evans AC (1995) Searching scale space for activation in PET images” Human Brain Mapping 4:74-90 Worsley KJ, Poline J-B, Vandal AC, Friston KJ (1995) Tests for distributed, non-focal brain activations” NeuroImage 2:183-194 Friston KJ, Holmes AP, Poline J-B, Price CJ, Frith CD (1996) Detecting Activations in PET and fMRI: Levels of Inference and Power” Neuroimage 4:223-235

PCA/SVD and Eigenimages A time-series of 1D images 128 scans of 32 “voxels” Expression of 1st 3 “eigenimages” Eigenvalues and spatial “modes” The time-series ‘reconstituted’

PCA/SVD and Eigenimages APPROX. OF Y U1 = s1 V1 APPROX. OF Y + s2 U2 V2 + s3 APPROX. OF Y U3 V3 voxels Y (DATA) + ... time Y = USVT = s1U1V1T + s2U2V2T + ...

Time x Condition interaction Time x condition interactions (i.e. adaptation) assessed with the SPM{T}

Structural Equation Modelling (SEM) Minimise the difference between the observed (S) and implied () covariances by adjusting the path coefficients (B) The implied covariance structure: x = x.B + z x = z.(I - B)-1 x : matrix of time-series of Regions 1-3 B: matrix of unidirectional path coefficients Variance-covariance structure: xT . x =  = (I-B)-T. C.(I-B)-1 where C = zT z xT.x is the implied variance covariance structure  C contains the residual variances (u,v,w) and covariances The free parameters are estimated by minimising a [maximum likelihood] function of S and  1 3 2 z SEM Key sentence BUT how can we calculate an implied var-cov matrix -->Equ Minimise the diff

Attention - No attention 0.43 0.75 0.47 0.76 No attention Attention Changes in “effective connectivity”

Second-order Interactions PP 2 =11, p<0.01 0.14 V5 V1 = V1xPP V5 The information that att is required does not come from V1, therefore top-down mechanisms: PP modulating V1 --> V5 OR PP changing the sensitivity of V5 for other inputs (ie V1) Previous presentations V5 --> PP mod by PFC: Same principle Modulatory influence of parietal cortex on V1 to V5