Statistical analysis of fMRI data, ‘bubbles’ data, and the connectivity between the two Keith Worsley, McGill (and Chicago) Nicholas Chamandy, McGill and.

Statistical analysis of fMRI data, ‘bubbles’ data, and the connectivity between the two Keith Worsley, McGill (and Chicago) Nicholas Chamandy, McGill and Google Jonathan Taylor, Université de Montréal and Stanford Robert Adler, Technion Philippe Schyns, Fraser Smith, Glasgow Frédéric Gosselin, Université de Montréal Arnaud Charil, Alan Evans, Montreal Neurological Institute

Before you start: PCA of time  space 1: exclude first frames 2: drift 3: long-range correlation or anatomical effect: remove by converting to % of brain 4: signal?

Bad design: 2 mins rest 2 mins Mozart 2 mins Eminem 2 mins James Brown

050100150200 3 2 1 Component Frame 0.41, 17% 0.31, 9.5% 0.24, 5.6% Slice (0 based) Component Spatial components 024681012141618 1 2 3 -0.5 0 0.5 1 Rest Mozart Eminem J. Brown Temporal components (sd, % variance explained) Period: 5.2 16.1 15.6 11.6 seconds

050100150200250300350 0 1 2 Alternating hot and warm stimuli separated by rest (9 seconds each). hot warm hot warm 050 -0.2 0 0.2 0.4 Hemodynamic response function: difference of two gamma densities 050100150200250300350 0 1 2 Responses = stimuli * HRF, sampled every 3 seconds Time, seconds Effect of stimulus on brain response Stimulus is delayed and dispersed by ~6s Modeled by convolving the stimulus with the “hemodynamic response function”

fMRI data, pain experiment, one slice T = (hot – warm effect) / S.d. ~ t 110 if no effect

How fMRI differs from other repeated measures data Many reps (~200 time points) Few subjects (~15) Df within subjects is high, so not worth pooling sd across subjects Df between subjects low, so use spatial smoothing to boost df Data sets are huge ~4GB, not easy to use statistics packages such as R

FMRISTAT (Matlab) / BRAINSTAT (Python) statistical analysis strategy Analyse each voxel separately Borrow strength from neighbours when needed Break up analysis into stages 1 st level: analyse each time series separately 2 nd level: combine 1 st level results over runs 3 rd level: combine 2 nd level results over subjects Cut corners: do a reasonable analysis in a reasonable time (or else no one will use it!)

1 st level: Linear model with AR(p) errors Data Y t = fMRI data at time t x t = (responses,1, t, t 2, t 3, … )’ to allow for drift Model Y t = x t ’β + ε t ε t = a 1 ε t-1 + … + a p ε t-p + σ F η t, η t ~ N(0,1) i.i.d. Fit in 2 stages: 1 st pass: fit by least squares, find residuals, estimate AR parameters a 1 … a p 2 nd pass: whiten data, re-fit by least squares

Higher levels: Mixed effects model Data E i = effect (contrast in β) from previous level S i = sd of effect from previous level z i = (1, treatment, group, gender, …)’ Model E i = z i ’γ + S i ε i F + σ R ε i R (S i high df, so assumed fixed) ε i F ~ N(0,1) i.i.d. fixed effects error ε i R ~ N(0,1) i.i.d. random effects error Fit by ReML Use EM for stability, 10 iterations

Where we use spatial information 1 st level: smooth AR parameters to lower their variability and increase “df” “df” defined by Satterthwaite approximation surrogate for variance of the variance parameters Higher levels: smooth Random / Fixed effects sd ratio to lower variability and increase “df” Final level: use random field theory to correct for multiple comparisons

1 st level: Autocorrelation AR(1) model: ε t = a 1 ε t-1 + σ F η t Fit the linear model using least squares ε t = Y t – Y t â 1 = Correlation (ε t, ε t-1 ) Estimating ε t changes their correlation structure slightly, so â 1 is slightly biased: Raw autocorrelation Smoothed 12.4mm Bias corrected â 1 ~ -0.05 ~ 0

0102030 0 50 100 FWHM â 0102030 0 50 100 How much smoothing? Hot stimulus Hot-warm stimulus Target = 100 df Residual df = 110 Target = 100 df Residual df = 110 FWHM = 10.3mmFWHM = 12.4mm df â = df residual ( 2 + 1 ) 1 1 2 acor(contrast of data) 2 df eff df residual df â FWHM â 2 3/2 FWHM data 2 = + Variability in â lowers df Df depends on contrast Smoothing â brings df back up: Contrast of data, acor = 0.79 Contrast of data, acor = 0.61 FWHM data = 8.79 df eff FWHM â

Run 1Run 2Run 3Run 4 Effect, E i Sd, S i T stat, E i / S i 0 1 2 nd level 0 0.1 0.2 -5 0 5 2 nd level: 4 runs, 3 df for random effects sd … and T>15.96 for P<0.05 (corrected): … very noisy sd: … so no response is detected …

Basic idea: increase “df” by spatial smoothing (local pooling) of the sd. Can’t smooth the random effects sd directly, - too much anatomical structure. Instead, random effects sd fixed effects sd which removes the anatomical structure before smoothing. Solution: Spatial smoothing of the sd ratio sd = smooth  fixed effects sd  )

Random effects sd, 3 dfFixed effects sd, 440 df 0 0.05 0.1 0.15 0.2 Mixed effects sd, ~100 df Random sd / fixed sd 0.5 1 1.5 Smoothed sd ratio random effect, sd ratio ~1.3 dividemultiply  ^ Average S i

df ratio = df random ( 2 + 1 ) 1 1 1 df eff df ratio df fixed How much smoothing? FWHM ratio 2 3/2 FWHM data 2 = + df random = 3, df fixed = 4  110 = 440, FWHM data = 8mm: 02040Infinity 0 100 200 300 400 FWHM ratio df eff random effects analysis, df eff = 3 fixed effects analysis, df eff = 440 Target = 100 df FWHM = 19mm

Run 1Run 2Run 3Run 4 Effect, E i Sd, S i T stat, E i / S i 0 1 2 nd level 0 0.1 0.2 -5 0 5 Final result: 19mm smoothing, 100 df … less noisy sd: … and T>4.93 for P<0.05 (corrected): … and now we can detect a response!

012345678910 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 P value FWHM (Full Width at Half Maximum) of smoothing filter -2 0 2 Z(s)Z(s) Random field theory Bonferroni Discrete local maxima Threshold chosen so that P(max S Z(s) ≥ t) = 0.05 Final level: Multiple comparisons correction FWHM

Random field theory Resels 0 (S) Resels 1 (S) Resels 2 (S) EC 0 (S) EC 1 (S) EC 2 (S) Resels (Resolution elements)EC densities FWHM filter =* white noiseZ(s)Z(s) Resels 3 (S) EC 3 (S)

Discrete local maxima Bonferroni applied to N events: {Z(s) ≥ t and Z(s) is a discrete local maximum} i.e. {Z(s) ≥ t and neighbour Z’s ≤ Z(s)} Conservative If Z(s) is stationary, with Cor(Z(s 1 ),Z(s 2 )) = ρ(s 1 -s 2 ), Then the DLM P-value is P{max S Z(s) ≥ t} ≤ N × P{Z(s) ≥ t and neighbour Z’s ≤ Z(s)} We only need to evaluate a (2D+1)-variate integral … Z(s -2 ) Z(s -1 )≤ Z(s) ≥Z(s 1 ) Z(s2)Z(s2) ≥ ≤

Discrete local maxima: “Markovian” trick If ρ is “separable”: s=(x,y), ρ((x,y)) = ρ((x,0)) × ρ((0,y)) e.g. Gaussian spatial correlation function: ρ((x,y)) = exp(-½(x 2 +y 2 )/w 2 ) Then Z(s) has a “Markovian” property: conditional on central Z(s), Z’s on different axes are independent: Z(s ±1 ) ┴ Z(s ±2 ) | Z(s) So condition on Z(s)=z, find P{neighbour Z’s ≤ z | Z(s)=z} = ∏ d P{Z(s ±d ) ≤ z | Z(s)=z} then take expectations over Z(s)=z Cuts the (2D+1)-variate integral down to a bivariate integral Z(s -2 ) Z(s -1 )≤ Z(s) ≥Z(s 1 ) Z(s2)Z(s2) ≥ ≤

Example: single run, hot-warm Detected by DLM, but not by BON or RFT Detected by BON and DLM but not by RFT

-50510152025 -0.4 -0.2 0 0.2 0.4 0.6 t (seconds) Estimating the delay of the response Delay or latency to the peak of the HRF is approximated by a linear combination of two optimally chosen basis functions: HRF(t + shift) ~ basis 1 (t) w 1 (shift) + basis 2 (t) w 2 (shift) Convolve bases with the stimulus, then add to the linear model basis 1 basis 2 HRF shift delay

-505 -3 -2 0 1 2 3 shift (seconds) Fit linear model, estimate w 1 and w 2 Equate w 2 / w 1 to estimates, then solve for shift (Hensen et al., 2002) To reduce bias when the magnitude is small, use shift / (1 + 1/T 2 ) where T = w 1 / Sd(w 1 ) is the T statistic for the magnitude Shrinks shift to 0 where there is little evidence for a response. w1w1 w2w2 w 2 / w 1

Shift of the hot stimulus T stat for magnitude T stat for shift Shift (secs) Sd of shift (secs)

Shift of the hot stimulus ~1 sec+/- 0.5 sec T>4T~2 T stat for magnitude T stat for shift Shift (secs) Sd of shift (secs)

Combining shifts of the hot stimulus (Contours are T stat for magnitude > 4) +/- 0.25 sec ~1 sec T~4

Shift (secs) Shift of the hot stimulus T stat for magnitude > 4.93

Functional Imaging Analysis Contest HBM2005 15 subjects / 4 runs per subject (2 with events, 2 with blocks) 4 conditions per run Same sentence, same speaker Same sentence, different speaker Different sentence, same speaker Different sentence, different speaker 3T, 191 frames, TR=2.5s Greater %BOLD response for different – same sentences (1.08±0.16%) different – same speaker (0.47±0.08%) Greater latency for different – same sentences (0.148±0.035 secs)

Contrasts in the data used for effects 050100150200250300350 0 1 2 050100150200250300350 0 1 2 Time (secs) Hot, Sd = 0.16 Hot, Sd = 0.28 Warm, Sd = 0.16 Warm, Sd = 0.43 Hot-warm, Sd = 0.55 Hot-warm, Sd = 0.19 9 sec blocks, 9 sec gaps 90 sec blocks, 90 sec gaps Only using data near block transitions Ignoring data in the middle of blocks Time (secs)

Optimum block design 0 0.1 0.2 0.3 0.4 0.5 Gap (secs) Sd of hot stimulus X 5101520 0 5 10 15 20 0 0.1 0.2 0.3 0.4 0.5 Sd of hot-warm X 5101520 0 5 10 15 20 0 0.2 0.4 0.6 0.8 1 (secs) 5101520 5 10 15 20 0 0.2 0.4 0.6 0.8 1 Block (secs) (secs) 5101520 0 5 10 15 20 Best design Best design X Best design Best design X Magnitude Delay (Not enough signal)

5101520 0 0.1 0.2 0.3 0.4 0.5 Average time between events (secs) Sd of effect (secs for delays) uniform......... random......... concentrated : Optimum event design ____ magnitudes ……. delays (Not enough signal) 12 secs best for magnitudes 7 secs best for delays

How many subjects? Largest portion of variance comes from the last stage i.e. combining over subjects: sd run 2 sd sess 2 sd subj 2 n run n sess n subj n sess n subj n subj If you want to optimize total scanner time, take more subjects. What you do at early stages doesn’t matter very much! ++

Features special to FMRISTAT / BRAINSTAT Bias correction for AR coefficients Df boosting due to smoothing: AR coefficients random/fixed effects variance P-value adjustment for: peaks due to small FWHM (DLM) clusters due to spatially varying FWHM Delays analysed the same way as magnitudes Sd of effects before collecting data

What is ‘bubbles’?

Nature (2005)

Subject is shown one of 40 faces chosen at random … Happy Sad Fearful Neutral

… but face is only revealed through random ‘bubbles’ First trial: “Sad” expression Subject is asked the expression: “Neutral” Response: Incorrect Sad 75 random bubble centres Smoothed by a Gaussian ‘bubble’ What the subject sees

Your turn … Trial 2 Subject response: “Fearful” CORRECT

Your turn … Trial 3 Subject response: “Happy” INCORRECT (Fearful)

Your turn … Trial 4 Subject response: “Happy” CORRECT

Your turn … Trial 5 Subject response: “Fearful” CORRECT

Your turn … Trial 6 Subject response: “Sad” CORRECT

Your turn … Trial 8 Subject response: “Neutral” CORRECT

Your turn … Trial 3000 Subject response: “Happy” INCORRECT (Fearful)

Bubbles analysis E.g. Fearful (3000/4=750 trials): Trial 1 + 2 + 3 + 4 + 5 + 6 + 7 + … + 750 = Sum Correct trials Proportion of correct bubbles =(sum correct bubbles) /(sum all bubbles) Thresholded at proportion of correct trials=0.68, scaled to [0,1] Use this as a bubble mask

Results Mask average face But are these features real or just noise? Need statistics … Happy Sad Fearful Neutral

Statistical analysis Correlate bubbles with response (correct = 1, incorrect = 0), separately for each expression Equivalent to 2-sample Z-statistic for correct vs. incorrect bubbles, e.g. Fearful: Very similar to the proportion of correct bubbles: Response 0 1 1 0 1 1 1 … 1 Trial 1 2 3 4 5 6 7 … 750 Z~N(0,1) statistic

Results Thresholded at Z=1.64 (P=0.05) Multiple comparisons correction? Need random field theory … Average face Happy Sad Fearful Neutral Z~N(0,1) statistic

Results, corrected for search Random field theory threshold: Z=3.92 (P=0.05) 3.82 3.80 3.81 3.80 Saddle-point approx (Chamandy, 2007): Z=↑ (P=0.05) Bonferroni: Z=4.87 (P=0.05) – nothing Average face Happy Sad Fearful Neutral Z~N(0,1) statistic

Scale Separate analysis of the bubbles at each scale

Scale space: smooth Z(s) with range of filter widths w = continuous wavelet transform adds an extra dimension to the random field: Z(s,w) 15mm signal is best detected with a 15mm smoothing filter -2 0 2 4 6 8 Scale space, no signal 6.8 10.2 15.2 22.7 34 -60-40-20 0 20 40 60 -2 0 2 4 6 8 One 15mm signal 6.8 10.2 15.2 22.7 34 -60-40-20 0 20 40 60 w = FWHM (mm, on log scale) s (mm) Z(s,w)Z(s,w)

-2 0 2 4 6 8 10mm and 23mm signals 6.8 10.2 15.2 22.7 34 -60-40-20 0 20 40 60 -2 0 2 4 6 8 Two 10mm signals 20mm apart 6.8 10.2 15.2 22.7 34 -60-40-20 0 20 40 60 w = FWHM (mm, on log scale) s (mm) But if the signals are too close together they are detected as a single signal half way between them Matched Filter Theorem (= Gauss-Markov Theorem): “to best detect signal + white noise, filter should match signal” Z(s,w)Z(s,w)

-60-40-200204060 0 5 10 8mm and 150mm signals at the same location 5 10 15 20 6.8 15.2 34 76 170 -60-40-20 0 20 40 60 w = FWHM (mm, on log scale) s (mm) Scale space can even separate two signals at the same location! Z(s,w)Z(s,w)

Bubbles task in fMRI scanner Correlate bubbles with BOLD at every voxel: Calculate Z for each pair (bubble pixel, fMRI voxel) a 5D “image” of Z statistics … Trial 1 2 3 4 5 6 7 … 3000 fMRI

Thresholding? Thresholding in advance is vital, since we cannot store all the ~1 billion 5D Z values Resels = (image resels = 146.2) × (fMRI resels = 1057.2) for P=0.05, threshold is Z = 6.22 (approx) Only keep 5D local maxima Z(pixel, voxel) > Z(pixel, 6 neighbours of voxel) > Z(4 neighbours of pixel, voxel)

Generalised linear models? The random response is Y=1 (correct) or 0 (incorrect), or Y=fMRI The regressors are X j =bubble mask at pixel j, j=1 … 240x380=91200 (!) Logistic regression or ordinary regression: logit(E(Y)) or E(Y) = b 0 +X 1 b 1 +…+X 91200 b 91200 But there are only n=3000 observations (trials) … Instead, since regressors are independent, fit them one at a time: logit(E(Y)) or E(Y) = b 0 +X j b j However the regressors (bubbles) are random with a simple known distribution, so turn the problem around and condition on Y: E(X j ) = c 0 +Yc j Equivalent to conditional logistic regression (Cox, 1962) which gives exact inference for b 1 conditional on sufficient statistics for b 0 Cox also suggested using saddle-point approximations to improve accuracy of inference … Interactions? logit(E(Y)) or E(Y)=b 0 +X 1 b 1 +…+X 91200 b 91200 +X 1 X 2 b 1,2 + …

Statistical analysis of fMRI data, ‘bubbles’ data, and the connectivity between the two Keith Worsley, McGill (and Chicago) Nicholas Chamandy, McGill and.

Similar presentations

Presentation on theme: "Statistical analysis of fMRI data, ‘bubbles’ data, and the connectivity between the two Keith Worsley, McGill (and Chicago) Nicholas Chamandy, McGill and."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Statistical analysis of fMRI data, ‘bubbles’ data, and the connectivity between the two Keith Worsley, McGill (and Chicago) Nicholas Chamandy, McGill and.

Similar presentations

Presentation on theme: "Statistical analysis of fMRI data, ‘bubbles’ data, and the connectivity between the two Keith Worsley, McGill (and Chicago) Nicholas Chamandy, McGill and."— Presentation transcript:

Similar presentations

About project

Feedback