Surface-based Exploratory Group Analysis in FreeSurfer

Surface-based Exploratory Group Analysis in FreeSurfer

Outline Processing Stages Command-line Stream Assemble Data
Design/Contrast (GLM Theory) Analyze Visualize Interactive/Automated GUI (QDEC) Correction for multiple comparisons

Aging Exploratory Analysis
In which areas does thickness change with age? Cortical Thickness vs Aging Salat, et al, 2004, Cerebral Cortex

N=40 (all in fsaverage space)
Aging Thickness Study N=40 (all in fsaverage space) Positive Age Correlation Negative Age Correlation p<.01

Surface-based Measures
Morphometric (eg, thickness) Functional PET MEG/EEG Diffusion (?) sampled just under the surface For group morphometric analysis, the observed data is comprised of a set of surface measures (such as cortical thickness) at each vertex in a surface model, for each subject in the group. This data can be organized as a set of vectors, each associated with a different vertex in the surface model, and containing a surface measurement for every subject in the group at the corresponding vertex.

Processing Stages Specify Subjects and Surface measures Assemble Data:
Resample into Common Space Smooth Concatenate into one file Model and Contrasts (GLM) Fit Model (Estimate) Correct for multiple comparisons Visualize

The General Linear Model (GLM)
Is Thickness correlated with Age? Thickness x1 x2 y2 y1 Dependent Variable, Measurement Subject 1 Subject 2 HRF Amplitude IQ, Height, Weight Linear modeling describes the observed data as a linear combination of explanatory factors plus noise, and determines how well that description explains the data being analyzed. Age Of course, you’d need more then two subjects … Independent Variable

System of Linear Equations
Linear Model Intercept: b Slope: m Thickness Age x1 x2 y2 y1 System of Linear Equations y1 = 1*b + x1*m y2 = 1*b + x2*m y1 y2 1 x1 1 x2 b m = * Matrix Formulation Intercept = Offset X = Design Matrix  = Regression Coefficients = Parameter estimates = “betas” = Intercepts and Slopes = beta.mgh (mri_glmfit) First, a linear model must be designed to include all explanatory variables (EVs) that may account for each vector's values. A simple linear model is given by y=a*x+b+e, where the observed data y is a one-dimensional vector of surface measures -- one measurement per subject at a vertex; x is a one-dimensional vector containing a variable, such as age, describing each subject; a is the parameter estimate (PE) for x, for instance the value that a subject's age must be multiplied by to fit the data in y; b is a constant, and in this example, would correspond to the baseline measurement present in the data; and e is the error in the model fitting. If an additional explanatory variable is added to explain the observed data, the model would be given as y=a1*x1+a2*x2+b+e, containing two different model waveforms, a1*x1 and a2*x2, corresponding to two variables, such as age and gender, describing all subjects in the study. 2.1 Estimation overview Once the model is specified, an estimation step follows, in which the model is fit to each vertex's vector separately; no interactions between vertices are taken into account in the examples presented here. This step generates the estimate of the "goodness of fit" of each of the explanatory variables to each vector of surface measurements. Thus if a particular vertex responds strongly to the explanatory variable x1, a large value for a1 will be produced by model-fitting; if the data appears unrelated to x2 then a2 will have a very small value. This kind of linear modeling is commonly expressed in matrix notation, where the the matrix X contains all the explanatory variables (designed effects and confounds) in the model, and the matrix A contains all the PEs. The matrix X is also commonly called the design matrix and it can be user-specified in FreeSurfer in the form of an FSGD (FreeSurfer Group Descriptor) file, as the exercises below illustrate. Each column of X corresponds to a different explanatory variable (also called a regressor or a covariate). As typically formulated and solved, the estimation step produces a set of estimates of the PEs, which in turn are used in hypothesis testing. b m  Y = X* mri_glmfit output: beta.mgh

Hypotheses and Contrasts
Is Thickness correlated with Age? Does m = 0? Null Hypothesis: H0: m=0 y1 y2 1 x1 1 x2 b m = * m= [0 1]* b m Intercept: b Slope: m Thickness Age x1 x2 y2 y1 b m   = C* ? 2.2 Inference overview Estimates of the PEs can be converted into statistical parametric maps, which are commonly visualized as a color-coded surface overlay. The overlay assigns each vertex a value based on the likelihood that the null hypothesis is false at that vertex. A linear combination of estimates of PEs is used to encode the particular hypothesis of interest. This encoding is accomplished with a user-specified ''contrast vector'', which assigns a contrast weight to each column of the design matrix. A simple example of a contrast vector that tests the null hypothesis for the explanatory variable associated with the first design matrix column would be[ ]. To compute this particular contrast at each vertex, the PE value associated with the first design matrix column at that vertex is divided by the error in its estimate, yielding a t-value. The t-value provides a good measure of confidence in the estimate of the PE value, and can be converted into a probability (P) or Z statistic at that vertex via a standard statistical transformation. T, P and Z values all convey the same information about how significantly the observed data is related to a given explanatory variable. C=[0 1]: Contrast Matrix mri_glmfit output: gamma.mgh

More than Two Data Points
Thickness Intercept: b y1 y2 y3 y4 1 x1 1 x2 1 x3 1 x4 b m = * Slope: m Age Y = X* y1 = 1*b + x1*m y2 = 1*b + x2*m y3 = 1*b + x3*m y4 = 1*b + x4*m No matter how many data points you have, there is still only one slope and one intercept. rvar.mgh = residual error variance Model Error Noise Uncertainty rvar.mgh

t-Test and p-values  = C* Y = X* p-value/significance
value between 0 and 1 closer to 0 means more significant FreeSurfer stores p-values as –log10(p): 0.1=10-1sig=1, 0.01=10-2sig=2 sig.mgh files Signed by sign of  p-value is for an unsigned test p-value sig = 1.0 0.05 = 1.3 0.01 = 2.0 0.001= 3.0 A t-value map can be produced for each explanatory variable of interest. Each map indicates how strongly vertices on the surface are related to one explanatory variable. Parameter estimates can also be compared to see if one explanatory variable is more strongly related to the data than another. To encode this kind of hypothesis, one PE is subtracted from another using a "contrast" vector such as [ ], a combined standard error is computed, and a new t-map is generated. In a similar fashion, to test for a more complicated collection of effects, a matrix of contrast weights can be specified. A more rigorous description of single and multiple linear regression and GLM, types of analyses, estimation and hypothesis testing is available at

Two Groups Do groups differ in Intercept? Do groups differ in Slope?
Intercept: b1 Slope: m1 Thickness Age Intercept: b2 Slope: m2 Is average slope different than 0? …

Two Groups Y = X* * = y11 = 1*b1 + 0*b2 + x11*m1 + 0*m2
Intercept: b1 Slope: m1 Thickness Age Intercept: b2 Slope: m2 y11 y12 y21 y22 x11 0 x12 0 x21 x22 b1 b2 m1 m2 = * Y = X* y11 = 1*b1 + 0*b2 + x11*m *m2 y12 = 1*b1 + 0*b2 + x12*m *m2 y21 = 0*b1 + 1*b *m1 + x21*m2 y22 = 0*b1 + 1*b *m1 + x22*m2

Two Groups Y = X*  * = y11 1 0 x11 0 y12 1 0 x12 0 y21 0 1 0 x21
b1 b2 m1 m2 y11 y12 y21 y22 x11 0 x12 0 x21 x22 * Do groups differ in Intercept? Does b1=b2? Does b1-b2 = 0? C = [ ],  = C* = Do groups differ in Slope? Does m1=m2? Does m1-m2=0? C = [ ],  = C* b1 b2 m1 m2 Y = X*  Intercept: b1 Slope: m1 Thickness Age Intercept: b2 Slope: m2 Is average slope different than 0? Does (m1+m2)/2 = 0? C = [ ],  = C*

Surface-based Group Analysis in FreeSurfer
Create your own design matrix and contrast matrices Create an FSGD File FreeSurfer creates design matrix You still have to specify contrasts QDEC Limited to 2 discrete variables, 2 levels max Limited to 2 continuous variables

Command-line Processing Stages
Assemble Data (mris_preproc) Resample into Common Space Smooth Concatenate into one file Model and Contrasts (GLM) (FSGD) Fit Model (Estimate) (mri_glmfit) Correct for multiple comparisons Visualize (tksurfer) } recon-all -qcache -qcache is a way to run preprocessing (resampling and smoothing) on each individual subject. The results are then stored in the subject's folder. These two steps take the longest when doing a group analysis. When this is done for a bunch of subjects, it allows for fast group analysis and allows the user to change the subject selection and re-run without having to wait along time. It make it possible to do these things more interactively, especially when running QDEC.

Specifying Subjects $SUBJECTS_DIR fred jenny margaret … bert
Subject ID $SUBJECTS_DIR fred jenny margaret … bert Set SUBJECTS_DIR, issue a command (mksubjdirs) to create directory structure. Convert (using mri_convert) data to COR format and store in mri/RRR, run recon-all. Typically, we acquire 3 high-quality T1 volumes to use as input to reconstruction (128-slice sagital, 1 mm in-plane resolution).

FreeSurfer Directory Tree
bert bem stats morph mri rgb scripts surf tiff label orig T1 brain wm aseg Subject ID lh.aparc_annnot rh.aparc_annnot Set SUBJECTS_DIR, issue a command (mksubjdirs) to create directory structure. Convert (using mri_convert) data to COR format and store in mri/RRR, run recon-all. Typically, we acquire 3 high-quality T1 volumes to use as input to reconstruction (128-slice sagital, 1 mm in-plane resolution). lh.white rh.white lh.thickness rh.thickness lh.sphere.reg rh.sphere.reg SUBJECTS_DIR environment variable

Example: Thickness Study
$SUBJECTS_DIR/bert/surf/lh.thickness $SUBJECTS_DIR/fred/surf/lh.thickness $SUBJECTS_DIR/jenny/surf/lh.thickness $SUBJECTS_DIR/margaret/surf/lh.thickness …

FreeSurfer Group Descriptor (FSGD) File
Simple text file List of all subjects in the study Accompanying demographics Like a spreadsheet Automatic design matrix creation You must still specify the contrast matrices Integrated with tksurfer Note: Can specify design matrix explicitly with --design

FSGD Format GroupDescriptorFile 1 Class Male Class Female
Variables Age Weight IQ Input bert Male Input fred Male Input jenny Female Input margaret Female One Discrete Factor (Gender) with Two Levels (M&F) Three Continuous Variables: Age, Weight, IQ Class = Group Note: Can specify design matrix explicitly with --design

} FSGDF  X (Automatic) } X = C = [-1 1 0 0 0 0 0 0]
X = Male Group Female Group Female Age Male Age Weight Age IQ C = [ ] } Tests for the difference in intercept/offset between groups C = [ ] } Tests for the difference in age slope between groups DODS – Different Offset, Different Slope

Another FSGD Example Two Discrete Factors One Continuous Variable: Age
Gender: Two Levels (M&F) Handedness: Two Levels (L&R) One Continuous Variable: Age GroupDescriptorFile 1 Class MaleRight Class MaleLeft Class FemaleRight Class FemaleLeft Variables Age Input bert MaleLeft Input fred MaleRight Input jenny FemaleRight Input margaret FemaleLeft Class = Group

Interaction Contrast  3-1)- 4-2) 1+2+ 3-4
Two Discrete Factors (no continuous, for now) Gender: Two Levels (M&F) Handedness: Two Levels (L&R) Four Regressors (Offsets) MR (1), ML (2), FR (3), FL (4) M F R L 1 4 3 2   GroupDescriptorFile 1 Class MaleRight Class MaleLeft Class FemaleRight Class FemaleLeft Input bert MaleLeft Input fred MaleRight Input jenny FemaleRight Input margaret FemaleLeft  3-1)- 4-2) 1+2+ 3-4 C = [ ]  This is a different example with two discrete factors each with two levels.

} Number of Regressors } C = [-1 1 0 0 0 0 0 0]
Each Group/Class: Has its own Intercept Has its own Slope for each continuous variable DODS = Different offset, different slope NRegressors = NClasses*(NVariables+1) NRegressors C = [ ] } Tests for the difference in intercept/offset between groups C = [ ] } Tests for the difference in age slope between groups

Factors, Levels, Groups, Classes
Factors can be Discrete or Continuous: Continuous Variables: Age, IQ, Volume, etc Discrete Factors: Gender, Handedness, Diagnosis Discrete Factors have Levels: Gender: Male and Female Handedness: Left and Right Diagnosis: Normal, MCI, AD Group or Class: Specification of All Discrete Factors: Left-handed Male MCI Right-handed Female Normal

Assemble Data: mris_preproc
mris_preproc --help --fsgd FSGDFile : Specify subjects thru FSGD File --hemi lh : Process left hemisphere --meas thickness : $SUBJECTS_DIR/subjectid/surf/hemi.thickness --target fsaverage : common space is subject fsaverage --o lh.thickness.mgh : output “volume-encoded surface file” Lots of other options! lh.thickness.mgh – file with thickness maps for all subjects  Input to Smoother or GLM

Surface Smoothing mri_surf2surf --help Loads lh.thickness.mgh
2D surface-based smoothing Specify FWHM (eg, fwhm = 10 mm) Saves lh.thickness.sm10.mgh Can be slow (~10-60min) recon-all -qcache

mri_glmfit Reads in FSGD File and constructs X
Reads in your contrasts (C1, C2, etc) Loads data (lh.thickness.sm10.mgh) Fits GLM (ie, computes ) Computes contrasts (=C*) t or F ratios, significances Significance -log10(p) (.01  2, .001  3)

mri_glmfit mri_glmfit --y lh.thickness.sm10.mgh --fsgd gender_age.txt
--C age.mtx –C gender.mtx --surf fsaverage lh --cortex --glmdir lh.gender_age.glmdir mri_glmfit --help

--C age.mtx –C gender.mtx --surf fsaverage lh --cortex --glmdir lh.gender_age.glmdir Input file (output from smoothing). Stack of subjects, one frame per subject

--C age.mtx –C gender.mtx --surf fsaverage lh --cortex --glmdir lh.gender_age.glmdir FreeSurfer Group Descriptor File (FSGD) Group membership Covariates

--C age.mtx –C gender.mtx --surf fsaverage lh --cortex --glmdir lh.gender_age.glmdir Contrast Matrices Simple text/ASCII files Test hypotheses

--C age.mtx –C gender.mtx --surf fsaverage lh --cortex --glmdir lh.gender_age.glmdir Perform analysis on left hemisphere of fsaverage subject Masks by fsaverage cortex.label Computes FWHM in 2D

mri_glmfit Output directory: lh.gender_age.glmdir/ mri_glmfit
beta.mgh – parameter estimates rvar.mgh – residual error variance etc … age/ sig.mgh – -log10(p), uncorrected gamma.mgh, F.mgh gender/ sig.mgh – -log10(p) mri_glmfit --y lh.thickness.sm10.mgh --fsgd gender_age.txt --C age.mtx –C gender.mtx --surf fsaverage lh --cortex --glmdir lh.gender_age.glmdir

Visualization with tksurfer
Saturation: -log10(p), Eg, 5=.00001 Threshold: -log10(p), Eg, 2=.01 uncorrected False Dicovery Rate Eg, .01 View->Configure->Overlay File->LoadOverlay

Visualization with tksurfer
File-> Load Group Descriptor File …

Problem of Multiple Comparisons
p value is probability that a voxel is falsely activated Threshold too liberal: many false positives Threshold too restrictive: lose activation (false negatives)

Clusters True signal tends to be clustered
p<.10 p<.01 p<10-7 True signal tends to be clustered False Positives tend to be randomly distributed in space Cluster – set of spatially contiguous voxels that are above a given threshold.

Cluster-forming Threshold
p<.00001 sig<5 p<.0001 sig<4 Unthresholded p<.001 sig<3 As threshold lowers, clusters may expand or merge and new clusters can form. No way to say what the threshold should be.

Cluster Table, Uncorrected
p<.0001 sig<4 38 clusters ClusterNo Area(mm2) X Y Z Structure Cluster superiorfrontal Cluster insula Cluster superiorparietal Cluster precentral Cluster supramarginal … How likely is it to get a cluster of a certain size under the null hypothesis?

Correction for Multiple Comparisons
Cluster-based Monte Carlo simulation Permutation Tests False Discovery Rate (FDR) – built into tksurfer and QDEC. (Genovese, et al, NI 2002)

Cluster-based Corr. for Multiple Comparisons
Simulate data under Null Hypothesis: Synthesize Gaussian noise and then smooth (Monte Carlo) Permute rows of design matrix (Permutation, orthog) Analyze, threshold, cluster, max cluster size Repeat 10,000 times Analyze real data, get cluster sizes P(cluster) = #MaxClusterSize > ClusterSize/10000 mri_glmfit-sim

Cluster Table, Corrected
p<.0001 sig<4 22 clusters out of 38 have cluster p-value < .05 ClusterNo Area(mm2) X Y Z Structure Cluster P Cluster superiorfrontal Cluster insula Cluster superiorparietal Cluster precentral Cluster supramarginal … Note the difference between the Cluster Forming Threshold (p<.0001) and the Cluster p-value.

Surface-based Corr. for Multiple Comparisons
2D Cluster-based Correction at p < .05 mri_glmfit-sim --glmdir lh.gender_age.glmdir --cache pos 2 --cwpvalthresh .05 --2spaces

2D Cluster-based Correction at p < .05 Original mri_glmfit command: mri_glmfit --y lh.thickness.sm10.mgh --fsgd gender_age.txt --C age.mtx –C gender.mtx --surf fsaverage lh --cortex --glmdir lh.gender_age.glmdir mri_glmfit-sim --glmdir lh.gender_age.glmdir --cache pos 2 --cwpvalthresh .05 --2spaces lh.gender_age.glmdir/ beta.mgh – parameter estimates rvar.mgh – residual error variance etc … age/ sig.mgh – -log10(p), uncorrected gamma.mgh, F.mgh gender/ sig.mgh – -log10(p)

2D Cluster-based Correction at p < .05 mri_glmfit-sim --glmdir lh.gender_age.glmdir --cache pos 2 --cwpvalthresh .05 --2spaces Use pre-cached simulation results positive contrast voxelwise threshold = 2 (p<.01) Can use another simulation or permutation

2D Cluster-based Correction at p < .05 mri_glmfit-sim --glmdir lh.gender_age.glmdir --cache pos 2 --cwpvalthresh .05 --2spaces Cluster-wise threshold p<.05

49 Surface-based Corr. for Multiple Comparisons 2D Cluster-based Correction at p < .05 mri_glmfit-sim --glmdir lh.gender_age.glmdir --cache pos 2 --cwpvalthresh .05 --2spaces Bonferroni correct over two hemispheres

Correction for Multiple Comparisons Output
mri_glmfit-sim --glmdir lh.gender_age.glmdir --cache pos 2 --cwpvalthresh .05 --2spaces lh.gender_age.glmdir age gender sig.mgh – pre-existing uncorrected p-values cache.th20.pos.sig.cluster.mgh – map of significance of clusters cache.th20.pos.sig.ocn.annot – annotation of significant clusters cache.th20.pos.sig.cluster.summary – text file of cluster table (clusters, sizes, MNI305 XYZ, and their significances) Only shows clusters p<.05

Tutorial Command-line Stream QDEC – same data set
Create an FSGD File for a thickness study Age and Gender Run mris_preproc mri_surf2surf mri_glmfit mri_glmfit-sim tksurfer QDEC – same data set

QDEC GUI Load QDEC Table File List of Subjects
List of Factors (Discrete and Cont) Choose Factors Choose Input (cached): Hemisphere Measure (eg, thickness) Smoothing Level “Analyze” Builds Design Matrix Builds Contrast Matrices Constructs Human-Readable Questions Analyzes Displays Results

Surface-based Exploratory Group Analysis in FreeSurfer

Similar presentations

Presentation on theme: "Surface-based Exploratory Group Analysis in FreeSurfer"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Surface-based Exploratory Group Analysis in FreeSurfer

Similar presentations

Presentation on theme: "Surface-based Exploratory Group Analysis in FreeSurfer"— Presentation transcript:

Similar presentations

About project

Feedback