fMRI Analysis with emphasis on the General Linear Model

Slides:

Advertisements

Similar presentations

Basis Functions. What’s a basis ? Can be used to describe any point in space. e.g. the common Euclidian basis (x, y, z) forms a basis according to which.

Advertisements

1 st Level Analysis: design matrix, contrasts, GLM Clare Palmer & Misun Kim Methods for Dummies

SPM 2002 C1C2C3 X =  C1 C2 Xb L C1 L C2  C1 C2 Xb L C1  L C2 Y Xb e Space of X C1 C2 Xb Space X C1 C2 C1  C3 P C1C2  Xb Xb Space of X C1 C2 C1 

FMRI Data Analysis: I. Basic Analyses and the General Linear Model

The General Linear Model Or, What the Hell’s Going on During Estimation?

Classical inference and design efficiency Zurich SPM Course 2014

fMRI Analysis with emphasis on the general linear model

fMRI data analysis – t-tests and correlations.

Multiple testing Justin Chumbley Laboratory for Social and Neural Systems Research Institute for Empirical Research in Economics University of Zurich With.

07/01/15 MfD 2014 Xin You Tai & Misun Kim

The General Linear Model (GLM)

1st level analysis: basis functions and correlated regressors

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.

Lorelei Howard and Nick Wright MfD 2008

FMRI – Week 9 – Analysis I Scott Huettel, Duke University FMRI Data Analysis: I. Basic Analyses and the General Linear Model FMRI Undergraduate Course.

SPM short course – May 2003 Linear Models and Contrasts The random field theory Hammering a Linear Model Use for Normalisation T and F tests : (orthogonal.

Efficiency – practical Get better fMRI results Dummy-in-chief Joel Winston Design matrix and.

General Linear Model & Classical Inference

General Linear Model & Classical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM M/EEGCourse London, May.

TSTAT_THRESHOLD (~1 secs execution) Calculates P=0.05 (corrected) threshold t for the T statistic using the minimum given by a Bonferroni correction and.

Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.

Hypothesis Testing:.

With many thanks for slides & images to: FIL Methods group, Virginia Flanagin and Klaas Enno Stephan Dr. Frederike Petzschner Translational Neuromodeling.

Basics of fMRI Inference Douglas N. Greve. Overview Inference False Positives and False Negatives Problem of Multiple Comparisons Bonferroni Correction.

With a focus on task-based analysis and SPM12

fMRI Analysis with emphasis on the General Linear Model Last Update: January 18, 2012 Last Course: Psychology 9223, W2010,

Analysis of fMRI data with linear models Typical fMRI processing steps Image reconstruction Slice time correction Motion correction Temporal filtering.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

SPM Course Zurich, February 2015 Group Analyses Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London With many thanks to.

FMRI Methods Lecture7 – Review: analyses & statistics.

SPM short course – Oct Linear Models and Contrasts Jean-Baptiste Poline Neurospin, I2BM, CEA Saclay, France.

Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

Contrasts & Statistical Inference

Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.

ANOVA, Regression and Multiple Regression March

The General Linear Model (for dummies…) Carmen Tur and Ashwani Jha 2009.

Statistical Analysis An Introduction to MRI Physics and Analysis Michael Jay Schillaci, PhD Monday, April 7 th, 2007.

FMRI Modelling & Statistical Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course Chicago, Oct.

The General Linear Model Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM fMRI Course London, May 2012.

SPM short – Mai 2008 Linear Models and Contrasts Stefan Kiebel Wellcome Trust Centre for Neuroimaging.

Statistics (cont.) Psych 231: Research Methods in Psychology.

SPM and (e)fMRI Christopher Benjamin. SPM Today: basics from eFMRI perspective. 1.Pre-processing 2.Modeling: Specification & general linear model 3.Inference:

The General Linear Model Christophe Phillips SPM Short Course London, May 2013.

The General Linear Model Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM fMRI Course London, October 2012.

SPM short course – Mai 2008 Linear Models and Contrasts Jean-Baptiste Poline Neurospin, I2BM, CEA Saclay, France.

General Linear Model & Classical Inference London, SPM-M/EEG course May 2016 Sven Bestmann, Sobell Department, Institute of Neurology, UCL

Stats Methods at IC Lecture 3: Regression.

What a Cluster F… ailure!

fMRI Analysis with emphasis on the General Linear Model

Dependent-Samples t-Test

Group Analyses Guillaume Flandin SPM Course London, October 2016

The General Linear Model (GLM)

General Linear Model & Classical Inference

The General Linear Model

fMRI Statistics with Emphasis on the General Linear Model

The General Linear Model (GLM): the marriage between linear systems and stats FFA.

The General Linear Model (GLM)

Contrasts & Statistical Inference

The General Linear Model

Psych 231: Research Methods in Psychology

The General Linear Model

The General Linear Model (GLM)

Contrasts & Statistical Inference

The General Linear Model

The General Linear Model (GLM)

The General Linear Model

The General Linear Model

Contrasts & Statistical Inference

MGS 3100 Business Analysis Regression Feb 18, 2016

Presentation transcript:

fMRI Analysis with emphasis on the General Linear Model Jody Culham Brain and Mind Institute Department of Psychology Western University http://www.fmri4newbies.com/ fMRI Analysis with emphasis on the General Linear Model Last Update: February 11, 2013 Last Course: Psychology 9223, W2013, Western University

Statistical Foundations

What data do we start with B Each voxel is a “big box of neurons” 30 slices x 64 voxels x 64 voxels of (3 mm)3 =122,880 voxels Each voxel has a time course

What data do we start with B Measured BOLD Signal We know the paradigm Mother Nature’s Convolution + Neural Activation in Response to Stimulus/Task Vasculature Error

What data do we start with We know the paradigm and predicted neural activity We can model the HRF and assume it’s linear(ish) Thus we can predict an expected time course

We could even derive subject-specific HRFs Choice of HRFs 20 Ss’ HRFs Handwerker et al., 2004, NI We could even derive subject-specific HRFs Two-Gamma (preferred) Boynton

What data do we start with Now we can see how closely our predicted time course matches real voxel time course We know the paradigm and predicted neural activity We can model the HRF and assume it’s linear(ish) Thus we can predict an expected time course

A Simple Correlation Will Do r (df =135) = 0.528 p < .000001 Amplitude of Data Time Point in 1 voxel Each dot is one time point in our subject’s data (except that I got too bored to draw all 136 time points for our run) Now we just have to repeat this 122,879 more times! Amplitude of Predictor Time Point

Statistical Approaches in a Nutshell t-tests compare activation levels between two conditions use a time-shift to account for hemodynamic lag correlations model activation and see whether any areas show a similar pattern Fourier analysis Do a Fourier analysis to see if there is energy at your paradigm frequency Fourier analysis images from Huettel, Song & McCarthy, 2004, Functional Magnetic Resonance Imaging

Effect of Thresholds r = .40 16% of variance p < .000001 r = .80

The General Linear Model (GLM) GLM definition from Huettel et al.: a class of statistical tests that assume that the experimental data are composed of the linear combination of different model factors, along with uncorrelated noise Model statistical model Linear things add up sensibly (1+1 = 2) note that linearity refers to the predictors in the model and not necessarily the BOLD signal General many simpler statistical procedures such as correlations, t-tests and ANOVAs are subsumed by the GLM

Benefits of the GLM GLM is an overarching tool that can do anything that the simpler tests do allows any combination of contrasts (e.g., intact - scrambled, scrambled - baseline), unlike simpler methods (correlations, t-tests, Fourier analyses) allows more complex designs (e.g., factorial designs) allows much greater flexibility for combining data within subjects and between subjects allows comparisons between groups allows counterbalancing orders within and between subjects allows modelling of known sources of noise in the data (e.g., error trials, head motion)

Composition of a Voxel Time Course

A Simple Experiment TIME Blank Screen Intact Objects Scrambled Objects Lateral Occipital Complex responds when subject views objects Blank Screen Intact Objects Scrambled Objects TIME One volume (12 slices) every 2 seconds for 272 seconds (4 minutes, 32 seconds) Condition changes every 16 seconds (8 volumes)

What’s real? A. C. B. D.

= + What’s real? signal noise I created each of those time courses based by taking the predictor function and adding a variable amount of random noise signal = + noise

What’s real? Which of the data sets below is more convincing?

Formal Statistics Formal statistics are just doing what your eyeball test of significance did Estimate how likely it is that the signal is real given how noisy the data is confidence: how likely is it that the results could occur purely due to chance? “p value” = probability value If “p = .03”, that means there is a .03/1 or 3% chance that the results are bogus By convention, if the probability that a result could be due to chance is less than 5% (p < .05), we say that result is statistically significant Significance depends on signal (differences between conditions) noise (other variability) sample size (more time points are more convincing)

Let’s create a time course for one LO voxel

We’ll begin with activation Response to Intact Objects is 4X greater than Scrambled Objects

Then we’ll assume that our modelled activation is off because a transient component

Our modelled activation could be off for other reasons All of the following could lead to inaccurate models different shape of function different width of function different latency of function

Reminder: Variability of HRF Intersubject variability of HRF in M1 Handwerker et al., 2004, NeuroImage

Now let’s add some variability due to head motion

…though really motion is more complex Head motion can be quantified with 6 parameters given in any motion correction algorithm x translation y translation z translation xy rotation xz rotation yz rotation For simplicity, I’ve only included parameter one in our model Head motion can lead to other problems not predictable by these parameters

Now let’s throw in a pinch of linear drift linear drift could arise from magnet noise (e.g., parts warm up) or physiological noise (e.g., subject’s head sinks)

and then we’ll add a dash of low frequency noise low frequency noise can arise from magnet noise or physiological noise (e.g., subject’s cycles of alertness/drowsiness) low frequency noise would occur over a range of frequencies but for simplicity, I’ve only included one frequency (1 cycle per run) here Linear drift is really just very low frequency noise

and our last ingredient… some high frequency noise high frequency noise can arise from magnet noise or physiological noise (e.g., subject’s breathing rate and heartrate)

When we add these all together, we get a realistic time course

General Linear Model

Now let’s be the experimenter First, we take our time course and normalize it using z scores z = (x - mean)/SD normalization leads to data where mean = zero SD = 1 Alternative: You can transform the data into % BOLD signal change. This is usually a better approach because it’s not dependent on variance

Wake Up!!!!! If you only pay attention to one slide in this lecture, it should be the next one!!!

We create a GLM with 2 predictors × 1 = + + × 2 = fMRI Signal Design Matrix x Betas + Residuals “what we CAN explain” “how much of it we CAN explain” “what we CANNOT explain” = “our data” x + Statistical significance is basically a ratio of explained to unexplained variance

Implementation of GLM in SPM Intact Predictor Scrambled Predictor Many thanks to Øystein Bech Gadmar for creating this figure in SPM  Time SPM represents time as going down SPM represents predictors within the design matrix as grayscale plots (where black = low, white = high) over time GLM includes a constant to take care of the average activation level throughout each run SPM shows this explicity (BV may not)

Effect of Beta Weights Adjustments to the beta weights have the effect of raising or lowering the height of the predictor while keeping the shape constant

Dynamic Example

The beta weight is NOT a correlation correlations measure goodness of fit regardless of scale beta weights are a measure of scale small ß large r small ß small r large ß large r large ß small r

We create a GLM with 2 predictors when 1=2 fMRI Signal + Residuals = + when 2=0.5 = Design Matrix x Betas “what we CAN explain” “how much of it we CAN explain” “what we CANNOT explain” = “our data” x + Statistical significance is basically a ratio of explained to unexplained variance

The “Linear” in GLM The GLM assumes that activation adds linearly Much more on this next lecture Poldrack, Mumford & Nichols, 2011 fMRI Data Analysis

Correlated Predictors Where possible, avoid predictors that are highly correlated with one another This is why we NEVER include a baseline predictor baseline predictor is almost completely correlated with the sum of existing predictors + r = -.53 = r = -.53 r = -.95 Two stimulus predictors Baseline predictor

Which model accounts for this data? x β = 1 x β = 0 + OR + x β = 1 x β = 0 + + x β = 0 x β = -1 Because the predictors are highly correlated, the model is overdetermined and you can’t tell which beta combo is best

Orthogonalizing Regressors

Orthogonalizing Regressors Outcome depends highly on which regressor goes into the model first Use only with caution! Poldrack, Mumford & Nichols, 2011 fMRI Data Analysis

= + Maximizing Your Power signal noise As we saw earlier, the GLM is basically comparing the amount of signal to the amount of noise How can we improve our stats? increase signal decrease noise increase sample size (keep subject in longer)

How to Reduce Noise If you can’t get rid of an artifact, you can include it as a “predictor of no interest” to soak up variance Example: Some people include predictors from the outcome of motion correction algorithms Corollary: Never leave out predictors for conditions that will affect your data (e.g., error trials) This works best when the motion is uncorrelated with your paradigm (predictors of interest)

Including First Derivative Some recommend including the first derivative of the HRF-convolved predictor can soak up some of the variance due to misestimations of the HRF

Now do you understand why we did temporal filtering? raw data high- pass low- band- Poldrack, Mumford & Nichols, 2011 fMRI Data Analysis

Reducing Residuals

Alternative to Filtering Rather than filtering low frequencies from our raw data, we can include a “discrete cosine basis set” that soaks up variance due to low frequency noise Poldrack, Mumford & Nichols, 2011 fMRI Data Analysis

Contrasts: Examples with Real Data

Sam’s Paradigm: Localizer for Ventral-Stream Visual Areas Fusiform Face Area

Contrasts in the GLM We can examine whether a single predictor is significant (compared to the baseline) R L z = -20 We can also examine whether a single predictor is significantly greater than another predictor

Contrast Vectors Houses Faces Objects Bodies Scram Faces - Baseline +1 +1 Faces - Houses -1 Faces - Objects Faces - Bodies Faces - Scrambled

Balanced Contrasts Unbalanced Balanced β 1 2 1 1 1 Condition Contrast -1 +1 Σ=-3 β 1 2 xβ Σ=-2 Contrast -1 +4 Σ=-0 β 1 2 xβ 8 Σ=4 If you do not balance the contrast, you are comparing one condition vs. the sum of all the others If you balance the contrast, you are comparing one condition vs. the average of all the others

Problems with Bulk Contrasts β β 1 2 1 1 1 2 2 2 2 .5 Condition Condition Balanced: Faces vs. Other Balanced: Faces vs. Other Contrast -1 +4 Σ=0 β 1 2 xβ 8 Σ=4 Contrast -1 +4 Σ=0 β 2 0.5 xβ -2 8 -0.5 Σ=1.5 Bulk contrasts can be significant if only a subset of conditions differ

Conjunctions (sometimes called Masking) Houses Faces Objects Bodies Scram Faces - Baseline +1 Faces - Houses -1 Faces - Objects Faces - Bodies Faces - Scrambled AND AND AND AND To describe this in text: [(Faces > Baseline) AND (Faces > Houses) AND (Faces > Objects) AND (Faces > Bodies) AND (Faces > Scrambled)]

Conjunction Example Faces – Houses Faces – Objects Faces – Bodies Scrambled Faces – Baseline Superimposed Maps Conjunction

P Values for Conjunctions If the contrasts are independent: e.g., [(Faces > Houses) AND (Scrambled > Baseline)] pcombined = (psinglecontrast)numberofcontrasts e.g., pcombined = (0.05)2 = 0.0025 If the contrasts are non-independent: e.g., [(Faces > Houses) AND (Faces > Baseline)] pcombined is less straightforward to compute http://mindhive.mit.edu/node/90

Real Voxel: GLM Here’s the time course from a voxel in right FFA (defined by conjunction) GLM Data, Model, and Residuals dfpredictors = # of predictors dfresidual = dftotal - dfpredictors dftotal = #volumes - 1 262 volumes (time points) GLM predictors account for (0.784)2 = 61% of variance

e.g., tFace = βFace/seFace Real Voxel: Betas t = β/se e.g., tFace = βFace/seFace tFace = 1.371/0.076 = 18.145 t(5,261)= 18.145  p < .000001

Real Voxel: Contrasts Σ[Contrast x β] = 0 x 0.964 + 1 x 1.371 = 1.371 – 0.687 = 0.684

Dealing with Faulty Assumptions

What’s this #*%&ing reviewer complaining about?! Correction for multiple comparisons Correction for serial correlations only necessary for data from single subjects not necessary for group data

Types of Errors Type I Error HIT Type II Error Correct Rejection Is the region truly active? Does our stat test indicate that the region is active? Yes No HIT Type I Error Type II Error Correct Rejection p value: probability of a Type I error e.g., p <.05 “There is less than a 5% probability that a voxel our stats have declared as “active” is in reality NOT active Slide modified from Duke course

Dead Salmon 130,000 voxels no correction for multiple comparisons poster at Human Brain Mapping conference, 2009 130,000 voxels no correction for multiple comparisons

Fishy Headlines

Mega-Multiple Comparisons Problem Typical 3T Data Set 30 slices x 64 x 64 = 122,880 voxels of (3 mm)3 If we choose p < 0.05… 122,880 voxels x 0.05 = approx. 6144 voxels should be significant due to chance alone We can reduce this number by only examining voxels inside the brain ~64,000 voxels (of (3 mm)3) x 0.05 = 3200 voxels significant by chance

Possible Solutions to Multiple Comparisons Problem Bonferroni Correction small volume correction Cluster Correction False Discovery Rate Gaussian Random Field Theory Test-Retest Reliability

Bonferroni Correction divide desired p value by number of comparisons Example: desired p value: p < .05 number of voxels in brain: 64,000 required p value: p < .05 / 64,000  p < .00000078 Variant: small-volume correction only search within a limited space brain cortical surface region of interest reduces the number of voxels and thus the severity of Bonferroni Drawback: overly conservative assumes that each voxel is independent of others not true – adjacent voxels are more likely to be sig in fMRI data than non-adjacent voxels

Cluster Correction falsely activated voxels should be randomly dispersed set minimum cluster size (k) to be large enough to make it unlikely that a cluster of that size would occur by chance some algorithms assume that data from adjacent voxels are uncorrelated (not true) some algorithms (e.g., Brain Voyager) estimate and factor in spatial smoothness of maps cluster threshold may differ for different contrasts Drawbacks: handicaps small regions (e.g., subcortical foci) more than large regions researcher can test many combinations of p values and k values and publish the one that looks the best

False Discovery Rate “controls the proportion of rejected hypotheses that are falsely rejected” (Type II errors) standard p value (e.g., p < .01) means that a certain proportion of all voxels will be significant by chance (1%) FDR uses q value (e.g., q < .01), meaning that a certain proportion of the “activated” (colored) voxels will be significant by chance (1%) Drawbacks very conservative when there is little activation; less conservative when there is a lot of activation

Gaussian Random Field Theory Fundamental to SPM If data are very smooth, then the chance of noise points passing threshold is reduced Can correct for the number of “resolvable elements” (“resels”) rather than number of voxels Drawback: Requires smoothing Slide modified from Duke course

Test-Retest Reliability Perform statistical tests on each half of the data The probability of a given voxel appearing in both purely by chance is the square of the p value used in each half e.g., .001 x .001 = .000001 Alternatively, use the first half to select an ROI and the second half to test your hypothesis Drawback: By splitting your data in half, you’re reducing your statistical power to see effects

Sanity Checks: “Poor Man’s Bonferroni” For casual data exploration, not publication Jack up the threshold till you get rid of the schmutz (especially in air, ventricles, white matter – may be real) If you have a comparison where one condition is expected to produce much more activity than the other, turn on both tails of the comparison If two areas are symmetrically active, they’re less likely to be due to chance (only works for bilateral areas) Jody’s rule of thumb: “If ya can’t trust the negatives, can ya trust the positives?” Too subjective for serious use Example: MT localizer data Moving rings > stationary rings (orange) Stationary rings > moving rings (blue)

Have We Been So Obsessed with Limiting Type I Error that Type II Error is Out of Control? Is the region truly active? Does our stat test indicate that the region is active? Yes No HIT Type I Error Type II Error Correct Rejection Slide modified from Duke course

Comparison of Methods simulated data uncorrected -high Type I -low Type II Bonferroni -low Type I -high Type II FDR -low Type I -low Type II Poldrack, Mumford & Nichols, 2011 fMRI Data Analysis

Strategies for Exploration vs. Publication Deductive approach Have a specific hypothesis/contrast planned Run all your subjects Run the stats as planned Publish Inductive approach Run a few subjects to see if you’re on the right track Spend a lot of time exploring the pilot data for interesting patterns “Find the story” in the data You may even change the experiment, run additional subjects, or run a follow-up experiment to chase the story While you need to use rigorous corrections for publication, do not be overly conservative when exploring pilot data or you might miss interesting trends Random effects analyses can be quite conservative so you may want to do exploratory analyses with fixed effects (and then run more subjects if needed so you can publish random effects)

What’s this #*%&ing reviewer complaining about?! Correction for multiple comparisons Correction for serial correlations only necessary for data from single subjects not necessary for group data stay tuned to find out why: Group Data lecture

Correction for Temporal Correlations When analyzing a single subject, degrees of freedom = number of volumes – 1 e.g., if our run has 200 volumes (400 s long if TR = 2), then df = 199 Statistical methods assume that each of our time points is independent. In the case of fMRI, this assumption is false. Even in a “screen saver scan”, activation in a voxel at one time is correlated with it’s activation within ~6 sec This artificially inflates your statistical significance.

Autocorrelation function time To calculate the magnitude of the problem, we can compute the autocorrelation function on the residuals For a voxel or ROI, correlate its time course with itself shifted in time Plot these correlations by the degree of shift original shift by 1 volume shift by 2 volumes If there’s no autocorrelation, function should drop from 1 to 0 abruptly – pink line The points circled in yellow suggest there is some autocorrelation, especially at a shift of 1, called AR(1)

BV can correct for the autocorrelation to yield revised (usually lower) p values BEFORE AFTER

BV Preprocessing Options

Temporal Smoothing of Data We have the option in our software to temporally smooth our data (i.e., remove high temporal frequencies or “low-pass filter”) However, I recommended that you not use this option Now do you understand why?

To Localize or Not to Localise?

To Localize or Not to Localise? Neuroimagers can’t even agree how to SPELL localiser/localizer!

Methodological Fundamentalism The latest review I received…

Approach #1: Voxelwise Statistics Run a statistical contrast for every voxel in your search volume. Correct for multiple comparisons. Find a bunch of blobs.

Voxelwise Approach: Example Malach et al., 1995, PNAS Question: Are there areas of the human brain that are more responsive to objects than scrambled objects You will recognize this as what we now call an LO localizer, but Malach was the first to identify LO LO (red) responds more to objects, abstract sculptures and faces than to textures, unlike visual cortex (blue) which responds well to all stimuli LO activation is shown in red, behind MT+ activation in green

Approach #2: Region of interest (ROI) analysis Identify a region of interest Functional ROI Anatomical Functional-Anatomical images from O’Reilly et al., 2012, SCAN Perform statistical contrasts for the ROI data in an INDEPENDENT data set Because the runs that are used to generate the area are independent from those used to test the hypothesis, liberal statistical thresholds (e.g., p < .05) can be used

Localizer Scan A separate scan conducted to identify functional regions of interest

Example of ROI Approach Culham et al., 2003, Experimental Brain Research Does the Lateral Occipital Complex compute object shape for grasping? Step 1: Localize LOC Intact Objects Scrambled Objects

Example of ROI Approach Culham et al., 2003, Experimental Brain Research Does the Lateral Occipital Complex compute object shape for grasping? Step 2: Extract LOC data from experimental runs Grasping Reaching NS p = .35 NS p = .31

Example of ROI Approach Very Simple Stats % BOLD Signal Change Left Hem. LOC Subject Grasping Reaching 1 0.02 0.03 2 0.19 0.08 3 0.04 0.01 4 0.10 0.32 5 1.01 -0.27 6 0.16 0.09 7 0.12 Then simply do a paired t-test to see whether the peaks are significantly different between conditions Extract average peak from each subject for each condition NS p = .35 NS p = .31 Instead of using % BOLD Signal Change, you can use beta weights You can also do a planned contrast in Brain Voyager using a module called the ROI GLM

Example: The Danger of ROI Approaches Example 1: LOC may be a heterogeneous area with subdivisions; ROI analyses gloss over this Example 2: Some experiments miss important areas (e.g., Kanwisher et al., 1997 identified one important face processing area -- the fusiform face area, FFA -- but did not report a second area that is a very important part of the face processing network -- the occipital face area, OFA -- because it was less robust and consistent than the FFA.

Pros and Cons: Voxelwise Approach Benefits Require no prior hypotheses about areas involved Include entire brain May identify subregions of known areas that are implicated in a function Doesn’t require independent data set Drawbacks Requires conservative corrections for multiple comparisons vulnerable to Type II errors Neglects individual differences in brain regions poor for some types of studies (e.g., topographic areas) Can lose spatial resolution with intersubject averaging Requires speculation about areas involved

Pros and Cons: ROI Approach Benefits Extraction of ROI data can be subjected to simple stats Elimination of mega multiple comparisons problem greatly improves statistical power (e.g., p < .05) Hypothesis-driven Useful when hypotheses are motivated by other techniques (e.g., electrophysiology) in specific brain regions ROI is not smeared due to intersubject averaging Important for discriminating abutting areas (e.g., V1/V2) Easy to analyze and interpret Can be useful for dissecting factorial design data in an unbiased manner Drawbacks Neglects other areas that may play a fundamental role If multiple ROIs need to be considered, you can spend a lot of scan time collecting localizer data (thus limiting the time available for experimental runs) Works best for reliable and robust areas with unambiguous definitions Sometimes you can’t find an ROI in some subjects Selection of ROIs can be highly subjective and error-prone

A Proposed Resolution There is no reason not to do BOTH ROI analyses and voxelwise analyses ROI analyses for well-defined key regions Voxelwise analyses to see if other regions are also involved Ideally, the conclusions will not differ If the conclusions do differ, there may be sensible reasons Effect in ROI but not voxelwise perhaps region is highly variable in stereotaxic location between subjects perhaps voxelwise approach is not powerful enough Effect in voxelwise but not ROI perhaps ROI is not homogenous or is context-specific

The War of Non-Independence

Finding the Obvious A priori probability of getting JQKA sequence = (1/13)4 = 1/28,561 A posteriori probability of getting JQKA sequence = 1/1 = 100% Non-independence error occurs when statistical tests performed are not independent from the means used to select the brain region Arguments from Vul & Kanwisher, book chapter in press

Non-independence Error Egregious example Identify Area X with contrast of A > B Do post hoc stats showing that A is statistically higher than B Act surprised!!! More subtle example of selection bias Do post hoc stats showing that A is statistically higher than C and C is statistically greater than B Arguments from Vul & Kanwisher, book chapter in press Figure from Kriegeskorte et al., 2009, Nature Neuroscience

Double Dipping & How to Avoid It Kriegeskorte et al., 2009, Nature Neuroscience surveyed 134 papers in prestiguous journals 42% showed at least one example of non-independence error

Correlations Between Individual Subjects’ Brain Activity and Behavioral Measures Sample of Critiqued Papers: Eisenberg, Lieberman & Williams, 2003, Science measured fMRI activity during social rejection correlated self-reported distress with brain activity found r = .88 in anterior cingulate cortex, an area implicated in physical pain perception concluded “rejection hurts” social exclusion > inclusion

“Voodoo Correlations” The original title of the paper was not well-received by reviewers so it was changed even though some people still use the term Voodoo 2009 reliability of personality and emotion measures: r ~ .7 reliability of activation in a given voxel: r ~ .7 highest expected behavior: fMRI correlation is ~.74 so how can we have behavior: fMRI correlations of r ~.9?!

“Voodoo Correlations” "Notably, 53% of the surveyed studies selected voxels based on a correlation with the behavioral individual-differences measure and then used those same data to compute a correlation within that subset of voxels." Vul et al., 2009, Perspectives on Psychological Science

Avoiding “Voodoo” Use independent means to select region and then evaluate correlation Do split-half reliability test WARNING: This is reassuring that the result can be replicated in your sample but does not demonstrate that result generalizes to the population

Is the “voodoo” problem all that bad? High correlations can occur in legitimately analyzed data Did voxelwise analyses use appropriate correction for multiple comparisons? then result is statistically significant regardless of specific correlation Is additional data being used for inference purposes? if they pretend to provide independent support, that’s bad presentation purposes? alternative formats can be useful in demonstrating that data is clean (e.g., time courses look sensible; correlations are not driven by outliers)