Estimating Power for fMRI & Classification Directions in fMRI Thomas Nichols Clinical Imaging Centre GlaxoSmithKline.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Inference in the Simple Regression Model
Type I and Type II errors
Power of a test. power The power of a test (against a specific alternative value) Is a tests ability to detect a false hypothesis Is the probability that.
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
HYPOTHESIS TESTING. Purpose The purpose of hypothesis testing is to help the researcher or administrator in reaching a decision concerning a population.
June 14. In Chapter 9: 9.1 Null and Alternative Hypotheses 9.2 Test Statistic 9.3 P-Value 9.4 Significance Level 9.5 One-Sample z Test 9.6 Power and Sample.
Mkael Symmonds, Bahador Bahrami
Chapter 9 Introduction to the t-statistic
Statistical Inferences Based on Two Samples
Pitfalls of Hypothesis Testing + Sample Size Calculations.
Topological Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course London, May 2014 Many thanks to Justin.
Probability & Statistical Inference Lecture 7 MSc in Computing (Data Analytics)
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Multiple comparison correction Methods & models for fMRI data analysis 29 October 2008 Klaas Enno Stephan Branco Weiss Laboratory (BWL) Institute for Empirical.
Inference about a Mean Part II
Topic 3: Regression.
Review of Stats Fundamentals
Chapter 9 Hypothesis Testing.
Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Choosing Statistical Procedures
2nd Level Analysis Jennifer Marchant & Tessa Dekker
Statistical Inference Decision Making (Hypothesis Testing) Decision Making (Hypothesis Testing) A formal method for decision making in the presence of.
1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©
Hypothesis Testing.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
7/16/2014Wednesday Yingying Wang
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
SPM Course Zurich, February 2015 Group Analyses Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London With many thanks to.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Inen 460 Lecture 2. Estimation (ch. 6,7) and Hypothesis Testing (ch.8) Two Important Aspects of Statistical Inference Point Estimation – Estimate an unknown.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
1 Identifying Robust Activation in fMRI Thomas Nichols, Ph.D. Assistant Professor Department of Biostatistics University of Michigan
Topological Inference Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course London, May 2015 With thanks to Justin.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
fMRIPower- Calculating power for group fMRI studies
Chapter 9 Hypothesis Testing.
Group Analyses Guillaume Flandin SPM Course London, October 2016
IEE 380 Review.
Topological Inference
Chapter 8 Hypothesis Testing with Two Samples.
Wellcome Trust Centre for Neuroimaging University College London
Hypothesis Testing: Hypotheses
Hypothesis Tests for a Population Mean in Practice
Group analyses Thanks to Will Penny for slides and content
Wellcome Trust Centre for Neuroimaging University College London
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Chapter 7 Hypothesis Testing with One Sample.
Review: What influences confidence intervals?
LESSON 20: HYPOTHESIS TESTING
Statistical inference
Topological Inference
Chapter 9: Hypothesis Tests Based on a Single Sample
Group analyses Thanks to Will Penny for slides and content
STAT Z-Tests and Confidence Intervals for a
Statistical Inference for Managers
One Way ANOVAs One Way ANOVAs
CHAPTER 6 Statistical Inference & Hypothesis Testing
CHAPTER 6 Statistical Inference & Hypothesis Testing
Reasoning in Psychology Using Statistics
Statistical Challenges in “Big Data” Human Neuroimaging
Reasoning in Psychology Using Statistics
STA 291 Spring 2008 Lecture 13 Dustin Lueker.
Chapter 9 Dummy Variables Undergraduated Econometrics Page 1
Statistical Power.
Wellcome Trust Centre for Neuroimaging University College London
Inference about Population Mean
Statistical inference
Presentation transcript:

Estimating Power for fMRI & Classification Directions in fMRI Thomas Nichols Clinical Imaging Centre GlaxoSmithKline

Overview Power Exploration –ROIs (small/big, lots/few) ? GD Mitsis, GD Iannetti, TS Smart, I Tracey & R Wise Regions of interest analysis in pharmacological fMRI: How do the definition criteria influence the inferred result? Epub NeuroImage Power Prediction Classification

Power Review: 1 Test Power: The probability of rejecting H 0 when H A is true Specify your null distribution –Mean=0, variance=σ 2 Specify the effect size (Δ), which leads to alternative distribution Specify the false positive rate, α α Power Null Distribution Alternative Distribution Δ/σΔ/σ

Power: 100,000 Tests? Avoid Multiple Testing Problem if possible –Typically study will use well-characterized paradigm –Expected region of response should be known But… –Variation in functional and structural anatomy –“Perfect” region never known Should we use focal ROI? Voxel-wise search in neighborhood? Over whole brain anyway?

Qualitative Power Exploration Simplified power setting –Not voxel-wise; instead largish (>1000 voxel) VOIs –Large VOIs: Assuming σ within << σ between Hence different sized VOI’s will have similar variance –Large VOIs: Assuming independence between VOIs Consider impact of many vs. fewer VOI’s –Many VOIs Better follows anatomy, possible shape of signal Worse multiple testing correction –Fewer VOIs Will dilute localized signal Fewer tests to correct for

Atlas 0 (AAL) k = 116 regions α FWE = (surrogate for correlated voxel-wise search) Atlas 3 k = 17 regions α FWE = Atlas 1 (AAL symmetric) k = 58 regions α FWE = Atlas 4 (Lobar AAL) k = 6 regions α FWE = Atlas 2 k = 28 regions α FWE = Atlas 5 (whole GM) k = 1 region α FWE = AAL & Derived ROI Atlases

Atlas 0 (AAL) k = 116 regions α FWE = Signal # VOIs = 1 Strength = 100% Atlas 3 k = 17 regions α FWE = Signal # VOIs = 1 Strength = 4.9% Atlas 1 (AAL symmetric) k = 58 regions α FWE = Signal # VOIs = 1 Strength = 47% Atlas 4 (Lobar AAL) k = 6 regions α FWE = Signal # VOIs = 1 Strength = 0.6% Atlas 2 k = 28 regions α FWE = Signal # VOIs = 1 Strength = 47% Atlas 5 (whole GM) k = 1 region α FWE = Signal # VOIs = 1 Strength 0.1% L Amygdala

Power: L Amygdala, True ROI True ROI best (of course) Rich ROI atlas (k=116) beats coarser atlases –Dilution more punishing than greater multiple testing

Power: L Amygdala, Shifted ROI True ROI best Wrong (unshifted) ROI next Rich ROI atlas still beats coarser atlases

Power: ½ of Mid-Cingulate Whole Mid- Cing ROI best Again, huge (k=116) atlas next best But we’ve assumed RFX –No precision gain for large ROI’s, as shrinking σ WiN is no help

Power: ½ of Mid-Cingulate: FFX Whole Mid- Cing ROI best Now Symmetric AAL atlas (k=58) best! –If σ BTW small, precision increase with large ROIs has impact

Power Exploration Conclusions Compared Range of Scales –Whole Brain, Lobar (k=6),…, AAL (k=116) Focal structures – Focal ROI’s best More extended signals, with heterogeneity –Rich atlas best Dilution of signal worse than Bonferroni But whole-brain always less powerful than reduced volume –Suggests voxel-wise / “Multiple Endpoint” result preferred, constrained coarsely

Why Doesn’t Bonf. Hurt More? H 0 True H 0 False Reject H 0 Type I Error α Power Accept H 0 Correct Type II Error Truth (unobserved) Test Result (observed) Example –1100 total voxels –100 voxels have β=Δ A test with 50% power on average will detect 50 of these voxels with true activation –1000 voxels have β=0 α=5% implies on average 50 null voxels will have false positives 1 Signal ROI –1 opportunity for a positive 100 Signal Voxels –100 opportunities for a positive

Formal Power analysis N: Number of Subjects –Adjusted to achieve sufficient power α: The size of the test you’d like to use –Commonly set to 0.05 (5% false positive rate) Δ: The size of the effect you’re interested in detecting –Based on intuition or similar studies σ 2 : The variance of Δ –Has a complicated structure with very little intuition –Depends on many things …

Power for Group fMRI... Time Subject 1 Temporal autocorr. Cov(Y)=σ 2 w V Subject N... Between subject variability, σ 2 B... Subject 2 J. Mumford & TE. Nichols. NeuroImage 39:261–268,

Level 1 Y k : T k -vector timeseries for subject k X k : T k  p design matrix β k : p-vector of parameters ε k : T k -vector error term, Cov(ε k )=σ 2 k V k = β k0 β k1 β k2 β k3 + Y k = X k β K + ε k

Level 2 cβ k X g : N  p g design matrix β g : p g -vector of parameters ε g : N-vector error term –Cov(ε g ) = V g = diag { c(X T k V k -1 X k ) -1 σ k 2 c T } + σ B 2 I N ^ β cont ^ = β g1 β g2 + =XgXg βgβg + εgεg Within subject variability Between subject variability

Alternative distribution For a specific H A :c g β g =Δ t is distributed T n-pg, ncp –ncp= Δ/c g (X g T V g -1 X g )c g T NαΔσ2σ2 cgcg XgXg σ2WVσ2WVσ2Bσ2B cXkXk σ2kσ2k V k (σ WN,σ AR,ρ) # subjFPREffect Mag.2 nd Level Model known guessed Effect SD W/in Subj SDBtw Subj SD 1 st Level ModelNoise Mag.Noise Autocorrelation

Model Block design 15s on 15s off TR=3s Hrf: Gamma, sd=3 Parameters estimated from Block study –FIAC single subject data –Read 3 little pigs Same/different speaker, same/different sentence Looked at blocks with same sentence same speaker

Power as a function of run length and sample size Assumes fixed maximal scanner time 21 Ss optimal Btw 23 and 18 subjects sufficient –17 subjects cannot obtain sufficient power

More importantly….cost! Cost to achieve 80% power Cost=$300 per subject+$10 per each extra minute

Power, Accounting for searching over space? S Hayasaka, AM Peiffer, CE Hugenschmidt, PJ Laurienti.Power and sample size calculation for neuroimaging studies by non-central random field theory. NeuroImage 37 (2007) 721–730

Univariate vs. Multivariate Mass Univariate Modelling –Model each voxel independently (account for dependence at inference stage) –Great for localization –Doesn’t acknowledge spatial structure Multivariate Modelling –Model entire volume simultaneously –Explicitly uses spatial structure –Not as good for localization

Multivariate Classification: Classification of Subjects ICA Components appear to distinguish NC vs. SZ vs. BP –fMRI Experiment: Auditory oddball task But no one voxel responsible VD Calhoun, PK Maciejewski, GD Pearlson, KA Kiehl. Temporal Lobe and ‘‘Default’’ Hemodynamic Brain Modes Discriminate Between Schizophrenia and Bipolar Disorder. Human Brain Mapping, Epub 2007 Sep 25

Multivariate Classification: Classification of Subjects ICA Components appear to distinguish NC vs. SZ vs. BP –fMRI Experiment: Auditory oddball task But no one voxel responsible VD Calhoun, PK Maciejewski, GD Pearlson, KA Kiehl. Temporal Lobe and ‘‘Default’’ Hemodynamic Brain Modes Discriminate Between Schizophrenia and Bipolar Disorder. Human Brain Mapping, Epub 2007 Sep 25

Multivariate Classification Even very simple method can give very good performance –Define average IC grp for each group –Label subj k with group that has minimum Euclidian distance (btw IC k & IC grp )

Multivariate Classification: Prediction Time Series

Inferring Experience Based Cognition from Virtual Reality fMRI Greg Siegle, Walter Schneider, Maureen McHugo, Melissa Thomas, Lori Koerbel, Lena Gemmer, Kate Fissell, Sudhir Pathak, Dan Jones, Kevin Jarbo University of Pittsburgh Pittsburgh Brain Activity Interpretation Competition

Virtual Reality fMRI Paradigm –Subjects explore neighborhood, looking for fruit, guns, dogs –11 features rated continuously e.g. arousal, valance, movement, dog, cell phone, etc –3 Sessions of fMRI data Features only given for 1 st 2 sessions Inferring Cognition R 2 = minutes

Very different methods gave similar scores (based on pre- and post- processing) Similar methods (e.g., support vector machines) gave very different results. Arousal Valence Hits SearchPeople SearchWeapons SearchFruit Instructions Dog Faces FruitsVegetables WeaponsTools InteriorExterior Velocity st place Correlation Surprisingly accurate results

Lessons from Contest Pre-processing mattered –Detrending details had big impact Multivariate, but not un- informed –Winners used masks Weighting salient voxels, ignoring uninformative ones –Post-processing clean up In general, extensive tuning per feature to be predicted Subject14 visual cortex Use for “Interior Exterior” Subject13 auditory cortex Use for “Dog”

Conclusions Power for fMRI –Focused ROI’s, but not too focused –Exact power predictions possible As always, based on guesses Classification –Uses entire brain to predict subject identity or cognitive state –New direction, methods still evolving e.g. Support Vector Machines work well, but never with out appreciable feature selection/tuning