Presentation on theme: "Multiple comparison correction"— Presentation transcript:
1 Multiple comparison correction Klaas Enno Stephan Laboratory for Social and Neural Systems ResearchInstitute for Empirical Research in EconomicsUniversity of ZurichFunctional Imaging Laboratory (FIL)Wellcome Trust Centre for NeuroimagingUniversity College LondonWith many thanks for slides & images to:FIL Methods groupMethods & models for fMRI data analysis in neuroeconomics 21 October 2009
2 Overview of SPM p <0.05 Image time-series Kernel Design matrix Statistical parametric map (SPM)RealignmentSmoothingGeneral linear modelStatisticalinferenceGaussianfield theoryNormalisationp <0.05TemplateParameter estimates
3 Voxel-wise time series analysis modelspecificationparameterestimationhypothesisstatisticTimeTimeBOLD signalsingle voxeltime seriesSPM
4 Inference at a single voxel NULL hypothesis H0: activation is zerou = p(t > u | H0)p-value: probability of getting a value of t at least as extreme as u.If is small we reject the null hypothesis.We can choose u to ensure a voxel-wise significance level of .t distributiont =contrast of estimated parametersvariance estimate
5 Student's t-distribution t-distribution is an approximation to the normal distribution for small samplesFor high degrees of freedom (large samples), t approximates Z.Sn = sample standard deviation = population standard deviation-5-4-3-2-1123450.050.10.150.18.104.22.1680.4n=1=2=5=10=
6 Types of error Actual condition Test result H0 true H0 false False positive (FP)Type I error αTrue positive(TP)Reject H0Test resultFalse negative (FN)Type II error βFailure toreject H0True negative(TN)specificity: 1-= TN / (TN + FP)= proportion of actual negatives which are correctly identifiedsensitivity (power): 1-= TP / (TP + FN)= proportion of actual positives which are correctly identified
7 Assessing SPMs High Threshold Med. Threshold Low Threshold Good Specificity Poor Power (risk of false negatives)Poor Specificity (risk of false positives) Good Power
9 Use of ‘uncorrected’ p-value, =0.1 11.3%12.5%10.8%11.5%10.0%10.7%11.2%10.2%9.5%Use of ‘uncorrected’ p-value, =0.1Percentage of Null Pixels that are False PositivesUsing an ‘uncorrected’ p-value of 0.1 will lead us to conclude on average that 10% of voxels are active when they are not.This is clearly undesirable. To correct for this we can define a nullhypothesis for images of statistics.
10 Family-wise null hypothesis Activation is zero everywhere.If we reject a voxel null hypothesisat any voxel, we reject the family-wisenull hypothesisA false-positive anywhere in the imagegives a Family Wise Error (FWE).Family-Wise Error (FWE) rate = ‘corrected’ p-value
11 Use of ‘uncorrected’ p-value, =0.1 Use of ‘corrected’ p-value, =0.1FWE
12 The Bonferroni correction The family-wise error rate (FWE), , for a family of N independentvoxels isα = Nvwhere v is the voxel-wise error rate.Therefore, to ensure a particular FWE, we can usev = α / NBUT ...
13 The Bonferroni correction Independent voxelsSpatially correlated voxelsBonferroni correction assumes independence of voxels this is too conservative for brain images,which always have a degree of smoothness
14 Smoothness (inverse roughness) roughness = 1/smoothnessintrinsic smoothnessMRI signals are aquired in k-space (Fourier space); after projection on anatomical space, signals have continuous supportdiffusion of vasodilatory molecules has extended spatial supportextrinsic smoothnessresampling during preprocessingmatched filter theorem deliberate additional smoothing to increase SNRdescribed in resolution elements: "resels"resel = size of image part that corresponds to the FWHM (full width half maximum) of the Gaussian convolution kernel that would have produced the observed image when applied to independent voxel values# resels is similar, but not identical to # independent observationscan be computed from spatial derivatives of the residuals
15 (“lattice approximation”) Random Field TheoryConsider a statistic image as a discretisation of a continuous underlying random field with a certain smoothnessUse results from continuous random field theoryDiscretisation(“lattice approximation”)
16 Euler characteristic (EC) Topological measurethreshold an image at uEC # blobsat high u:p (blob) = E [EC]therefore (under H0)FWE, = E [EC]
17 Euler characteristic (EC) for 2D images R = number of reselsZT = Z value thresholdWe can determine that Z threshold for which E[EC] = At this threshold, every remaining voxel represents a significant activation, corrected for multiple comparisons across the search volume.Example: For 100 resels, E [EC] = for a Z threshold of 3.8. That is, the probability of getting one or more blobs where Z is greater than 3.8, isExpected EC values for an image of 100 resels
18 Euler characteristic (EC) for any image Computation of E[EC] can be generalized to volumes of any dimension, shape and size (Worsley et al. 1996).When we have an a priori hypothesis about where an activation should be, we can reduce the search volume:mask defined by (probabilistic) anatomical atlasesmask defined by separate "functional localisers"mask defined by orthogonal contrasts(spherical) search volume around previously reported coordinatesWorsley et al A unified statistical approach for determining significant signals in images of cerebral activation. Human Brain Mapping, 4, 58–83.small volume correction (SVC)
19 Computing EC wrt. search volume and threshold E(u) () ||1/2 (u 2 -1) exp(-u 2/2) / (2)2 Search region R3( volume||1/2 roughnessAssumptionsMultivariate NormalStationary*ACF twice differentiable at 0StationarityResults valid w/out stationarityMore accurate when stat. holds
20 Voxel, cluster and set level tests Regional specificitySensitivityVoxel level test:intensity of a voxelCluster level test:spatial extent above uSet level test:number of clusters above u
21 False Discovery Rate (FDR) Familywise Error Rate (FWE)probability of one or more false positive voxels in the entire imageFalse Discovery Rate (FDR)FDR = E(V/R) (R voxels declared active, V falsely so)proportion of activated voxels that are false positives
23 Control of Per Comparison Rate at 10% 11.3%12.5%10.8%11.5%10.0%10.7%11.2%10.2%9.5%Percentage of False PositivesControl of Familywise Error Rate at 10%Occurrence of Familywise ErrorFWEControl of False Discovery Rate at 10%6.7%10.4%14.9%9.3%16.2%13.8%14.0%10.5%12.2%8.7%Percentage of Activated Voxels that are False Positives
24 Benjamini & Hochberg procedure Select desired limit q on FDROrder p-values, p(1) p(2) ... p(V)Let r be largest i such thatReject all null hypotheses corresponding to p(1), ... , p(r).1p(i)p-valuep(i) (i/V) q(i/V) qq = proportion of all selected voxels (i.e. i/V) for which null hypothesis is rejectedi/V1i/V = proportion of all selected voxelsBenjamini & Hochberg, JRSS-B (1995) 57:
25 Real Data: FWE correction with RFT ThresholdS = 110,7762 2 2 voxels 5.1 5.8 6.9 mm FWHMu = 9.870Result5 voxels above the threshold-log10 p-value
26 Real Data: FWE correction with FDR Thresholdu = 3.83Result3,073 voxels above threshold
27 Caveats concerning FDR Current methodological discussions concern the question whether standard FDR implementations are actually valid for neuroimaging data.Chumbley & Friston 2009, NeuroImage: the fMRI signal is spatially extended, it does not have compact support → inference should therefore not be about single voxels, but about topological features of the signal (e.g. peaks or clusters)
28 Caveats concerning FDR “Imagine that we declare a hundred voxels significant using an FDR criterion. 95 of these voxels constitute a single region that is truly active. The remaining five voxels are false discoveries and are dispersed randomly over the search space. In this example, the false discovery rate of voxels conforms to its expectation of 5%. However, the false discovery rate in terms of regional activations is over 80%. This is because we have discovered six activations but only one is a true activation.” (Chumbley & Friston 2009, NeuroImage)Possible alternative: FDR on topological features (e.g. peaks, clusters)
29 ConclusionsCorrections for multiple testing are necessary to control the false positive risk.FWEVery specific, not so sensitiveRandom Field TheoryInference about topological features (peaks, clusters)Excellent for large sample sizes (e.g. single-subject analyses or large group analyses)Afford littles power for group studies with small sample size consider non-parametric methods (not discussed in this talk)FDRLess specific, more sensitiveInterpret with care!represents false positive risk over whole set of selected voxelsvoxel-wise inference (which has been criticised)
30 Further readingChumbley JR, Friston KJ. False discovery rate revisited: FDR and topological inference using Gaussian random fields. Neuroimage. 2009;44(1):62-70.Friston KJ, Frith CD, Liddle PF, Frackowiak RS. Comparing functional (PET) images: the assessment of significant change. J Cereb Blood Flow Metab Jul;11(4):690-9.Genovese CR, Lazar NA, Nichols T. Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage Apr;15(4):870-8.Worsley KJ Marrett S Neelin P Vandal AC Friston KJ Evans AC. A unified statistical approach for determining significant signals in images of cerebral activation. Human Brain Mapping 1996;4:58-73.
Your consent to our cookies if you continue to use this website.