Download presentation

Presentation is loading. Please wait.

Published byNatalie Simpson Modified over 2 years ago

1
Analysis of fMRI data using Support Vector Machine (SVM) Janaina Mourao-Miranda

2
Recently, pattern recognition methods have been used to analyze fMRI data with the goal of decoding the information represented in the subjects brain at a particular time. Carlson, T.A., Schrater, P., He, S. (2003) Patterns of activity in the categorical representations of objects. J Cogn Neurosci.. Cox, D.D., Savoy, R.L. (2003). Functional magnetic resonance imaging (fMRI) "brain reading": detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage. Mourão-Miranda, J., Bokde, A. L.W., Born, C., Hampel, H., Stetter, S. (2005) Classifying brain states and determining the discriminating activation patterns: support vector machine on functional MRI data. NeuroImage Davatzikos,C. Ruparel, K., Fan, Y., Shen, D.G., Acharyya, M., Loughead, J.W., Gur, R.C. and Langleben, D.D. (2005) Classifying spatial patterns of brain activity with machine learning methods: Application to lie detection. NeuroImage Haynes, J.D. and Rees, G. (2005) Predicting the orientation of invisible stimuli from activity in human primary visual cortex. Nature Neuroscience. 8: Kriegeskorte, N., Goebel, R. and Bandettini, P. (2006) Information-based functional brain mapping. PANAS. LaConte, S., Strother, S., Cherkassky, V., Anderson, J. and Hu, X. (2005) Support vector machines for temporal classification of block design fMRI data. NeuroImage. Mitchell, T.M., Hutchinson, R., Niculescu, R.S., Pereira, F., Wang, X., Just, M., Newman, S. (2004). Learning to Decode Cognitive States from Brain Images. Machine Learning. Mourão-Miranda, J., Reynaud, E., McGlone, F., Calvert, G., Brammer, M. (2006) The impact of temporal compression and space selection on SVM analysis of single-subject and multi-subject fMRI data.. NeuroImage (accepted) Norman, K.A., Polyn, S.M., Detre, G.J., Haxby, J.V. (2006) Beyond mind-reading: multivoxel pattern analysis of fMRI data. Trends in Cognitive Sciences. Haynes, J.D. and Rees, G. (2006) Decoding mental states from brain activity in humans. Nature Reviews. Neuroscience.

3
Pattern recognition is a field within the area of machine learning Input: X 1 X 2 X 3 Output y 1 y 2 y 3 Learning/Training Generate a function or hypothesis f such that Training Examples: (X 1, y 1 ), (X 2, y 2 ),...,(X n, y n ) Test Prediction Test Example X i f(x i ) -> y i f(X i ) = y i f Learning Methodology Automatic procedures that learn a task from a series of examples No mathematical model available Supervised Learning

4
Machine Leaning Methods Artificial Neural Networks Decision Trees Bayesian Networks Support Vector Machines.. SVM is a classifier derived from statistical learning theory by Vapnik and Chervonenkis SVMs introduced by Boser, Guyon, Vapnik in COLT-92 Powerful tool for statistical pattern recognition

5
SVM Aplications Face/Object Recognition Texture Classification Classification of Microarray Gene Expression Data fMRI data analysis Protein Structure Prediction Support Vector Machine Handwritten Digit Recognition

6
e.g. GLM InputOutput Map: Activated regions task 1 vs. task 2 Classical approach: Mass-univariate Analysis SVM - training InputOutput Volumes from task 1 Volumes from task 2 … … Map: Discriminating regions between task 1 and task 2 Pattern recognition approach: Multivariate Analysis SVM - test Prediction: task 1 or task 2 Time Intensity BOLD signal 1. Voxel time series 2. Experimental Design New example fMRI Data Analyis

7
Each fMRI volume is treated as a vector in a extremely high dimensional space (~200,000 voxels or dimensions after the mask) fMRI data as input to a classifier fMRI volume feature vector (dimension = number of voxels)

8
Binary classification can be viewed as a task of finding a hyperplane voxel 1 voxel 2 w volume in t 1 volume in t 2 volume in t 4 volume from a new subject volume in t L R 42 task 2 volume in t 1 volume in t 3 volume in t 2 volume in t 4 task 2task 1 task ?

9
Simplest Approach: Fisher Linear Discriminant voxel 1 voxel 2 w thr w Projections onto the learning weight vector The FLD classifies by projecting the training set on the axis that is defined by the difference between the center of mass for both classes, corrected for the within-class covariance. Projection of X 1 (t 1 )

10
Which of the linear separators is optimal? voxel 1 voxel 2 Optimal Hyperplane w Data:, i=1,..,N Observations: X i R d Labels: y i {-1,+1} All hyperplanes in R d are parameterized by a vector (w) and a constant b. They can be expressed as wX+b=0 Our aim is to find such a hyperplane/decision function f(X)=sign(wX+b), that correctly classify our data: f(X 1 )=+1 and f(X 2 )=-1 (X 1,+1) (X 2,-1)

11
If the optimal hyperplane has margin >r it will correctly separate the test points. r Among all hyperplanes separating the data there is a unique optimal hyperplane, the one which presents the largest margin (the distance of the closest points to the hyperplane). Let us consider that all test points are generated by adding bounded noise (r) to the training examples (test and training data are assumed to have been generate by the same underlying dependence). Optimal hyperplane: Largest Marging Classifier

12
Support Vector Machine (SVM) The distance between the separating hyperplane and a sample X i is d = |(wX i +b)|/||w|| Assuming that a margin exists, all training patterns obey the inequality y i d, i=1,…,n Substituting d into the previous equation y i |(wX i +b)|/||w|| Thus maximizing the margin is equivalent to minimizing the norm of w To limit the number of solutions we fix the scale of the product ||w|| = 1 Finding an optimal hyperplane is a quadratic optimization problem with linear constrains and can be formally stated as: Determine w and b that minimize the functional (w) = ||w|| 2 /2 subject to the constraints y i [(wX i )+b) 1, i=1,…,n The solution has the form: w = Σα i y i X i b = wX i -y i for any X i such that α i 0 The examples X i for which α i > 0 are called the Support Vectors. Data:, i=1,..,N Observations: X i R d Labels: y i {-1,+1} w Support vectors Optimal hyperplane d XiXi Margin

13
How to interpret the learning weight vector (Discriminating Volume)? Weight vector (Discriminating Volume) W = [ ] task1task2task1task2task1task2 H: Hyperplane w The value of each voxel in the discriminating volume indicates the importance of such voxel in differentiating between the two classes or brain states

14
Voxel 1 Voxel 2 Voxel 1: there is no mean difference Voxel 2: there is mean difference Univariate analysis: only detects activation in voxel 2 SVM (Multivatiate analysis): gives weight for both voxels Advantage of using Multivariate Methods

15
Patter Recognition Method: General Procedure Split data: training and test SVM training and test Dimensionality Reduction (e.g. PCA) and/or feature selection (e.g. ROI) Pre-processing: Normalization Realignment Smooth Output: -Accuracy -Disciminating Maps -(SVM weight vector)

16
Applications

17
Can we classify brain states using the whole brain information from different subjects?

18
fMRI scanner ? Machine Learning Method: Support Vector Machine The subject was viewing a pleasant stimuli Brain looking at a pleasant stimulus Brain looking at an unpleasant stimulus fMRI scanner Brain looking at a pleasant stimulus Brain looking at an unpleasant stimulus Training Subjects Test Subject

19
Application I Number of subjects: 16 Tasks: Viewing unpleasant and pleasant pictures (6 blocks of 7 scans) Pre-Processing Procedures Motion correction, normalization to standard space, spatial filter. Mask to select voxels inside the brain. Leave one-out-test Training: 15 subjects Test: 1 subject This procedure was repeated 16 times and the results (error rate) were averaged.

20
Spatial observations Pre-processing Output: -Accuracy -Discriminating volume SVM PCA unpleasant pleasant z=-18z=-6z=6z=18z=30z=42 Spatial weight vector

21
Can we classify groups using the whole brain information from different subjects?

22
TP=74% TN=63% Collaboration with Cynthia H.Y. Fu Pattern Classification of Brain Activity in Depression

23
Can we improve the accuracy by averaging time points?

24
First Approach: Use single volumes as training examples Second Approach: Use the average of the volumes within the block as training examples (one example per block) Third Approach: Use block-specific estimators as training examples

25
Multi-subject Classifier Impact of temporal compression and spatial selection on SVM accuracy Whole data (A) No Temporal Compression Split data: training and test SVM training and test SVD/PCA (B) Temporal Compression I Split data: training and test SVM training and test SVD/PCA Average volumes within the blocks Unpleasant Neutral Pleasant (C) Temporal Compression II Split data: training and test SVM training and test SVD/PCA GLM analysis: block- specific estimator

26
Can we improve the accuracy by using ROIs?

27
Fourth Approach: Space restriction (training with the ROIs selected by GLM) Fifth Approach: Space restriction (training with the ROIs selected by SVM)

28
Multi-subject Classifier Impact of temporal compression and spatial selection on SVM accuracy (B) Space Selection by the GLM Split data: training and test SVM training and test GLM analysis Select activated voxels based on GLM SVD/PCA (C) Space Selection by the SVM Split data: training and test SVM training and test SVM analysis Select discriminating voxels based on SVM SVD/PCA Unpleasant Neutral Pleasant (A) Whole brain Split data: training and test SVM training and test SVD/PCA

29
Single-subject Classifier Impact of temporal compression and spatial selection on SVM accuracy (B) Temporal Compression Split data: training and test SVM training and test SVD/PCA Average volumes within the blocks (A) Whole Brain Split data: training and test SVM training and test SVD/PCA (C) Space Selection by the GLM Split data: training and test SVM training and test GLM analysis Select activated voxels based on GLM SVD/PCA (D) Temporal Compression + Space Selection Split data: training and test SVM training and test GLM analysis Select activated voxels based on GLM Average volumes within the blocks SVD/PCA Unpleasant Neutral Pleasant

30
MethodMean accuracy Whole brain (no temporal compression)62.00% Temporal compression I (average)83.68% Temporal compression II (betas)80.55% Space selection I (GLM)57.59% Space selection II (SVM)62.85% Space selection I and Temporal compression I79.86% Space selection II and Temporal compression I82.99% Summary Multi-subject Classifier MethodMean accuracy Whole brain69.34% Temporal compression I75.00% Space selection I81.84% Space selection I and Temporal compression I85.41% Single-subject Classifier

31
How similar are the results of the GLM and the SVM?

32
Contrast between viewing unpleasant (red scale) and neutral pictures (blue scale) (A) Standard GLM analysis Spmt Random Effect p-value < (uncorrected) (B) SVM - Whole Data (C) SVM - Time Compressed Data

33
Contrast between viewing unpleasant (red scale) and pleasant pictures (blue scale) (A) Standard GLM analysis Spmt Random Effect p-value < (uncorrected) (B) SVM - Whole Data (C) SVM - Time Compressed Data

34
Contrast between viewing neutral (red scale) and pleasant pictures (blue scale) (A) Standard GLM analysis Spmt Random Effect p-value < (uncorrected) (B) SVM - Whole Data (C) SVM - Time Compressed Data

35
Does the classifier works for ER designs?

36
0 Time (TR=3s) Time (TR=3s) Block Block vs. ER designs Stimuli (pleasant, unpleasant and neutral pictures Event

37
(B) Block as ER (A) Event-related (ER) Whole data Leave-one-out SVM Training Average 2 volumes within the event Leave-one-out SVM Training (C) Block Leave-one-out SVM Training Average 2 volumes within the block Average all volumes within the block SVD/PCA Unpleasant Neutral Pleasant Block Design vs. ER

38
SVM results: ER design (2 class SVM) Approach used to define train and test example Accuracy for unpleasant Accuracy for Neutral Scans 1 and 266%64% Mean of scans 1 and 272%65% Scan 163%65% Scan 268%70% SVM results: Block design (2 class SVM) Approach used to define train and test example Accuracy for unpleasant Accuracy for Neutral Scans 1 to 780% Mean of scans 1 to 784%88% Summary

39
Block Design Discriminating volume (SVM weight vector): unpleasant (red scale) x neutral (blue scale) ER Design

40
Can we make use of the temporal dimension in decoding?

41
Fixation Unpleasant or Pleasant Stimuli Spatial Observation

42
Spatial SVM Spatial observations Pre-processing Output: -Accuracy -Discriminating volume SVM unpleasant pleasant z=-18z=-6z=6z=18z=30z=42 Spatial weight vector Unpleasant Pleasant PCA

43
Fixation Unpleasant or Pleasant Stimuli v t2 v t3 v t4 v t5 v t6 v t7 v t9 v t10 v t11 v t12 v t13 v t14 v t8 v t1 Duty Cycle Spatial Temporal Observation V i = [v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 v 9 v 10 v 11 v 12 v 13 v 14 ]

44
Spatiotemporal SVM: Block Design Spatiotemporal observations (4D data including all volumes within the duty cycle) Output: -Accuracy -Dynamic Discriminating volume Unpleasant Pleasant B Training example: Whole duty cycle Pre-processingSVM PCA T1 T2T3 T4 T5T6 T7 T8 T9T10T11T12T13T unpleasant pleasant

45
Spatial-Temporal weight vector: Dynamic discriminating map unpleasant pleasant T1 T2 T3 T4 T5 T6 T7 z=-18z=-6z=6z=18z=30z=42

46
T8 T9 T10 T11 T12 T13 T unpleasant pleasant Spatial-Temporal weight vector: Dynamic discriminating map

47
T unpleasant pleasant z=-18 AC BD

48
T5 z= unpleasant pleasant B A

49
T5 z= unpleasant pleasant C D E F

50
The Brain Image Analysis Unit (BIAU)

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google