Download presentation

Published byNatalie Simpson Modified over 4 years ago

1
**Analysis of fMRI data using Support Vector Machine (SVM) Janaina Mourao-Miranda**

2
Recently, pattern recognition methods have been used to analyze fMRI data with the goal of decoding the information represented in the subject’s brain at a particular time. Carlson, T.A., Schrater, P., He, S. (2003) Patterns of activity in the categorical representations of objects. J Cogn Neurosci.. Cox, D.D., Savoy, R.L. (2003). Functional magnetic resonance imaging (fMRI) "brain reading": detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage. Mourão-Miranda, J., Bokde, A. L.W., Born, C., Hampel, H., Stetter, S. (2005) Classifying brain states and determining the discriminating activation patterns: support vector machine on functional MRI data. NeuroImage Davatzikos,C. Ruparel, K., Fan, Y., Shen, D.G., Acharyya, M., Loughead, J.W., Gur, R.C. and Langleben, D.D. (2005) Classifying spatial patterns of brain activity with machine learning methods: Application to lie detection. NeuroImage Haynes, J.D. and Rees, G. (2005) Predicting the orientation of invisible stimuli from activity in human primary visual cortex. Nature Neuroscience. 8: Kriegeskorte, N., Goebel, R. and Bandettini, P. (2006) Information-based functional brain mapping. PANAS. LaConte, S., Strother, S., Cherkassky, V., Anderson, J. and Hu, X. (2005) Support vector machines for temporal classification of block design fMRI data. NeuroImage. Mitchell, T.M., Hutchinson, R., Niculescu, R.S., Pereira, F., Wang, X., Just, M., Newman, S. (2004). Learning to Decode Cognitive States from Brain Images. Machine Learning. Mourão-Miranda, J., Reynaud, E., McGlone, F., Calvert, G., Brammer, M. (2006) The impact of temporal compression and space selection on SVM analysis of single-subject and multi-subject fMRI data.. NeuroImage (accepted) Norman, K.A., Polyn, S.M., Detre, G.J., Haxby, J.V. (2006) Beyond mind-reading: multivoxel pattern analysis of fMRI data. Trends in Cognitive Sciences. Haynes, J.D. and Rees, G. (2006) Decoding mental states from brain activity in humans. Nature Reviews. Neuroscience.

3
**Pattern recognition is a field within the area of machine learning**

Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 No mathematical model available Learning Methodology Automatic procedures that learn a task from a series of examples Learning/Training Generate a function or hypothesis f such that Training Examples: (X1, y1), (X2, y2), . . .,(Xn, yn) Test Prediction Test Example Xi f(xi) -> yi f(Xi) = yi f

4
**Machine Leaning Methods**

Artificial Neural Networks Decision Trees Bayesian Networks Support Vector Machines .. SVM is a classifier derived from statistical learning theory by Vapnik and Chervonenkis SVMs introduced by Boser, Guyon, Vapnik in COLT-92 Powerful tool for statistical pattern recognition

5
**Support Vector Machine SVM Aplications Face/Object Recognition**

fMRI data analysis Support Vector Machine Handwritten Digit Recognition Classification of Microarray Gene Expression Data Texture Classification Protein Structure Prediction

6
**Classical approach: Mass-univariate Analysis**

fMRI Data Analyis Classical approach: Mass-univariate Analysis Input Output e.g. GLM Time Intensity BOLD signal Map: Activated regions task 1 vs. task 2 1. Voxel time series 2. Experimental Design Pattern recognition approach: Multivariate Analysis Input Output SVM - training … Standard methods for fMRI data analysis use time series as input and have as output a statistic parametric map with the most activated regions. It is a voxel based analysis (univariate). It doesn’t take in account the spatial correlation of the data. In the SVM approach each fMRI volume is treated as a spatial pattern and the method is used to map these patterns to instantaneous brain state. Volumes from task 1 … Map: Discriminating regions between task 1 and task 2 Volumes from task 2 SVM - test New example Prediction: task 1 or task 2

7
**fMRI data as input to a classifier**

Each fMRI volume is treated as a vector in a extremely high dimensional space (~200,000 voxels or dimensions after the mask) fMRI volume feature vector (dimension = number of voxels)

8
**Binary classification can be viewed as a task of finding a hyperplane**

4 2 task 2 volume in t1 volume in t3 volume in t2 volume in t4 task 1 task ? voxel 2 volume in t2 volume in t1 w volume in t4 volume in t3 2 4 If we imagine a brain with only 2 voxels we can see the classification problem in 2D space. The binary classification problem can be viewed as a task of finding a hyperplane which separetes the two conditions. The hyperplane is described by a weight vector w and an offset b. To find a separating hyperplane we applied two different approaches: FLD and SVM volume from a new subject voxel 1

9
**Simplest Approach: Fisher Linear Discriminant**

voxel 2 thr w Projections onto the learning weight vector w Projection of X1(t1) voxel 1 The FLD classifies by projecting the training set on the axis that is defined by the difference between the center of mass for both classes, corrected for the within-class covariance. The FLD classifies by projecting the training set onto the axis that is defined by the difference between the center of mass for both classes (tasks), corrected for within class covariance. Here there we can see an example illustrating how important is to correct for the within class covariance. The ellipses represent distribution of training examples from both classes In the first picture we can see a lot of overlap between the projections of both classes. After correct for the within class covariance there is no more overlap between the projections and we can get a correct classification for the data.

10
**Optimal Hyperplane Which of the linear separators is optimal?**

(X1,+1) w voxel 2 Data: <Xi,yi>, i=1,..,N Observations: Xi Rd Labels: yi {-1,+1} (X2,-1) voxel 1 All hyperplanes in Rd are parameterized by a vector (w) and a constant b. They can be expressed as w•X+b=0 Our aim is to find such a hyperplane/decision function f(X)=sign(w•X+b), that correctly classify our data: f(X1)=+1 and f(X2)=-1 The SVM represents a “large margin classifier”, which selects from many possible solutions the most robust one. We can see in the picture that there are many possible hyperplanes that separates the data. However a classifier that does very well on the trainin data might not generalize well to unseen examples.

11
**Optimal hyperplane: Largest Marging Classifier**

Among all hyperplanes separating the data there is a unique optimal hyperplane, the one which presents the largest margin (the distance of the closest points to the hyperplane). Let us consider that all test points are generated by adding bounded noise (r) to the training examples (test and training data are assumed to have been generate by the same underlying dependence). r If the optimal hyperplane has margin >r it will correctly separate the test points.

12
**Support Vector Machine (SVM)**

w Data: <Xi,yi>, i=1,..,N Observations: Xi Rd Labels: yi {-1,+1} d Support vectors Xi Optimal hyperplane Margin The distance between the separating hyperplane and a sample Xi is d = |(w•Xi+b)|/||w|| Assuming that a margin exists, all training patterns obey the inequality yid ≥ , i=1,…,n Substituting d into the previous equation yi|(w•Xi+b)|/||w|| ≥ Thus maximizing the margin is equivalent to minimizing the norm of w To limit the number of solutions we fix the scale of the product ||w|| = 1 Finding an optimal hyperplane is a quadratic optimization problem with linear constrains and can be formally stated as: Determine w and b that minimize the functional (w) = ||w||2/2 subject to the constraints yi[(w•Xi)+b) ≥ 1, i=1,…,n The solution has the form: w = ΣαiyiXi b = wXi-yi for any Xi such that αi 0 The examples Xi for which αi > 0 are called the Support Vectors. Among all hyperplanes separating the data there is a unique optimal hyperplane, the one which presents the largest margin.

13
**Weight vector (Discriminating Volume)**

How to interpret the learning weight vector (Discriminating Volume)? 1 4 2 3 2.5 4.5 0.5 0.3 1.5 task1 task2 H: Hyperplane w Weight vector (Discriminating Volume) W = [ ] 0.45 0.89 The value of each voxel in the discriminating volume indicates the importance of such voxel in differentiating between the two classes or brain states.

14
**Advantage of using Multivariate Methods**

Voxel 2 Voxel 1 Voxel 1: there is no mean difference Voxel 2: there is mean difference Univariate analysis: only detects activation in voxel 2 SVM (Multivatiate analysis): gives weight for both voxels

15
**Patter Recognition Method: General Procedure**

Pre-processing: Normalization Realignment Smooth Split data: training and test Dimensionality Reduction (e.g. PCA) and/or feature selection (e.g. ROI) SVM training and test Output: Accuracy Disciminating Maps (SVM weight vector)

16
Applications

17
**Can we classify brain states using the whole brain information from different subjects?**

18
**? Training Subjects Test Subject Machine Learning Method:**

fMRI scanner ? Brain looking at a pleasant stimulus fMRI scanner fMRI scanner Brain looking at an unpleasant stimulus fMRI scanner Machine Learning Method: Support Vector Machine Brain looking at a pleasant stimulus fMRI scanner Brain looking at an unpleasant stimulus The subject was viewing a pleasant stimuli

19
**Pre-Processing Procedures**

Application I Number of subjects: 16 Tasks: Viewing unpleasant and pleasant pictures (6 blocks of 7 scans) Pre-Processing Procedures Motion correction, normalization to standard space, spatial filter. Mask to select voxels inside the brain. Leave one-out-test Training: 15 subjects Test: 1 subject This procedure was repeated 16 times and the results (error rate) were averaged.

20
**Discriminating volume**

Pre-processing PCA SVM Output: Accuracy Discriminating volume Spatial observations unpleasant pleasant 1.00 0.66 0.33 0.05 -0.05 -0.33 -0.66 -1.00 Spatial weight vector z=-18 z=-6 z=6 z=18 z=30 z=42

21
**Can we classify groups using the whole brain information from different subjects?**

22
**Pattern Classification of Brain Activity in Depression**

Collaboration with Cynthia H.Y. Fu TP=74% TN=63%

23
**Can we improve the accuracy by averaging time points?**

24
**First Approach: Second Approach: Third Approach:**

Use single volumes as training examples Second Approach: Use the average of the volumes within the block as training examples (one example per block) Third Approach: Use block-specific estimators as training examples

25
**Multi-subject Classifier **

Impact of temporal compression and spatial selection on SVM accuracy Whole data (A) No Temporal Compression Split data: training and test SVM training and test SVD/PCA (B) Temporal Compression I Split data: training and test SVM training and test SVD/PCA Average volumes within the blocks (C) Temporal Compression II Split data: training and test SVM training and test SVD/PCA GLM analysis: block-specific estimator Unpleasant Neutral Pleasant

26
**Can we improve the accuracy by using ROIs?**

27
**Fourth Approach: Space restriction (training with the ROIs selected by GLM)**

Fifth Approach: Space restriction (training with the ROIs selected by SVM)

28
**Multi-subject Classifier **

Impact of temporal compression and spatial selection on SVM accuracy (B) Space Selection by the GLM (C) Space Selection by the SVM (A) Whole brain Split data: training and test SVM training and test SVD/PCA Split data: training and test SVM training and test GLM analysis Select activated voxels based on GLM SVD/PCA Split data: training and test SVM training and test SVM analysis Select discriminating voxels based on SVM SVD/PCA Unpleasant Neutral Pleasant

29
**Single-subject Classifier **

Impact of temporal compression and spatial selection on SVM accuracy (A) Whole Brain (B) Temporal Compression (C) Space Selection by the GLM (D) Temporal Compression + Space Selection Split data: training and test SVM training and test SVD/PCA Split data: training and test SVM training and test SVD/PCA Average volumes within the blocks Split data: training and test SVM training and test GLM analysis Select activated voxels based on GLM SVD/PCA Split data: training and test SVM training and test GLM analysis Select activated voxels based on GLM Average volumes within the blocks SVD/PCA Unpleasant Neutral Pleasant

30
**Multi-subject Classifier**

Summary Multi-subject Classifier Method Mean accuracy Whole brain (no temporal compression) 62.00% Temporal compression I (average) 83.68% Temporal compression II (betas) 80.55% Space selection I (GLM) 57.59% Space selection II (SVM) 62.85% Space selection I and Temporal compression I 79.86% Space selection II and Temporal compression I 82.99% Single-subject Classifier Method Mean accuracy Whole brain 69.34% Temporal compression I 75.00% Space selection I 81.84% Space selection I and Temporal compression I 85.41%

31
**How similar are the results of the GLM and the SVM?**

32
**Contrast between viewing unpleasant (red scale) and neutral pictures (blue scale)**

(A) Standard GLM analysis Spmt Random Effect p-value < (uncorrected) (B) SVM - Whole Data (C) SVM - Time Compressed Data

33
**Contrast between viewing unpleasant (red scale) and pleasant pictures (blue scale)**

(A) Standard GLM analysis Spmt Random Effect p-value < (uncorrected) (B) SVM - Whole Data (C) SVM - Time Compressed Data

34
**Contrast between viewing neutral (red scale) and pleasant pictures (blue scale)**

(A) Standard GLM analysis Spmt Random Effect p-value < (uncorrected) (B) SVM - Whole Data (C) SVM - Time Compressed Data

35
**Does the classifier works for ER designs?**

36
**Stimuli (pleasant, unpleasant and neutral pictures**

Block vs. ER designs Stimuli (pleasant, unpleasant and neutral pictures Block 1 2 3 4 5 6 7 8 9 10 11 12 13 Time (TR=3s) Event 1 2 Time (TR=3s)

37
**(A) Event-related (ER)**

Block Design vs. ER Whole data (A) Event-related (ER) (B) Block as ER (C) Block Leave-one-out Leave-one-out Leave-one-out SVD/PCA SVD/PCA SVD/PCA Average 2 volumes “within” the event Average 2 volumes within the block Average all volumes within the block SVM Training SVM Training SVM Training Unpleasant Neutral Pleasant

38
**Summary SVM results: ER design (2 class SVM)**

Approach used to define train and test example Accuracy for unpleasant Accuracy for Neutral Scans 1 and 2 66% 64% Mean of scans 1 and 2 72% 65% Scan 1 63% Scan 2 68% 70% SVM results: Block design (2 class SVM) Approach used to define train and test example Accuracy for unpleasant Accuracy for Neutral Scans 1 to 7 80% Mean of scans 1 to 7 84% 88%

39
**Discriminating volume (SVM weight vector): **

unpleasant (red scale) x neutral (blue scale) Block Design ER Design

40
**Can we make use of the temporal dimension in decoding?**

41
**Unpleasant or Pleasant Stimuli**

Spatial Observation Unpleasant or Pleasant Stimuli Fixation

42
**Discriminating volume**

Spatial SVM Pre-processing PCA SVM Output: Accuracy Discriminating volume Spatial observations 1.00 0.66 0.33 0.05 -0.05 -0.33 -0.66 -1.00 unpleasant pleasant z=-18 z=-6 z=6 z=18 z=30 z=42 Spatial weight vector Unpleasant Pleasant

43
**Spatial Temporal Observation**

Duty Cycle Unpleasant or Pleasant Stimuli Fixation vt1 vt8 vt9 vt2 vt10 vt3 vt11 vt4 vt12 vt5 vt13 vt6 vt14 vt7 Vi = [v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14]

44
**Spatiotemporal SVM: Block Design**

Spatiotemporal observations (4D data including all volumes within the duty cycle) Pre-processing PCA SVM Output: Accuracy Dynamic Discriminating volume 1.00 0.45 0.22 0.05 -0.05 -0.22 -0.45 -1.00 unpleasant pleasant Training example: Whole duty cycle B Unpleasant Pleasant T6 T7 T5 T3 T1 T2 T4 T9 T8 T14 T13 T10 T11 T12

45
**T1 T2 T3 T4 T5 T6 T7 Spatial-Temporal weight vector:**

Dynamic discriminating map T1 T2 1.00 0.45 0.22 0.05 -0.05 -0.22 -0.45 -1.00 unpleasant pleasant T3 T4 T5 T6 T7 z=-18 z=-6 z=6 z=18 z=30 z=42

46
**T8 T9 T10 T11 T12 T13 T14 Spatial-Temporal weight vector:**

Dynamic discriminating map T8 T9 1.00 0.45 0.22 0.05 -0.05 -0.22 -0.45 -1.00 unpleasant pleasant T10 T11 T12 T13 T14

47
**T5 unpleasant pleasant z=-18 A C B D 1.00 0.45 0.22 0.05 -0.05 -0.22**

-0.45 -1.00 unpleasant pleasant T5 z=-18 A C B D

48
**T5 A unpleasant B pleasant z=-6 1.00 0.45 0.22 0.05 -0.05 -0.22 -0.45**

-1.00 unpleasant pleasant T5 B z=-6

49
**T5 unpleasant pleasant z=-6 C E D F 1.00 0.45 0.22 0.05 -0.05 -0.22**

-0.45 -1.00 unpleasant pleasant T5 z=-6 C E D F

50
**The Brain Image Analysis Unit (BIAU)**

Similar presentations

OK

Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.

Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google