Recently, pattern recognition methods have been used to analyze fMRI data with the goal of decoding the information represented in the subject’s brain.

Slides:



Advertisements
Similar presentations
FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.
Advertisements

3.6 Support Vector Machines
1 1  1 =.
On Sequential Experimental Design for Empirical Model-Building under Interval Error Sergei Zhilin, Altai State University, Barnaul, Russia.
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
The basics for simulations
Statistical Machine Learning- The Basic Approach and Current Research Challenges Shai Ben-David CS497 February, 2007.
Introduction to Support Vector Machines (SVM)
Support Vector Machine
Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.
CS525: Special Topics in DBs Large-Scale Data Management
General Linear Model L ύ cia Garrido and Marieke Schölvinck ICN.
When you see… Find the zeros You think….
Before Between After.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
9. Two Functions of Two Random Variables
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Compiler Construction
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
A (very) brief introduction to multivoxel analysis “stuff” Jo Etzel, Social Brain Lab
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Principal Component Analysis
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Multi-voxel Pattern Analysis (MVPA) and “Mind Reading” By: James Melrose.
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
How To Do Multivariate Pattern Analysis
Comparison of Boosting and Partial Least Squares Techniques for Real-time Pattern Recognition of Brain Activation in Functional Magnetic Resonance Imaging.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
MVPA Tutorial Last Update: January 18, 2012 Last Course: Psychology 9223, W2010, University of Western Ontario Last Update:
FMRI Methods Lecture7 – Review: analyses & statistics.
Current work at UCL & KCL. Project aim: find the network of regions associated with pleasant and unpleasant stimuli and use this information to classify.
An Introduction to Support Vector Machines (M. Law)
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
MVPD – Multivariate pattern decoding Christian Kaul MATLAB for Cognitive Neuroscience.
An Introduction to Support Vector Machine (SVM)
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
C O R P O R A T E T E C H N O L O G Y Information & Communications Neural Computation Machine Learning Methods on functional MRI Data Siemens AG Corporate.
Support Vector Machines
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
functional magnetic resonance imaging (fMRI)
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
The General Linear Model
Multi-Voxel Pattern Analyses MVPA
Classification of fMRI activation patterns in affective neuroscience
Basic machine learning background with Python scikit-learn
The General Linear Model (GLM)
The General Linear Model
The General Linear Model
The General Linear Model (GLM)
Adaptive multi-voxel representation of stimuli, rules and responses
Other Classification Models: Support Vector Machine (SVM)
The General Linear Model
The General Linear Model
The General Linear Model
Presentation transcript:

Analysis of fMRI data using Support Vector Machine (SVM) Janaina Mourao-Miranda

Recently, pattern recognition methods have been used to analyze fMRI data with the goal of decoding the information represented in the subject’s brain at a particular time. Carlson, T.A., Schrater, P., He, S. (2003) Patterns of activity in the categorical representations of objects. J Cogn Neurosci.. Cox, D.D., Savoy, R.L. (2003). Functional magnetic resonance imaging (fMRI) "brain reading": detecting and classifying distributed patterns of fMRI activity in human visual cortex. Neuroimage. Mourão-Miranda, J., Bokde, A. L.W., Born, C., Hampel, H., Stetter, S. (2005) Classifying brain states and determining the discriminating activation patterns: support vector machine on functional MRI data. NeuroImage Davatzikos,C. Ruparel, K., Fan, Y., Shen, D.G., Acharyya, M., Loughead, J.W., Gur, R.C. and Langleben, D.D. (2005) Classifying spatial patterns of brain activity with machine learning methods: Application to lie detection. NeuroImage Haynes, J.D. and Rees, G. (2005) Predicting the orientation of invisible stimuli from activity in human primary visual cortex. Nature Neuroscience. 8:686-91. Kriegeskorte, N., Goebel, R. and Bandettini, P. (2006) Information-based functional brain mapping. PANAS. LaConte, S., Strother, S., Cherkassky, V., Anderson, J. and Hu, X. (2005) Support vector machines for temporal classification of block design fMRI data. NeuroImage. Mitchell, T.M., Hutchinson, R., Niculescu, R.S., Pereira, F., Wang, X., Just, M., Newman, S. (2004). Learning to Decode Cognitive States from Brain Images. Machine Learning. Mourão-Miranda, J., Reynaud, E., McGlone, F., Calvert, G., Brammer, M. (2006) The impact of temporal compression and space selection on SVM analysis of single-subject and multi-subject fMRI data.. NeuroImage (accepted) Norman, K.A., Polyn, S.M., Detre, G.J., Haxby, J.V. (2006) Beyond mind-reading: multivoxel pattern analysis of fMRI data. Trends in Cognitive Sciences. Haynes, J.D. and Rees, G. (2006) Decoding mental states from brain activity in humans. Nature Reviews. Neuroscience.

Pattern recognition is a field within the area of machine learning Supervised Learning Input: X1 X2 X3 Output y1 y2 y3 No mathematical model available Learning Methodology Automatic procedures that learn a task from a series of examples Learning/Training Generate a function or hypothesis f such that Training Examples: (X1, y1), (X2, y2), . . .,(Xn, yn) Test Prediction Test Example Xi f(xi) -> yi f(Xi) = yi f

Machine Leaning Methods Artificial Neural Networks Decision Trees Bayesian Networks Support Vector Machines .. SVM is a classifier derived from statistical learning theory by Vapnik and Chervonenkis SVMs introduced by Boser, Guyon, Vapnik in COLT-92 Powerful tool for statistical pattern recognition

Support Vector Machine SVM Aplications Face/Object Recognition fMRI data analysis Support Vector Machine Handwritten Digit Recognition Classification of Microarray Gene Expression Data Texture Classification Protein Structure Prediction

Classical approach: Mass-univariate Analysis fMRI Data Analyis Classical approach: Mass-univariate Analysis Input Output e.g. GLM Time Intensity BOLD signal Map: Activated regions task 1 vs. task 2 1. Voxel time series 2. Experimental Design Pattern recognition approach: Multivariate Analysis Input Output SVM - training … Standard methods for fMRI data analysis use time series as input and have as output a statistic parametric map with the most activated regions. It is a voxel based analysis (univariate). It doesn’t take in account the spatial correlation of the data. In the SVM approach each fMRI volume is treated as a spatial pattern and the method is used to map these patterns to instantaneous brain state. Volumes from task 1 … Map: Discriminating regions between task 1 and task 2 Volumes from task 2 SVM - test New example Prediction: task 1 or task 2

fMRI data as input to a classifier Each fMRI volume is treated as a vector in a extremely high dimensional space (~200,000 voxels or dimensions after the mask) fMRI volume feature vector (dimension = number of voxels)

Binary classification can be viewed as a task of finding a hyperplane 4 2 task 2 volume in t1 volume in t3 volume in t2 volume in t4 task 1 task ? voxel 2 volume in t2 volume in t1 w volume in t4 volume in t3 2 4 If we imagine a brain with only 2 voxels we can see the classification problem in 2D space. The binary classification problem can be viewed as a task of finding a hyperplane which separetes the two conditions. The hyperplane is described by a weight vector w and an offset b. To find a separating hyperplane we applied two different approaches: FLD and SVM volume from a new subject voxel 1

Simplest Approach: Fisher Linear Discriminant voxel 2 thr w Projections onto the learning weight vector w Projection of X1(t1) voxel 1 The FLD classifies by projecting the training set on the axis that is defined by the difference between the center of mass for both classes, corrected for the within-class covariance. The FLD classifies by projecting the training set onto the axis that is defined by the difference between the center of mass for both classes (tasks), corrected for within class covariance. Here there we can see an example illustrating how important is to correct for the within class covariance. The ellipses represent distribution of training examples from both classes In the first picture we can see a lot of overlap between the projections of both classes. After correct for the within class covariance there is no more overlap between the projections and we can get a correct classification for the data.

Optimal Hyperplane Which of the linear separators is optimal? (X1,+1) w voxel 2 Data: <Xi,yi>, i=1,..,N Observations: Xi  Rd Labels: yi  {-1,+1} (X2,-1) voxel 1 All hyperplanes in Rd are parameterized by a vector (w) and a constant b. They can be expressed as w•X+b=0 Our aim is to find such a hyperplane/decision function f(X)=sign(w•X+b), that correctly classify our data: f(X1)=+1 and f(X2)=-1 The SVM represents a “large margin classifier”, which selects from many possible solutions the most robust one. We can see in the picture that there are many possible hyperplanes that separates the data. However a classifier that does very well on the trainin data might not generalize well to unseen examples.

Optimal hyperplane: Largest Marging Classifier Among all hyperplanes separating the data there is a unique optimal hyperplane, the one which presents the largest margin (the distance of the closest points to the hyperplane). Let us consider that all test points are generated by adding bounded noise (r) to the training examples (test and training data are assumed to have been generate by the same underlying dependence). r  If the optimal hyperplane has margin >r it will correctly separate the test points.

Support Vector Machine (SVM) w Data: <Xi,yi>, i=1,..,N Observations: Xi  Rd Labels: yi  {-1,+1} d  Support vectors Xi Optimal hyperplane Margin The distance between the separating hyperplane and a sample Xi is d = |(w•Xi+b)|/||w|| Assuming that a margin  exists, all training patterns obey the inequality yid ≥ , i=1,…,n Substituting d into the previous equation yi|(w•Xi+b)|/||w|| ≥  Thus maximizing the margin  is equivalent to minimizing the norm of w To limit the number of solutions we fix the scale of the product  ||w|| = 1 Finding an optimal hyperplane is a quadratic optimization problem with linear constrains and can be formally stated as: Determine w and b that minimize the functional (w) = ||w||2/2 subject to the constraints yi[(w•Xi)+b) ≥ 1, i=1,…,n The solution has the form: w = ΣαiyiXi b = wXi-yi for any Xi such that αi 0 The examples Xi for which αi > 0 are called the Support Vectors. Among all hyperplanes separating the data there is a unique optimal hyperplane, the one which presents the largest margin.

Weight vector (Discriminating Volume) How to interpret the learning weight vector (Discriminating Volume)? 1 4 2 3 2.5 4.5 0.5 0.3 1.5 task1 task2 H: Hyperplane w Weight vector (Discriminating Volume) W = [0.45 0.89] 0.45 0.89 The value of each voxel in the discriminating volume indicates the importance of such voxel in differentiating between the two classes or brain states.

Advantage of using Multivariate Methods Voxel 2 Voxel 1 Voxel 1: there is no mean difference Voxel 2: there is mean difference Univariate analysis: only detects activation in voxel 2 SVM (Multivatiate analysis): gives weight for both voxels

Patter Recognition Method: General Procedure Pre-processing: Normalization Realignment Smooth Split data: training and test Dimensionality Reduction (e.g. PCA) and/or feature selection (e.g. ROI) SVM training and test Output: Accuracy Disciminating Maps (SVM weight vector)

Applications

Can we classify brain states using the whole brain information from different subjects?

? Training Subjects Test Subject Machine Learning Method: fMRI scanner ? Brain looking at a pleasant stimulus fMRI scanner fMRI scanner Brain looking at an unpleasant stimulus fMRI scanner Machine Learning Method: Support Vector Machine Brain looking at a pleasant stimulus fMRI scanner Brain looking at an unpleasant stimulus The subject was viewing a pleasant stimuli

Pre-Processing Procedures Application I Number of subjects: 16 Tasks: Viewing unpleasant and pleasant pictures (6 blocks of 7 scans) Pre-Processing Procedures Motion correction, normalization to standard space, spatial filter. Mask to select voxels inside the brain. Leave one-out-test Training: 15 subjects Test: 1 subject This procedure was repeated 16 times and the results (error rate) were averaged.

Discriminating volume Pre-processing PCA SVM Output: Accuracy Discriminating volume Spatial observations unpleasant pleasant 1.00 0.66 0.33 0.05 -0.05 -0.33 -0.66 -1.00 Spatial weight vector z=-18 z=-6 z=6 z=18 z=30 z=42

Can we classify groups using the whole brain information from different subjects?

Pattern Classification of Brain Activity in Depression Collaboration with Cynthia H.Y. Fu TP=74% TN=63%

Can we improve the accuracy by averaging time points?

First Approach: Second Approach: Third Approach: Use single volumes as training examples Second Approach: Use the average of the volumes within the block as training examples (one example per block) Third Approach: Use block-specific estimators as training examples

Multi-subject Classifier Impact of temporal compression and spatial selection on SVM accuracy Whole data (A) No Temporal Compression Split data: training and test SVM training and test SVD/PCA (B) Temporal Compression I Split data: training and test SVM training and test SVD/PCA Average volumes within the blocks (C) Temporal Compression II Split data: training and test SVM training and test SVD/PCA GLM analysis: block-specific estimator Unpleasant Neutral Pleasant

Can we improve the accuracy by using ROIs?

Fourth Approach: Space restriction (training with the ROIs selected by GLM) Fifth Approach: Space restriction (training with the ROIs selected by SVM)

Multi-subject Classifier Impact of temporal compression and spatial selection on SVM accuracy (B) Space Selection by the GLM (C) Space Selection by the SVM (A) Whole brain Split data: training and test SVM training and test SVD/PCA Split data: training and test SVM training and test GLM analysis Select activated voxels based on GLM SVD/PCA Split data: training and test SVM training and test SVM analysis Select discriminating voxels based on SVM SVD/PCA Unpleasant Neutral Pleasant

Single-subject Classifier Impact of temporal compression and spatial selection on SVM accuracy (A) Whole Brain (B) Temporal Compression (C) Space Selection by the GLM (D) Temporal Compression + Space Selection Split data: training and test SVM training and test SVD/PCA Split data: training and test SVM training and test SVD/PCA Average volumes within the blocks Split data: training and test SVM training and test GLM analysis Select activated voxels based on GLM SVD/PCA Split data: training and test SVM training and test GLM analysis Select activated voxels based on GLM Average volumes within the blocks SVD/PCA Unpleasant Neutral Pleasant

Multi-subject Classifier Summary Multi-subject Classifier Method Mean accuracy Whole brain (no temporal compression) 62.00% Temporal compression I (average) 83.68% Temporal compression II (betas) 80.55% Space selection I (GLM) 57.59% Space selection II (SVM) 62.85% Space selection I and Temporal compression I 79.86% Space selection II and Temporal compression I 82.99% Single-subject Classifier Method Mean accuracy Whole brain 69.34% Temporal compression I 75.00% Space selection I 81.84% Space selection I and Temporal compression I 85.41%

How similar are the results of the GLM and the SVM?

Contrast between viewing unpleasant (red scale) and neutral pictures (blue scale) (A) Standard GLM analysis Spmt Random Effect p-value < 0.001 (uncorrected) (B) SVM - Whole Data (C) SVM - Time Compressed Data

Contrast between viewing unpleasant (red scale) and pleasant pictures (blue scale) (A) Standard GLM analysis Spmt Random Effect p-value < 0.001 (uncorrected) (B) SVM - Whole Data (C) SVM - Time Compressed Data

Contrast between viewing neutral (red scale) and pleasant pictures (blue scale) (A) Standard GLM analysis Spmt Random Effect p-value < 0.001 (uncorrected) (B) SVM - Whole Data (C) SVM - Time Compressed Data

Does the classifier works for ER designs?

Stimuli (pleasant, unpleasant and neutral pictures Block vs. ER designs Stimuli (pleasant, unpleasant and neutral pictures Block 1 2 3 4 5 6 7 8 9 10 11 12 13 Time (TR=3s) Event 1 2 Time (TR=3s)

(A) Event-related (ER) Block Design vs. ER Whole data (A) Event-related (ER) (B) Block as ER (C) Block Leave-one-out Leave-one-out Leave-one-out SVD/PCA SVD/PCA SVD/PCA Average 2 volumes “within” the event Average 2 volumes within the block Average all volumes within the block SVM Training SVM Training SVM Training Unpleasant Neutral Pleasant

Summary SVM results: ER design (2 class SVM) Approach used to define train and test example Accuracy for unpleasant Accuracy for Neutral Scans 1 and 2 66% 64% Mean of scans 1 and 2 72% 65% Scan 1 63% Scan 2 68% 70% SVM results: Block design (2 class SVM) Approach used to define train and test example Accuracy for unpleasant Accuracy for Neutral Scans 1 to 7 80% Mean of scans 1 to 7 84% 88%

Discriminating volume (SVM weight vector): unpleasant (red scale) x neutral (blue scale) Block Design ER Design

Can we make use of the temporal dimension in decoding?

Unpleasant or Pleasant Stimuli Spatial Observation Unpleasant or Pleasant Stimuli Fixation

Discriminating volume Spatial SVM Pre-processing PCA SVM Output: Accuracy Discriminating volume Spatial observations 1.00 0.66 0.33 0.05 -0.05 -0.33 -0.66 -1.00 unpleasant pleasant z=-18 z=-6 z=6 z=18 z=30 z=42 Spatial weight vector Unpleasant Pleasant

Spatial Temporal Observation Duty Cycle Unpleasant or Pleasant Stimuli Fixation vt1 vt8 vt9 vt2 vt10 vt3 vt11 vt4 vt12 vt5 vt13 vt6 vt14 vt7 Vi = [v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13 v14]

Spatiotemporal SVM: Block Design Spatiotemporal observations (4D data including all volumes within the duty cycle) Pre-processing PCA SVM Output: Accuracy Dynamic Discriminating volume 1.00 0.45 0.22 0.05 -0.05 -0.22 -0.45 -1.00 unpleasant pleasant Training example: Whole duty cycle B Unpleasant Pleasant T6 T7 T5 T3 T1 T2 T4 T9 T8 T14 T13 T10 T11 T12

T1 T2 T3 T4 T5 T6 T7 Spatial-Temporal weight vector: Dynamic discriminating map T1 T2 1.00 0.45 0.22 0.05 -0.05 -0.22 -0.45 -1.00 unpleasant pleasant T3 T4 T5 T6 T7 z=-18 z=-6 z=6 z=18 z=30 z=42

T8 T9 T10 T11 T12 T13 T14 Spatial-Temporal weight vector: Dynamic discriminating map T8 T9 1.00 0.45 0.22 0.05 -0.05 -0.22 -0.45 -1.00 unpleasant pleasant T10 T11 T12 T13 T14

T5 unpleasant pleasant z=-18 A C B D 1.00 0.45 0.22 0.05 -0.05 -0.22 -0.45 -1.00 unpleasant pleasant T5 z=-18 A C B D

T5 A unpleasant B pleasant z=-6 1.00 0.45 0.22 0.05 -0.05 -0.22 -0.45 -1.00 unpleasant pleasant T5 B z=-6

T5 unpleasant pleasant z=-6 C E D F 1.00 0.45 0.22 0.05 -0.05 -0.22 -0.45 -1.00 unpleasant pleasant T5 z=-6 C E D F

The Brain Image Analysis Unit (BIAU)