Presentation on theme: "Mind Reading with fMRI Ken Norman Department of Psychology Princeton University May 1, 2007."— Presentation transcript:
Mind Reading with fMRI Ken Norman Department of Psychology Princeton University May 1, 2007
Brain Scanning Todays topic: Applying pattern classifiers to brain scanning data, to decode the information represented in a persons brain at a particular point in time This is NOT the standard approach Standard approach: Stick someone in the scanner Have them perform a cognitive task Explore which brain regions are engaged by the cognitive task
Brain Scanning If youre interested in memory retrieval: Scan people while theyre retrieving memories Scan people during a control condition Look at which brain regions respond differentially This approach has been very productive for cognitive neuroscience
Brain Scanning Alternative approach to analyzing brain scanning data: Use pattern classification algorithms, applied to distributed patterns of neural activity, to identify the neural signatures of particular thoughts and memories Once we have trained the classifier to recognize a particular thought, we can use the classifier to track the comings and goings of those thoughts over time
Motivation Why pattern classification? Reason #1: Improve the interface between fMRI and cognitive theories Cognitive neuroscientists have developed very detailed theories of how information is processed in the brain What information is represented in different brain structures? How is it represented? How is that information transformed at different stages of processing? To directly test these theories, we need a way of decoding the informational contents of the subjects brain state
Motivation Reason #2: We arent doing as good a job of data mining fMRI data as we could... We collect several GB of information from each subject There is a lot of information about subjects thoughts buried in these big data files; the challenge is how to extract this information Machine learning researchers have developed tremendously powerful algorithms for extracting meaningful regularities from large data sets These algorithms are not routinely used in fMRI data analysis…
Outline 3 minute overview of functional MRI Brief overview of existing research on fMRI pattern classification Technical challenges & machine learning issues
Brain Scanning 101
How do we image neural activity with functional MRI? Brain regions that are active use up more metabolic resources In particular, they use up more oxygen from the blood The MRI machine can be tuned to detect the difference between oxygenated and deoxygenated blood By looking at which brain areas have deoxygenated vs. oxygenated blood, we can get a sense of which brain areas are active at a particular moment
Brain Scanning 101 it takes approx. 2 seconds for the MRI machine to take a snapshot of blood flow (across the entire brain)
fMRI images Big cube, made out of a grid of little cubes – Pixel = one square in a 2D grid (picture element) – Voxel = one of the tiny little cubes in an fMRI image (like a volumetric pixel) Voxels are approx. 3 millimeters on each side Neuron size ~ 10 micrometer Each voxel reflects the aggregate activity of a very large number of neurons We arent directly measuring activity, we are measuring blood flow! Blood flow response is smeared out in time (peak response = ~6 sec after neural activity)
Patterns in the brain Key idea: Cognitive states correspond to distributed patterns of brain activity What do these patterns in the brain look like?
The Eight Categories Study (Haxby et al. 2001) Faces Cats Scissors Chairs Houses Bottles Shoes Scrambled Pictures slides courtesy of Jim Haxby
Accuracy of Category Identification Identification Accuracy ± SE Chance Overall Accuracy = 96% slides courtesy of Jim Haxby
Our Studies We set out to extend the basic pattern classification method The brain patterns from the Haxby study correspond to several minutes worth of brain activity We wanted to see if we could classify cognitive states based on single brain images (reflecting ~2 seconds worth of neural activity)
Pattern Classification Method General approach: Say that we want to be able to track the presence of two different cognitive states in the subjects brain (e.g., viewing shoes vs. bottles) using fMRI
Pattern Classification Method 1.Acquire brain data while the subject is thinking about shoes or bottles
Pattern Classification Method 1.Acquire brain data 2.Convert each functional brain volume (~ 2 seconds worth of data) into a vector that reflects the pattern of activity across voxels at that point in time. We typically do some kind of feature selection to cut down on the number of voxels
Pattern Classification Method 1.Acquire brain data 2.Generate brain patterns 3. Label brain patterns according to whether the subject was viewing shoes vs. bottles (adjusting for lag in the blood flow response)
Pattern Classification Method 1.Acquire brain data 2.Generate brain patterns 3.Label brain patterns 4.Train a classifier to discriminate between bottle patterns and shoe patterns
Simple Neural Network Classifier (Logistic Regression) To estimate how much subjects are thinking about bottles, compute a weighted sum of voxel activity values; do the same for shoes Apply decision rule (e.g., sigmoid function) To train the classifier, we use a learning algorithm that sets the weights to maximize decision performance (e.g., backpropagation) Output layer BottleShoe vs Input layer (voxels)
Pattern Classification Method 1.Acquire brain data 2.Generate brain patterns 3.Label brain patterns 4.Train the classifier 5.Apply the trained classifier to new brain patterns (not presented at training).
Free Recall & Mental Time Travel (Polyn et al., 2005) How do we selectively retrieve memories from a particular event? Intuitively: We try to recapture our mindset from that event Concretely: We try to make our brain state during recall resemble our brain state during the original event Mental Time Travel Goal of the study: Use fMRI pattern-analysis to image this process of mental time travel as it happens...
Imaging Mental Time Travel (Polyn et al., 2005) Memory experiment: Subjects study 3 types of stimuli Jack Nicholson Giza pyramids flask Recall test: Recall items from all 3 categories, in any order Hypothesis: To recall a particular category, subjects try to recapture their mindset from the study phase In concrete terms: Subjects try to make their brain state at test resemble their brain state when they were studying that category If subjects succesfully recapture their brain state from the study phase, this will trigger recall of specific studied items...
Analysis strategy Step 1: Feed fMRI data from the study phase into a pattern classification algorithm Train the pattern classifier to recognize the brain patterns associated with studying faces vs. locations vs. objects
Neural network classifier Mapping from voxel activity values to output units (one per category)
Analysis strategy Step 2: Apply the trained classifier to brain data from the retrieval phase Use the classifier to track, second-by-second, how well the subjects brain state at retrieval matches their brain state when they were studying faces vs. locations vs. objects
Predictions As subjects try recall faces, locations, and objects, their brain state should come into alignment with the brain states associated with studying faces, locations, and objects This neural measure of category-specific mental reinstatement should be predictive of recall
Final free recall - classifier output match to face study context match to location study context match to object study context Classifier traces for Subject 9 during final free recall.
Final free recall - classifier output Classifier traces for Subject 9 during final free recall. match to face study context match to location study context match to object study context
Other findings Kamitani & Tong (2005): decode the orientation of a striped pattern that is being viewed by the subject (accurate to within 20 degrees)
2006 Pittsburgh competition Subjects were scanned while they watched 3 episodes of Home Improvement Time-varying ratings obtained for amusement, food, tools, faces... Goal: predict ratings using brain data Train a classifier using brain data + ratings from 2 episodes Then, feed the trained classifier the brain data from the 3rd episode and use the classifier to predict (in a second-by- second fashion) the subjects feature ratings
Interim Summary By applying classifiers to fMRI data, we can derive a time- varying estimate of the subjects cognitive state, that relates in a meaningful way to their behavior Technical challenges
Technical Challenges From the perspective of machine learning, fMRI classification is a particularly difficult problem (Mitchell et al., 2004, Machine Learning) Big patterns Noisy patterns Relatively few patterns What can we do to improve classification?
Classifiers We have tried lots of classifiers –Neural network, correlation-based classifiers, support vector machines, Gaussian Naive Bayes, boosting, k-nearest- neighbor, linear discriminant analysis... The exact classifier that we use doesnt seem to matter (much); nonlinear classifiers do not systematically outperform linear classifiers... Regularization helps (e.g., ridge regression outperforms normal regression)
Feature Selection Getting rid of noisy voxels greatly helps performance Standard method: Run a voxel-wise omnibus ANOVA on the conditions of interest (e.g., face vs. location vs. object) Get rid of voxels that dont vary significantly across conditions
Feature Selection This ANOVA method helps, but it has several problems Main benefit of linear classifiers is that they can aggregate weak signals across voxels In light of this, it seems like a bad idea to discard individual voxels just because the voxels signal is weak...
Feature Selection What we really want to do is to come up with multivariate means of voxel selection we want to select sets of voxels that in aggregate carry useful information Promising approach: Searchlights (Kriegeskorte et al., 2006, PNAS)
Dimensionality Reduction We are also exploring different methods of re-coding the data There is extensive redundancy across voxels (esp. spatially proximal voxels) Is there a more efficient way to represent the input (i.e., with fewer dimensions) manifold learning Spatial wavelet decomposition ICA
Dimensionality reduction algorithms Generative models (David Weiss & David Blei) Each brain state is made of a linear combination of neural topics Each topic = a pattern of voxel activity across the whole brain (positive and negative values are OK) To generate a brain state from topics, multiply each topic by a positive value Topics are constrained to be spatially sparse (L2 regularization; trying L1 also)
Next steps We know a lot about the brain (in general), the fMRI response, and cognition that we are not telling the classifier… Currently: Each brain pattern is treated as a distinct observation In actuality: There is massive correlation between adjacent time points Knowing the information represented at time n tells you a lot about the information represented at time n + 1
Next steps In addition to temporal correlation, there is extensive spatial correlation Nearby voxels tend to represent similar things One way to address this issue is by spatially smoothing the data (averaging together activity from nearby voxels) However, you can lose information this way A more sophisticated approach would be to directly measure pairwise correlations between voxels and incorporate this information in the model
Next steps Currently, our analyses are focused on single subjects Is there some way to leverage data from other subjects to help with classification If you run 10 subjects in the Haxby 8-category experiment, none of the subjects will have the exact same shoe representation, but the shoe representations are not random either It might be possible to draw on data from other subjects to set priors on which voxels will be involved in representing shoes
Next steps Also, there is an enormous body of evidence relating to which brain structures are involved in a given cognitive task face area, place area We can use this information to set priors on voxel weights in the classification process
Next steps The cognitive states that we are trying to classify often have a hierarchical structure How you represent a stimulus depends on the task that you are performing Informing the classifier about this hierarchical structure should boost classification
Next steps Different tasks (dangerous/safe, land/water) have different neural signatures If we can detect the neural signatures of these tasks, we can conditionalize the classifer on which task representation is present in the subjects head
Next steps Lots of potential constraints Temporal autocorrelation Correlation between nearby voxels Data from other subjects in the same experiment Data from other experiments Hierarchical structure of cognitive states How do we inform the classifier of these constraints? Graphical models should provide a way of doing this….
Summary By applying pattern classification algorithms to neuroimaging data, we can extract a tremendous amount of information regarding what subjects are thinking, and how subjects thoughts evolve over time Plenty of room for improvement... Solving these problems will require meaningful contributions from several disciplines: Cognitive psychology, neuroscience, machine learning, engineering, signal processing, statistics, and mathematics...
Computational Memory Lab Michael Bannert Melissa Carroll Denis Chigirev Greg Detre Chris Moore Ehren Newman Joel Quamme Susan Robison Per Sederberg Matt Weber David Weiss And many others… Princeton Colleagues David Blei (Comp. Sci.) Matt Botvinick (PSY) Jon Cohen (PSY) Ingrid Daubechies (Math) Jim Haxby (PSY) Fei-Fei Li (Comp. Sci.) Dan Osherson (PSY) Peter Ramadge (EE) Rob Schapire (Comp. Sci.) Greg Stephens (Physics)
my Princeton Multi-Voxel Pattern Analysis Toolkit currently in public beta-testing: NiAM (NeuroImaging Analysis Methods) group meets Fridays 2pm