1 Hidden Process Models Rebecca Hutchinson Joint work with Tom Mitchell and Indra Rustandi.

Slides:

Advertisements

Similar presentations

Basis Functions. What’s a basis ? Can be used to describe any point in space. e.g. the common Euclidian basis (x, y, z) forms a basis according to which.

Advertisements

Linear Regression.

Statistical Signal Processing for fMRI

The General Linear Model Or, What the Hell’s Going on During Estimation?

Supervised Learning Recap

Visual Recognition Tutorial

Overview Full Bayesian Learning MAP learning

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Assuming normally distributed data! Naïve Bayes Classifier.

FMRI - What Is It? Then: Example of fMRI in Face Processing Psychology 355: Cognitive Psychology Instructor: John Miyamoto 04/06 /2015: Lecture 02-1 This.

Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.

1 Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks Thesis Committee: Tom Mitchell (Chair) John Lafferty Andrew Moore Bharat Rao.

Hidden Process Models with applications to fMRI data Rebecca Hutchinson Oregon State University Joint work with Tom M. Mitchell Carnegie Mellon University.

Evaluating Hypotheses

Hidden Process Models: Decoding Overlapping Cognitive States with Unknown Timing Rebecca A. Hutchinson Tom M. Mitchell Carnegie Mellon University NIPS.

1 Learning fMRI-Based Classifiers for Cognitive States Stefan Niculescu Carnegie Mellon University April, 2003 Our Group: Tom Mitchell, Luis Barrios, Rebecca.

1 Classifying Instantaneous Cognitive States from fMRI Data Tom Mitchell, Rebecca Hutchinson, Marcel Just, Stefan Niculescu, Francisco Pereira, Xuerui.

Hidden Process Models Rebecca Hutchinson Tom M. Mitchell Indrayana Rustandi October 4, 2006 Women in Machine Learning Workshop Carnegie Mellon University.

1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,

Hidden Process Models Rebecca Hutchinson May 26, 2006 Thesis Proposal Carnegie Mellon University Computer Science Department.

Hidden Process Models for Analyzing fMRI Data Rebecca Hutchinson Joint work with Tom Mitchell May 11, 2007 Student Seminar Series In partial fulfillment.

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

Tracking with Linear Dynamic Models. Introduction Tracking is the problem of generating an inference about the motion of an object given a sequence of.

Hidden Process Models with Applications to fMRI Data Rebecca A. Hutchinson March 24, 2010 Biostatistics and Biomathematics Seminar Fred Hutchinson Cancer.

Modeling fMRI data generated by overlapping cognitive processes with unknown onsets using Hidden Process Models Rebecca A. Hutchinson (1) Tom M. Mitchell.

Learning to Identify Overlapping and Hidden Cognitive Processes from fMRI Data Rebecca Hutchinson, Tom Mitchell, Indra Rustandi Carnegie Mellon University.

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE NEURAL NETWORKS RESEACH CENTRE Variability of Independent Components.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Contrasts (a revision of t and F contrasts by a very dummyish Martha) & Basis Functions (by a much less dummyish Iroise!)

Naive Bayes Classifier

Brain Mapping Unit The General Linear Model A Basic Introduction Roger Tait

Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.

FINSIG'05 25/8/2005 1Eini Niskanen, Dept. of Applied Physics, University of Kuopio Principal Component Regression Approach for Functional Connectivity.

1 Preliminary Experiments: Learning Virtual Sensors Machine learning approach: train classifiers –fMRI(t, t+  )  CognitiveState Fixed set of possible.

ALIP: Automatic Linguistic Indexing of Pictures Jia Li The Pennsylvania State University.

Machine Learning for Analyzing Brain Activity Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 2006 Collaborators: Rebecca.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Bayesian Modelling of Functional Imaging Data Will Penny The Wellcome Department of Imaging Neuroscience, UCL http//:

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.

MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,

BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators:

C O R P O R A T E T E C H N O L O G Y Information & Communications Neural Computation Machine Learning Methods on functional MRI Data Siemens AG Corporate.

1 Modeling the fMRI signal via Hierarchical Clustered Hidden Process Models Stefan Niculescu, Tom Mitchell, R. Bharat Rao Siemens Medical Solutions Carnegie.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

Conditional Markov Models: MaxEnt Tagging and MEMMs

Machine Learning 5. Parametric Methods.

Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston Ch. 5 Bayesian Treatment of Neuroimaging Data Will Penny and Karl Friston 18.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.

Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.

Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.

Deep Feedforward Networks

LECTURE 11: Advanced Discriminant Analysis

LECTURE 33: STATISTICAL SIGNIFICANCE AND CONFIDENCE (CONT.)

Data Mining Lecture 11.

Hidden Process Models with applications to fMRI data

School of Computer Science, Carnegie Mellon University

Learning Theory Reza Shadmehr

Hierarchical Models and

Parametric Methods Berlin Chen, 2005 References:

Probabilistic Modelling of Brain Imaging Data

Mathematical Foundations of BME

Will Penny Wellcome Trust Centre for Neuroimaging,

Presentation transcript:

1 Hidden Process Models Rebecca Hutchinson Joint work with Tom Mitchell and Indra Rustandi

2 Talk Outline fMRI (functional Magnetic Resonance Imaging) data Prior work on analyzing fMRI data HPMs (Hidden Process Models) Preliminary results HPMs and BodyMedia

3 functional MRI

4 fMRI Basics Safe and non-invasive Temporal resolution ~ 1 3D image every second Spatial resolution ~ 1 mm –Voxels: 3mm x 3mm x 3-5mm Measures the BOLD response: Blood Oxygen Level Dependent –Indirect indicator of neural activity

5 The BOLD response Ratio of deoxy-hemoglobin to oxy- hemoglobin (different magnetic properties). Also called hemodynamic response function (HRF). Common working assumption: responses sum linearly.

6 More on BOLD response Signal Amplitude Time (seconds) At left is a typical BOLD response to a brief stimulation. (Here, subject reads a word, decides whether it is a noun or verb, and pushes a button in less than 1 second.)

7

8 Lots of features! 10,000-15,000 voxels per image …

9 Study: Pictures and Sentences 13 normal subjects. 40 trials per subject. Sentences and pictures describe 3 symbols: *, +, and $, using ‘above’, ‘below’, ‘not above’, ‘not below’. Images are acquired every 0.5 seconds. Read Sentence View PictureRead Sentence View PictureFixation Press Button 4 sec.8 sec.t=0 Rest

10 The star is not below the plus.

11

*

13.

14 fMRI Summary High-dimensional time series data. Considerable noise on the data. Typically small number of examples (trials) compared with features (voxels). BOLD responses sum linearly.

15 Talk Outline fMRI (functional Magnetic Resonance Imaging) data Prior work on analyzing fMRI data HPMs (Hidden Process Models) Preliminary results HPMs and BodyMedia

16 It’s not hopeless! Learning setting is tough, but we can do it! Feature selection is key. Learn fMRI(t,t+8)->{Picture,Sentence} Read Sentence View PictureRead Sentence View PictureFixation Press Button 4 sec.8 sec.t=0 Rest

17 Results ABCDEFG 71.3%91.2%76.2%96.3%85.0%66.2%71.3% Gaussian Naïve Bayes Classifier. 95% confidence intervals per subject are +/- 10%-15%. Accuracy of default classifier is 50%. Feature selection: Top 240 most active voxels in brain. Subject: Accuracy: HIJKLMAvg. 95.0%81.2%90.0%85.0%65.0%90.0%81.8% Subject: Accuracy:

18 Why is this interesting? Cognitive architectures like ACT-R and 4CAPS predict cognitive processes involved in tasks, along with cortical regions associated with the processes. Machine learning can contribute to these architectures by linking their predictions to empirical fMRI data.

19 Other Successes We can distinguish between 12 semantic categories of words (e.g. tools vs. buildings). We can train classifiers across multiple subjects.

20 What can’t we do? Take into account that the responses for Picture and Sentence overlap. What does the response for Decide look like and when does it start? Read Sentence View PictureRead Sentence View PictureFixation Press Button 4 sec.8 sec.t=0 Rest

21 Talk Outline fMRI (functional Magnetic Resonance Imaging) data Prior work on analyzing fMRI data HPMs (Hidden Process Models) Preliminary results HPMs and BodyMedia

22 Motivation Overlapping processes –The responses to Picture and Sentence could overlap in space and/or time. Hidden processes –Decide does not directly correspond to the known stimuli. Move to a temporal model.

23 Hidden Markov Models? Can’t do overlapping processes – states are mutually exclusive. Markov assumption: given state t-1, state t is independent of everything before t-1. BOLD response: Not Markov! t-1tt+1t+2 CogProc {Picture, Sentence, Decide} fMRI

24 factorial HMMs? Have more flexibility than we need. –Picture state sequence should not be { …} Still have Markov assumption problem. t-1tt+1t+2 Picture = {0,1} Sentence = {0,1} Decide = {0,1} fMRI

25 Hidden Process Models Process ID = 3 Process ID = 2 Process Instances: Observed fMRI: cortical region 1: cortical region 2: Decide whether consistent View picture Processes: Name: Read sentence Process ID: 1 Response: Name: View Picture Process ID: 2 Response: Name: Decide whether consistent Process ID: 3 Response: Process ID = 1

26 HPM Parameters Set of processes, each of which has: –a process ID –a maximum response duration R –emission weights for each voxel v [W(v,1),…,W(v,t),…,W(v,R)] –a multinomial distribution over possible start times within a trial [  1,…,  t,…,  T ] Set of standard deviations – one for each voxel  1,…,  v,...,  V ] 

27 Interpreting data with HPMs Data Interpretation (int) –Set of process instances, each of which has: a process ID a start time S To predict fMRI data using an HPM and int: –For each active process, add the response associated with its processID to the prediction.

28 Synthetic Data Example Process 1:Process 2:Process 3: Process responses: Process instances: Predicted data ProcessID=1, S=1 ProcessID=2, S=17 ProcessID=3, S=21

29 Our Assumptions Processes, not states. –One hidden variable – process start time. Known number of processes in the model. –e.g. Picture, Sentence, Decide – 3 processes Known number of instantiations of those processes. –e.g. numTrials*3 processes Each process has a unique signature. Contributions of overlapping processes to the same output variable sum linearly.

30 The generative model Together HPM and interpretation (int) define a probability distribution over sequences of fMRI images: where P(y v,t |hpm,int) = N(  v,t,  v )  v,t =  W i.procID (v,t – start(i)) i  active process instances

31 Inference Given: –An HPM –A set of data interpretations (int) of processIDs and start times –Priors over the interpretations P(int=i|Y)  P(Y|int=i)P(int=i) Choose the interpretation i with the highest probability.

32 Synthetic Data Example Interpretation 1: Observed data ProcessID=1, S=1 ProcessID=2, S=17 ProcessID=3, S=21 Interpretation 2: ProcessID=2, S=1 ProcessID=1, S=17 ProcessID=3, S=23 Prediction 1 Prediction 2

33 Learning the Model EM (Expectation-Maximization) algorithm E-step –Estimate a conditional distribution over the start times of the process instances given the observed data, P(S|fMRI). M-step –Use the distribution from the E step to get maximum-likelihood estimates of the HPM parameters { , W,  }.

34 More on the E-step The start times of the process instances are not necessarily conditionally independent given the data. –Must consider joint configurations. –With no constraints, T nInstances configurations. – configurations for typical experiment. Can we consider a smaller set of start time configurations?

35 Reducing complexity Prior knowledge –Landmarks Events with known timing that “trigger” processes. One per process instance. –Offsets The interval of possible delays from a landmark to a process instance onset. One vector of n offsets per process. Conditional independencies –Introduced when no process instance could be active.

36 Before Prior Knowledge Decide whether consistent Read sentence View picture Cognitive processes: Observed fMRI: cortical region 1: cortical region 2:

37 Decide whether consistent Read sentence View picture Cognitive processes: Observed fMRI: cortical region 1: cortical region 2: Landmarks: (Stimuli) Sentence Presentation Picture Presentation Sentence offsets = {0,1} Picture offsets = {0,1} Decide offsets = {0,1,2,3} Landmarks go to process instances. Offset values are determined by process IDs. Prior Knowledge

38 Conditional Independencies Decide whether consistent Read sentence View picture Observed fMRI: cortical region 1: cortical region 2: Landmarks: (Stimuli) Sentence Presentation Picture Presentation Sentence offsets = {0,1} Picture offsets = {0,1} Decide offsets = {0,1,2,3} Decide whether consistent Read sentence View picture Sentence Presentation Picture Presentation Sentence offsets = {0,1} Picture offsets = {0,1} Decide offsets = {0,1,2,3} HERE

39 More on the M-step Weighted least squares procedure –exact, but may become intractable for large problems –weights are the probabilities computed in the E-step Gradient ascent procedure –approximate, but may be necessary when exact method is intractable –derivatives of the expected log likelihood of the data with respect to the parameters

40 Talk Outline fMRI (functional Magnetic Resonance Imaging) data Prior work on analyzing fMRI data HPMs (Hidden Process Models) Preliminary results HPMs and BodyMedia

41 View Picture Or Read Sentence Or View Picture Fixation Press Button 4 sec.8 sec.t=0 Rest picture or sentence? 16 sec. GNB: picture or sentence? HPM: Preliminary Results

42 GNB vs. HPM Classification GNB: non-overlapping processes HPM: simultaneous classification of multiple overlapping processes Average improvement of 15% in classification error using HPM vs GNB E.g., for one subject –GNB classification error: 0.14 –HPM classification error: 0.09

43 trial 25 Learned models Comprehend sentence Comprehend picture

44 Model selection experiments Model with 2 or 3 cognitive processes? –How would we know ground truth? –Cross validated data likelihood P(testData | HPM) Better with 3 processes than 2 –Cross validated classification accuracy Better with 3 processes than 2

45 Current work and challenges Add temporal and/or spatial smoothness constraints. Feature selection for HPMs. Process libraries, hierarchies. Process parameters (e.g. sentence negated or not). Model process interactions. Scaling parameters for response amplitudes to model habituation effects.

46 Talk Outline fMRI (functional Magnetic Resonance Imaging) data Prior work on analyzing fMRI data HPMs (Hidden Process Models) Preliminary results HPMs and BodyMedia

47 One idea… Sensor 1: Sensor 2: Processes: Name: Riding bus Process ID: 1 Response: Name: Eating Process ID: 2 Response: Name: Walking consistent Process ID: 3 Response: Process instances: Observed data: ProcessID=3 ProcessID=2 ProcessID=1

48 Some questions What processes are interesting? What granularity/duration would processes have? What would landmarks be? Variable process durations needed? Better way to parameterize process signatures?