The General Linear Model …a talk for dummies

The General Linear Model …a talk for dummies
Holly Rossiter & Louise Croft Method for Dummies 2010

Topics covered The General Linear Model The Design Matrix Parameters
Error An fMRI example Problems with the GLM + solutions A cake recipe

Overview of SPM Image time-series Kernel Design matrix Realignment
Smoothing General linear model Gaussian field theory Statistical inference Normalisation Template p <0.05 Parameter estimates

The General Linear Model
Describes a response (y), such as the BOLD response in a voxel, in terms of all its contributing factors (xβ) in a linear combination, whilst also accounting for the contribution of error (e).

Dependent variable Describes a response (such as the BOLD response in a single voxel, taken from an fMRI scan)

Independent Variable aka. Predictor e.g. Experimental conditions (Embodies all available knowledge about experimentally controlled factors and potential confounds)

Parameters (aka regression coefficient/beta weights) Quantifies how much each predictor (X) independently influences the dependent variable (Y)  The slope of the line Y X

Error Variance in the data (y) which is not explained by the linear combination of predictors (x)

Therefore… = DV = (IV x Parameters) Error + As we take samples of a response (y) many times, this equation actually represents a matrix…

= …the GLM matrix Time BOLD signal One response… ...over time Time
Single voxel Time ...over time Time

The Design Matrix models all known predictors of y
...over time Time 1 Time 2 fMRI signal SPM always adds a constant term in the design matrix, to model the overall mean of the signal, which is arbitrary for BOLD data. A constant (overall mean of the signal)

Each predictor (x) has an expected signal time course, which contributes to y
+ 1 × 2 × 3 × X1 X2 X3 X: Predictors fMRI signal Time Y: Observed Data = + Residual residuals

The design matrix does not account for all of y
If we plot our observations (n) on a graph these will not fall in a straight line This is a result of uncontrolled influences (other than x) on y This contribution to y is called the error (or residual) Minimising the difference between the response predicted by the model (y) and the actual response (y) minimises the error of the model Y X ^

…in order to minimise the error of the model
So, we need to construct a design matrix which aims to explain as much of the variance in y as possible… 1 2 Time = 3 + 4 5 6 …in order to minimise the error of the model

Parameters (β) Beta is the slope of the regression line
Quantifies a specific predictor’s (x) contribution to y. The parameter (β) chosen for a model should minimise the error (reducing the amount of variance in y which is left unexplained) Y X

So, We have our set of hypothetical time-series: x1, x2, x3....
Generation Shadowing Baseline Measured X1 X2 X3 ≈ want to estimate (or want SPM to estimate) or model this observed time series as a linear combination of the hypothetical time series. ”Known” ....and our data

We find the best parameter values by modelling...
4 3 2 1 1 2 1 ≈ + β3* ”Unknown” parameters + β2* β1* Generation Shadowing Baseline want to estimate (or want SPM to estimate) or model this observed time series as a linear combination of the hypothetical time series. ...the best parameter will miminise the error in the model

Generation Shadowing Baseline
Here, there is a lot of residual variance in y which is unexplained by the model (error) 4 3 2 1 1 2 1 ≈ β1* + β2* + β3* Generation Shadowing Baseline Not brilliant 3

...and the same goes here 4 3 2 1 1 2 1 ≈ β1* + β2* + β3* Generation Shadowing Baseline Still not great 1 4

...but here we have a good fit, with minimal error
4 3 2 + β2* + β1* 1 β0* + β3* β1* 2 ≈ ≈ Generation Shadowing Baseline 0.83 0.16 2.98

...and the same model can fit different data – just use different parameters 3 2 1 + β2* + β1* 1 β0* + β3* β1* 2 ≈ ≈ ≈ β3* + β1* + β2* Generation Shadowing Baseline 0.68 0.82 2.17 In other words:

...as you can see Different data (y) The same predictors (x) 3 2 1 + β2* + β1* 1 β0* + β3* β1* 2 ≈ ≈ ≈ ≈ β0* + β1* + β2* Generation Shadowing Baseline 0.03 0.06 2.04 Doesn’t care: Different parameters ()

So the same model can be used across all voxels of the brain, just using different paramteres
beta_0001.img ... Time-series beta_0002.img ... beta_0003.img

Finding the optimal parameter can also be visualised in geometric space
^ The y and x in our model are all vectors of the same dimensionality and so, lie within the same large space. The design matrix (x1, x2, x3…) defines a subspace; the design space (green panel). x2 x1 Design space defined by X

Parameters determine the co-ordinates of the predicted response (y) within this space
^ y The actual data (y) however, lie outside this space. So there is always a difference between the predicted y and actual y x2 y ˆ ^ x1

We need to minimise the difference between predicted y and actual y
^ y So the GLM aims find the projection of y on the design space which minimises the error of the model (minimises the difference between predicted y and actual y) e (minimise) ^ x2 ^ y ˆ x1

The smallest error vector is orthogonal to the design space…
y …So the best parameter will position y so that the error vector between y and y is orthogonal to the design space (minimising the error) e ^ ^ x2 y ˆ x1

How do we find the parameter which produces minimal error?

The optimal parameter can be calculated using Ordinary Least Squares
A statistical method for estimating unknown parameters from sampled data - minimizing the difference between predicted data and observed data. å = N t e 1 2 = minimum

Some assumptions in GLM: Error
1. Errors are normally distributed 2. Error is the same in each & every measurement point x 3. There is no correlation between errors at different time points/data points

To re-cap…

x + = E Time BOLD signal BOLD signal time course Multiple predictors
Variance accounted for by predictors + E Unaccounted variance (error) Multiple predictors x =

An fMRI example 34

An example, using fMRI… Time Time BOLD signal
Single voxel analysed across many different time points Y = BOLD signal at each time point from that voxel Time In order to explain how GLM is applied to brain imaging data, I will use the example of a simple fMRI experiment. So obviously with fMRI we want to look at all the voxels in the brain but each GLM is just focusing on one voxel in the brain across a series of time points (left-hand picture) and the actual value of Y is the strength of the BOLD signal (at that location and at that time point) (right hand picture). Time BOLD signal 35

A simple fMRI experiment
One session Passive word listening versus rest 7 cycles of rest and listening So I’m just going to briefly explain this simple fMRI experiment which I borrowed from some SPM course slides. This is an fMRI experiment with just one session and is comparing passive word listening to rest, these are cycled 7 times. So the main question we want to answer in this experiment is whether the BOLD response changes between listening and rest and in which areas does this change occur. The graph demonstrates the strength of the BOLD response at a particular voxel (y axis) against time (x axis) and the change between listening and rest can be seen in the stimulus function, so as you can see the BOLD response is changing between these conditions. Stimulus function Question: Is there a change in the BOLD response between listening and rest? 36

General Linear Model: fMRI example
BOLD response at each time point at chosen voxel Predictors that explain the data such as listening vs rest in this example How much each predictor explains the data (Coefficient) Variance in the data that cannot be explained by the predictors (noise) This is how the GLM applies to the fMRI experiment. Y is the strength of the BOLD response at a chosen voxel at each time point. X are the different predictors that may explain the changes seen in the BOLD response, for example in this study listening vs rest. It is also important to include other predictors even if they are not experimentally interesting for example you might want to include the effect of gender, or fatigue to the response as well. The beta value for each x indicates how much that particular predictor explains the BOLD response. Then e explains all the variance that cannot be explained by your predictors so any noise leftover in the data. β = 0.44 37

Statistical Parametric Mapping
Null hypothesis is that your effect of interest explains none of your data. Looking at simple listening task, how much does listening vs rest explain the BOLD signal – is it statistically significant? From this equation B can be calculated. So for your effect of interest, in this case lets say listening as compared to rest, the null hypothesis is that it explains none of the BOLD response. So you calculate Beta and then you can determine whether this effect is statistically significant by performing all sorts of statistical tests such as t-tests or ANOVAs etc. Statistical Inference eg. P<0.05 38

Problems with this model/issues to consider...
BOLD signal is not a simple on/off Low-frequency noise Assumptions about error may be violated Physiological confounds Although this is an extremely useful and important model, there are some issues that need to be considered. Firstly the BOLD response which is not simply an on or off signal so the data needs to be adjusted in order to match the shape of the BOLD response. There is also a lot of low-frequency noise within fMRI data which may contaminate the data. GLM has many assumptions about the error, however some of these may be violated and there are also physiological confounds that need to be considered when recording and analysing fMRI data. I’ll go through each of these in turn and then explain how we can attempt to solve these problems. 39

Problem 1: Shape of BOLD response Solution: Convolution model
HRF Expected BOLD Impulses  = expected BOLD response = input function hemodynamic response function (HRF) So, the BOLD signal does not respond immediately and it responds in this shape (middle picture), but the impulses or conditions change in a 0 to 1 or on/off way (left-hand picture). The model needs to be altered to accommodate the haemodynamic response function (right-hand picture). This is termed the convolution model.

Convolution model of the BOLD response
Convolve stimulus function with a canonical hemodynamic response function (HRF): Original Convolved HRF  HRF So you alter the model from being an on-off (red column on left) to a more smoothed convolved model (green column on left). This graph (right hand side) shows the BOLD response against scan number and shows how the model can be adjusted to have a closer match with the actual HRF. 41

Problem 2: Low-frequency noise Solution: High pass filtering
blue = data black = mean + low-frequency drift green = predicted response, taking into account low-frequency drift red = predicted response, NOT taking into account low-frequency drift The second problem is that there is a lot of low-frequency noise or baseline drift in the fMRI data, this can be due to breathing. In order to remove this we can add a high-pass filter. This figure shows columns in the matrix that are very low frequency which can be modelled in the data in order to take account of this low-frequency noise. discrete cosine transform (DCT) set 42

e in time t is correlated with e in time t-1
Problem 3: Error is correlated although model assumes it is not Solution: Autoregressive model The error at each time point is correlated to the error at the previous time point It should be… t e e e in time t is correlated with e in time t-1 One of the main violations of the error assumptions with GLM is that it assumes that the error is not correlated whereas it often is in fMRI data. For the model it wants the error to be like this (left-hand graph) whereas in fact it is more like this (right-hand graph) so error at time t is correlated to error at time t-1. t It is… 43

Autoregressive Model Temporal autocorrelation: in y = Xβ + e over time
et = aet-1 + ε This autocovariance function can be calculated and then the data can be analysed taking this into account, GLM is then performed again without this correlation error. This is called ‘whitening’. Autoregressive model autocovariance function In order to correct this, SPM uses something called an autoregressive model. This equation simply shows that for an error at time t, there is an autocovariance function a which shows its correlation with error at time t-1. The error at one point is correlated to the previous time point and to a lesser degree the time point before that but it slopes off quickly. So it is possible to use this equation to calculate to what degree the error is correlated and then remove it from the GLM equation. All you basically need to understand is that: GLM assumes that the error is not correlated but it is, SPM is able to perform an ‘autoregressive model’ which is able to calculate this correlation and then remove it from the data so it is no longer a problem. 44

Problem 4: Physiological Confounds
head movements arterial pulsations breathing eye blinks (visual cortex) adaptation affects, fatigue, changes in attention to task The fourth problem is that of physiological confounds such as the subject moving their head in the scanner, eye blinks, fatigue and so on which may affect the data. Some of these can be minimised by giving clear instructions to the participant and ensuring your scan doesn’t last for too long! 45

(Predictors x Parameters...)
To recap... = Response = (Predictors x Parameters...) Error + So just a final recap of the GLM: Y is your data i.e. the BOLD response. X are the predictors that are able to explain your data. B is how much each predictor explains the data and the error is whatever is leftover. 46

Analogy: Reverse Cookery
Start with finished product and try to explain how it is made... You specify which ingredients to add (X) For each ingredient, GLM finds the quantities (β) that produce the best reproduction Then if you tried to make the cake with what you know about X and β then the error would be the difference between the original cake/data and yours! I found this in one of the previous years slides and thought it was quite a good way of explaining GLM. So they said it’s like reverse cookery. So if you imagine you start with the finished product of a cake (Y), you then work out what ingredients are in it (X your predictors), you work out how much you need of each ingredient (B) and if you tried to make the cake with that knowledge then the difference between the original and what you made would be the error!

Thanks to... Previous years MfD slides (2003-2009)
Guillaume Flandin (our expert) Slides from previous SPM courses Suz and Christian We’d just like to thank Guillaume for his help, the previous years presenters and SPM course for their slides and Suz and Christian. Thanks for listening. 48

The General Linear Model …a talk for dummies

Similar presentations

Presentation on theme: "The General Linear Model …a talk for dummies"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The General Linear Model …a talk for dummies

Similar presentations

Presentation on theme: "The General Linear Model …a talk for dummies"— Presentation transcript:

Similar presentations

About project

Feedback