Lorelei Howard and Nick Wright MfD 2008

Slides:



Advertisements
Similar presentations
T-tests, ANOVA and regression Methods for Dummies February 1 st 2006 Jon Roiser and Predrag Petrovic.
Advertisements

T tests, ANOVAs and regression
Andrea Banino & Punit Shah. Samples vs Populations Descriptive vs Inferential William Sealy Gosset (Student) Distributions, probabilities and P-values.
Lesson 10: Linear Regression and Correlation
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
The General Linear Model Or, What the Hell’s Going on During Estimation?
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Objectives (BPS chapter 24)
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
ASSESSING THE STRENGTH OF THE REGRESSION MODEL. Assessing the Model’s Strength Although the best straight line through a set of points may have been found.
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
REGRESSION AND CORRELATION
SIMPLE LINEAR REGRESSION
Today Concepts underlying inferential statistics
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
Relationships Among Variables
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Lecture 5 Correlation and Regression
Correlation and Regression
Example of Simple and Multiple Regression
Lecture 16 Correlation and Coefficient of Correlation
Lecture 15 Basics of Regression Analysis
February  Study & Abstract StudyAbstract  Graphic presentation of data. Graphic presentation of data.  Statistical Analyses Statistical Analyses.
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Introduction to Linear Regression and Correlation Analysis
Chapter 15 Correlation and Regression
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Introduction to Linear Regression
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Environmental Modeling Basic Testing Methods - Statistics III.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Chapter 13 Understanding research results: statistical inference.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Stats Methods at IC Lecture 3: Regression.
Correlation and Simple Linear Regression
Chapter 11 Simple Regression
12 Inferential Analysis.
Correlation and Simple Linear Regression
Correlation and Regression
CHAPTER 29: Multiple Regression*
Correlation and Regression
Correlation and Simple Linear Regression
Correlation and Regression
12 Inferential Analysis.
Simple Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Lorelei Howard and Nick Wright MfD 2008 t-tests, ANOVA and regression - and their application to the statistical analysis of fMRI data Lorelei Howard and Nick Wright MfD 2008 Hello, my name is Lorelei. I’m in the first year of my PhD. Nick, who you met last week, and myself will be explaining various statistical tests and their application to the analysis of fMRI data. In the first part of this session I will be talking to you about…(next slide)

Overview Why do we need statistics? P values T-tests ANOVA …t-tests and ANOVAs. Prior to this I will be discussing some basic principles in statistical testing. This is mainly for the benefit of those of you who have not had much experience using statistics I will start by explaining why we need statistics in science For those of you who are familiar with this material, I apologise and ask you to bear with me I will then discuss the use of p values and how these enable us to quantify our certainty in the generalisabilty of our results

Why do we need statistics? To enable us to test experimental hypotheses H0 = null hypothesis H1 = experimental hypothesis In terms of fMRI Null = no difference in brain activation between these 2 conditions Exp = there is a difference in brain activation between these 2 conditions In each experiment there are two hypotheses… The null hypothesis, denoted as H0 And the experimental hypothesis, denoted as H1 Alternatively the exp hypothesis can be referred to as alternative or research hypothesis. This hypothesis makes a prediction that must be falsifiable For example… We may wish to investigate whether the presentation of famous names elicits more activation in the fusiform gyrus than the presentation of the names of unknown people. In such a study the null hypothesis would be that… And the experimental hypothesis would be that… So…How do we establish which of theses two possible outcomes this is true?

2 types of statistics Descriptive Stats Inferential statistics e.g., mean and standard deviation (S.D) Inferential statistics t-tests, ANOVAs and regression There are 2 types of statistics that can help us to determine whether our data support our exp hypothesis - Descriptive Stats Values such as the mean and the S.D allow us to summarise our data sets Returning to our example… for each voxel… We could use the mean BOLD signal for the two conditions to describe any differences between the 2 conditions Say that we were examining the activity in a voxel from the fusiform gyrus and we found the following… Mean BOLD activity for condition 1 (famous names) = 500 units Whereas the Mean BOLD activity for condition 2 (unknown names) = 498 units We can see that these 2 numbers differ numerically But we cannot comment on whether this difference is meaningful - Inferential statistics Include tests such as t-tests, ANOVAs and regression Again, using the example from above… A T-test could be used to determine whether the difference between these 2 conditions is statistically significant Inferential stats also allow us to comment on the likelihood that the effects found in the current dataset are genuine Linking sentence needed..

Issues when making inferences It is important to acknowledge that the data collected represent only a single sample from a much larger population It is therefore possible that if a different sample were used then different results could have been obtained

So how do we know whether the effect observed in our sample was genuine? We don’t Instead we use p values to indicate our level of certainty that our results represent a genuine effect present in the whole population Q Explain that by genuine we mean that the observed effect was caused by a true effect present within the whole population A

P values P values = the probability that the observed result was obtained by chance i.e. when the null hypothesis is true α level is set a priori (Usually 0.05) If p < α level then we reject the null hypothesis and accept the experimental hypothesis 95% certain that our experimental effect is genuine If however, p > α level then we reject the experimental hypothesis and accept the null hypothesis P = Probability this value tells us the probability that the observed result was obtained by chance That there is no difference between the two groups Each test result (e.g. t value) is associated with a particular p value α level is set a priori This is basically an acceptance level Usually this is set to 0.05 But as I understand, α levels are usually much lower than this in fMRI If p < α level then we reject the null hypothesis and accept the experimental hypothesis - concluding that we are 95% certain that our experimental effect is genuine If however, p > α level then we reject the experimental hypothesis and accept the null hypothesis - that there was no sig diff in brain activation levels between the two conditions

Two types of errors Type I error = false positive α level of 0.05 means that there is 5% risk that a type I error will be encountered Type II error = false negative Beware of errors We must be aware that errors can occur during this process Two types of errors - type I error = false positive Where we incorrectly reject the null hypothesis. The pre-determined α level determines the risk of this type of error. α level of 0.05 means that there is 5% risk that a type I error will be encountered. The other type of error is… - type II error = false negative Where we incorrectly reject the exp hypothesis

t-tests Compare two group means The type of statistical tests used depends on your hypotheses. If you would like to compare differences between the means of two groups of data then a t-test would be appropriate as it compares two group means in context of their variability I think the use of t-tests can be best explained with an example, here I will be using a hypothetical experiment to aid explanation…

Hypothetical experiment Time Explain diagram in full Timeline of exp view faces of cartoon characters Research Q…. Two conditions H0 = no diff H1 = diff Q – does viewing pictures of the Simpson and the Griffin family activate the same brain regions? Condition 1 = Simpson family faces Condition 2 = Griffin family faces

Calculating T Difference between the means divided by the pooled standard error of the mean Group 1 Group 2 Predrag – To calculate the t score, Take mean from group 1 (x bar) Then the mean from group 2 Find the difference between these two And divide this by their shared standard error Calculate this by… Taking the variance for group 1 and dividing it by the sample size for this group Do the same for group 2 and add together. Then finally take the square root of this and put the resulting value back into the original calculation to get your t value.

How do we apply this to fMRI data analysis? If we just return to our hypothetical experiment…

Time NB: for each voxel… Calculate the mean and variance for the activation levels associated with the Simpsons family members And then for the Griffin family Calculate the difference between the means Calculate their shared S.E Compute T Once you have your T value you then need to work out your degrees of freedom…

Degrees of freedom = number of unconstrained data points Which in this case = number of data points – 1. Can use t value and df to find the associated p value Then compare to the α level

Different types of t-test 2 sample t tests Related = two samples related, i.e. same people in both conditions Independent = two independent samples, i.e. diff people in 2 conditions One sample t tests compare the mean of one sample to a given value 2 sample t tests Like e.g. above. These may be related or independent Related = two samples related, i.e. same people in both conditions Independent = two independent samples, i.e. diff people in 2 conditions Q – does fMRI always use related??? Also there are… One sample t tests - These compare the mean of one sample to a given value (e.g. 0)

Another approach to group differences Analysis Of VAriance (ANOVA) Variances not means Multiple groups e.g. Different facial expressions H0 = no differences between groups H1 = differences between groups ANOVA uses variances instead of means to compare groups Unlike t-tests where only 2 groups can be compared, ANOVA can compare across 3, 4, 5 groups or more For example, do different facial expression elicit different neuronal activity? Could compare the neural activity associated with happy, sad, and neutral faces

Calculating F F = the between group variance divided by the within group variance the model variance/error variance for F to be significant the between group variance should be considerably larger than the within group variance F = statistic computed in ANOVA This can be defined as… the between group variance divided by the within group variance Or sometimes referred to as the model variance/error variance If = 1 then means that the between and the within group variances are equal So for F to be sig it must be noticeably larger than one, i.e. the between group variance should be considerably larger than the within group variance Again this value and it’s associated degrees of freedom are used to find the p value

What can be concluded from a significant ANOVA? There is a significant difference between the groups NOT where this difference lies Finding exactly where the differences lie requires further statistical analyses

Different types of ANOVA One-way ANOVA One factor with more than 2 levels Factorial ANOVAs More than 1 factor Mixed design ANOVAs Some factors independent, others related The type that I have described is referred to as a one-way ANOVA because it has one factor which = cartoon characters (with more than 2 levels) Can also have two-way, three-way ANOVAs These = factorial ANOVAs Allow for possible interactions between factors as well as main effects For example you could have 2 factors with 2 levels each This would = a 2 x 2 factorial design Can also have related or independent designs or a mixture

Conclusions T-tests assess if two group means differ significantly Can compare two samples or one sample to a given value ANOVAs compare more than two groups or more complicated scenarios They use variances instead of means

Further reading Acknowledgements Howell. Statistical methods for psychologists Howitt and Cramer. An introduction to statistics in psychology Huettel. Functional magnetic resonance imaging (especially chapter 12) Acknowledgements MfD Slides 2005 – 2007

PART 2 Correlation Regression Relevance to GLM and SPM

Correlation Strength and direction of the relationship between variables Scattergrams Y X Positive correlation Negative correlation No correlation

Describe correlation: covariance A statistic representing the degree to which 2 variables vary together Covariance formula cf. variance formula but… the absolute value of cov(x,y) is also a function of the standard deviations of x and y.

Describe correlation: Pearson correlation coefficient (r) Equation r = -1 (max. negative correlation); r = 0 (no constant relationship); r = 1 (max. positive correlation) Limitations: Sensitive to extreme values, e.g. r is an estimate from the sample, but does it represent the population parameter? Relationship not a prediction. s = st dev of sample

Summary Correlation Regression Relevance to SPM

Regression Regression: Prediction of one variable from knowledge of one or more other variables. Regression v. correlation: Regression allows you to predict one variable from the other (not just say if there is an association). Linear regression aims to fit a straight line to data that for any value of x gives the best prediction of y.

Best fit line, minimising sum of squared errors Describing the line as in GCSE maths: y = m x + c Here, ŷ = bx + a ŷ : predicted value of y b: slope of regression line a: intercept ŷ = bx + a ε ε = residual = y i , observed = ŷ, predicted Residual error (ε): Difference between obtained and predicted values of y (i.e. y- ŷ). Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (y- ŷ)2

How to minimise SSerror Minimise (y- ŷ)2 , which is (y-bx+a)2 Plotting SSerror for each possible regression line gives a parabola. Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus. Take partial derivatives of (y-bx-a)2 and solve for 0 as simultaneous equations, giving: Values of a and b Sums of squared error (SSerror) Gradient = 0 min SSerror

How good is the model? sy2 = sŷ2 + ser2 r2 = sŷ2 / sy2 We can calculate the regression line for any data, but how well does it fit the data? Total variance = predicted variance + error variance sy2 = sŷ2 + ser2 Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model r2 = sŷ2 / sy2 Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get: ser2 = sy2 (1 – r2) From this we can see that the greater the correlation the smaller the error variance, so the better our prediction

Is the model significant? i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean? F-statistic: complicated rearranging F (dfŷ,dfer) = sŷ2 ser2 r2 (n - 2)2 1 – r2 =......= And it follows that: r (n - 2) So all we need to know are r and n ! t(n-2) = √1 – r2

Summary Correlation Regression Relevance to SPM

General Linear Model Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept. y = bx + a +ε A General Linear Model is just any model that describes the data in terms of a straight line

= + Y = X × b + e data vector (Voxel) design matrix parameters One voxel: The GLM Our aim: Solve equation for β – tells us how much BOLD signal is explained by X data vector (Voxel) design matrix parameters error vector a m b3 b4 b5 b6 b7 b8 b9 = + Y = X × b + e

Multiple regression Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y The different x variables are combined in a linear way and each has its own regression coefficient: y = b0 + b1x1+ b2x2 +…..+ bnxn + ε The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y. i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for

SPM Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y Multiple Regression models the effect of several independent variables, x1, x2 etc, on one dependent variable, y Both are types of General Linear Model This is what SPM does and will be explained soon…

Summary Correlation Regression Relevance to SPM Thanks!