Presentation on theme: "Z, t, and F tests Making inferences from experimental sample to population using statistical tests."— Presentation transcript:
1 z, t, and F testsMaking inferences from experimental sample to population using statistical tests
2 CONTENTS OF TALKDistributions and probability – what is a statistical test?The normal distributionInferences from sample to population: hypothesis testingCentral limit theoremz tests and examplet tests and exampleF tests/ANOVA and example…. and finally….Statistical tests and SPM
3 Distribution & Probability If we know something about the distribution of events, we know something about the probability that one of these events is likely to occur.e.g. I know that 75% of people have brown eyes, therefore there is a probability of .75 that the next person I meet will have brown eyes.We can use information about distributions to decide how probable it is that the results of an experiment looking at variable x support a particular hypothesis about the distribution of variable x in the population.= central aim of experimental scienceThis is how statistical tests work: test a sample distribution (our experimental results) against a hypothesised distribution, resulting in a ‘p’ value for how likely it is that we would obtain our results under the null hypothesis (null hypothesis = there is no effect or difference between conditions) – i.e. how likely it is that our results were a fluke!e.g. in an experiment I measure RT in two different conditions and find a difference between conditions. I want to know whether my data can statistically support the hypothesis that there is a genuine difference in RT between these two conditions in the population as a whole, i.e. that the data come from a population where the means of the two conditions are different. The null hypothesis is therefore that the means are the same, and I want a probability of less than .05 of getting the results we obtained under the null hypothesisA statistical test allows me to ‘test’ how likely it is that the sample datacome from a parent population with a particular characteristic
4 The Normal Distribution Many continuous variables follow a normal distribution, and it plays a special role in the statistical tests we are interested in;68% of dist.1 s.d.P(x)xThe x-axis represents the values of a particular variableThe y-axis represents the proportion of members of the population that have each value of the variableThe area under the curve represents probability – e.g. area under the curve between two values on the x-axis represents the probability of an individual having a value in that rangeMean and standard deviation tell you the basic features of a distributionmean = average value of all members of the groupstandard deviation = a measure of how much the values of individual members vary in relation to the meanThe normal distribution is symmetrical about the mean68% of the normal distribution lies within 1 s.d. of the mean
5 Sample to Population Testing Hypotheses t,z, and F tests mathematically comparethe distribution of an experimental sample – i.e. the mean and standard deviation of your resultsa normal distribution whose parameters represent some hypothesised feature of the population, which you think your results supporttoHow does this work? (without going through the derivation of the equations…!)…CENTRAL LIMIT THEOREM
6 Central Limit TheoremSpecial feature of normal distribution which underlies its use in statistical tests…Take k samples from a population, and calculate the mean of each sample. The distribution of those means will approximate a normal distribution (for certain variable types). As k tends to infinity, the distribution of sample means tends to a normal distributionBecause the means of samples tend towards a normal distribution in this way, we can convert the mean of our sample distribution (the experimental results) into a value from a standardised normal distribution.A z-test achieves this conversion by performing a linear transformation – the equation is given on the next slideThis can be thought of as expressing your results and your hypothesis in the same ‘units’.so the z-statistic represents a value on the x-axis of the standard distribution, for which we know all the p-valuesP( )68% of dist.1 s.d.1 s.d.msample meanpopulation mean
7 z-tests: What are they? formula: Plug in the values, and get a ‘z-value’ which corresponds to a location on the x-axis of a standardised normal distribution (=0, =1)For the standardised normal distribution we know the probability of any particular value coming from it (area under the curve)this is what you read off from a table of z-valuesBecause we are dealing with the probabilities of hypotheses about our sample, there is always a chance you are wrong…. Choosing the significance level represents how big you want this chance to be…P<.05 = a 5% chance that you would obtain your result under the null hypothesis (Type 1 error)
8 z-tests: Worked Example Battery of psychological tests to judge IQ from which we have obtained distribution:Mean = 50S.D. = 10Represents disrtibution of entire populationWe would like to find out probability of various scores, for ex. Which are those scores that are so high they can only be obtained by 10% of the populationNeed to transform the distribution to a STANDARD NORMAL DISTRIBUTION:Thus we now have a z distribution z=X-m = X-50sNo change in the data since new distribution has same shape + observations stand in same relation to each other (same as converting inches to centimeters) – we have performed a LINEAR TRANSFORMATIONNow, a score that was 60 is 1, i.e. the score is 1 S.D. above the meanA z score represents the number of S.D. that observation Xi is above or below the mean.
9 t-tests: Testing Hypotheses About Means ()√nformula:n = size of sampleFor a z-test you need to know the population mean and s.d. Often you don’t know the s.d. of the hypothesised or comparison population, and so you use a t-test. This uses the sample s.d. instead.This introduces a source of error, which decreases as your sample size increasesTherefore, the t statistic is distributed differently depending on the size of the sample, like a family of normal curves. The degrees of freedom (d.f. = sample size – 1) represents which of these curves you are relating your t-value to. There are different tables of p-values for different degrees of freedom.larger sample = more ‘squashed’ t-statistic distribution = easier to get significanceKinds of t-tests (formula is slightly different for these different kinds):Single-sample: tests whether a sample mean is significantly different from 0Independent-samples: tests the relationship between two independent populationsPaired-samples: tests the relationship between two linked populations, for example means obtained in two conditions by a single group of participants
10 t-tests: Worked Example of Single Sample t-test We know that finger tapping speed in normal population:Mean=100ms per tapFinger tapping speed in 8 subjects with caffeine addiction:Mean = 89.4msStandard deviation = 20msDoes this prove that caffeine addiction has an effect on tapping speed?Null Hypothesis H0: tapping speed not faster after caffeinePreselected significance level was 0.05Calculate from t value, for ex. T(7)= √8 ( ) = -1.520Find area below t(7) = -1.5, get 0.07: i.e. 7% of the time we would expect a score as low as thisThis value is above 0.05 => We could NOT reject H0!We can’t conclude that caffeine addiction has an effect on tapping speed
11 F-tests / ANOVAs: What are they? ANOVA = analysis of varianceinvolves calculating an F value whose significance is tested (similarly to a z or t value)Like t-tests, F-tests deal with differences between or among sample means, but with any number of means (each mean corresponding to a ‘factor’)Q/ do k means differ? A/ yes, if the F value is significantQ/ how do the k factors influence each other? A/ look at the interaction effectsANOVA calculates F values by comparing the variability between two conditions with the variability within each condition (this is what the formula does)e.g. we give a drug that we believe will improve memory to a group of people and give a placebo to another group. We then take dependent measures of their memory performance, e.g. mean number of words recalled from memorised lists.An ANOVA compares the variability that we observe between the two conditions to the variability observed within each condition. Variability is measured as the sum of the difference of each score from the mean.Thus, when the variability that we predict (between the two groups) is much greater than the variability we don't predict (within each group) then we will conclude that our treatments produce different results.
12 F-tests / ANOVAs: What are they? ANOVA calculates an F value, which has a distribution related to the sample size and number of conditions (degrees of freedom)The formula compares the variance between and within conditions or ‘factors’ as discussed above – we won’t worry about the derivation! (n.b. MS = mean squares)If the F statistic is significant, this tells us that the means of the factors differ significantly=> are not likely to have come from the same ‘population’ = our variable is having an effectWhen can we use ANOVAs?The formula is based on a model of what contributes to the value of any particular data point, and how the variance in the data is composed. This model makes a number of assumptions that must be met in order to allow us to use ANOVAhomogeneity of variancenormalityindependence of observationsRemember: when you get a significant F value, this just tells you that there is a significant difference somewhere between the means of the factors in the ANOVA. Therefore, you often need to do planned or post-hoc comparisons in order to test more specific hypotheses and probe interaction effects
13 ANOVAs: Worked Example Testing Differences between independent sample means: Following rTMS over the Right Parietal cortex, are the incorrectly cued trials in a cued RT task slowed down compared to the correctly cued trials?“Repeated measures” ANOVA:1 group of 14 healthy volunteersPerform 100 trials pre- and 100 trials post- stimulationReal vs Sham rTMS on two separate daysWithin-session factors:Correct vs Incorrect trialsPre vs PostBetween-session factors:Real vs Sham rTMSNull Hypothesis H0: there is no difference in the RTs of incorrectly cued trialsMany possibilities if H0 is rejected:All means are different from each other: meanICpreR vs. meanICpostR vs. meanICpreS vs. meanICpostSMeans in the Real condition are different from means in the ShamInteraction of means might be different (pre_post in Real diff. pre_post in Sham)
14 Why do we care? Statistical tests in SPM Example in a simple block design of the effect of a drug on right hand movement versus rest:Factorial Design: 2x2DRUGRealPlacebofMRI:Acquired8 measurements,2 of each conditionmoveSubjects:12 healthy volunteersCounterbalanced orderrest
15 Why do we care? Statistical tests in SPM We perform ANOVAs, t-tests, and f-tests when we create a design matrix and specify contrastsReminder: GLM equation to explain our data yy = X b + eX is the design matrix: enter this into SPM to tell program how to divide up the imaging data into the different conditionsEach element in the matrix represents one conditionX b + e = yb are the regressors: Allocate regressors specific values to test specific hypotheses (i.e. CONTRASTS) between conditionse = errorIn this case:Y= (b1x1+b2x2+b3x3+b4x4)+eColumn1 = right movement with drugColumn2 = rest with drugColumn3 = right movement with placeboColumn4 = rest with placebo1234
16 Why do we care? t-tests in SPM A t-contrast is a linear combination of parameters: c’ x bIf we think that 1 regressor in our design matrix (e.g. b1) could lead to an interesting activation, we compute:1xb1+0xb2+0xb3+0xb4 and divide by SDOur question: Is the mean activity in condition 1 significantly different from the mean activity in all other conditions?
17 Why do we care? t-tests in SPM In SPM, we make the weights sum to 0 when testing specific hypothesesT-tests in our study would include:Main effects of movement across all sessions:Main effects of the drug:Increases:Decreases:Interaction increases:Interaction decreases:1234
18 Why do we care? F-tests in SPM An F-test models multiple linear hypotheses: does the design matrix X model anything?F-contrasts in our previous example…Are there any differences of the drug and placebo altogether? (i.e. increases AND decreases)Used if we want to make more general inferences about data that1) might not be found with simple averaging (cancellations?)2) to test for effects that are jointly expressed by some one-dimensional contrasts'all effects of interest' to check whether there is any effect at all3) in case data is modelled in a more complex way (hrf & deriv)4) when you have multiple regressors and think that the effect expresses itself in some of them, not only one5) If you do not have a very clear hypothesis: might be useful to derive more hypotheses to be tested with t-contrasts=> more details will be given later in the course…
19 References‘Statistical Methods for Psychology’ (2001), by David HowellSPM website: