T- and F-tests Testing hypotheses.

Slides:



Advertisements
Similar presentations
Research Methods in Politics Chapter 14 1 Research Methods in Politics 14 Understanding Inferential Statistics.
Advertisements

Introductory Mathematics & Statistics for Business
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
Multiple-choice question
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
T-tests, ANOVA and regression Methods for Dummies February 1 st 2006 Jon Roiser and Predrag Petrovic.
Quantitative Methods Lecture 3
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
9.4 t test and u test Hypothesis testing for population mean Example : Hemoglobin of 280 healthy male adults in a region: Question: Whether the population.
6. Statistical Inference: Example: Anorexia study Weight measured before and after period of treatment y i = weight at end – weight at beginning For n=17.
Chapter 15 ANOVA.
Statistics for the Social Sciences Psychology 340 Spring 2005 Using t-tests.
Statistics for the Social Sciences
Z, t, and F tests Making inferences from experimental sample to population using statistical tests.
Sociology 5811: T-Tests for Difference in Means
Chapter Thirteen The One-Way Analysis of Variance.
Ch 14 實習(2).
Hypothesis Testing Chapter 13. Hypothesis Testing Decision-making process Statistics used as a tool to assist with decision-making Scientific hypothesis.
T-tests continued.
Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
BHS Methods in Behavioral Sciences I April 25, 2003 Chapter 6 (Ray) The Logic of Hypothesis Testing.
T-Tests.
t-Tests Overview of t-Tests How a t-Test Works How a t-Test Works Single-Sample t Single-Sample t Independent Samples t Independent Samples t Paired.
T-Tests.
Chapter Seventeen HYPOTHESIS TESTING
PSY 307 – Statistics for the Behavioral Sciences
Independent Sample T-test Formula
Lecture 10 PY 427 Statistics 1 Fall 2006 Kin Ching Kong, Ph.D
Inferential Stats for Two-Group Designs. Inferential Statistics Used to infer conclusions about the population based on data collected from sample Do.
PSY 307 – Statistics for the Behavioral Sciences
Overview of Lecture Parametric Analysis is used for
Lecture 9: One Way ANOVA Between Subjects
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Statistics for the Social Sciences Psychology 340 Spring 2005 Within Groups ANOVA.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Today Concepts underlying inferential statistics
PSY 307 – Statistics for the Behavioral Sciences
Richard M. Jacobs, OSA, Ph.D.
PSY 307 – Statistics for the Behavioral Sciences
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
AM Recitation 2/10/11.
PS 225 Lecture 15 Analysis of Variance ANOVA Tables.
ANOVA Greg C Elvers.
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
Chapter 13: Introduction to Analysis of Variance
User Study Evaluation Human-Computer Interaction.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Chapter 14 Repeated Measures and Two Factor Analysis of Variance
Chapter 10 The t Test for Two Independent Samples
Introduction to ANOVA Research Designs for ANOVAs Type I Error and Multiple Hypothesis Tests The Logic of ANOVA ANOVA vocabulary, notation, and formulas.
Chapter 13 Understanding research results: statistical inference.
BHS Methods in Behavioral Sciences I May 9, 2003 Chapter 6 and 7 (Ray) Control: The Keystone of the Experimental Method.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Statistical Inferences for Variance Objectives: Learn to compare variance of a sample with variance of a population Learn to compare variance of a sample.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Part Four ANALYSIS AND PRESENTATION OF DATA
Chapter 14 Repeated Measures
Last class Tutorial 1 Census Overview
I. Statistical Tests: Why do we use them? What do they involve?
F-tests Testing hypotheses.
Chapter 10 – Part II Analysis of Variance
BHS Methods in Behavioral Sciences I
Presentation transcript:

t- and F-tests Testing hypotheses

Overview Distribution& Probability Standardised normal distribution t-test F-Test (ANOVA)

Starting Point Central aim of statistical tests: Determining the likelihood of a value in a sample, given that the Null hypothesis is true: P(value|H0) H0: no statistically significant difference between sample & population (or between samples) H1: statistically significant difference between sample & population (or between samples) Significance level: P(value|H0) < 0.05 The central aim of statistical methods is to determine the likelihood of a value in a sample under the assumption that the Null hypothesis is true The H0 states that there is no statistically significant difference between your sample and a reference population (or between two samples) The H1 states the opposite, i.e. that there IS a statistically significant difference between your sample and a reference population (or between two samples It‘s quite important to note that this does not refer to the probablity of the value, but to the probablity of the value under the assumption that H0 is true In order to decide when we reject the H0, we have to set a significance level; this is normally at p < 0.05, i.e if the likelihood of a value in our sample is less or equal to 5%, given that H0 is true

Types of Error Population H0 H1 Sample b-error 1-a (Type II error) a-error (Type I error) 1-b There are two types of error that you can make: A Type 1 (or alpha) error denotes a false positive result, i.e. that you accept the H1 in your data, even though the H0 is true Conversely, a type 2 (or beta) error denotes a false negative result, i.e. that you accept the H0, even though the H1 is true The two green fields describe the remaining probability that, given alpha (or beta), you are making the correct decision when you accept the H0 (true negative result) or reject the H0 (i.e. accept the H1) (true positive result) The way in which we decide whether a given value is highly unlikely (i.e. Statistically significant) is to look at the underlying distribution

Distribution & Probability If we know s.th. about the distribution of events, we know s.th. about the probability of these events a/2 If we know s.th. about the distribution of events, we know s.th. about the probability of these events Many common attributes (such as for example feet size) roughly follow the normal distribution, as it is depicted in this graph The normal distribution is sufficiently characterised by its mean (mu) and its standard deviation (sigma) mean/mu = average of the data Standard deviation/sigma = measure for the variability of the data That means that if you are given the mean and standard deviation of a normal distribution, you know what it looks like, i.e. you can in theory draw it As you can see, the area that covers the distance from mu – sigma to mu + sigma covers approximately 68% of all the values in the normal distribution; from mu – 2*sigma to mu + 2*sigma covers roughly 95%, and 99% if you consider the interval of 3*sigma around mu This distribution basically represents your distribution under the assumption that the H0 is correct this is important, because if you now discover that, say, Will Penny has size 14, then this is a rather unlikely event

Standardised normal distribution Population Sample the z-score represents a value on the x-axis for which we know the p-value 2-tailed: z = 1.96 is 2SD around mean = 95%  ‚significant‘ 1-tailed: z = +-1.65 is 95% from ‚plus or minus infinity‘ There are infinitely many normal distribution with infinitely many mu‘s and sigma‘s. Here then the problem of comparability arises i.e. whether an IQ score of 115 in one test (ranging from 0-200) is comparable to an IQ score of 115 in another (ranging from 50-150); (or whether a 5° difference in Celsius is the same as a 5° difference in Fahrenheit) One way to make distributions directly comparable, is to standardise them by computing a linear transformation The standardised normal distribution does exactly that and is defined as that normal distribution with mu=0 and sigma=1 This can be thought of as expressing your data in the same ‘units’. Therefore, if you remember from the previous slide, the range of 2 standard deviations around the mean covers approximatley 95%; because the standard deviation of a standardised normal distribution is 1, a z-score of +2 or –2, i.e. 2 std, gives the boundary for our confidence interval Only for 2-tailed tests! See distr. around mean versus area from –infinity to z=2.0

t-tests: Testing Hypotheses About Means For a z-test you need to know the population mean and s.d. Often you don’t know the s.d. of the hypothesised or comparison population, and so you use a t-test. This uses the sample s.d. instead. This introduces a source of error, which decreases as your sample size increases Therefore, the t statistic is distributed differently depending on the size of the sample, like a family of normal curves. The degrees of freedom (df = sample size – 1) represents which of these curves you are relating your t-value to. There are different tables of p-values for different degrees of freedom. larger sample = more ‘squashed’ t-statistic distribution = easier to get significance

Degrees of freedom (df) Number of scores in a sample that are free to vary n=4 scores; mean=10  df=n-1=4-1=3 Mean= 40/4=10 E.g.: score1 = 10, score2 = 15, score3 = 5  score4 = 10

Kinds of t-tests Formula is slightly different for each: Single-sample: tests whether a sample mean is significantly different from a pre-existing value (e.g. norms) Paired-samples: tests the relationship between 2 linked samples, e.g. means obtained in 2 conditions by a single group of participants Independent-samples: tests the relationship between 2 independent populations formula see previous slide

Independent sample t-test Number of words recalled Group 1 Group 2 (Imagery) 21 22 19 25 18 27 24 23 26 17 28 16 30 mean = 19 mean = 26 std = sqrt(40) std = sqrt(50) df = (n1-1) + (n2-1) = 18 Recall of 40 nouns Group 1 task: memorise Group 2 task: memorise and form images  Reject H0

Bonferroni correction To control for false positives: E.g. four comparisons:

F-tests / Analysis of Variance (ANOVA) T-tests - inferences about 2 sample means But what if you have more than 2 conditions? e.g. placebo, drug 20mg, drug 40mg, drug 60mg Placebo vs. 20mg 20mg vs. 40mg Placebo vs 40mg 20mg vs. 60mg Placebo vs 60mg 40mg vs. 60mg Chance of making a type 1 error increases as you do more t-tests ANOVA controls this error by testing all means at once - it can compare k number of means. Drawback = loss of specificity

F-tests / Analysis of Variance (ANOVA) Different types of ANOVA depending upon experimental design (independent, repeated, multi-factorial) Assumptions • observations within each sample were independent • samples must be normally distributed • samples must have equal variances

F-tests / Analysis of Variance (ANOVA) obtained difference between sample means difference expected by chance (error) F = variance (differences) between sample means variance (differences) expected by chance (error) Difference between sample means is easy for 2 samples: (e.g. X1=20, X2=30, difference =10) but if X3=35 the concept of differences between sample means gets tricky

F-tests / Analysis of Variance (ANOVA) Solution is to use variance - related to SD Standard deviation = Variance E.g. Set 1 Set 2 20 28 30 30 35 31 s2=58.3 s2=2.33 These 2 variances provide a relatively accurate representation of the size of the differences

F-tests / Analysis of Variance (ANOVA) Simple ANOVA example Total variability Between treatments variance ---------------------------- Measures differences due to: Treatment effects Chance Within treatments variance -------------------------- Measures differences due to: 1. Chance

F-tests / Analysis of Variance (ANOVA) When treatment has no effect, differences between groups/treatments are entirely due to chance. Numerator and denominator will be similar. F-ratio should have value around 1.00 When the treatment does have an effect then the between-treatment differences (numerator) should be larger than chance (denominator). F-ratio should be noticeably larger than 1.00 MSbetween F = MSwithin

F-tests / Analysis of Variance (ANOVA) Simple independent samples ANOVA example F(3, 8) = 9.00, p<0.05 There is a difference somewhere - have to use post-hoc tests (essentially t-tests corrected for multiple comparisons) to examine further Placebo Drug A Drug B Drug C Mean 1.0 1.0 4.0 6.0 SD 1.73 1.0 1.0 1.73 n 3 3 3 3

F-tests / Analysis of Variance (ANOVA) Gets more complicated than that though… Bit of notation first: An independent variable is called a factor e.g. if we compare doses of a drug, then dose is our factor Different values of our independent variable are our levels e.g. 20mg, 40mg, 60mg are the 3 levels of our factor

F-tests / Analysis of Variance (ANOVA) Can test more complicated hypotheses - example 2 factor ANOVA (data modelled on Schachter, 1968) Factors: Weight - normal vs obese participants Full stomach vs empty stomach Participants have to rate 5 types of crackers, dependent variable is how many they eat This expt is a 2x2 factorial design - 2 factors x 2 levels

F-tests / Analysis of Variance (ANOVA) Mean number of crackers eaten Empty Full Result: No main effect for factor A (normal/obese) No main effect for factor B (empty/full) Normal 22 15 = 37 = 35 Obese 17 18 = 39 = 33

F-tests / Analysis of Variance (ANOVA) Mean number of crackers eaten Empty Full 23 22 21 20 19 18 17 16 15 14 Normal 22 15 obese Obese 17 18 normal Empty Full Stomach Stomach

F-tests / Analysis of Variance (ANOVA) Application to imaging…

= F-tests / Analysis of Variance (ANOVA) Application to imaging… - Early days => subtraction methodology => T-tests corrected for multiple comparisons = e.g. Pain Visual task Appropriate rest condition Statistical parametric map - =

F-tests / Analysis of Variance (ANOVA) This is still a fairly simple analysis. It shows the main effect of pain (collapsing across the pain source) and the individual conditions. More complex analyses can look at interactions between factors Derbyshire, Whalley, Stenger, Oakley, 2004

Gravetter & Wallnau - Statistics for the behavioural sciences References Gravetter & Wallnau - Statistics for the behavioural sciences Last years presentation, thank you to: Louise Whiteley & Elisabeth Rounis http://www.fil.ion.ucl.ac.uk/spm/doc/mfd-2004.html Google