PSY2004 Research Methods PSY2005 Applied Research Methods Week Eleven Stephen Nunn.

Slides:



Advertisements
Similar presentations
PTP 560 Research Methods Week 9 Thomas Ruediger, PT.
Advertisements

Inference Sampling distributions Hypothesis testing.
Statistical Issues in Research Planning and Evaluation
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
1. Estimation ESTIMATION.
Review: What influences confidence intervals?
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Today Concepts underlying inferential statistics
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Using Statistics in Research Psych 231: Research Methods in Psychology.
The t Tests Independent Samples.
Descriptive Statistics
Inferential Statistics
Inferential Statistics
Statistics for the Social Sciences
Overview of Statistical Hypothesis Testing: The z-Test
Hypothesis Testing.
Tuesday, September 10, 2013 Introduction to hypothesis testing.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Psy B07 Chapter 8Slide 1 POWER. Psy B07 Chapter 8Slide 2 Chapter 4 flashback  Type I error is the probability of rejecting the null hypothesis when it.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Chapter 8 Introduction to Hypothesis Testing
Statistics (cont.) Psych 231: Research Methods in Psychology.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Five.
Statistical Power The power of a test is the probability of detecting a difference or relationship if such a difference or relationship really exists.
Inferential Statistics Body of statistical computations relevant to making inferences from findings based on sample observations to some larger population.
Correct decisions –The null hypothesis is true and it is accepted –The null hypothesis is false and it is rejected Incorrect decisions –Type I Error The.
Review Hints for Final. Descriptive Statistics: Describing a data set.
Sampling and Probability Chapter 5. Sampling & Elections >Problems with predicting elections: Sample sizes are too small Samples are biased (also tied.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Six.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
Introduction to Hypothesis Testing
1.  What inferential statistics does best is allow decisions to be made about populations based on the information about samples.  One of the most useful.
Handout Six: Sample Size, Effect Size, Power, and Assumptions of ANOVA EPSE 592 Experimental Designs and Analysis in Educational Research Instructor: Dr.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Chapter 13 Understanding research results: statistical inference.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Inferential Statistics Psych 231: Research Methods in Psychology.
Introduction to Power and Effect Size  More to life than statistical significance  Reporting effect size  Assessing power.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 11: Between-Subjects Designs 1.
Section Testing a Proportion
Hypothesis Testing.
Central Limit Theorem, z-tests, & t-tests
P-value Approach for Test Conclusion
Review: What influences confidence intervals?
Chapter 12 Power Analysis.
Psych 231: Research Methods in Psychology
Inferential Statistics
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Power and Error What is it?.
AP STATISTICS LESSON 10 – 4 (DAY 2)
Type I and Type II Errors
Statistical Power.
Rest of lecture 4 (Chapter 5: pg ) Statistical Inferences
Presentation transcript:

PSY2004 Research Methods PSY2005 Applied Research Methods Week Eleven Stephen Nunn

What it is Why it is important

sensitivity of a statistical test

why stats? variability in the data lots of different, random sources of variability

we’re trying to see if changes in the Independent Variable type of non-word treatment type affects scores on the Dependent Variable reaction time no. of days drugs taken

lots of other things affect the DV individual differences time of day mood level of attention etc etc Lots of random, unsystematic, sources of variation, unrelated to IV ‘noise’

sometimes the effects due to the IV are big, strong easy to see through the noise but what if the effect you’re looking for is small, weak

your ‘equipment’ ( eyes, statistical test) needs to be sensitive enough to spot it otherwise you’ll miss it

sensitivity of a statistical test

ability or probability of detecting an effect [when there is one]

sounds like a good thing [but is often ignored]

Reviews of meta-analyses* suggest most social science effect sizes are medium at best, mostly small (Ellis, 2010) * meta-analyses combine the results from several studies addressing the same hypotheses

Estimates of power in published Psychological research (e.g., Clark-Carter, 1997, looking at BJP) mean power for medium effects = 0.6 mean power for small effects = 0.2 NB recommended level of power = 0.8

What does power = 0.2 mean? [when there is an effect to detect] you only have a 20% chance of detecting it [i.e., getting a statistically significant result]

The ‘noise’ will tend to swamp the effect of your IV. Repeated running of the same study would only give a significant result 20% of the time

Or, you have a 80% probability of making a Type II error [failing to reject the null hypothesis when it is false]

what affects power? anything that changes the effect / ’noise’ ratio

effect size all other things being equal you will have greater power with a bigger effect, less power with a smaller effect

design all other things being equal repeated measures designs are more powerful that independent groups because they allow you to remove the ‘noise’ in the data due to individual differences

cell size all other things being equal simpler designs, fewer levels of your IV will increase power

alpha [criterion for rejecting H 0 ] stricter (smaller) alphas DECREASE power e.g., Post-hoc Type 1 error rate correction Bonferroni achieved at the expense of power

measures, samples unreliable measures heterogeneous samples –> increase the ‘noise’ –> decrease power

sample size a larger N gives you more power [from Central Limit Theorem, increasing N reduces the variability in the sample means, reduces the ‘noise’]

but does this matter?

for the individual researcher: power = 0.2 = highly likely to waste time and other resources

for ‘science’: should we not worry more about Type 1 errors? [rejecting H 0 when it is false]

maybe, but: common (but mistaken) tendency to interpret non-significant results as evidence for no difference i.e., non-significant result due to low power isn’t just waste of resources, but can be misinterpreted in a misleading way

maybe, but: a strong publication bias in Psychology means Type 1 errors and Power are intertwined i.e., only significant results tend to get published

This bias means that if all H 0 were true then all published studies would be Type 1 errors i.e., keeping the type 1 error rate at 5% for individual studies or research as a whole doesn’t keep the error rate in the literature at that level due to the publication bias

Low power across the discipline increases the proportion of published studies that are Type 1 errors i.e., general low power reduces the proportion of studies with false H 0 s that reach significance and which are therefore published (due to the publication bias). The ratio of Type 1 errors to correct rejections of H 0 is therefore increased (Ellis, 2010)

H 0 true (no effect) H 0 true (no effect) H 0 false (effect) H 0 false (effect) Type 1 errors (5%) Correct failure to reject H 0 Correct failure to reject H 0 Correct rejection of H 0 80% power 40% power Type 2 errors Type 1 errors (5%) Type 2 errors Correct rejection of H 0 published ratio of type 1 errors to correct rejections = 5:80 (6.2%) ratio of type 1 errors to correct rejections = 5:40 (12.4%)

NB the publication bias also makes it harder to reveal Type 1 errors in the literature i.e., non-significant failures to replicate a published study (that reveal it as a possible Type 1 error) are less likely to be published due to the publishing bias against non- significant findings.

maybe, but: small sample (i.e., low power) studies tend to over-estimate the size of effects and are more likely to be Type 1 errors (Ellis, 2010) i.e., studies with small N are more likely to give misleading results (but not always)

low power is BAD for individual researchers and BAD for Psychology as a discipline

what you should do : make sure your own study (e.g., FYP) has sufficient power use something like G*Power to calculate your N for power (1-β) = 0.8 simplify your designs, only include what’s necessary an extra IV or condition either reduces power or raises the N you need

what you should do : pay more attention to published studies that have greater power – e.g., larger samples distrust results from small samples look for meta-analyses