Introduction to Power and Effect Size  More to life than statistical significance  Reporting effect size  Assessing power.

Slides:



Advertisements
Similar presentations
Effect Sizes and Power Review
Advertisements

Statistical Issues in Research Planning and Evaluation
Lecture 3: Null Hypothesis Significance Testing Continued Laura McAvinue School of Psychology Trinity College Dublin.
Behavioural Science II Week 1, Semester 2, 2002
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Inferential Statistics
Inferential Statistics
+ Quantitative Statistics: Chi-Square ScWk 242 – Session 7 Slides.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Mid-semester feedback In-class exercise. Chapter 8 Introduction to Hypothesis Testing.
Sampling Distributions and Hypothesis Testing. 2 Major Points An example An example Sampling distribution Sampling distribution Hypothesis testing Hypothesis.
Chapter 8 Introduction to Hypothesis Testing
Statistical Power The ability to find a difference when one really exists.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Chapter 8 Introduction to Hypothesis Testing
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
Step 3 of the Data Analysis Plan Confirm what the data reveal: Inferential statistics All this information is in Chapters 11 & 12 of text.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 4): Power Fall, 2008.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
The use & abuse of tests Statistical significance ≠ practical significance Significance ≠ proof of effect (confounds) Lack of significance ≠ lack of effect.
Statistical Power The power of a test is the probability of detecting a difference or relationship if such a difference or relationship really exists.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Chapter 9 Introduction to the t Statistic. 9.1 Review Hypothesis Testing with z-Scores Sample mean (M) estimates (& approximates) population mean (μ)
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
6 Making Sense of Statistical Significance.
Hypothesis test flow chart
Statistics (cont.) Psych 231: Research Methods in Psychology.
Inferential Statistics Psych 231: Research Methods in Psychology.
Chapter 9 Introduction to the t Statistic
15 Inferential Statistics.
Chapter 8 Introducing Inferential Statistics.
Psych 231: Research Methods in Psychology
The ability to find a difference when one really exists.
INF397C Introduction to Research in Information Studies Spring, Day 12
Power Analysis and Meta-analysis
Introduction to inference Use and abuse of tests; power and decision
Statistics for the Social Sciences
Chapter 21 More About Tests.
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern October 17 and 19, 2017.
Sampling Distributions and Hypothesis Testing
Inferential Statistics
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Central Limit Theorem, z-tests, & t-tests
Hypothesis Testing: Hypotheses
More about Tests and Intervals
Statistical Process Control
Calculating Sample Size: Cohen’s Tables and G. Power
Kin 304 Inferential Statistics
Chapter 6 Making Sense of Statistical Significance: Decision Errors, Effect Size and Statistical Power Part 1: Sept. 18, 2014.
William P. Wattles, Ph.D. Psychology 302
Significance and t testing
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Chapter 12 Power Analysis.
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Inferential Statistics
Chapter 8 Making Sense of Statistical Significance: Effect Size, Decision Errors, and Statistical Power.
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Type I and Type II Errors
Rest of lecture 4 (Chapter 5: pg ) Statistical Inferences
Introduction To Hypothesis Testing
Statistical Power And Sample Size Calculations
Presentation transcript:

Introduction to Power and Effect Size  More to life than statistical significance  Reporting effect size  Assessing power

Statistical significance  Turns out a lot of researchers do not know what p<.05 actually means  Cohen (1994) Article: The earth is round (p<.05)  What it means: "Given that H 0 is true, what is the probability of these (or more extreme) data?”  Trouble is most people want to know "Given these data, what is the probability that H 0 is true?"

Always a difference  Commonly define the null hypothesis as ‘no difference’  Differences between groups always exist (at some level of precision)  Obtaining statistical significance can then be seen as just a matter of sample size  Furthermore the importance and magnitude of an effect are not reflected (because of the role of sample size in probability value attained)

What should we be doing?  Want to make sure we have looked hard enough for the difference – power analysis  Figure out how big the thing we are looking for is – effect size

Calculating effect size  Different statistical tests have different effect sizes developed for them  However, the general principle is the same

Types of effect size Two basic classes of effect size  Focused on means, then standardized differences Allows comparison across samples and variables with differing variance  Equivalent to z scores Note sometimes no need to standardize (units of the scale have inherent meaning)  Variance-accounted-for Amount explained versus the total  Some others (see Kirk, 1996)

Cohen’s d – Differences Between Means  Uuse this with independent samples t test  Getting at: Magnitude of experimental result  Cohen was one of the pioneers  Defined d

The usual problem  In a theoretical discussion use of parameters is fine  We want a practical tool  Use sample statistics Side note - Cohen suggested could use either sample standard deviation, since they should both be roughly equal. In practice people now use the pooled variance.

Characterizing effect size  Cohen emphasized that the interpretation of effects requires the researcher to consider things narrowly in terms of the specific area of inquiry  Evaluation of effect sizes inherently requires a personal value judgment regarding the practical or clinical importance of the effects

  2 =  A measure of the degree to which variability among observations can be attributed to conditions Eta-squared

Small, medium, large? Cohen (1969)  ‘small’ real, but difficult to detect difference between the heights of 15 year old and 16 year old girls in the US  ‘medium’ ‘large enough to be visible to the naked eye’ difference between the heights of 14 & 18 year old girls  ‘large’ ‘grossly perceptible and therefore large’ difference between the heights of 8 & 18 year old girls

How big?  Cohen (e.g. 1969, 1988) offers some rules of thumb Fairly widespread convention now  Looked at social science literature and suggested some ways to carve results into small, medium, and large effects  In terms of d or In terms of  2  0.2 small.10+ large  0.5 medium  0.8 large  Be wary of “mindlessly invoking” these (or any other) criteria, especially for small samples

Maximum Power!  In statistics we want to give ourselves the best chance to find a significant result if one exists.  Power represents the probability of finding that significant result if it exists  Power = 1 -    is our Type II error rate, or the probability of retaining the null when we should have rejected it

Two kinds of power analysis  A priori Used when planning your study What sample size is needed?  Post hoc Used when evaluating study What chance did you have of significant results? Not really appropriate. If you do the power analysis and conduct your analysis accordingly then you did what you could. To say after, “I would have found a difference but didn’t have enough power” isn’t going to impress anyone.

A priori power Can use all this to calculate how many subjects we need to run  Decide an acceptable level of power  Have a standard  level  Figure out the desirable or expected effect size  Calculate N

A priori Effect Size?  Figure out an effect size before I run my experiment?  Several ways to do this: Base it on substantive knowledge  What you know about the situation and scale of measurement Base it on previous research Use Cohen’s conventions

An acceptable level of power?  Why not set power at.99?  Practicalities Cost of increasing power (usually done through increasing n) can be high

Post hoc power  If you fail to reject the null hypothesis you might want to know what chance you had of finding a significant result – defending the failure  This is a little dubious  Better used in a post hoc fashion to figure out the likelihood of other experiments replicating your results

Carrying out the calculation  When you actually have to implement power calculations you can use specialist programs – lots of websites with applications to do the calculation for you

The hard way  Use sample mean (or difference score) and the raw score associated w/ your alpha cutoff point (i.e. convert the t critical value to a raw score).  Use z score methods to help you find percentages for beta and/or 1-beta.

Reminder: Factors affecting Power  Alpha level  Sample size  Effect size  Variability

Howell’s general rule  Look for big effects or  Use big samples