Life After P-hacking (APS May 2013, Washington DC) With minor edits for posting Uri Simonsohn Penn (gave the talk) Leif Nelson UC Berkeley Joe Simmons.

Slides:



Advertisements
Similar presentations
Question #1 Start Question #2 Question #3 Question #4 Question #5 Question #7 Question #6 Question #8 Question #9 Question #10 Final Question.
Advertisements

Calculations with significant figures So now you know what to do when you multiply and divide measured quantities:
Right Brain vs. Left Brain. Directions Get a blank sheet of lined paper. Every time you read a description or characteristic that applies to you, write.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Wants.
Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
It is 2 o C The temperature drops by 3 degrees What temperature is it now? -1 o C.
Imagine you work at an orchard and have 27 apples…
Chapter 7 Sampling and Sampling Distributions
Chapter 25 Paired Samples and Blocks
Work, Women and Caregiving By PAULA SPANPAULA SPAN NY Times – November 21, 2013
5th Grade Module 2 – Lesson 16
June 14. In Chapter 9: 9.1 Null and Alternative Hypotheses 9.2 Test Statistic 9.3 P-Value 9.4 Significance Level 9.5 One-Sample z Test 9.6 Power and Sample.
Tests of Significance and Measures of Association
Green Eggs and Ham.
Week 5: Research proposal reminders. Word limit: 2000, but aim for a maximum of 1500 (not including title or references) SectionMax. marks Content Title2IV.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
A.
Type I & Type II errors Brian Yuen 18 June 2013.
Chapter 15 ANOVA.
The T Distribution ©Dr. B. C. Paul Wasn’t the Herby Assembly Line Problem Fun But there is one little problem But there is one little problem We.
GT Project Step by Step An online guide.
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Research Methodology Statistics Maha Omair Teaching Assistant Department of Statistics, College of science King Saud University.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 2) Slideshow: exercise 2.16 Original citation: Dougherty, C. (2012) EC220 - Introduction.
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
to talk about things that probably will happen in the future, considering certain circumstances to talk about things that probably will happen in the.
Probability and Induction
1 Hypothesis Testing Chapter 8 of Howell How do we know when we can generalize our research findings? External validity must be good must have statistical.
Safety On The Internet Illinois Attorney General’s Office Naperville Police Department.
Winslow Homer: “On The Stile” INFERENTIAL PROBLEM SOLVING Hypothesis Testing and t-tests Chapter 6:
Analysis of frequency counts with Chi square
Economics 5550 Statistical Tools. Statistics and Hypotheses If you are following the health care system, you see a lot of discussion. Let's consider health.
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
P-value Method 2 means, sigmas unknown. Sodium levels are measured in millimoles per liter (mmol/L) and a score between 136 and 145 is considered normal.
Multiplication with Base 10 Pieces Modeling Multiplication With your Base Ten blocks, model the problem: 3 x 5 Let’s see the example below… How.
June 2, 2008Stat Lecture 18 - Review1 Final review Statistics Lecture 18.
Safety On The Internet  Usage time  Locations that may be accessed  Parental controls  What information may be shared with others Online rules should.
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
Traditional Method 1 mean, sigma unknown. In a national phone survey conducted in May 2012, adults were asked: Thinking about social issues, would you.
More About Significance Tests
STUDENT's "When I Feel Frustrated" Story
2 nd Order CFA Byrne Chapter 5. 2 nd Order Models The idea of a 2 nd order model (sometimes called a bi-factor model) is: – You have some latent variables.
When Someone is Talking. Sometimes in school I have something important to tell an adult. Oh I really need to tell her something…but she is talking…
Comparing 2 Population Means Goal is to compare the mean response (or other quantity) of two different populations. –We sample from two groups of individuals.
I am ready to test!________ I am ready to test!________
Sight Words.
AP STATS: PROJECT Take 5-10 minutes to brainstorm ideas for your final project. Proposals will be due on Friday (rough drafts will be peer edited on Thursday).
Looking for Context Clues?. Context Clues – What Are They? Context clues are bits of information from the text that, when combined with prior knowledge,
The binomial applied: absolute and relative risks, chi-square.
Psych 230 Psychological Measurement and Statistics
Get Your D.C. License Digital Citizenship Directions Click the arrow to turn the page Next, read each question and the three (3) answers provided. Click.
Section 9.1(re-visited)  Making Sense of Statistical Significance  Inference as Decision.
High Frequency Words.
Seek First To Understand, Then To Be Understood Hey there people My name is Joe Jonas and I am your guide for today Now who here thinks they’re a good.
1 Chapter 4, Part 1 Basic ideas of Probability Relative Frequency, Classical Probability Compound Events, The Addition Rule Disjoint Events.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Introduction We learned from last chapter that histogram can be used to summarize large amounts of data. We learned from last chapter that histogram can.
RELATIONSHIPS! The Relationship Trap! What’s healthy, and what is not!
Created By Sherri Desseau Click to begin TACOMA SCREENING INSTRUMENT FIRST GRADE.
Men’s Talk and Women’s Talk in the United States A Marriage is often not easy. Maybe a man and a woman love or like each other, but they argue. They get.
Bringing Power to Psychological Science
Chapter 23 Comparing Means.
Active Learning Lecture Slides
Null Hypothesis Testing
AP Stats Check In Where we’ve been…
TMA 4255 Applied statistics
Lecture Slides Elementary Statistics Eleventh Edition
Psych 231: Research Methods in Psychology
Presentation transcript:

Life After P-hacking (APS May 2013, Washington DC) With minor edits for posting Uri Simonsohn Penn (gave the talk) Leif Nelson UC Berkeley Joe Simmons Penn also Photo not necessary

Definition p-hacking: exploiting researchers degrees-of- freedom seeking p<.05

Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian?

~ Median study: n=20 False-Positive Psych: n>20 What can you reliably detect with n=20? Mturk study. – N=674 – Why not published ds?

n=20 is enough for: Men taller than women n=6 People above median age closer to retirement n=10 Women, more shoes than men n=15

n=20 is not enough for: People who like spicy food are more likely to like Indian food n = 27 Liberals rate social equality as more important than do conservatives n = 34 People who like eggs report eating egg salad more often n = 47 Men weigh more than women n = 47 Smokers think smoking is less likely to kill someone than do non-smokers n = 146

People who like spicy food are more likely to like Indian food n = 27 Liberals rate social equality as more important than do conservatives n = 34 People who like eggs report eating egg salad more often n = 47 Men weigh more than women n = 47 Smokers think smoking is less likely to kill someone than do non-smokers n = 146

Are you studying a bigger effect than: Men weigh more than women? If not, use n>50

Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian?

Estimates are way off Subjects confused? Big outliers

p <.03 Estimates are way off Subjects confused? Big outliers

p <.03 Study 1?

Run calories study again. Same exclusion rule.

Why not just conceptual replication? Restart p-hacking clock Failures do not count

Replications Conceptual – Rule out confounds – Rule in generalizability Direct – Rule out false-positive

Life after p-hacking n>50 Direct replications 21 words (Google it) Compromise writing Who to hire What about Bayesian?

How can an organic farmer compete?

How can an organic researcher compete? If you determined sample size in advance Say it. If you did not drop variables Say it. If you did not drop conditions Say it.

21 Word Solution get.pdf here Footnote 1 We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study. Organic Farmer Organic Researcher

Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian?

Compromise writing While reviewers still in dark ages. Have it both ways. Clean version in main text – All studies worked & < 2500 words Supplement/footnote – n=100 n=150 – p=.08 w/o exclusion – Data and materials online Only reformers read small print Organic 21 words applies. Everybody likes the paper

Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian?

If you hire based on quantity you pass on these guys

Whats the alternative to counting papers? Rookies: Best 1 Tenure: Best 3 Full: Best 5 Try it. It is a powerful question. Whats her best paper?

Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian? Only speak for myself here. My prior: Bayesians will be unhappy in 321

P-hacking also invalidates Bayesian results

Let me say that again

Bayesian proposals for Psych 1) Bayesian t-test Replications use it sometimes Turns out – α = 5% 2) Bayesian estimation Latest JEP:G. Turns out – Changes nothing 1%

t-test vs Bayesian Estimation changes nothing How similar? Results change by less than if we dropped 1 observation at random.

But! Isnt data-peeking OK for Bayes? – Not when used for hypothesis testing Also: – Dropped subjects, measures, conditions invalidate all inference.

P-hacking Bayesian stats Drunk driving leather seats Good reasons to go Bayesian do not include p-hacking.

Next slide is the last.

Life after p-hacking n>50 Direct replications 21 words Compromise writing Who to hire What about Bayesian? Only speak for myself here. Leif Nelson UC Berkeley Joe Simmons Penn