Beyond Null Hypothesis Testing Supplementary Statistical Techniques.

Slides:



Advertisements
Similar presentations
Hypothesis Testing making decisions using sample data.
Advertisements

Inferential Statistics
Statistical Issues in Research Planning and Evaluation
Review: What influences confidence intervals?
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
PSY 307 – Statistics for the Behavioral Sciences
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
What z-scores represent
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 10: Hypothesis Tests for Two Means: Related & Independent Samples.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Chapter 9 Hypothesis Testing.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 11: Power.
Today Concepts underlying inferential statistics
Inferential Statistics
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
Introduction to Hypothesis Testing
Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.
AM Recitation 2/10/11.
Hypothesis Testing:.
Overview of Statistical Hypothesis Testing: The z-Test
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
Tuesday, September 10, 2013 Introduction to hypothesis testing.
Comparing Means From Two Sets of Data
Statistical Analysis Statistical Analysis
Psy B07 Chapter 8Slide 1 POWER. Psy B07 Chapter 8Slide 2 Chapter 4 flashback  Type I error is the probability of rejecting the null hypothesis when it.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
More About Significance Tests
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Comparing Two Population Means
T tests comparing two means t tests comparing two means.
Evidence Based Medicine
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Comparing Two Proportions
Learning Objectives In this chapter you will learn about the t-test and its distribution t-test for related samples t-test for independent samples hypothesis.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
Essential Question:  How do scientists use statistical analyses to draw meaningful conclusions from experimental results?
DIRECTIONAL HYPOTHESIS The 1-tailed test: –Instead of dividing alpha by 2, you are looking for unlikely outcomes on only 1 side of the distribution –No.
Copyright © 2010 Pearson Education, Inc. Chapter 22 Comparing Two Proportions.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
Chapter 10 The t Test for Two Independent Samples
© Copyright McGraw-Hill 2004
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 22 Comparing Two Proportions.  Comparisons between two percentages are much more common than questions about isolated percentages.  We often.
Sampling Distribution (a.k.a. “Distribution of Sample Outcomes”) – Based on the laws of probability – “OUTCOMES” = proportions, means, test statistics.
T tests comparing two means t tests comparing two means.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Review Statistical inference and test of significance.
 What is Hypothesis Testing?  Testing for the population mean  One-tailed testing  Two-tailed testing  Tests Concerning Proportions  Types of Errors.
Statistics 22 Comparing Two Proportions. Comparisons between two percentages are much more common than questions about isolated percentages. And they.
Chapter 10: The t Test For Two Independent Samples.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 9 Introduction to the t Statistic
CHAPTER 15: THE NUTS AND BOLTS OF USING STATISTICS.
Testing the Difference Between Two Means
Testing Hypotheses I Lesson 9.
Type I and Type II Errors
Presentation transcript:

Beyond Null Hypothesis Testing Supplementary Statistical Techniques

PSYC 6130, PROF. J. ELDER 2 Limitations of NHT Criticisms of NHT date from the 1930s. –Null hypothesis is rarely true. –The real question is not about the existence of an effect, but about the nature of the effect: What is the direction of the effect? What is the size of the effect? How important is it? What are the underlying mechanisms (theory)?

PSYC 6130, PROF. J. ELDER 3 Direction of Effect NHT is reasonably well suited to testing the direction of an effect. –For example, are tall men more or less likely to be wealthy? (Canadian Community Health Survey, 2004)

PSYC 6130, PROF. J. ELDER 4 Magnitude of Effect NHT by itself tells us nothing about the magnitude of an effect. This is really a problem of descriptive statistics. The simplest descriptor of the magnitude of an effect is a point estimate:

PSYC 6130, PROF. J. ELDER 5 Magnitude of Effect A problem with a point estimate is that it suggests a certainty we do not really have. A more complete and useful description of the magnitude of the effect is provided by a confidence interval.

PSYC 6130, PROF. J. ELDER 6 Importance of Effect However, even a confidence interval does not really tell us whether a treatment or factor is important. One way to judge whether a difference of means is ‘big’ is to compare the size of the difference of the means to the values of the means themselves, e.g., i.e., wealthy men are roughly 1.2% taller.

PSYC 6130, PROF. J. ELDER 7 Importance of Effect However it is often more meaningful to compare the treatment effect to the overall variation in the measured variable. We call this normalized measure of the effect the effect size d.

PSYC 6130, PROF. J. ELDER 8 Importance of Effect However it is often more meaningful to compare the treatment effect to the overall variation in the measured variable:

PSYC 6130, PROF. J. ELDER 9 Example Effect Sizes Group 1 Group 2

PSYC 6130, PROF. J. ELDER 10 Importance of Effect d provides a sense of how much of the variation in the dependent variable is due to the ‘treatment’.

End of Lecture 6 Oct 22, 2008

PSYC 6130, PROF. J. ELDER 12 Theory Even when augmented with measures of effect size, NHT does not directly tell us about the mechanism by which the treatment impacts the dependent variable. e.g., Wealthy men are taller because… –Tall men attract wealthy women? –Wealthy men come from wealthy families that provided better care (e.g., nutrition). To understand these relationships, it is not enough to test the significance of and quantify effects. Ultimately, we require detailed, mechanistic (causal), testable theories, and experiments that test these theories. These theories should generate quantitative predictions, that can be compared against experimental outcomes. The theory that provides the closest quantiative account of the data should be considered our current ‘working hypothesis’ about how the system under study operates. When comparing theories, we must beware of “Occam’s Razor”. This process is less dependent on NHT, and more dependent upon model fitting, analysis of variance and cross-validation techniques.

Planning Experiments: Statistical Power

PSYC 6130, PROF. J. ELDER 14 Planning a Study There are many considerations that go into planning an experiment or study. Here we focus on the statistical considerations. Some possible questions: –How many samples (e.g., subjects) will I need for my study? –I already know that I will only have access to n samples (subjects). Will this be enough? Answering these questions depends on understanding the relationship between sample size, effect size, and statistical power.

PSYC 6130, PROF. J. ELDER 15 Sample Size and Effect Size Codetermine Power + +

PSYC 6130, PROF. J. ELDER 16 Statistical Power Power is defined as the complement of the Type II error rate. Thus understanding power means understanding Type II errors.

PSYC 6130, PROF. J. ELDER 17 Type I Errors and the Null Hypothesis Distribution (NHD) To understand Type I errors, we considered the situation where the null hypothesis is true, and modeled the null hypothesis distribution.

PSYC 6130, PROF. J. ELDER 18 Understanding Type II Errors To understand the factors that determine Type II errors, we need to model the situation when the null hypothesis is false and the alternative hypothesis is true. The difficulty is that the alternative hypothesis typically encompasses a range of possible population means, and we do not know which one is the correct mean. But suppose for the moment we did. This defines the alternative hypothesis distribution (AHD), which follows a non-central t distribution. We will often approximate this as a normal distribution, in order to compute rough estimates of power.

PSYC 6130, PROF. J. ELDER 19 Sampling Distributions of the Difference of the Means Probability p NHD AHD 0

PSYC 6130, PROF. J. ELDER 20 Standardizing the Alternative Hypothesis Distribution Just as for the NHD, it is useful to standardize the AHD:

PSYC 6130, PROF. J. ELDER 21 Standardized Distributions of the Difference of the Means Probability p(t) NHD AHD t

PSYC 6130, PROF. J. ELDER 22 Planning an Experiment: Approximations Estimates of effect size are always approximate, and so it is reasonable to make approximations when planning a study. For example:

PSYC 6130, PROF. J. ELDER 23 Standardized Distributions of the Difference of the Means (Assume homogeneity of variance, equal sample sizes)

PSYC 6130, PROF. J. ELDER 24 Standardized Distributions of the Difference of the Means Probability p(t) NHD AHD t

PSYC 6130, PROF. J. ELDER 25 Estimating Power (Non-central t distribution) (Central t distribution) Expected t value Pr(t)

PSYC 6130, PROF. J. ELDER 26 Calculating Power from Sample Size and Effect Size

PSYC 6130, PROF. J. ELDER 27 Planning Experiments Planning experiments may involve estimating any one of these variables given knowledge or assumptions about the other two:

PSYC 6130, PROF. J. ELDER 28 Example: Height Difference between Men and Women

PSYC 6130, PROF. J. ELDER 29 Example 1. From this large prior study we know men are on average 5.4” taller than women. We wish to see if this also applies to University students, i.e., whether male students are taller on average than female students. What power will we obtain if we have a class of 10 males and 10 females?

PSYC 6130, PROF. J. ELDER 30

PSYC 6130, PROF. J. ELDER 31 Example 2a. From this large prior study we know men are on average 5.4” taller than women. We wish to see if this also applies to University students, i.e., whether male students are taller on average than female students. What sample size do we need to obtain power of 0.8?

PSYC 6130, PROF. J. ELDER 32 Example 2b. Suppose we only care about differences greater than 1” Suppose also that we wish to have power of at least.8 (i.e., 80% chance of rejecting the null hypothesis, given it is false) for a 2-tailed test with  =.05. What is the maximum sample size worth collecting?

PSYC 6130, PROF. J. ELDER 33 Example 3. Suppose we are stuck with a sample size of 10 (i.e., 10 men and 10 women). Is it worth doing the study? Let’s decide that it is not worth doing the study unless we have power of at least.8 (i.e., 80% chance of rejecting the null hypothesis, given it is false) for a 2-tailed test with  =.05.

PSYC 6130, PROF. J. ELDER 34 Manipulating Power In theory, power can be manipulated by changing –Sample size –Alpha level –Effect size Increase strength of treatment Decrease variability –Control of nuisance variables –Matched designs

PSYC 6130, PROF. J. ELDER 35 One-Sample Tests Note the greater power of one-sample tests, relative to two-sample tests!

PSYC 6130, PROF. J. ELDER 36 Unequal Sample Sizes When samples are of different size, apply same formulas for estimating power, using average sample size. Most accurate method is to use the harmonic mean:

PSYC 6130, PROF. J. ELDER 37 Effect Size for Paired Sample Designs Two methods for computing effect size for paired designs: (e.g., Dunlop et al., 1996) (e.g., Rosenthal, 1991) Either method is fine, as long as you know what it means!