Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses.

Slides:



Advertisements
Similar presentations
Biostatistics in Practice Session 3: Testing Hypotheses Peter D. Christenson Biostatistician
Advertisements

Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Objectives Look at Central Limit Theorem Sampling distribution of the mean.
Thursday, September 12, 2013 Effect Size, Power, and Exam Review.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Sampling Distributions
Inferences About Process Quality
Sample Size Determination
Sample size and study design
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Sampling Theory Determining the distribution of Sample statistics.
Sample Size Determination Ziad Taib March 7, 2014.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Statistical Analysis. Purpose of Statistical Analysis Determines whether the results found in an experiment are meaningful. Answers the question: –Does.
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
AM Recitation 2/10/11.
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Fundamentals of Hypothesis Testing: One-Sample Tests
Statistical Analysis Statistical Analysis
June 18, 2008Stat Lecture 11 - Confidence Intervals 1 Introduction to Inference Sampling Distributions, Confidence Intervals and Hypothesis Testing.
Comparing Two Population Means
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
January 31 and February 3,  Some formulae are presented in this lecture to provide the general mathematical background to the topic or to demonstrate.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 1: Sample Size & Power for Inequality and Equivalence Studies.
Chapter 20 Testing hypotheses about proportions
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Biostatistics in practice Session 3 Youngju Pak, Ph.D. UCLA Clinical and Translational Science Institute LA BioMed/Harbor-UCLA Medical Center LA BioMed/Harbor-UCLA.
STA Lecture 251 STA 291 Lecture 25 Testing the hypothesis about Population Mean Inference about a Population Mean, or compare two population means.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 4: Study Size and Power.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size and Power.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 6: Case Study.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Ch11: Comparing 2 Samples 11.1: INTRO: This chapter deals with analyzing continuous measurements. Later, some experimental design ideas will be introduced.
Issues concerning the interpretation of statistical significance tests.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size for Precision or Power.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Welcome to MM570 Psychological Statistics
AP Statistics Section 11.1 B More on Significance Tests.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 2: Correlation of Time Courses of Simultaneous.
© Copyright McGraw-Hill 2004
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
Statistical Techniques
Math 3680 Lecture #15 Confidence Intervals. Review: Suppose that E(X) =  and SD(X) = . Recall the following two facts about the average of n observations.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 3: Testing Hypotheses.
Nonparametric Statistical Methods. Definition When the data is generated from process (model) that is known except for finite number of unknown parameters.
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 1: Demonstrating Equivalence of Active Treatments:
Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
Biostatistics Case Studies 2007
Presentation transcript:

Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses

Session 3 Preparation We have been using a recent study on hyperactivity for the concepts in this course. The questions below based on this paper are intended to prepare you for session 3. 1.Look at the bottom panel of Figure 3. Based on what we have discussed about confidence intervals, do you see evidence for change in hyperactivity under Mix A? 2.Repeat question 1 for placebo.

Session 3 Preparation: #1 and #2

Session 3 Preparation We have been using a recent study on hyperactivity for the concepts in this course. The questions below based on this paper are intended to prepare you for session Now look at the fourth vertical bar in this same panel in Fig 3. Does it agree with your combined conclusions in questions 1 and 2?

Session 3 Preparation: #3

Session 3 Preparation We have been using a recent study on hyperactivity for the concepts in this course. The questions below based on this paper are intended to prepare you for session Do you think that the negative conclusion for question #1 been "proven"? 5. Do you think that the positive conclusion for question #2 been "proven"?

Session 3 Preparation: #4 and #5 Possible values for real effect. Zero is “ruled out”.

Session 3 Preparation 5. From Tables 1 and 2, we see that ( )/209=34% of parents of the younger children and ( )/160=19% of parents of the older children initially were interested but did not complete the study. What are the main reported reasons for not completing? Does it seem logical that the rate is higher for the 3-year-olds? Do you have any intuition on whether the magnitude of the 34% vs. 19% difference is enough to support an age difference, regardless of the logical reason?

Session 3 Preparation #5 73% ↔ Consented ↔ 90% 66% ↔ Completed ↔ 81% Not intuitive whether 73% vs. 90% is real, or reproducible.

Session 3 Goals Statistical testing concepts Three most common tests Software Equivalence of testing and confidence intervals False positive and false negative conclusions

Goal: Do Groups Differ By More than is Expected By Chance? Cohan (2005) Crit Care Med;33:

Goal: Do Groups Differ By More than is Expected By Chance? First, need to: Specify experimental units (Persons? Blood draws?). Specify single outcome for each unit (e.g., Yes/No, mean or min of several measurements?). Examine raw data, e.g., histogram, for meeting test requirements. Specify group summary measure to be used (e.g., % or mean, median over units). Choose particular statistical test for the outcome.

Outcome Type → Statistical Test Cohan (2005) Crit Care Med;33: Medians %s Means Wilcoxon Test ChiSquare Test t Test

Minimal MAP: Group Distributions of Individual Units AI Group (N=42) Stem.Leaf # Multiply Stem.Leaf by 10**+1 Non-AI Group (N=38) Stem.Leaf # Multiply Stem.Leaf by 10**+1 → Approximately normally distributed → Use means to summarize groups. → Use t-test to compare means.

Goal: Do Groups Differ By More than is Expected By Chance? Next, need to: 1. Calculate a standardized quantity for the particular test, a “test statistic”. Often: t=(Diff in Group Means)/SE(Diff) 2. Compare the test statistic to what it is expected to be if (populations represented by) groups do not differ. Often: t is approx’ly normal bell curve. 3. Declare groups to differ if test statistic is too deviant from expectations in (2) above. Often: absolute value of t >~2.

t-Test for Minimal MAP: Step 1 1. Calculate a standardized quantity for the particular test, a “test statistic”. Diff in Group Means = = 7.2 SE(Diff) ≈ sqrt[SEM SEM 2 2 ] = sqrt( ) ≈ 2.2 AI N 42 Mean Std Dev SE(Mean) 1.66=10.78/√42 Non AI N 38 Mean Std Dev SE(Mean) 1.41=8.71/√38 → Test Statistic = t = ( )/2.2 = 3.28

t-Test for Minimal MAP: Step 2 2.Compare the test statistic to what it is expected to be if (populations represented by) groups do not differ. Often: t is approx’ly normal bell curve. Expect 0.95 Chance Observed = 3.28 Expected values for test statistic if groups do not differ. Area under sections of curve = probability of values in the interval. (0.5 for 0 to ∞) Prob (-2 to -1) is Area = 0.14

t-Test for Minimal MAP: Step 3 Expect 95% Chance Observed = Declare groups to differ if test statistic is too deviant. [How much?] Convention: “Too deviant” is ~2. “Two-tailed” = the 5% is allocated equally for either group to be superior. 2.5% Conclude: Groups differ since ≥3.28 has <5% if no diff in entire populations.

t-Test for Minimal MAP: p value Expect 95% Chance Observed = 3.28 p-value: Probability of a test statistic at least as deviant as observed, if populations really do not differ. Smaller values ↔ more evidence of group differences. Area = p value = 2(0.0007) = << Declare groups to differ if test statistic is too deviant. [How much?]

t-Test: Technical Note There are actually several types of t-tests: Equal vs. unequal variance (variance =SD 2 ), depending on whether the SDs are too different between the groups. [Yes, there is another statistical test for comparing the SDs.] SE(Diff) ≈ sqrt[SEM SEM 2 2 ] = sqrt( ) ≈ 2.2 is approximate. There are more complicated exact formulas that software implements. AI N 42 Mean Std Dev SE(Mean) 1.66=10.78/√42 Non AI N 38 Mean Std Dev SE(Mean) 1.41=8.71/√38

t-Test: Another Note There are other types of t-tests: A two-sided t-test assumes that differences (between groups or pre-to-post) are possible in both directions, e.g., increase or decrease. A one-sided t-test assumes that these differences can only be either an increase or decrease, or one group can only have higher or lower responses than the other group. This is very rare, and generally not acceptable.

Back to Paper: Normal Range Δ= = 7.2 is the best guess for the MAP diff between a randomly chosen AI and non-AI patient, w/o other patient info. What is the “normal” range for AI patients? SD = 8.7 SD = 10.8 N = 38 N = 42

Back to Paper: Confidence Intervals Δ= 7.2 is the best guess for the MAP diff between the means of “all” AI and non-AI patients. We are 95% sure that diff is within ≈ 7.2±2SE(Diff) = 7.2±2(2.2) = 2.8 to SD = 8.7 SD = 10.8 N = 38 N = 42 SE = 1.41 SE = 1.66 SE(Diff of Means) = 2.2 SE(Diff) ≈ sqrt of [SEM SEM 2 2 ]

Back to Paper: t-test Δ= 7.2 is statistically significant (p=0.0014); i.e., only 14 of 1000 sets of 80 patients would differ so much, if AI and non-AI really don’t differ in MAP. Is Δ= 7.2 clinically significant?

Confidence Intervals ↔ Tests p>0.05 p≈0.05 p<0.05 Hyperactivity Paper

Confidence Intervals ↔ Tests |Δ/SE(Δ)| = |t| < 2 is equivalent to: |Δ| < 2 SE(Δ) is equivalent to: -2 SE(Δ) < Δ < 2 SE(Δ) is equivalent to: Δ - 2 SE(Δ) < 0 < Δ + 2 SE(Δ) (95% Confidence Interval)

Confidence Intervals ↔ Tests 95% Confidence Intervals Non-overlapping 95% confidence intervals, as here, are sufficient for significant (p<0.05) group differences. However, non-overlapping is not necessary. They can overlap and still groups can differ significantly.

Back to Paper: Experimental Units Cannot use t-test for comparing lab data for multiple blood draws per subject. b at least 100 g/kg/min of propofol administered at the time of blood draw, or any pentobarbital in the 48 hrs before the blood draw

Tests on Percentages Is 26.3% vs. 61.9% statistically significant (p<0.05), i.e., a difference too large to have a <5% of occurring by chance if groups do not really differ? Solution: same theme as for means. Find a test statistic and compare to its expected values if groups do not differ. See next slide.

Tests on Percentages Cannot use t-test for comparing lab data for multiple blood draws per subject. Expect 1 Observed = 10.2 Area = Chi-Square Distribution 95% Chance 5.99 Here, the test statistic is a ratio, expected to be 1, rather than a difference, expected to be 0. Test statistic=10.2 >> 5.99, so p<0.05. In fact, p=0.002.

Tests on Percentages: Chi-Square The chi-square test statistic (10.2 in the example) is found by first calculating what is the expected number of AI patients with MAP <60 and the same for non-AI patients, if AI and non-AI really do not differ for this. Then, chi-square is found as the sum of standardized (Observed – Expected) 2. This should be close to 1, as in the graph on the previous slide, if groups do not differ. The value 10.2 seems too big to have happened by chance (probability=0.002).

Back to t-Test Expect 95% Chance Observed = 3.28 Declare groups to differ if test statistic is too deviant. Convention: “Too deviant” is ~2. Why not choose, say, |t|>3, so that our chances of being wrong are even less, <1%? 2.5% How much “deviance” is enough proof?

Graphical Representation of t-test No Effect Real Effect No real effect (0) Real effect = 3 Effect in study=1.13 \\\ = Probability: Conclude Effect, But no Real Effect (5%). /// = Probability: Conclude No Effect, But Real Effect (41%). 41% 5% Δ = Effect (Difference Between Group Means) RedBlue Green Just Δ, not t = Δ/SE(Δ)Conclude real effect.

Graphical Representation of t-test No Effect Real Effect No real effect (0) Real effect = 3 Effect in study= % 5% Δ = Effect (Difference Between Group Means) RedBlue Green Just Δ, not t = Δ/SE(Δ)Conclude real effect. Suppose we need stronger proof; i.e., shift cutoff to right. Then, chance of false positive is reduced to ~1%, but false negative is increased to ~60%.

Power of a Study Statistical power is the sensitivity of a study to detect real effects, if they exist. It is =59% two slides back. This is the topic for the next session #4.