2 Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section 3 (Ch 16 – 19): Inference for categorical outcomes
3 In Chapter 16: + 16.1 Proportions − 16.2 Sampling Distribution of a Proportion + 16.3 Hypothesis Test, Normal Approximation − 16.4 Hypothesis Test, Exact Binomial + 16.5 CI for parameter p − 16.6 Sample Size and Power + ≡ covered in introductory recorded lecture
4 Binary Variables Binary variable ≡ two possible outcomes: “success” or “failure” Examples Current smoker (Y/N) Gender (M/F) Survival 5+ years (Y/N) Case or non-case (Y/N)
5 Binomial Proportions Start by calculating sample proportion “p-hat”: Illustration: An SRS of 57 adults identifies 17 smokers. Therefore, the sample proportion is:
6 Two Special Proportions Incidence proportion ≡ proportion at risk that develop a condition over a specified period of time ≡ “average risk” Prevalence proportion ≡ proportion with the characteristic or condition at a particular time
8 Sampling Distribution of a Proportion* The binomial pmf forms the basis of inferential methods in this chapter Normal approximation to the binomial are appropriate in all but small samples The sampling distribution of a proportion is addressed in §16.2 * The basis of sampling distributions will be misunderstood by many introductory students.
9 Confidence Interval for p The true value of parameter p will never be known with absolute certainty, but can be estimated with (1 – α)100% confidence
11 Illustration Data: 17 smokers in an SRS on n = 57 For 95% confidence, use z 1−.05/2 = 1.96
12 Computer Programs Data: x = 17 and n = 57 Wilson’s CI method produces results that are similar (but not identical) to the plus-four method Computation also available via www.OpenEPI.comwww.OpenEPI.com
13 Testing a Proportion A. Hypothesis statements B. z statistic or binomial procedure* * In small samples (less than 5 successes expected) the exact binomial procedure is required C. P-value with interpretation
14 Hypothesis Testing Example A survey in a particular community shows a sample prevalence of 17 of 57 (.2982) The National Center for Health Statistics reports a national prevalence of 21%. Do data support the claim that the particular community has a prevalence that exceeds 21%? Under the null hypothesis: H 0 : p =.21 Note: The value of p under the null hypothesis (call this p 0 ) comes from the research question, NOT the data. H a : p >.21 (one-sided) or H a : p ≠.21 (two-sided)
16 P-value (example) Use Table B or a computer program to determine area under curve beyond the z stat of 1.63 Marginal evidence against H 0 (marginal significance) From Table B: One-tailed P-value =.0515 Two-tailed P-value = 2 ×.0515 =.1030
17 Summary of Illustrative Test H 0 : p =.21 17 of 57 (.2982) were smokers Two-tailed P-value =.10 The evidence against the claim made by H 0 is marginally significant My P value is marginally small!?
18 Hypothesis Test with the CI H 0 : p = p 0 can be tested at the α-level of significance with the (1–α)100% CI Example: Test H 0 : p =.21 at α =.05 95% CI for p = (.195 to.428) [prior slide] CI does NOT exclude a proportion of.21 The difference is NOT significant at α =.05
19 WinPEPI > Describe > A. Test H 0 : p =.25 Data: x = 17; n = 57 Note: Mid-P exact P-values matches our z test! Input Output
20 Tea Challenge Example An individual correctly identifies the order of adding milk to tea in 6 of 8 attempts. Ask: Is this good evidence against guessing (50/50 chance of being right)? A. Hypotheses: Under the null hypothesis of guessing (“50/50” chance of being right) H 0 : p = 0.5 B. Statistic. None. An exact binomial procedure must be used (small n).
21 WinPepi > Describe > A. Input Output The two-tailed exact mid-P P-value = 0.180, suggesting the evidence against H 0 is not significant Input Output
22 Conditions Sampling independence Valid data Plus-four CI requires at least 10 observations z tests requires at least 5 expected cases “I'd rather have a sound judgment than a talent.”
23 Sample Size and Power Three approaches: 1.n needed to estimate p with margin of error m (estimation) 2.n needed to test H 0 at given α and power 3.Power of a test of H 0 at given n Presentation covers first method only
24 To Achieve Margin of Error m where p* represent an educated guess for population proportion p When no educated guess for p* is available, let p* =.5 (for a conservative estimate)
25 Example: n need to Achieve m For 95% confidence and m =.03, use: For 95% confidence and m =.05, use: Suppose p* = 0.30 Larger sample size smaller m
26 n to Test H 0 : p = p 0 where α ≡ alpha level of the test (two-sided) 1 – β ≡ power of the test p 0 ≡ proportion under the null hypothesis p 1 ≡ proportion under the alternative hypothesis
27 n to Test H 0 : p = p 0, example How large a sample is needed to test H 0 : p =.21 against H a : p =.31 at α = 0.05 (two-sided) with 90% power? means round up to ensure stated power
28 Power When Testing H 0 : p = p 0 where α ≡ alpha level of the test (two-sided) n ≡ sample size p 0 ≡ proportion under the null hypothesis p 1 ≡ proportion under the alternative hypothesis
29 Power, Example What is the power of testing H 0 : p =.21 against H a : p =.31 at α = 0.05 (two-sided) when n = 57 ?