Presentation is loading. Please wait.

Presentation is loading. Please wait.

STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample

Similar presentations


Presentation on theme: "STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample"— Presentation transcript:

1 STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
7.1 - Basic Properties of Confidence Intervals 7.2 - Large-Sample Confidence Intervals for a Population Mean and Proportion 7.3 - Intervals Based on a Normal Population Distribution 7.4 - Confidence Intervals for the Variance and Standard Deviation of a Normal Pop Chapter 8 - Tests of Hypotheses Based on a Single Sample 8.1 - Hypotheses and Test Procedures 8.2 - Z-Tests for Hypotheses about a Population Mean 8.3 - The One-Sample T-Test 8.4 - Tests Concerning a Population Proportion 8.5 - Further Aspects of Hypothesis Testing

2 “Statistical Inference”
POPULATION via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Assume Population Distribution X H0: pop mean age  = 25.4 (i.e., no change since 2010) Random Sample size n = 400 ages The reasonableness of the normality assumption is empirically verifiable (e.g., histogram, Q-Q plot) and in fact formally testable from the sample data If violated (e.g., skewed) or inconclusive (e.g., small sample size), then a transformation (e.g. logarithm) or “distribution-free” nonparametric tests should be used instead… Examples: Sign Test, Wilcoxon Signed Rank Test (= Mann-Whitney U Test) x4 x1 x3 x2 x5 … etc… x400

3 “Statistical Inference”
POPULATION via… “Hypothesis Testing” Study Question: Has “Mean (i.e., average) Age at First Birth” of women in the U.S. changed since 2010 (25.4 yrs old)? Present Day: Assume “Mean Age at First Birth” follows a normal distribution (i.e., “bell curve”) in the population. Population Distribution X H0: pop mean age  = 25.4 (i.e., no change since 2010) Random Sample size n = 400 ages x4 x1 Sample size n partially depends on the power of the test, i.e., the desired probability of correctly rejecting a false null hypothesis (80% or more). Coming up next! x3 x2 x5 … etc… x400

4 P(Accept H0 | H0 is true) = 1 – α. P(Reject H0 | H0 is true) = α.
Power and Sample Size IF the null hypothesis H0: μ = μ0 is true, then we should expect a random sample mean to lie in its “acceptance region” with probability 1 – α, the “confidence level.” That is, P(Accept H0 | H0 is true) = 1 – α. Therefore, we should expect a random sample mean to lie in its “rejection region” with probability α, the “significance level.” P(Reject H0 | H0 is true) = α. 1   H0:  = 0 Acceptance Region for H0 Rejection Region /2 “Null Distribution” “Type 1 Error” μ0 + zα/2 (σ /

5 P(Reject H0 | H0 is false) = 1 – . P(Accept H0 | H0 is false) = .
Power and Sample Size 1   H0:  = 0 Acceptance Region for H0 Rejection Region /2 “Null Distribution” “Alternative Distribution” IF the null hypothesis H0: μ = μ0 is false, then the “power” to correctly reject it in favor of a particular alternative HA: μ = μ1 is P(Reject H0 | H0 is false) = 1 – . Thus, P(Accept H0 | H0 is false) = . 1 –  “Type 2 Error” HA: μ = μ1 Set them equal to each other, and solve for n… μ0 + zα/2 (σ / μ1 – z (σ /

6 Given: X ~ N(μ , σ ) Normally-distributed population random variable, with unknown mean, but known standard deviation H0: μ = μ0 Null Hypothesis value HA: μ = μ1 Alternative Hypothesis specific value  significance level (or equivalently, confidence level 1 – ) 1 –  power (or equivalently, Type 2 error rate  ) Then the minimum required sample size is: N(0, 1) 1   z Example: σ = 1.5 yrs, μ0 = 25.4 yrs,  = .05  z.025 = 1.96 Suppose it is suspected that currently, μ1 = 26 yrs. Want more power! Want 90% power of correctly rejecting H0 in favor of HA, if it is false  1 –  = .90   = .10  z.10 = 1.28  = |26 – 25.4| / 1.5 = 0.4 qnorm(.9) So… minimum sample size required is n  66

7 Given: X ~ N(μ , σ ) Normally-distributed population random variable, with unknown mean, but known standard deviation H0: μ = μ0 Null Hypothesis value HA: μ = μ1 Alternative Hypothesis specific value  significance level (or equivalently, confidence level 1 – ) 1 –  power (or equivalently, Type 2 error rate  ) Then the minimum required sample size is: N(0, 1) 1   z Example: σ = 1.5 yrs, μ0 = 25.4 yrs,  = .05  z.025 = 1.96 Change μ1 Suppose it is suspected that currently, μ1 = 26 yrs. Want 95% power of correctly rejecting H0 in favor of HA, if it is false Want 90% power of correctly rejecting H0 in favor of HA, if it is false  1 –  = .90  1 –  = .95   = .05   = .10  z.05 = 1.645  z.10 = 1.28  = |26 – 25.4| / 1.5 = 0.4 qnorm(.9) qnorm(.975) So… minimum sample size required is n  82 n  66

8 Given: X ~ N(μ , σ ) Normally-distributed population random variable, with unknown mean, but known standard deviation H0: μ = μ0 Null Hypothesis value HA: μ = μ1 Alternative Hypothesis specific value  significance level (or equivalently, confidence level 1 – ) 1 –  power (or equivalently, Type 2 error rate  ) Then the minimum required sample size is: N(0, 1) 1   z Example: σ = 1.5 yrs, μ0 = 25.4 yrs,  = .05  z.025 = 1.96 Suppose it is suspected that currently, μ1 = 25.7 yrs. Suppose it is suspected that currently, μ1 = 26 yrs. Want 95% power of correctly rejecting H0 in favor of HA, if it is false  1 –  = .95   = .05  z.05 = 1.645  = |25.7 – 25.4| / 1.5 = 0.2  = |26 – 25.4| / 1.5 = 0.4 qnorm(.975) So… minimum sample size required is n  82 n  325

9 Given: X ~ N(μ , σ ) Normally-distributed population random variable, with unknown mean, but known standard deviation H0: μ = μ0 Null Hypothesis value HA: μ = μ1 Alternative Hypothesis specific value  significance level (or equivalently, confidence level 1 – ) 1 –  power (or equivalently, Type 2 error rate  ) Then the minimum required sample size is: N(0, 1) 1   z Example: σ = 1.5 yrs, μ0 = 25.4 yrs,  = .05  z.025 = 1.96 Suppose it is suspected that currently, μ1 = 25.7 yrs. With n = 400, how much power exists to correctly reject H0 in favor of HA, if it is false? Power = 1 –  = = , i.e., 98%

10

11 Comments

12 Given: X ~ N(μ , σ ) Normally-distributed population random variable, with unknown mean, but known standard deviation H0: μ = μ0 Null Hypothesis HA: μ ≠ μ0 Alternative Hypothesis (2-sided)  significance level (or equivalently, confidence level 1 – ) n sample size From this, we obtain… “standard error” s.e. sample mean sample standard deviation …with which to test the null hypothesis (via CI, AR, p-value). In practice however, it is far more common that the true population standard deviation σ is unknown. So we must estimate it from the sample! (estimate) x1, x2,…, xn Recall that

13 Given: X ~ N(μ , σ ) Normally-distributed population random variable, with unknown mean, but known standard deviation H0: μ = μ0 Null Hypothesis HA: μ ≠ μ0 Alternative Hypothesis (2-sided)  significance level (or equivalently, confidence level 1 – ) n sample size From this, we obtain… “standard error” s.e. sample mean sample standard deviation …with which to test the null hypothesis (via CI, AR, p-value). In practice however, it is far more common that the true population standard deviation σ is unknown. So we must estimate it from the sample! This introduces additional variability from one sample to another… PROBLEM??? Not if n is “large”…say,  30. (estimate) But what if n < 30? T-test! x1, x2,…, xn Recall that

14 Student’s T-Distribution
… is actually a family of distributions, indexed by the degrees of freedom, labeled tdf. William S. Gossett ( ) t2 t1 t10 t3 Z ~ N(0, 1) As the sample size n gets large, tdf converges to the standard normal distribution Z ~ N(0, 1). So the T-test is especially useful when n < 30.

15 Student’s T-Distribution
… is actually a family of distributions, indexed by the degrees of freedom, labeled tdf. William S. Gossett ( ) Z ~ N(0, 1) t4 .025 1.96 As the sample size n gets large, tdf converges to the standard normal distribution Z ~ N(0, 1). So the T-test is especially useful when n < 30.

16 Lecture Notes Appendix…
or… qt(.975, 4) qt(.025, 4, lower.tail = F) [1]

17 Student’s T-Distribution
… is actually a family of distributions, indexed by the degrees of freedom, labeled tdf. William S. Gossett ( ) Z ~ N(0, 1) t4 .025 .025 1.96 2.776 Because any t-distribution has heavier tails than the Z-distribution, it follows that for the same right-tailed area value, t-score > z-score.

18 If n is small, T-score > 2.
… the “T-score" increases (from ≈ 2 to a max of for a 95% confidence level) as n decreases  larger margin of error  less power to reject, even if a genuine statistically significant difference exists! If n is large, T-score ≈ 2.

19 Given: X = Age at first birth ~ N(μ , σ ) H0: μ = 25.4 yrs Null Hypothesis HA: μ ≠ 25.4 yrs Alternative Hypothesis Previously… σ = 1.5 yrs, n = 400, statistically significant at  = .05 Now suppose that σ is unknown, and n < 30. Example: n = 16, s = 1.22 yrs standard error (estimate) = .025 critical value = t15, .025

20 Lecture Notes Appendix…

21 Given: X = Age at first birth ~ N(μ , σ ) H0: μ = 25.4 yrs Null Hypothesis HA: μ ≠ 25.4 yrs Alternative Hypothesis Previously… σ = 1.5 yrs, n = 400, statistically significant at  = .05 Now suppose that σ is unknown, and n < 30. Example: n = 16, s = 1.22 yrs 95% margin of error = (2.131)(0.305 yrs) = yrs standard error (estimate) = .025 critical value = t15, .025 = 95% Confidence Interval = (25.9 – 0.65, ) = (25.25, ) yrs p-value = Test Statistic:

22 Lecture Notes Appendix…

23 Given: X = Age at first birth ~ N(μ , σ ) H0: μ = 25.4 yrs Null Hypothesis HA: μ ≠ 25.4 yrs Alternative Hypothesis Previously… σ = 1.5 yrs, n = 400, statistically significant at  = .05 Now suppose that σ is unknown, and n < 30. Example: n = 16, s = 1.22 yrs 95% margin of error = (2.131)(0.305 yrs) = yrs standard error (estimate) = .025 critical value = t15, .025 = 95% Confidence Interval = CONCLUSIONS: (25.9 – 0.65, ) = The 95% CI does contain the null value μ = 25.4. (25.25, ) yrs The p-value is between .10 and .20, i.e., > .05. (Note: The R command 2 * pt(1.639, 15, lower.tail = F) gives the exact p-value as .122.) p-value = = 2 (between .05 and .10) Not statistically significant; small n gives low power! = between .10 and .20.

24 … with sample mean = 25.9 and sample sd = 1.22.
Edited R code: y = rnorm(16, 0, 1) z = (y - mean(y)) / sd(y) x = *z sort(round(x, 1)) Generates a normally-distributed random sample of 16 age values… c(mean(x), sd(x)) [1] … with sample mean = 25.9 and sample sd = 1.22. t.test(x, mu = 25.4) One Sample t-test data: x t = , df = 15, p-value = alternative hypothesis: true mean is not equal to 25.4 95 percent confidence interval: sample estimates: mean of x 25.9

25

26

27 See…


Download ppt "STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample"

Similar presentations


Ads by Google